# Spatio-temporal trends of diarrheal mortality of children in association with hydrographic regions of Brazil.

Introduction

The importance of an improved understanding of environmental factors associated with enteric diseases such as diarrhea has been widely recognized and has particular relevance in spatial epidemiology (Torok et al. 1997; Kelly-Hope et al. 2007). There is strong demand to investigate dynamics of exposure to and transmission of water-borne pathogens, and to establish improved surveillance systems that can lead to effective preventive measures of such diseases (Eyles et al. 2002; Rushton 2003; Pande et al. 2008; Chaikaew et al. 2009). Such research is of particular priority in less developed regions where the burden from enteric diseases is still very high (Fewtrell et al. 2005).

An important factor in the analysis of environmental risk factors is the geographic extent and framework of the unit of analysis. Recent efforts analyzed spatio-temporal patterns of diarrhea by exploring the influence of regional dynamics of risk factors such as climate and socio-economic conditions (Kelly-Hope et al. 2008; Jepsen et al. 2004). Often the unit of analysis for such studies are administrative reporting units (states or larger) used in disease reporting, resulting in highly aggregated outcomes with limited representation of the underlying environmental phenomenon that might be more realistically reflected by analytical units defined by natural barriers, ecological systems, and other important factors in pathogen occurrence, exposure, and transmission (Curriero et al. 2001). Such disconnect can result in a higher likelihood of error due to ecological fallacy, reduce heterogeneity of observed data, and thus reduce statistical power needed to discern association between the risk factors and incidence of disease (Peters et al. 2004; Wakefield and Shaddick 2005; Beale et al. 2008).

In this article we demonstrate a geospatial framework for identifying spatio-temporal patterns of mortality peak timing from pediatric diarrhea based on relative location of the disease reporting unit in the hydrologic regime of major river basins in Brazil. Such an approach could be useful for a better understanding of the associations between waterborne diseases and environmental processes in general, thus allowing for derivation of predictive models for exposure and transmission within a natural, rather than geopolitical, spatial unit of analysis.

Data

The geographic extent of our study area is the geopolitical boundary of Brazil. We obtained spatially registered GIS data layers of watershed boundaries and stream hydrography for eight (8) Hydrographic Regions from Agencia Nacional de Aguas (ANA; http://www2.ana.gov.br/). We used 1 km spatial resolution topographic data from the Shuttle Radar Topography Mission (SRTM; http://www2.jpl. nasa.gov/srtm/) to carry out hydrologic modeling.

We obtained Brazilian mortality data classified as intestinal infectious disease (ICD-9 codes) and selected all deaths of children younger than 5 years of age for the period 1979-1989 from the Ministerio da Saude do Brasil (http://www2.datasus. gov.br/ DATASUS/index.php?area=02). We used this time period rather than more current data because mortality rates and thus geographic heterogeneity of number of deaths were higher, allowing a more robust data set for the purpose of our study. In the dataset, monthly mortality is reported by Municipality, which is a census reporting unit of the Brazilian government with an approximate median area of 420 Km2. The number of Municipalities in Brazil changed from 3991 in 1979 to 4491 in 1989. Because of large proportions of missing data and changes of political boundaries for Municipalities over our study period, we aggregated the number of deaths per month to Census Micro Regions (CMRs) each of which contain multiple Municipalities. The number of CMRs increased from 542 in 1979 to 558 in 1989 during the study period, with a median area of 5480 Km2 in 1989. We applied screening criteria for accepting CMR-level MPT data: the presence of only one maximum value, average maximum deaths greater than 3 or if less than 3, at least 5 years of recorded data, resulting in a final dataset of 507 CMRs for use in our study, or 91% of all CMRs that reported in 1989. Due to the criteria of at least 5 years of data, newly created CMR regions could be included only if they were established no later than 1985, and had annual deaths reported for each year or had higher numbers of deaths in less than 5 years. We obtained spatial layers of polygons from the Instituto Brasileiro de Geografia e Estatistica (IBGE; http://www.ibge.gov. br/) representing administrative boundaries of CMRs and Municipalities, as well as point data (latitude, longitude) representing Municipal seats locations (cities).

Methods

In a first step we calculated the mortality peak timing variable (MPT) for each Census Micro Region (CMR) based on aggregated monthly Municipal mortality data. MPT is defined as the average digital month on a continuous time scale (between 1.0 for May 1st and 12.99 for April 30th) when maximum mortality occurred in each CMR over the study period (1979-1989). Next, we tested the mortality peak time variable for global spatial autocorrelation and local spatial clustering in order to justify the use of geostatistical models. In a third step we assigned the CMR-based MPT to the point location of the Municipality with the highest number of mean annual deaths, termed Maximum Mortality Municipality (MMM), within each CMR. These point locations were used as input to two different geostatistical models in order to create validated continuous surfaces of mortality peak timing across each Hydrographic Region. Finally we tested for trends of mortality peak timing along the major stream channels of the river network within each Hydrographic Region. We validated our modeled trends by comparing them to inspected trends based on MPT values of discrete Municipality locations in close proximity to major streams.

Extracting Mortality Peak Timing

We extracted the number of deaths occurring in Municipalities for each CMR of Brazil (Figure 1) for each month of the years 1979-89. We normalized these numbers for each year to proportional values on a continuous scale in the range [0,1] such that the annual maximum number of deaths was equal to unity, and the proportional number in each month a value between 0 and 1. These proportional values represent measures of similarity between numbers of deaths in each month and the number of deaths in the peak month. We then averaged these proportional values for each month over the years of the study period (1979-89) resulting in twelve averaged proportional values for each CMR. The maximum of these twelve values represents the estimated CMR-level mortality peak month for the study period. This step compensates for variations in seasonality since the average peak month is based on the normalized occurrence of deaths in comparison to all other months instead of simply determining the month most frequently found as annual peak. In order to derive a digital scale of continuous peak time estimates we corrected the peak month ti to a peak time value to based on the magnitudes of the values in the prior and subsequent months using the following equation:

tc = (a * [t.sub.i+1] + [t.sub.i+1])/(1+a) (1)

with a = ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])/([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]), [t.sub.i] = 1, ..., 12

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the maximum proportional value of the considered peak month [t.sub.i], [t.sub.i-1] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the prior month and its membership, and [t.sub.i.+1] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the subsequent month and its membership, respectively.

The peak month [t.sub.i] was corrected such that ([t.sub.c] - [t.sub.i-1])/ ([t.sub.i+1] - [t.sub.c]) equals the ratio ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])/ ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Hence the range of values for a is any positive real number (a [right arrow] [0, 9.6] for this study). The peak month value is thus corrected toward the greater neighbor temporally. The rationale for this step is that the real peak time value would be expected closer to e.g., the prior month [t.sub.i-1] if its membership value is very similar to the maximum [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and further away in time from the subsequent month [t.sub.i+1] if its" membership [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is very small. We converted the mortality peak time values to a time scale such that May 1st represents the starting point (1.0) and increases continuously over an annual cycle (e.g., mid November has the value 7.5).

[FIGURE 1 OMITTED]

Test for Spatial Autocorrelalion and Clustering

In order to test for global spatial autocorrelation in MPT values we calculated global Moran's I (Moran 1950). In addition we derived local indicators of spatial association (LISA), i.e., local Moran's I (Anselin 1995) to test for the presence and type of significant spatial clusters and outliers in MPT. We classified clusters that were statistically significant at the 95% confidence level as clustered high or clustered low values as well as outliers (low values surrounded by high values and vice versa) in order to identify regional differences in the spatial distribution of the data. Because of the proportion of excluded data at the scale of CMRs (51 out of 558), the spatial structure of the polygon features was incomplete which did not allow to use contiguity" for determining spatial weights in the LISA analysis. We resolved this issue by assigning row-normalized spatial weights based on polygon centroids and inverse distances to neighbors.

Geostatistical Modeling and Validation

We assigned CMR-derived MPT values to the location of the Municipality where the highest number of mean annual deaths were reported within each CMR during the study period, hereafter referred to as the Maximum Mortality Municipality (MMM) (Figure 1). This procedure allowed to focus on the most relevant locations within a CMR, and avoided the use of centroids of CMR polygons, which could result in obscuring underlying environmental risk factors in locations where mortality is prevalent. Based on these locations and values, we computed and validated continuous geostatistical surfaces of mortality peak timing (MPT) at a spatial resolution of 20 km (hereafter termed, MPT Model Raster Cells). We applied two different geostatistical models: (1) Ordinary Kriging, which applies a uniform variogram across our study area; purposely we made no assumptions regarding anisotropy or global trends in order to estimate MPT based on distance-related weighting for all localities without any barriers, and (2) Diffusion Interpolation (Perona and Malik 1990; Black et al., 1998), which allowed us to constrain interpolation within the natural boundary of each Hydrographic Region of Brazil. Diffusion Interpolation uses a kernel that is based on the heat equation and changes its shape according to the diffusion equation near the physical barrier. The outcome is similar to Kernel Interpolation in that it is designed to eliminate influence from data points outside the designated barrier in estimating a variable value within a designated geography. The main reason for using these two different models was to investigate whether different results can be found if trend analysis is carried out based on physically constrained vs. non-constrained interpolation surfaces and thus to study effects of natural boundaries forming the Hydrographic Regions. If there is no association between MPT and the location of the Maximum Mortality Municipality (MMM) within the hydrologic regime, an isotropic continuous spatial distribution of MPT would be expected. If such an association exists anisotropy driven by hydrography can be expected. A subregional effect of anisotropy could be expected if such an association exists on the scale of a Hydrographic Region.

We created our geostatistical surfaces using randomly drawn MPT values from MMM locations within 355 (70%) of the 507 CMRs in our study. We sequestered MPT values for the remaining 152 MMMs in order to validate the predicted MPTs at these locations. We calculated Spearman correlation coefficient between the predicted MPT Model Raster Cell values and observed MPT values for CMRs in the validation subset. In addition we computed the Mean Absolute Error (MAE) to test for systematic bias. We repeatedly run and validated the geostatistical models 100 times in order to calculate mean values of MPT for each MPT Model Raster Cell as well as ranges of correlation and accuracy measures.

Spatially Refined Trend Analysis: Hydrologic Regime

To test for trends in mortality peak timing (MPT) along the stream networks in each Hydrographic Region we first calculated flow accumulation based on elevation data (1Km spatial resolution). We selected the cell values representing locations along the major river channels in each major river basin with contributing watersheds greater than 10,000 [km.sup.2] in land area (Figure 2A) (hereafter termed Flow Accumulation Raster Cells, FARC). In order to uniquely encode local flow direction(s),we conducted a Focal Flow Analysis (FFA), which determines at each FARC the direction(s) of reception within a moving window and assigns an additive binary code between 0 and 255 depending on the location of neighbors flowing into the cell. Each combination of such neighbors results in a different value. Since this analysis was based on flow accumulation values (greater watershed area at a neighboring FARC indicates downstream direction) the resulting Focal Flow raster values (binary codes) were known indicators of downstream direction, and thus could be used with confidence in a comparative analysis to determine if directional trends in MPT were the same.

[FIGURE 2 OMITTED]

We estimated MPT values for each FARC based on the value in the corresponding MPT Model Raster Cell encompassing the FARC (Figure 2B and C) to create a spatial surface representing continuous values of MPT of a spatial resolution of 1km along major rivers within each Hydrographic Region (hereafter MPT Stream Raster Cells, MSRC). In order to ensure that the MSRCs have continuously changing MPT cell values the extracted MPT Model Raster Cell values were distance-averaged between the values of neighboring MPT Model raster cells.

In order to determine the trend in MPT for each MSRC, we employed a second Focal Flow Analysis along the major streams but using the extracted MPT values instead of Flow Accumulation. The resulting raster values (binary codes) were compared to the Focal Flow output raster values for flow accumulation and the difference between the two surfaces was calculated using simple map algebra. If the result was 0 i.e., the binary codes have equal values, the focal flow of MPT is in the same direction as the corresponding accumulated flow (i.e., hydrologically downstream). Otherwise the focal flow direction is undefined (equal values in neighboring cells or no reception) or upstream, and counted as "no agreement" in our analyses of results. Since sudden changes in FFA binary codes can occur at FARCs spatially coincident to confluence points of two streams (Figure 2D), the corresponding

MSRCs were not included in the trend analysis of MPT. We repeated the steps described above using MPT values for MSRCs based on both Ordinary Kriging and Diffusion Interpolation models in order to compare results between these two different approaches (Figure 2E and F).

For each of the 8 Hydrographic Regions of Brazil, we calculated the proportion of raster cells for which the difference between Focal Flow binary codes for FARCs versus MSRCs, was zero, thus indicating the proportion of major stream locations with a downstream trend in MPT. We also recorded the proportions of MSRCs with a downstream trend in MPT for each stream segment within the dendritic stream network occurring between the confluences of two or more streams, or between a confluence and the mouth of the river (or country border). For each Hydrographic Region we summed the lengths of the stream segments, for which we found more than 50% of the MSRCs with a downstream trend in MPT values. We calculated summary statistics from these analyses for each Hydrographic Region as well as the entire study area.

Validation of Trend Analysis: Hydrologic Regime

We assessed the trends in predicted MPT values along major rivers within each Hydrographic Region by comparing them to trends generated based on MPT values at Maximum Mortality Municipalities (MMMs) located in close proximity to the rivers. We used a minimum of 10 MMMs in each Hydrographic Region, starting with the MMM most upstream, and assessing trend in the downstream direction of the hydrologic regime until evaluating the most downstream MMM before the confluence of the major fiver with the sea or the political boundary of Brazil.

Results

Extracting Peak Time Values and Exploratory Spatial Statistics

We calculated average annual mortality peak timing values (MPT) of pediatric mortality attributed to diarrheal disease for 507 (91%) of 558 Census Micro Regions (CMR) in Brazil for the period 1979-1989. In Table 1, we present statistical distributions of MPT values within the Hydrographic Regions of Brazil, and the country overall by three different estimation methods for deriving MPT values: from observed Municipal level mortality data aggregated to Census Micro Regions, from geostatistical MPT Model surfaces generated by Ordinary Kriging and Diffusion Interpolation, and from spatially refined MPT values at 1 km raster cells representing locations of iterative flow accumulation represented by a contributing watershed area of 10,000 km2 and thus along major streams (MSRCs). The distributions of MPT values in Hydrographic Regions are similar between different geostatistical models but show differences to CMR-level MPT distributions in that the modeled values show a decrease in interquartile ranges and thus a lower variation. Across Hydrographic Regions there are differences in statistical distributions for all data sources that are not reflected by the global distribution parameters (Table 1).

Global Moran's I statistics (Table 1) indicated highly significant (p < 0.01) spatial autocorrelation in MPT for the whole country (I = 0.36; Z-score = 25.73) suggesting that geostatistical models are appropriate for estimation. We found highly significant spatial autocorrelation in all Hydrographic Regions except regions 7 and 8 in the south (Figure 3).

The results of the spatial cluster analysis (LISA) of CMR-level MPT values are presented in Figure 3. Based on overlaid boundaries of the Hydrographic Regions it can be seen that there are significant spatial clusters of low MPT values (low-low = early peaks spatially cluster with other early peaks) in the headwater areas of Hydrographic Regions (HR) 1, 2, 4 and 6, all of which are major river basins. Significant clusters of high MPT values (high-high = late peaks spatially cluster with other late peaks) that occurred in these river basins were primarily located in the most downstream locations of Hydrographic Regions 2 and 4. Other significant clusters of high MPT values occurred in regions 3 and 5, which are composed of multiple unique river basins with separate hydrologic regimes for surface flow. Statistically significant "outliers"--CMRs with a high value surrounded by CMRs with low values--are located in the upper watersheds of major river basins (HR 2 and 6), and in areas in the upper and lower elevations of HR 3. Other outliers--CMRs with a low value surrounded by CMRs with high values--can be seen in the lower basin of HR 4, the higher elevations of HR 5, and in both upper and lower elevations of HR 3. The patterns in Figure 3 illustrate a general trend from central-west to the northeast (early to late MPT) across Brazil, but trends in Hydrographic Regions do not necessarily follow this pattern, especially in HRs that are single major river basins (e.g. HR 2).

Geostatistical Modeling and Validation

The continuous surfaces derived from Ordinary Kriging and Diffusion Interpolation are shown in Figure 4A and B. It can be seen that the use of physical barriers in Diffusion Interpolation results in a distribution that is similar to the Kriging surfaces but shows differences in local patterns in close proximity to the boundaries of Hydrographic Regions.

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]

The validation analysis of the geostatistical models resulted in highly significant correlations (p < 0.01) between estimated MPT values in the MPT Model Raster Cells and spatially coincident MPT values derived for Maximum Mortality Municipalities (MMMs) in 98% of 100 total model runs. The correlation coefficients ranged between 0.59 and 0.79 when we used Ordinary Kriging to estimate MPT, with a mean prediction error of 0.111 over all model iterations. The Mean Absolute Errors (MAE) in this comparison ranged between 0.43 and 0.69. For Diffusion Interpolation the correlation coefficients ranged between 0.56 and 0.77 with a mean prediction error over all 100 model iterations of 0.133; MAE values ranged between 0.47 and 0.71.

Spatially Refined Trend Analysis considering Hydrologic Regime

The spatially relined MPT values along the major streams (MPT Stream Raster Cells, MSRCs) extracted from the MPT Model surfaces using Ordinary Kriging and Diffusion Interpolation are illustrated in Figure 5A and B, respectively. Table 2 presents the results of the MPT trend analysis along major rivers for Brazil for each of the eight Hydrographic Regions. There are considerable differences between proportions of raster cells that are downstream across Hydrographic Regions for both Ordinary Kriging (between 0.28 and 0.67) and Diffusion Interpolation (between 0.15 and 0.81). The proportions of raster cells downstream are higher in the Hydrographic Regions of larger areal extent that consist of single-river basins (i.e., regions 1, 2 and 4; all values > 0.5) as compared to the southern regions (regions 7-8; all values < 0.3). Regions along the Eastern coastline that have multi-river basins (regions 3 and 5) show mixed results. Diffusion Interpolation resulted in higher proportions of downstream raster cells: 65% for whole Brazil (58% based on Ordinary Kriging) and up to 81% (region 4) for individual Hydrographic Regions.

[FIGURE 5 OMITTED]

Considering the total study area (i.e. political boundary of Brazil), the total length of river segments with more than 50% downstream trend represents 64% and 75% of the total length of stream channels for Ordinary Kriging and Diffusion Interpolation, respectively (Table 2). These percentages increase to up to 88% for Hydrographic Region 2 when we applied Diffusion Interpolation (Figure 5B and D). The maps in Figures 5C and D show the downstream proportions of stream segments on a continuous scale between 0 and 1 in order to provide a more differentiated presentation of the results as compared to a crisp threshold of 0.5. It can be seen that in regions with strong downstream trends (Table 2; region 1, 2 and 4) the majority of segments has indeed proportions of downstream MSRCs close to 1.0 when using Diffusion Interpolation (Figure 5D).

Validation: Spatially Refined Trend Analysis considering Hydrologic Regime

The final inspection of possible trends downstream based on discrete Maximum Mortality Municipalities (MMM) confirmed the results based on the geostatistical models. For Hydrographic Regions 1, 2 and 4 an increase of MPT between MMMs located close to the origin of the river and those close to the mouth of the river was found in most cases (70-80%, Table 2). In the southern part of the country (regions 6-8) but also in multi-river basin regions (regions 3 and 5) it was difficult to detect any trends due to their small areal extent (7 and 8), a limited number of points for validation (3, 5 and 6), or simply because there was no trend.

Discussion and Concluding Remarks

In this paper, we analyze trends in mean annual peak timing values of pediatric mortality (MPT) attributed to diarrheal disease in Brazil for the period 1979-1989 using a novel approach for environmental health risk studies, that is using natural boundaries instead of political boundaries to define the unit of analysis. We evaluate the approach at varying spatial scales: (1) Country-wide based on observed Municipal level mortality data aggregated to Census Micro Regions (CMR) with a median area of 5480 Km2; (2) Country-wide based on a grid of 20 Km2 raster cells generated by geostatistical models that use CMR-level MPT values as input; (3) Within eight officially designated Hydrographic Regions of Brazil (mean area of approximately 1,086,000 Km2) based on the geostatistical models, and (4) Along longitudinal "vectors" of 1Km2 raster cells defining the stream network (hydrologic regime) within each Hydrographic Region. MPT for the latter case was extracted from spatially coincident raster cells of the geostatistical MPT models.

At the country level, we found evidence of a trend west to east of increasing MPT over an annual cycle (May to April) using the CMR-level estimates, whether the unit of analysis was a political boundary (CMRs) or a natural boundary such as the Hydrographic Regions (Figure 1). The overall trend was verified by unconstrained geostatistical surfaces i.e., using Ordinary Kriging (Figure 4A). When we examined the model results at finer scales (higher resolution of the unit of analysis), we discovered greater geographic heterogeneity in MPT in the model surface as well as in the LISA map (Figure 3). Using Hydrographic Regions (HRs) as the unit of analysis we found CMR-level MPT ranged from approximately 6.3 to 10.0, compared to 9.2 to 9.5 when calculated at the country level (Table 1). This provided justification for generating constrained geostatistical surfaces e.g., Diffusion Interpolation, using Hydrographic Region boundaries as barriers (Figure 4). A compelling finding of our study was the fact that when we increased the spatial resolution to the level of the stream network within the Hydrographic Regions, we found dominant and consistent trends of increasing MPT from the source areas (upper watersheds) to downstream locations, particularly for single river basins (Figure 5). Thus existing trends were no longer predominantly east to west as on the country level, but oriented in the direction of flow of the major river draining the basin and some tributary streams (e.g., Amazon River basin, HR 1).

The observed wide range in MPT across different spatial scales could have important implications for assessment of known or suspected environmental risk factors that have significant temporal variation. These factors can include climatological and meteorological variables, water use activities, sanitation and drainage, among others. In a previous study, we identified such factors, and demonstrated the adverse affect of temporal aggregation of data in models of mortality risk and diarrheal disease in Mexico (Leyk et al. 2011). In this current study we provide a substantial argument for evaluating natural boundaries as the preferred analytical unit in order to reflect heterogeneity and spatial variation of such variables in the context of health risk studies (Wakefield and Shaddick 2005; Beale et al. 2008). For example, we found consistent trends in increasing MPT in stream segments from upper to lower reaches of some major river basins that were not detected on coarser scales. Thus, between the two studies, we have demonstrated that both temporal and spatial aggregation of risk factors and disease data can have profound effects on understanding the association between the environment and disease, and therefore on the design and implementation of effective intervention strategies for disease prevention. If such findings hold across other geographic regions, implications of such aggregation should be part of risk communication to policy makers (Leyk et al. 2011).

We purposely selected the study period 1979-1989. Likewise, we purposely used absolute number of mortalities as opposed to mortality rates in our study; even though the latter expression of prevalence is more typical in health risk analysis. The objective of our study is to identify spatio-temporal patterns of mortality peak timing from pediatric diarrhea based on relative location of the disease reporting unit. We used data from the selected time period rather than more current data because higher mortality rates and geographic heterogeneity of the number of deaths made the dataset more robust. Peak timing would be in the same month if one uses population-based rates or absolute values of the disease outcome in question. Thus, our findings would hold for a more current time period if underlying factors that caused the wide range in MPT at different spatial scales were still operative.

A limitation to our study was missing or sparsely distributed mortality data in large Municipalities and CMRs, particularly in the northwest part of the country (Figure 1) (HR1; The Amazon River Basin); it is well established that population density is low compared to other basins, and progressively so from the mid- to upper basin subwatersheds. There is also a documented issue of under-reported mortality in regions across Brazil (Castro et al. 2010). Neither of these issues appeared to have a major impact on our findings since the trends across different basins seem consistent, both in magnitude and range of MPT. Timing of peak mortality over an annual cycle compared favorably with other studies reporting temporal trends at more localized studies in Brazil (Guerrant et al. 1983; Schorling et al. 1990; Kale et al. 2004; Melli and Waldman 2009).

Another potential limitation to our study could be the interpolation of observed data to different spatial scales. Because of large proportions of missing data and changes of political boundaries for the base reporting unit used by the Brazilian government, Municipalities (median area 420 Km2), we aggregated observed mortality to 507 Census Micro Regions (median area 5480 Km2) using the available municipal level data within each CMR, and used these data to calculate a CMR-level average annual peak timing of mortality (MPT). We use these CMR-level estimates as inputs to our geostatistical models based on the location of the Municipality with the highest number of mean annual deaths over our study period. To evaluate the accuracy of this procedure, we sequestered calculated MPT at these locations in a randomly selected sample of 30% of the CMRs and used these to validate predicted MPT for the spatially coincident modeling unit. We found correlation coefficients ranged from 0.56 to 0.79 providing some confidence. However, the absence of spatial autocorrelation (Moran's I) in some Hydrographic Regions and calculated error surfaces based on the 100 model runs demonstrate that the geostatistical model is expected to perform weakly in some subregions (HR7 and 8). Likewise, we evaluated the direction of trend in estimated MPT in 1Km2 raster cells representing locations along the stream network in each Hydrographic Region by comparing these estimates to observed trends in municipalities in close proximity to the network. This evaluation confirmed the results based on the models in that trends are detectable in some Hydrographic Regions (HR 1, 2 and 4) but cannot be identified in others. However, our validation efforts and the interpretation of the trend analysis results across eight different Hydrologic Regions justifies the suitability of the presented methods for extracting, interpolating and spatially refining observed mortality data for the purpose of our study.

In summary, we have completed a study of average peak timing of mortality due to diarrheal disease in children less than 5 years of age using spatially registered mortality data from Brazil. Our study results indicate substantial spatial variation in peak timing, which could have important ramifications in studies concerning known or suspected risk factors with significant temporal variation over an annual cycle. We found the geographic orientation of trend in mortality peak timing to be highly dependent on the geographic extent and nature of the unit of analysis. We demonstrate that a unit based on natural boundaries, in our case stream segments and watershed boundaries within the Hydrogeographic Regions of Brazil, resulted in consistent prediction of trend in large single-river basins; with mortality peak timing occurring in earlier months in the upper watersheds of major river basins, and in later months in the lower basin, during an annual cycle defined from May to April. Our results are consistent with findings of other studies in expressing caution in the use of aggregated data with political boundaries as input to spatio-temporal models for health risk assessment. Further research is needed to determine if our methods and findings are transferable to other regions.

ACKNOWLEDGEMENTS

The Etiology, Risk Factors and Interactions of Enteric Infections and Malnutrition and the Consequences for Child Health and Development Study (MAL-ED) is carried out as a collaborative study supported by the Bill & Melinda Gates Foundation, the Foundation for the NIH and the National Institutes of Health/ Fogarty International Center. SL and JRN were also funded by interagency personnel agreements between the Fogarty International Center (NIH) and the University of Colorado at Boulder and Colorado State University, respectively. The authors thank four anonymous reviewers for their constructive critics.

REFERENCES

Anselin, L. 1995. Local Indicators of Spatial Association--LISA. Geographical Analysis 27: 93-115.

Beale, L., J.J. Abellan, S. Hodgson and L. Jarup. 2008. Methodologic Issues and Approaches to Spatial Epidemiology. Environmental Health Perspectives 116: 1105-1110.

Black, M. J., G. Sapiro, D.H. Marimont and D. Heeger. 1998. Robust anisotropic diffusion. IEEE Transactions on Image Processing, 7(3): 421-432.

Castro, M.C. and C.C. Simoes. 2010. SpatioTemporal Trends of Infant Mortality in Brazil. Proceedings of the Annual Meeting. Population Association of America (PAA) 2010.

Chaikaew, N., N.K. Tripathi and M. Souris. 2009. Exploring spatial patterns and hotspots of diarrhea in Chiang Mai, Thailand. International Journal of Health Geographics 8: 36.

Curriero, F.C., J.A. Patz, J.B. Rose and S. Lele. 2001. The Association Between Extreme Precipitation and Waterborne Disease Outbreaks in the United States, 1948-1994. American Journal of Public Health 91: 1194-1199.

Eyles, R., D. Niyogi, C. Townsend, G. Benwell and R Weinstein. 2002. Spatial and Temporal Patterns of

Campylobacter Contamination Underlying Public Health Risk in the Taieri River, New Zealand. Journal of Environmental Quality 32(5): 1820-1828.

Fewtrell, L., R.B. Kaufmann, D. Kay, W. Enanoria, L. Hailer and J.M. Colford. 2005. Water, sanitation and hygiene interventions to reduce diarrhea in less developed countries: a systematic view and metanalysis. Lancet Journal of Infectious Diseases 5: 42-52.

Guerrant, R.L., L.V. Kirchhoff, D.S. Shields, M.K. Nations, J. Leslie, M.A. de Sousa, J.G. Araujo, L.L Correia, K.T. Sauer, K.E. McClelland, F.L. Trowbridge and J.M. Hughes. 1983. Prospective Study of Diarrheal Illnesses in Northeastern Brazil: Patterns of Disease, Nutritional Impact, Etiologies, and Risk Factors. The Journal of Infectious Diseases 148(6): 986-997.

Jepsen, M.R., J. Simonsen and J.S. Ethelberg. 2004. Spatio-temporal cluster analysis of the incidence of Campylobacter cases and patients with general diarrhea in a Danish county, 1995-2004. International Journal of Health Geographics 8:11.

Kale P.L., V.L. Andreozzi and F.F. Nobre. 2004. Time Series Analysis of Deaths Due to Diarrhea in Children in Rio de Janeiro, Brazil, 1980-1998. Journal of Health, Population, and Nutrition, 22(1): 27-33.

Kelly-Hope, L.A., W.J. Alonso, V.D. Thiem, D.D. Anh, G. Canh do, H. Lee, D.L. Smith and M.A. Miller. 2007. Geographical distribution and risk factors associated with enteric diseases in Vietnam. The American Journal of Tropical Medicine and Hygiene 76(4): 706-12.

Kelly-Hope, L.A., W.A. Alonso, VD. Thiem, D.G. Canh, D.D. Anh, H. Lee and M.A. Miller. 2008. Temporal Trends and Climatic Factors Associated with Bacterial Enteric Diseases in Vietnam, 1991-2001. Environmental Health Perspectives 116:7-12.

Leyk S., B.J.J.. McCormick and J.R. Nuckols. 2011. Effects of varying temporal scale on spatial models of mortality patterns of pediatric diarrhea. Spatial and Spatio-temporal Epidemipology. doi: 10.1016/j. sste.2011.03.002

Melli, L. and E.A. Waldman. 2009. Temporal trends and inequality in under-5 mortality from diarrhea. Jornal de Pediatria 85(1): 21-27.

Moran, P. 1950. Notes on Continuous Stochastic Phenomena. Biometrika 37:17-33.

Pande, S., M.A. Keyzer, A. Arouna and B.G. Sonneveld. 2008. Addressing diarrhea prevalence in the West African Middle Belt: social and geographic dimensions in a case study for Benin. International Journal of Health Geographics 7: 17.

Perona, R and J. Malik. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(7): 629-639.

Peters, D. P., R.A. Pielke, B.T. Bestelmeyer, C.D. Allen, S. Munson-McGee and K.M. Havstad. 2004. Cross-scale interactions, nonlinearities, and forecasting catastrophic events. Proceedings of the National Academy of Sciences of the United States of America 101: 15130-15135.

Rushton, G. 2003. Public Health, GIS, and Spatial Analytic Tools. Annual Review of Public Health 24: 43-56.

Schorling, J.B., C.A. Wanke, S.K. Schorling, J.E McAuliffe, M.A. De Souza and R.L. Guerrant. 1990. A prospective study of persistent diarrhea among children in an urban Brazilian slum. Patterns of occurrence and etiologic agents. American Journal of Epidemiology 132(1): 144-156.

Torok, T.J., P.E. Kilgore, M.J. Clarke, R.C. Holman, J.S. Bresee, R.I. Glass and the National Respiratory and Enteric Virus Surveillance System Collaborating Laboratories. 1997. Visualizing geographic and temporal trends in rotavirus activity in the United States, 1991 to 1996. National Respiratory and Enteric Virus Surveillance System Collaborating Laboratories. Pediatric Infectious Disease Journal 16(10): 941-946.

Wakefield, J. and G. Shaddick. 2005. Health-Exposure Modelling and the Ecological Fallacy. Biostatistics 1: 1-19.

Stefan Leyk and Jeremy Smith, University of Colorado-Boulder, Boulder, Colorado 80309, USA. Email: <stefan.leyk@colorado.edu>, <jmsmith@colorado.edu>. Thomas P. Phillips, Department of Aerospace Engineering, University of Colorado, Boulder, Colorado, 80309, USA. Email: <thomas.phillips@colorado.edu>. John R. Nuckols, Division of International Epidemiology and Population Studies, Fogarty International Center, NIH, Besthesda, MD, 20892, USA. Email: <john.nuckols@colostate.edu>.

DOI: 10.1559/15230406382223

The importance of an improved understanding of environmental factors associated with enteric diseases such as diarrhea has been widely recognized and has particular relevance in spatial epidemiology (Torok et al. 1997; Kelly-Hope et al. 2007). There is strong demand to investigate dynamics of exposure to and transmission of water-borne pathogens, and to establish improved surveillance systems that can lead to effective preventive measures of such diseases (Eyles et al. 2002; Rushton 2003; Pande et al. 2008; Chaikaew et al. 2009). Such research is of particular priority in less developed regions where the burden from enteric diseases is still very high (Fewtrell et al. 2005).

An important factor in the analysis of environmental risk factors is the geographic extent and framework of the unit of analysis. Recent efforts analyzed spatio-temporal patterns of diarrhea by exploring the influence of regional dynamics of risk factors such as climate and socio-economic conditions (Kelly-Hope et al. 2008; Jepsen et al. 2004). Often the unit of analysis for such studies are administrative reporting units (states or larger) used in disease reporting, resulting in highly aggregated outcomes with limited representation of the underlying environmental phenomenon that might be more realistically reflected by analytical units defined by natural barriers, ecological systems, and other important factors in pathogen occurrence, exposure, and transmission (Curriero et al. 2001). Such disconnect can result in a higher likelihood of error due to ecological fallacy, reduce heterogeneity of observed data, and thus reduce statistical power needed to discern association between the risk factors and incidence of disease (Peters et al. 2004; Wakefield and Shaddick 2005; Beale et al. 2008).

In this article we demonstrate a geospatial framework for identifying spatio-temporal patterns of mortality peak timing from pediatric diarrhea based on relative location of the disease reporting unit in the hydrologic regime of major river basins in Brazil. Such an approach could be useful for a better understanding of the associations between waterborne diseases and environmental processes in general, thus allowing for derivation of predictive models for exposure and transmission within a natural, rather than geopolitical, spatial unit of analysis.

Data

The geographic extent of our study area is the geopolitical boundary of Brazil. We obtained spatially registered GIS data layers of watershed boundaries and stream hydrography for eight (8) Hydrographic Regions from Agencia Nacional de Aguas (ANA; http://www2.ana.gov.br/). We used 1 km spatial resolution topographic data from the Shuttle Radar Topography Mission (SRTM; http://www2.jpl. nasa.gov/srtm/) to carry out hydrologic modeling.

We obtained Brazilian mortality data classified as intestinal infectious disease (ICD-9 codes) and selected all deaths of children younger than 5 years of age for the period 1979-1989 from the Ministerio da Saude do Brasil (http://www2.datasus. gov.br/ DATASUS/index.php?area=02). We used this time period rather than more current data because mortality rates and thus geographic heterogeneity of number of deaths were higher, allowing a more robust data set for the purpose of our study. In the dataset, monthly mortality is reported by Municipality, which is a census reporting unit of the Brazilian government with an approximate median area of 420 Km2. The number of Municipalities in Brazil changed from 3991 in 1979 to 4491 in 1989. Because of large proportions of missing data and changes of political boundaries for Municipalities over our study period, we aggregated the number of deaths per month to Census Micro Regions (CMRs) each of which contain multiple Municipalities. The number of CMRs increased from 542 in 1979 to 558 in 1989 during the study period, with a median area of 5480 Km2 in 1989. We applied screening criteria for accepting CMR-level MPT data: the presence of only one maximum value, average maximum deaths greater than 3 or if less than 3, at least 5 years of recorded data, resulting in a final dataset of 507 CMRs for use in our study, or 91% of all CMRs that reported in 1989. Due to the criteria of at least 5 years of data, newly created CMR regions could be included only if they were established no later than 1985, and had annual deaths reported for each year or had higher numbers of deaths in less than 5 years. We obtained spatial layers of polygons from the Instituto Brasileiro de Geografia e Estatistica (IBGE; http://www.ibge.gov. br/) representing administrative boundaries of CMRs and Municipalities, as well as point data (latitude, longitude) representing Municipal seats locations (cities).

Methods

In a first step we calculated the mortality peak timing variable (MPT) for each Census Micro Region (CMR) based on aggregated monthly Municipal mortality data. MPT is defined as the average digital month on a continuous time scale (between 1.0 for May 1st and 12.99 for April 30th) when maximum mortality occurred in each CMR over the study period (1979-1989). Next, we tested the mortality peak time variable for global spatial autocorrelation and local spatial clustering in order to justify the use of geostatistical models. In a third step we assigned the CMR-based MPT to the point location of the Municipality with the highest number of mean annual deaths, termed Maximum Mortality Municipality (MMM), within each CMR. These point locations were used as input to two different geostatistical models in order to create validated continuous surfaces of mortality peak timing across each Hydrographic Region. Finally we tested for trends of mortality peak timing along the major stream channels of the river network within each Hydrographic Region. We validated our modeled trends by comparing them to inspected trends based on MPT values of discrete Municipality locations in close proximity to major streams.

Extracting Mortality Peak Timing

We extracted the number of deaths occurring in Municipalities for each CMR of Brazil (Figure 1) for each month of the years 1979-89. We normalized these numbers for each year to proportional values on a continuous scale in the range [0,1] such that the annual maximum number of deaths was equal to unity, and the proportional number in each month a value between 0 and 1. These proportional values represent measures of similarity between numbers of deaths in each month and the number of deaths in the peak month. We then averaged these proportional values for each month over the years of the study period (1979-89) resulting in twelve averaged proportional values for each CMR. The maximum of these twelve values represents the estimated CMR-level mortality peak month for the study period. This step compensates for variations in seasonality since the average peak month is based on the normalized occurrence of deaths in comparison to all other months instead of simply determining the month most frequently found as annual peak. In order to derive a digital scale of continuous peak time estimates we corrected the peak month ti to a peak time value to based on the magnitudes of the values in the prior and subsequent months using the following equation:

tc = (a * [t.sub.i+1] + [t.sub.i+1])/(1+a) (1)

with a = ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])/([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]), [t.sub.i] = 1, ..., 12

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the maximum proportional value of the considered peak month [t.sub.i], [t.sub.i-1] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the prior month and its membership, and [t.sub.i.+1] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the subsequent month and its membership, respectively.

The peak month [t.sub.i] was corrected such that ([t.sub.c] - [t.sub.i-1])/ ([t.sub.i+1] - [t.sub.c]) equals the ratio ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])/ ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Hence the range of values for a is any positive real number (a [right arrow] [0, 9.6] for this study). The peak month value is thus corrected toward the greater neighbor temporally. The rationale for this step is that the real peak time value would be expected closer to e.g., the prior month [t.sub.i-1] if its membership value is very similar to the maximum [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and further away in time from the subsequent month [t.sub.i+1] if its" membership [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is very small. We converted the mortality peak time values to a time scale such that May 1st represents the starting point (1.0) and increases continuously over an annual cycle (e.g., mid November has the value 7.5).

[FIGURE 1 OMITTED]

Test for Spatial Autocorrelalion and Clustering

In order to test for global spatial autocorrelation in MPT values we calculated global Moran's I (Moran 1950). In addition we derived local indicators of spatial association (LISA), i.e., local Moran's I (Anselin 1995) to test for the presence and type of significant spatial clusters and outliers in MPT. We classified clusters that were statistically significant at the 95% confidence level as clustered high or clustered low values as well as outliers (low values surrounded by high values and vice versa) in order to identify regional differences in the spatial distribution of the data. Because of the proportion of excluded data at the scale of CMRs (51 out of 558), the spatial structure of the polygon features was incomplete which did not allow to use contiguity" for determining spatial weights in the LISA analysis. We resolved this issue by assigning row-normalized spatial weights based on polygon centroids and inverse distances to neighbors.

Geostatistical Modeling and Validation

We assigned CMR-derived MPT values to the location of the Municipality where the highest number of mean annual deaths were reported within each CMR during the study period, hereafter referred to as the Maximum Mortality Municipality (MMM) (Figure 1). This procedure allowed to focus on the most relevant locations within a CMR, and avoided the use of centroids of CMR polygons, which could result in obscuring underlying environmental risk factors in locations where mortality is prevalent. Based on these locations and values, we computed and validated continuous geostatistical surfaces of mortality peak timing (MPT) at a spatial resolution of 20 km (hereafter termed, MPT Model Raster Cells). We applied two different geostatistical models: (1) Ordinary Kriging, which applies a uniform variogram across our study area; purposely we made no assumptions regarding anisotropy or global trends in order to estimate MPT based on distance-related weighting for all localities without any barriers, and (2) Diffusion Interpolation (Perona and Malik 1990; Black et al., 1998), which allowed us to constrain interpolation within the natural boundary of each Hydrographic Region of Brazil. Diffusion Interpolation uses a kernel that is based on the heat equation and changes its shape according to the diffusion equation near the physical barrier. The outcome is similar to Kernel Interpolation in that it is designed to eliminate influence from data points outside the designated barrier in estimating a variable value within a designated geography. The main reason for using these two different models was to investigate whether different results can be found if trend analysis is carried out based on physically constrained vs. non-constrained interpolation surfaces and thus to study effects of natural boundaries forming the Hydrographic Regions. If there is no association between MPT and the location of the Maximum Mortality Municipality (MMM) within the hydrologic regime, an isotropic continuous spatial distribution of MPT would be expected. If such an association exists anisotropy driven by hydrography can be expected. A subregional effect of anisotropy could be expected if such an association exists on the scale of a Hydrographic Region.

We created our geostatistical surfaces using randomly drawn MPT values from MMM locations within 355 (70%) of the 507 CMRs in our study. We sequestered MPT values for the remaining 152 MMMs in order to validate the predicted MPTs at these locations. We calculated Spearman correlation coefficient between the predicted MPT Model Raster Cell values and observed MPT values for CMRs in the validation subset. In addition we computed the Mean Absolute Error (MAE) to test for systematic bias. We repeatedly run and validated the geostatistical models 100 times in order to calculate mean values of MPT for each MPT Model Raster Cell as well as ranges of correlation and accuracy measures.

Spatially Refined Trend Analysis: Hydrologic Regime

To test for trends in mortality peak timing (MPT) along the stream networks in each Hydrographic Region we first calculated flow accumulation based on elevation data (1Km spatial resolution). We selected the cell values representing locations along the major river channels in each major river basin with contributing watersheds greater than 10,000 [km.sup.2] in land area (Figure 2A) (hereafter termed Flow Accumulation Raster Cells, FARC). In order to uniquely encode local flow direction(s),we conducted a Focal Flow Analysis (FFA), which determines at each FARC the direction(s) of reception within a moving window and assigns an additive binary code between 0 and 255 depending on the location of neighbors flowing into the cell. Each combination of such neighbors results in a different value. Since this analysis was based on flow accumulation values (greater watershed area at a neighboring FARC indicates downstream direction) the resulting Focal Flow raster values (binary codes) were known indicators of downstream direction, and thus could be used with confidence in a comparative analysis to determine if directional trends in MPT were the same.

[FIGURE 2 OMITTED]

We estimated MPT values for each FARC based on the value in the corresponding MPT Model Raster Cell encompassing the FARC (Figure 2B and C) to create a spatial surface representing continuous values of MPT of a spatial resolution of 1km along major rivers within each Hydrographic Region (hereafter MPT Stream Raster Cells, MSRC). In order to ensure that the MSRCs have continuously changing MPT cell values the extracted MPT Model Raster Cell values were distance-averaged between the values of neighboring MPT Model raster cells.

In order to determine the trend in MPT for each MSRC, we employed a second Focal Flow Analysis along the major streams but using the extracted MPT values instead of Flow Accumulation. The resulting raster values (binary codes) were compared to the Focal Flow output raster values for flow accumulation and the difference between the two surfaces was calculated using simple map algebra. If the result was 0 i.e., the binary codes have equal values, the focal flow of MPT is in the same direction as the corresponding accumulated flow (i.e., hydrologically downstream). Otherwise the focal flow direction is undefined (equal values in neighboring cells or no reception) or upstream, and counted as "no agreement" in our analyses of results. Since sudden changes in FFA binary codes can occur at FARCs spatially coincident to confluence points of two streams (Figure 2D), the corresponding

MSRCs were not included in the trend analysis of MPT. We repeated the steps described above using MPT values for MSRCs based on both Ordinary Kriging and Diffusion Interpolation models in order to compare results between these two different approaches (Figure 2E and F).

For each of the 8 Hydrographic Regions of Brazil, we calculated the proportion of raster cells for which the difference between Focal Flow binary codes for FARCs versus MSRCs, was zero, thus indicating the proportion of major stream locations with a downstream trend in MPT. We also recorded the proportions of MSRCs with a downstream trend in MPT for each stream segment within the dendritic stream network occurring between the confluences of two or more streams, or between a confluence and the mouth of the river (or country border). For each Hydrographic Region we summed the lengths of the stream segments, for which we found more than 50% of the MSRCs with a downstream trend in MPT values. We calculated summary statistics from these analyses for each Hydrographic Region as well as the entire study area.

Validation of Trend Analysis: Hydrologic Regime

We assessed the trends in predicted MPT values along major rivers within each Hydrographic Region by comparing them to trends generated based on MPT values at Maximum Mortality Municipalities (MMMs) located in close proximity to the rivers. We used a minimum of 10 MMMs in each Hydrographic Region, starting with the MMM most upstream, and assessing trend in the downstream direction of the hydrologic regime until evaluating the most downstream MMM before the confluence of the major fiver with the sea or the political boundary of Brazil.

Results

Extracting Peak Time Values and Exploratory Spatial Statistics

We calculated average annual mortality peak timing values (MPT) of pediatric mortality attributed to diarrheal disease for 507 (91%) of 558 Census Micro Regions (CMR) in Brazil for the period 1979-1989. In Table 1, we present statistical distributions of MPT values within the Hydrographic Regions of Brazil, and the country overall by three different estimation methods for deriving MPT values: from observed Municipal level mortality data aggregated to Census Micro Regions, from geostatistical MPT Model surfaces generated by Ordinary Kriging and Diffusion Interpolation, and from spatially refined MPT values at 1 km raster cells representing locations of iterative flow accumulation represented by a contributing watershed area of 10,000 km2 and thus along major streams (MSRCs). The distributions of MPT values in Hydrographic Regions are similar between different geostatistical models but show differences to CMR-level MPT distributions in that the modeled values show a decrease in interquartile ranges and thus a lower variation. Across Hydrographic Regions there are differences in statistical distributions for all data sources that are not reflected by the global distribution parameters (Table 1).

Global Moran's I statistics (Table 1) indicated highly significant (p < 0.01) spatial autocorrelation in MPT for the whole country (I = 0.36; Z-score = 25.73) suggesting that geostatistical models are appropriate for estimation. We found highly significant spatial autocorrelation in all Hydrographic Regions except regions 7 and 8 in the south (Figure 3).

The results of the spatial cluster analysis (LISA) of CMR-level MPT values are presented in Figure 3. Based on overlaid boundaries of the Hydrographic Regions it can be seen that there are significant spatial clusters of low MPT values (low-low = early peaks spatially cluster with other early peaks) in the headwater areas of Hydrographic Regions (HR) 1, 2, 4 and 6, all of which are major river basins. Significant clusters of high MPT values (high-high = late peaks spatially cluster with other late peaks) that occurred in these river basins were primarily located in the most downstream locations of Hydrographic Regions 2 and 4. Other significant clusters of high MPT values occurred in regions 3 and 5, which are composed of multiple unique river basins with separate hydrologic regimes for surface flow. Statistically significant "outliers"--CMRs with a high value surrounded by CMRs with low values--are located in the upper watersheds of major river basins (HR 2 and 6), and in areas in the upper and lower elevations of HR 3. Other outliers--CMRs with a low value surrounded by CMRs with high values--can be seen in the lower basin of HR 4, the higher elevations of HR 5, and in both upper and lower elevations of HR 3. The patterns in Figure 3 illustrate a general trend from central-west to the northeast (early to late MPT) across Brazil, but trends in Hydrographic Regions do not necessarily follow this pattern, especially in HRs that are single major river basins (e.g. HR 2).

Geostatistical Modeling and Validation

The continuous surfaces derived from Ordinary Kriging and Diffusion Interpolation are shown in Figure 4A and B. It can be seen that the use of physical barriers in Diffusion Interpolation results in a distribution that is similar to the Kriging surfaces but shows differences in local patterns in close proximity to the boundaries of Hydrographic Regions.

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]

The validation analysis of the geostatistical models resulted in highly significant correlations (p < 0.01) between estimated MPT values in the MPT Model Raster Cells and spatially coincident MPT values derived for Maximum Mortality Municipalities (MMMs) in 98% of 100 total model runs. The correlation coefficients ranged between 0.59 and 0.79 when we used Ordinary Kriging to estimate MPT, with a mean prediction error of 0.111 over all model iterations. The Mean Absolute Errors (MAE) in this comparison ranged between 0.43 and 0.69. For Diffusion Interpolation the correlation coefficients ranged between 0.56 and 0.77 with a mean prediction error over all 100 model iterations of 0.133; MAE values ranged between 0.47 and 0.71.

Spatially Refined Trend Analysis considering Hydrologic Regime

The spatially relined MPT values along the major streams (MPT Stream Raster Cells, MSRCs) extracted from the MPT Model surfaces using Ordinary Kriging and Diffusion Interpolation are illustrated in Figure 5A and B, respectively. Table 2 presents the results of the MPT trend analysis along major rivers for Brazil for each of the eight Hydrographic Regions. There are considerable differences between proportions of raster cells that are downstream across Hydrographic Regions for both Ordinary Kriging (between 0.28 and 0.67) and Diffusion Interpolation (between 0.15 and 0.81). The proportions of raster cells downstream are higher in the Hydrographic Regions of larger areal extent that consist of single-river basins (i.e., regions 1, 2 and 4; all values > 0.5) as compared to the southern regions (regions 7-8; all values < 0.3). Regions along the Eastern coastline that have multi-river basins (regions 3 and 5) show mixed results. Diffusion Interpolation resulted in higher proportions of downstream raster cells: 65% for whole Brazil (58% based on Ordinary Kriging) and up to 81% (region 4) for individual Hydrographic Regions.

[FIGURE 5 OMITTED]

Considering the total study area (i.e. political boundary of Brazil), the total length of river segments with more than 50% downstream trend represents 64% and 75% of the total length of stream channels for Ordinary Kriging and Diffusion Interpolation, respectively (Table 2). These percentages increase to up to 88% for Hydrographic Region 2 when we applied Diffusion Interpolation (Figure 5B and D). The maps in Figures 5C and D show the downstream proportions of stream segments on a continuous scale between 0 and 1 in order to provide a more differentiated presentation of the results as compared to a crisp threshold of 0.5. It can be seen that in regions with strong downstream trends (Table 2; region 1, 2 and 4) the majority of segments has indeed proportions of downstream MSRCs close to 1.0 when using Diffusion Interpolation (Figure 5D).

Validation: Spatially Refined Trend Analysis considering Hydrologic Regime

The final inspection of possible trends downstream based on discrete Maximum Mortality Municipalities (MMM) confirmed the results based on the geostatistical models. For Hydrographic Regions 1, 2 and 4 an increase of MPT between MMMs located close to the origin of the river and those close to the mouth of the river was found in most cases (70-80%, Table 2). In the southern part of the country (regions 6-8) but also in multi-river basin regions (regions 3 and 5) it was difficult to detect any trends due to their small areal extent (7 and 8), a limited number of points for validation (3, 5 and 6), or simply because there was no trend.

Discussion and Concluding Remarks

In this paper, we analyze trends in mean annual peak timing values of pediatric mortality (MPT) attributed to diarrheal disease in Brazil for the period 1979-1989 using a novel approach for environmental health risk studies, that is using natural boundaries instead of political boundaries to define the unit of analysis. We evaluate the approach at varying spatial scales: (1) Country-wide based on observed Municipal level mortality data aggregated to Census Micro Regions (CMR) with a median area of 5480 Km2; (2) Country-wide based on a grid of 20 Km2 raster cells generated by geostatistical models that use CMR-level MPT values as input; (3) Within eight officially designated Hydrographic Regions of Brazil (mean area of approximately 1,086,000 Km2) based on the geostatistical models, and (4) Along longitudinal "vectors" of 1Km2 raster cells defining the stream network (hydrologic regime) within each Hydrographic Region. MPT for the latter case was extracted from spatially coincident raster cells of the geostatistical MPT models.

At the country level, we found evidence of a trend west to east of increasing MPT over an annual cycle (May to April) using the CMR-level estimates, whether the unit of analysis was a political boundary (CMRs) or a natural boundary such as the Hydrographic Regions (Figure 1). The overall trend was verified by unconstrained geostatistical surfaces i.e., using Ordinary Kriging (Figure 4A). When we examined the model results at finer scales (higher resolution of the unit of analysis), we discovered greater geographic heterogeneity in MPT in the model surface as well as in the LISA map (Figure 3). Using Hydrographic Regions (HRs) as the unit of analysis we found CMR-level MPT ranged from approximately 6.3 to 10.0, compared to 9.2 to 9.5 when calculated at the country level (Table 1). This provided justification for generating constrained geostatistical surfaces e.g., Diffusion Interpolation, using Hydrographic Region boundaries as barriers (Figure 4). A compelling finding of our study was the fact that when we increased the spatial resolution to the level of the stream network within the Hydrographic Regions, we found dominant and consistent trends of increasing MPT from the source areas (upper watersheds) to downstream locations, particularly for single river basins (Figure 5). Thus existing trends were no longer predominantly east to west as on the country level, but oriented in the direction of flow of the major river draining the basin and some tributary streams (e.g., Amazon River basin, HR 1).

The observed wide range in MPT across different spatial scales could have important implications for assessment of known or suspected environmental risk factors that have significant temporal variation. These factors can include climatological and meteorological variables, water use activities, sanitation and drainage, among others. In a previous study, we identified such factors, and demonstrated the adverse affect of temporal aggregation of data in models of mortality risk and diarrheal disease in Mexico (Leyk et al. 2011). In this current study we provide a substantial argument for evaluating natural boundaries as the preferred analytical unit in order to reflect heterogeneity and spatial variation of such variables in the context of health risk studies (Wakefield and Shaddick 2005; Beale et al. 2008). For example, we found consistent trends in increasing MPT in stream segments from upper to lower reaches of some major river basins that were not detected on coarser scales. Thus, between the two studies, we have demonstrated that both temporal and spatial aggregation of risk factors and disease data can have profound effects on understanding the association between the environment and disease, and therefore on the design and implementation of effective intervention strategies for disease prevention. If such findings hold across other geographic regions, implications of such aggregation should be part of risk communication to policy makers (Leyk et al. 2011).

We purposely selected the study period 1979-1989. Likewise, we purposely used absolute number of mortalities as opposed to mortality rates in our study; even though the latter expression of prevalence is more typical in health risk analysis. The objective of our study is to identify spatio-temporal patterns of mortality peak timing from pediatric diarrhea based on relative location of the disease reporting unit. We used data from the selected time period rather than more current data because higher mortality rates and geographic heterogeneity of the number of deaths made the dataset more robust. Peak timing would be in the same month if one uses population-based rates or absolute values of the disease outcome in question. Thus, our findings would hold for a more current time period if underlying factors that caused the wide range in MPT at different spatial scales were still operative.

A limitation to our study was missing or sparsely distributed mortality data in large Municipalities and CMRs, particularly in the northwest part of the country (Figure 1) (HR1; The Amazon River Basin); it is well established that population density is low compared to other basins, and progressively so from the mid- to upper basin subwatersheds. There is also a documented issue of under-reported mortality in regions across Brazil (Castro et al. 2010). Neither of these issues appeared to have a major impact on our findings since the trends across different basins seem consistent, both in magnitude and range of MPT. Timing of peak mortality over an annual cycle compared favorably with other studies reporting temporal trends at more localized studies in Brazil (Guerrant et al. 1983; Schorling et al. 1990; Kale et al. 2004; Melli and Waldman 2009).

Another potential limitation to our study could be the interpolation of observed data to different spatial scales. Because of large proportions of missing data and changes of political boundaries for the base reporting unit used by the Brazilian government, Municipalities (median area 420 Km2), we aggregated observed mortality to 507 Census Micro Regions (median area 5480 Km2) using the available municipal level data within each CMR, and used these data to calculate a CMR-level average annual peak timing of mortality (MPT). We use these CMR-level estimates as inputs to our geostatistical models based on the location of the Municipality with the highest number of mean annual deaths over our study period. To evaluate the accuracy of this procedure, we sequestered calculated MPT at these locations in a randomly selected sample of 30% of the CMRs and used these to validate predicted MPT for the spatially coincident modeling unit. We found correlation coefficients ranged from 0.56 to 0.79 providing some confidence. However, the absence of spatial autocorrelation (Moran's I) in some Hydrographic Regions and calculated error surfaces based on the 100 model runs demonstrate that the geostatistical model is expected to perform weakly in some subregions (HR7 and 8). Likewise, we evaluated the direction of trend in estimated MPT in 1Km2 raster cells representing locations along the stream network in each Hydrographic Region by comparing these estimates to observed trends in municipalities in close proximity to the network. This evaluation confirmed the results based on the models in that trends are detectable in some Hydrographic Regions (HR 1, 2 and 4) but cannot be identified in others. However, our validation efforts and the interpretation of the trend analysis results across eight different Hydrologic Regions justifies the suitability of the presented methods for extracting, interpolating and spatially refining observed mortality data for the purpose of our study.

In summary, we have completed a study of average peak timing of mortality due to diarrheal disease in children less than 5 years of age using spatially registered mortality data from Brazil. Our study results indicate substantial spatial variation in peak timing, which could have important ramifications in studies concerning known or suspected risk factors with significant temporal variation over an annual cycle. We found the geographic orientation of trend in mortality peak timing to be highly dependent on the geographic extent and nature of the unit of analysis. We demonstrate that a unit based on natural boundaries, in our case stream segments and watershed boundaries within the Hydrogeographic Regions of Brazil, resulted in consistent prediction of trend in large single-river basins; with mortality peak timing occurring in earlier months in the upper watersheds of major river basins, and in later months in the lower basin, during an annual cycle defined from May to April. Our results are consistent with findings of other studies in expressing caution in the use of aggregated data with political boundaries as input to spatio-temporal models for health risk assessment. Further research is needed to determine if our methods and findings are transferable to other regions.

ACKNOWLEDGEMENTS

The Etiology, Risk Factors and Interactions of Enteric Infections and Malnutrition and the Consequences for Child Health and Development Study (MAL-ED) is carried out as a collaborative study supported by the Bill & Melinda Gates Foundation, the Foundation for the NIH and the National Institutes of Health/ Fogarty International Center. SL and JRN were also funded by interagency personnel agreements between the Fogarty International Center (NIH) and the University of Colorado at Boulder and Colorado State University, respectively. The authors thank four anonymous reviewers for their constructive critics.

REFERENCES

Anselin, L. 1995. Local Indicators of Spatial Association--LISA. Geographical Analysis 27: 93-115.

Beale, L., J.J. Abellan, S. Hodgson and L. Jarup. 2008. Methodologic Issues and Approaches to Spatial Epidemiology. Environmental Health Perspectives 116: 1105-1110.

Black, M. J., G. Sapiro, D.H. Marimont and D. Heeger. 1998. Robust anisotropic diffusion. IEEE Transactions on Image Processing, 7(3): 421-432.

Castro, M.C. and C.C. Simoes. 2010. SpatioTemporal Trends of Infant Mortality in Brazil. Proceedings of the Annual Meeting. Population Association of America (PAA) 2010.

Chaikaew, N., N.K. Tripathi and M. Souris. 2009. Exploring spatial patterns and hotspots of diarrhea in Chiang Mai, Thailand. International Journal of Health Geographics 8: 36.

Curriero, F.C., J.A. Patz, J.B. Rose and S. Lele. 2001. The Association Between Extreme Precipitation and Waterborne Disease Outbreaks in the United States, 1948-1994. American Journal of Public Health 91: 1194-1199.

Eyles, R., D. Niyogi, C. Townsend, G. Benwell and R Weinstein. 2002. Spatial and Temporal Patterns of

Campylobacter Contamination Underlying Public Health Risk in the Taieri River, New Zealand. Journal of Environmental Quality 32(5): 1820-1828.

Fewtrell, L., R.B. Kaufmann, D. Kay, W. Enanoria, L. Hailer and J.M. Colford. 2005. Water, sanitation and hygiene interventions to reduce diarrhea in less developed countries: a systematic view and metanalysis. Lancet Journal of Infectious Diseases 5: 42-52.

Guerrant, R.L., L.V. Kirchhoff, D.S. Shields, M.K. Nations, J. Leslie, M.A. de Sousa, J.G. Araujo, L.L Correia, K.T. Sauer, K.E. McClelland, F.L. Trowbridge and J.M. Hughes. 1983. Prospective Study of Diarrheal Illnesses in Northeastern Brazil: Patterns of Disease, Nutritional Impact, Etiologies, and Risk Factors. The Journal of Infectious Diseases 148(6): 986-997.

Jepsen, M.R., J. Simonsen and J.S. Ethelberg. 2004. Spatio-temporal cluster analysis of the incidence of Campylobacter cases and patients with general diarrhea in a Danish county, 1995-2004. International Journal of Health Geographics 8:11.

Kale P.L., V.L. Andreozzi and F.F. Nobre. 2004. Time Series Analysis of Deaths Due to Diarrhea in Children in Rio de Janeiro, Brazil, 1980-1998. Journal of Health, Population, and Nutrition, 22(1): 27-33.

Kelly-Hope, L.A., W.J. Alonso, V.D. Thiem, D.D. Anh, G. Canh do, H. Lee, D.L. Smith and M.A. Miller. 2007. Geographical distribution and risk factors associated with enteric diseases in Vietnam. The American Journal of Tropical Medicine and Hygiene 76(4): 706-12.

Kelly-Hope, L.A., W.A. Alonso, VD. Thiem, D.G. Canh, D.D. Anh, H. Lee and M.A. Miller. 2008. Temporal Trends and Climatic Factors Associated with Bacterial Enteric Diseases in Vietnam, 1991-2001. Environmental Health Perspectives 116:7-12.

Leyk S., B.J.J.. McCormick and J.R. Nuckols. 2011. Effects of varying temporal scale on spatial models of mortality patterns of pediatric diarrhea. Spatial and Spatio-temporal Epidemipology. doi: 10.1016/j. sste.2011.03.002

Melli, L. and E.A. Waldman. 2009. Temporal trends and inequality in under-5 mortality from diarrhea. Jornal de Pediatria 85(1): 21-27.

Moran, P. 1950. Notes on Continuous Stochastic Phenomena. Biometrika 37:17-33.

Pande, S., M.A. Keyzer, A. Arouna and B.G. Sonneveld. 2008. Addressing diarrhea prevalence in the West African Middle Belt: social and geographic dimensions in a case study for Benin. International Journal of Health Geographics 7: 17.

Perona, R and J. Malik. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(7): 629-639.

Peters, D. P., R.A. Pielke, B.T. Bestelmeyer, C.D. Allen, S. Munson-McGee and K.M. Havstad. 2004. Cross-scale interactions, nonlinearities, and forecasting catastrophic events. Proceedings of the National Academy of Sciences of the United States of America 101: 15130-15135.

Rushton, G. 2003. Public Health, GIS, and Spatial Analytic Tools. Annual Review of Public Health 24: 43-56.

Schorling, J.B., C.A. Wanke, S.K. Schorling, J.E McAuliffe, M.A. De Souza and R.L. Guerrant. 1990. A prospective study of persistent diarrhea among children in an urban Brazilian slum. Patterns of occurrence and etiologic agents. American Journal of Epidemiology 132(1): 144-156.

Torok, T.J., P.E. Kilgore, M.J. Clarke, R.C. Holman, J.S. Bresee, R.I. Glass and the National Respiratory and Enteric Virus Surveillance System Collaborating Laboratories. 1997. Visualizing geographic and temporal trends in rotavirus activity in the United States, 1991 to 1996. National Respiratory and Enteric Virus Surveillance System Collaborating Laboratories. Pediatric Infectious Disease Journal 16(10): 941-946.

Wakefield, J. and G. Shaddick. 2005. Health-Exposure Modelling and the Ecological Fallacy. Biostatistics 1: 1-19.

Stefan Leyk and Jeremy Smith, University of Colorado-Boulder, Boulder, Colorado 80309, USA. Email: <stefan.leyk@colorado.edu>, <jmsmith@colorado.edu>. Thomas P. Phillips, Department of Aerospace Engineering, University of Colorado, Boulder, Colorado, 80309, USA. Email: <thomas.phillips@colorado.edu>. John R. Nuckols, Division of International Epidemiology and Population Studies, Fogarty International Center, NIH, Besthesda, MD, 20892, USA. Email: <john.nuckols@colostate.edu>.

DOI: 10.1559/15230406382223

Table 1. Summary statistics for MPT distributions at the scales of Hydrographic Region and country derived from different distributions i.e., CMR-level MPT, interpolated MPT Model surfaces and spatially refined MPT estimations at stream locations (MSRCs). Unit of analysis HR- Census Micro Region (CMR) Ordinary Hydrographic Region level MPT Kriging surface Med IQR Z (Moran's I) Med IQR HR 1 6.33 2.51 5.13 * 6.56 2.23 HR 2 9.11 4.50 3.26 * 8.90 1.70 HR 3 10.05 2.01 3.69 * 8.97 1.25 HR 4 8.69 3.41 5.30 * 9.13 1.60 HR 5 9.46 3.80 3.88 * 9.65 1.08 HR 6 8.75 1.41 8.37 * 8.17 1.51 HR 7 9.48 0.81 0.22 9.31 0.61 HR 8 9.50 1.94 -0.62 9.21 0.52 Country 9.10 2.46 25.73 * 8.18 2.54 Unit of analysis HR- Diffusion Ordinary Diffusion Hydrographic Region Interpolation Kriging Interpolation surface along major along major streams streams Med IQR Med IQR Med IQR HR 1 6.56 1.54 6.32 1.83 6.43 1.07 HR 2 8.78 2.23 9.12 1.37 9.07 2.12 HR 3 9.02 0.98 8.77 0.85 9.00 0.50 HR 4 9.71 2.00 9.27 1.63 9.93 2.10 HR 5 9.82 0.73 9.78 1.04 9.82 0.68 HR 6 8.59 0.76 8.06 1.32 8.51 0.81 HR 7 9.57 0.53 9.15 0.54 9.50 0.48 HR 8 9.28 0.48 9.21 0.39 9.30 0.85 Country 8.30 2.55 7.74 2.53 780 2.49 Med = Median; IQR = Interquartile Range; Z = Z value of Moran's I Test; * 1% significance level Table 2. Results of the trend analysis for Brazil and individual Hydrographic Regions presented as proportions of MPT stream raster cells (MSRC; 1 km spatial resolution) downstream, as proportional stream length of segments with > 50% downstream raster cells as well as validated downstream trend proportions. Unit of Proportion raster cells Analysis Stream raster downstream HR = cells Hydrographic [[km.sup.2]] Ordinary Diffusion Region Kriging Interpolation HR 1 31058 0.64 0.69 HR 2 5743 0.67 0.79 HR 3 6062 0.35 0.57 HR 4 4125 0.52 0.81 HR 5 3125 0.60 0.60 HR 6 8446 0.52 0.51 HR 7 876 0.28 0.15 HR 8 935 0.40 0.47 Total 60370 0.58 0.65 Total segment length Unit of (proportion) with > Analysis 50% downstream Discrete trend HR = assessment Hydrographic Ordinary Diffusion Region Kriging Interpolation HR 1 0.70 0.79 0.70 HR 2 0.72 0.88 0.80 HR 3 0.22 0.63 0.40 HR 4 0.70 0.86 0.70 HR 5 0.81 0.68 0.40 HR 6 0.59 0.62 0.50 HR 7 0.21 0.39 -- HR 8 0.54 0.46 -- Total 0.64 0.75 --

Printer friendly Cite/link Email Feedback | |

Author: | Leyk, Stefan; Phillips, Thomas P.; Smith, Jeremy M.; Nuckols, John R. |
---|---|

Publication: | Cartography and Geographic Information Science |

Article Type: | Report |

Geographic Code: | 3BRAZ |

Date: | Apr 1, 2011 |

Words: | 6731 |

Previous Article: | The impact of hurricanes on crime: a spatio-temporal analysis in the city of Houston, Texas. |

Next Article: | Customized visualization of natural hazards assessment results and associated uncertainties through interactive functionality. |

Topics: |