Interpreting posterior relative risk estimates in disease-mapping studies.There is currently much interest in conducting spatial analyses of health outcomes at the small-area scale. This requires sophisticated statistical techniques, usually involving Bayesian Adj. 1. Bayesian - of or relating to statistical methods based on Bayes' theorem models, to smooth the underlying risk estimates because the data me typically sparse sparse - A sparse matrix (or vector, or array) is one in which most of the elements are zero. If storage space is more important than access speed, it may be preferable to store a sparse matrix as a list of (index, value) pairs or use some kind of hash scheme or associative memory. . However, questions have been raised about the performance of these models for recovering the "true" risk surface, about the influence of the prior structure specified, and about the amount of smoothing of the risks that is actually performed. We describe a comprehensive simulation study designed to address these questions. Our results show that Bayesian disease-mapping models are essentially conservative, with high specificity even in situations with very sparse data but low sensitivity if the raised-risk areas have only a moderate (< 2-fold) excess or are not based on substantial expected counts (> 50 per area). Semiparametric spatial mixture models typically produce less smoothing than their conditional autoregressive Autoregressive Using past data to predict future data. Notes: Essentially it's forecasting, similar to the weather... Sometimes even the weatherman can be caught in an unexpected downpour. counterpart counterpart n. in the law of contracts, a written paper which is one of several documents which constitute a contract, such as a written offer and a written acceptance. when there is sufficient information in the data (moderate-size Adj. 1. moderate-size - intermediate in size medium-size, medium-sized, moderate-sized sized - having a specified size expected count and/or and/or conj. Used to indicate that either or both of the items connected by it are involved. Usage Note: And/or is widely used in legal and business writing. high true excess risk). Sensitivity may be improved by exploiting the whole posterior posterior /pos·ter·i·or/ (pos-ter´e-er) directed toward or situated at the back; opposite of anterior. pos·te·ri·or adj. 1. Located behind a part or toward the rear of a structure. distribution to try to detect true raised-risk areas rather than just reporting and mapping the mean posterior relative risk. For the widely used conditional autoregressive model, we show that a decision rule based See rules based. on computing computing - computer the probability that the relative risk is above 1 with a cutoff between 70 and 80% gives a specific rule with reasonable sensitivity for a range of scenarios having moderate expected counts (- 20) and excess risks (-1.5- to 2-fold). Larger (3- fold) excess risks are detected almost certainly using this rule, even when based on small expected counts, although the mean of the posterior distribution is typically smoothed to about half the true value. Key words: Bayesian hierarchical models In a hierarchical data model, data are organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. , cancer mapping, environmental epidemiology epidemiology, field of medicine concerned with the study of epidemics, outbreaks of disease that affect large numbers of people. Epidemiologists, using sophisticated statistical analyses, field investigations, and complex laboratory techniques, investigate the cause , sensitivity, small-area studies, spatial smoothing, specificity. Environ en·vi·ron tr.v. en·vi·roned, en·vi·ron·ing, en·vi·rons To encircle; surround. See Synonyms at surround. [Middle English envirounen, from Old French environner Health Perspect 112:1016-1025 (2004). doi:10.1289/ehp.6740 available via http://dx.doi.org/[Online 15 April 2004] ********** Spatial analyses of health outcomes have long been recognized in the epidemiologic ep·i·de·mi·ol·o·gy n. The branch of medicine that deals with the study of the causes, distribution, and control of disease in populations. [Medieval Latin epid literature as playing a specific and important role in description and analysis. In particular, they can highlight sources of heterogeneity het·er·o·ge·ne·i·ty n. The quality or state of being heterogeneous. heterogeneity the state of being heterogeneous. underlying spatial patterns in the health outcomes and consequently are able to suggest important public health determinants or etiologic e·ti·ol·o·gy also ae·ti·ol·o·gy n. pl. e·ti·ol·o·gies 1. a. The study of causes or origins. b. The branch of medicine that deals with the causes or origins of disease. 2. a. clues. A good example of geographic epidemiology is the seminal seminal /sem·i·nal/ (sem´i-n'l) pertaining to semen or to a seed. sem·i·nal adj. Of, relating to, containing, or conveying semen or seed. monograph mon·o·graph n. A scholarly piece of writing of essay or book length on a specific, often limited subject. tr.v. mon·o·graphed, mon·o·graph·ing, mon·o·graphs To write a monograph on. by Doll doll, small figure of a human being, usually used as a child's toy. The many types of dolls found among the relics of primitive peoples were cult objects. Egypt, Greece, and Rome have left well-preserved dolls of wood, clay, bone, ivory, and bronze that were used (1980), which described some of the first hypotheses concerning the influence of environment and lifestyle characteristics on cancer mortality and discussed how these arose from studying the geographic distribution of various cancers. These early studies were usually performed on a large geographic scale, using mostly international or regional comparisons. Recently, the availability of local geographically indexed health and population data, together with advances in computing and geographic information systems geographic information system (GIS) Computerized system that relates and displays data collected from a geographic entity in the form of a map. The ability of GIS to overlay existing data with new information and display it in colour on a computer screen is used primarily to , has encouraged the analysis of health data on a small geographic scale (Elliott Elliott may refer to: possessing the best body in the whole world. like the hottest, sexiest body ever! the feeling of his skin kills me and sends me straight to heaven. et al. 2000). The motivation is the increased interpretability of small-scale small-scale adj. 1. Limited in scope or extent; modest: a small-scale plan. 2. Created on a small scale: studies, as they are in principle less susceptible to the component of ecologic e·col·o·gy n. pl. e·col·o·gies 1. a. The science of the relationships between organisms and their environments. Also called bionomics. b. The relationship between organisms and their environment. bias created by the within-area heterogeneity of exposure or other determinants. They are also better able to detect highly localized Translated into the spoken language of the country. See localization. effects such as those related to industrial pollution in the vicinity. Conversely con·verse 1 intr.v. con·versed, con·vers·ing, con·vers·es 1. To engage in a spoken exchange of thoughts, ideas, or feelings; talk. See Synonyms at speak. 2. , small-scale studies require more sophisticated statistical analysis techniques than, for example, an analysis between countries, because the data are typically sparse with low (even zero) counts of events in many of the small areas. Further, frequently there is evidence of overdispersion A common task in applied statistics is choosing a parametric model to fit a given set of empirical observations. This necessitates an assessment of the fit of the chosen model. It is usually possible to choose the model parameters in such a way that the theoretical population mean of the of the counts with respect to the Poisson model as well as spatial patterns indicating some dependence between the counts in neighboring neigh·bor n. 1. One who lives near or next to another. 2. A person, place, or thing adjacent to or located near another. 3. A fellow human. 4. Used as a form of familiar address. v. areas. Faced with these nonstandard non·stan·dard adj. 1. Varying from or not adhering to the standard: nonstandard lengths of board. 2. characteristics, statistical models have been developed to address these issues and make best use of small-area health data. In connection with generic developments in a flexible modeling strategy using the paradigm of Bayesian hierarchical models, hierarchical A structure made up of different levels like a company organization chart. The higher levels have control or precedence over the lower levels. Hierarchical structures are a one-to-many relationship; each item having one or more items below it. disease-mapping models based on conditional autoregressions (CAR) were proposed in the 1990s through the work of Besag et al. (1991), Clayton Clayton, city (1990 pop. 13,874), seat of St. Louis co., E central Mo., a suburb of St. Louis; inc. 1919. Developed in the 1960s, it has high-rise office buildings, hotels, and shopping centers; several major firms are headquartered there. and Bernardinelli (1992), and Clayton et al. (1993). These CAR models are now commonly used both by statisticians Statisticians or people who made notable contributions to the theories of statistics, or related aspects of probability, or machine learning: A to E
heterogeneous - Composed of unrelated parts, different in kind. Often used in the context of distributed systems that may be running different operating systems or network protocols (a heterogeneous network). risk surfaces and particularly to allow for potential discontinuities in the risk. The main characteristic of all these models is to provide some shrinkage Shrinkage The amount by which inventory on hand is shorter than the amount of inventory recorded. Notes: The missing inventory could be due to theft, damage, or book keeping errors. and spatial smoothing of the raw relative risk estimates that otherwise would be computed separately in each area. Such shrinkage gives a more stable estimate of the pattern of underlying risk of disease than that provided by the raw estimates. The pattern of the raw risks, strongly influenced by the size of the population at risk, leads to a noisy Noisy is the name or part of the name of six communes of France:
v. blurred, blur·ring, blurs v.tr. 1. To make indistinct and hazy in outline or appearance; obscure. 2. To smear or stain; smudge. 3. picture of the true unobserved risks. Within the disease-mapping paradigm, questions have been raised about the performance of these models in recovering the true risk surface, the influence of the prior structure specified, and the amount of smoothing of the risks actually performed by these models. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently , it is important to understand thoroughly the sensitivity (ability to detect true patterns of heterogeneity of risk) and the specificity (ability to discard false patterns created by noise) of Bayesian disease-mapping models. This is the focus of this article. This understanding is crucial for interpretation of any specific disease pattern derived through the use of such models. Such a calibration calibration /cal·i·bra·tion/ (kal?i-bra´shun) determination of the accuracy of an instrument, usually by measurement of its variation from a standard, to ascertain necessary correction factors. study cannot be performed on real data because it relies on knowing the true underlying pattern of risk. We have thus conducted an extensive simulation study where the generated data patterns are close to those found in typical disease-mapping studies. We report here the main conclusions that can be drawn. Let us stress that we are not placing ourselves in the context of duster detection methods based on so-called so-called adj. 1. Commonly called: "new buildings ... in so-called modern style" Graham Greene. 2. point data, that is, data where the precise geographic location of all the cases (and controls) is known. These methods, which have been reviewed in a number of monographs or special issues (e.g., Alexander and Boyle 1996) are typically used on a localized scale, mostly to study the spatial distribution of cases around a point source or different patterns of randomness or clustering of the cases in relation to those of controls. Here we are concerned with methods for describing the overall spatial pattern of cases aggregated over small areas and the interpretation of the residual variations once the Poisson noise has been smoothed out by the disease-mapping models. Several simulation exercises to study different aspects of the performance of disease-mapping models have been reported recently. For example, Lawson The name Lawson can refer to a number of different things: People
In statistics, the Bayesian information criterion (BIC) is a statistical criterion for model selection. . They concluded that the version of the CAR model proposed by Besag et al. (1991) [Besag, Yorke, and Mollie mollie or molly, New World fish of the genus Mollienesia, in the same family as the guppy (see killifish). Mollies are found from the E and central United States to Argentina. (BYM BYM Baltimore Yearly Meeting (of Friends) BYM Before You Move BYM Blow Your Mind ) model] was the most robust model among those with spatial structure. We also consider the BYM model here and compare its performance with two models not considered by Lawson et al. (2000): a version of the BYM model that is more robust to outliers and hence may be better able to detect abrupt changes in the spatial pattern of risk, and a spatial mixture model proposed by Green and Richardson (2002) (see the section on Bayesian disease-mapping models for details). Our study also specifically addresses the smoothing of high-risk high-risk adjective Referring to an ↑ risk of suffering from a particular condition Infectious disease Referring to an ↑ risk for exposure to blood-borne pathogens, which occurs with blood bank technicians, dental professionals, dialysis unit areas and further use of the posterior distribution of the relative risks for detecting areas with excess risk--issues not considered by Lawson et al. (2000). Jarup et al. (2002) reported the results of a small simulation study similar to ours; we chose the same study area and expected counts to carry out our comprehensive exercise. In the next section we describe the simulation setup See BIOS setup and install program. and the Bayesian disease-mapping methods to be compared. We then discuss the interpretation of the means of the posterior distribution of the relative risk estimates typically reported in disease-mapping studies and displayed as summary maps, and we illustrate quantitatively how much smoothing is performed. We then discuss how information on the whole posterior distribution of the relative risks can be better exploited to discriminate dis·crim·i·nate v. dis·crim·i·nat·ed, dis·crim·i·nat·ing, dis·crim·i·nates v.intr. 1. a. between areas displaying higher risk and areas with relative risk close to background level. We conclude with a short discussion that emphasizes the importance of interpreting any results from a disease-mapping exercise in the context of the size of expected counts and the potential spatial structure of the risks. Materials and Methods The basic setup of disease-mapping studies is as follows: The number of cases of a particular disease [Y.sub.i] occurring in area [A.sub.i] is recorded, where the set of areas {[A.sub.i]}, i = 1, 2, ..., n represents a partition A reserved part of disk or memory that is set aside for some purpose. On a PC, new hard disks must be partitioned before they can be formatted for the operating system, and the Fdisk utility is used for this task. of the region under study. For each area [A.sub.i], the expected number of cases [E.sub.i] is also computed using reference rates for the disease incidence (or mortality) and the sociodemographic strata (with respect to age, sex, and perhaps socioeconomic so·ci·o·ec·o·nom·ic adj. Of or involving both social and economic factors. socioeconomic Adjective of or involving economic and social factors Adj. 1. characteristics) where census data are available. The distribution of the counts [Y.sub.i] is typically assumed to come from a Poisson distribution A statistical method developed by the 18th century French mathematician S. D. Poisson, which is used for predicting the probable distribution of a series of events. For example, when the average transaction volume in a communications system can be estimated, Poisson distribution is used , as the diseases usually considered in such studies are rare and this distribution gives a good approximation approximation /ap·prox·i·ma·tion/ (ah-prok?si-ma´shun) 1. the act or process of bringing into proximity or apposition. 2. a numerical value of limited accuracy. to the underlying binomial distribution binomial distribution n. The frequency distribution of the probability of a specified number of successes in an arbitrary number of repeated independent Bernoulli trials. Also called Bernoulli distribution. that would hold for each risk stratum stratum /stra·tum/ (strat´um) (stra´tum) pl. stra´ta [L.] a layer or lamina. stratum basa´le . The local variability of the counts is thus modeled as follows: [Y.sub.i] ~ Poisson ([E.sub.i] [[theta Theta A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option. ].sub.i]), [1] independently for i = 1, 2, ..., n. The parameter (1) Any value passed to a program by the user or by another program in order to customize the program for a particular purpose. A parameter may be anything; for example, a file name, a coordinate, a range of values, a money amount or a code of some kind. of interest is [[theta].sub.i], the relative risk that quantifies whether the area i has a higher ([[theta].sub.i] > 1) or lower ([[theta].sub.i] < 1) occurrence of cases than that expected from the reference rates. It is this parameter that we are trying to estimate to quantify Quantify - A performance analysis tool from Pure Software. the heterogeneity of the risk and to highlight unusual patterns of risks. Data Generation The spatial structure used throughout the simulations is that of the 532 wards in the county of Yorkshire Yorkshire, former county, N England. In 1974, Yorkshire was divided into the nonmetropolitan counties of Humberside, Cleveland, North Yorkshire, and partially into the metropolitan county of West Yorkshire. All but North Yorkshire have since been dissolved. , England England, the largest and most populous portion of the United Kingdom of Great Britain and Northern Ireland (1991 pop. 46,382,050), 50,334 sq mi (130,365 sq km). It is bounded by Wales and the Irish Sea on the west and Scotland on the north. . Wards are administrative areas in the United Kingdom, with a total population of approximately 5,000 on average. We base our expected counts [E.sub.i] on those calculated by Jarup et al. (2002) for prostate cancer prostate cancer, cancer originating in the prostate gland. Prostate cancer is the leading malignancy in men in the United States and is second only to lung cancer as a cause of cancer death in men. in males 45-64 years of age over the period from 1975 to 1991. We then simulate simulate - simulation three spatial patterns of increased risks. For each pattern, we examine three magnitudes for the elevated risks. We also examine how the inference (logic) inference - The logical process by which new facts are derived from known facts by the application of inference rules. See also symbolic inference, type inference. is changed if the expected counts are multiplicatively mul·ti·pli·ca·tive adj. 1. Tending to multiply or capable of multiplying or increasing. 2. Having to do with multiplication. mul increased by a scale factor (SF) varying from 2 to 10. Three spatial patterns for areas of elevated risk were chosen. The choice of patterns was intended to span a spectrum ranging from a scenario with single isolated areas with elevated risks (the hardest test case for any smoothing method) to a scenario with a number of larger clusters of several contiguous Adjacent or touching. Contrast with fragmentation. See contiguous file. areas with elevated risks (a situation with a substantial amount of heterogeneity). In all cases the elevated areas were selected in turn at random from the set of areas with the required expected counts. In the Simu 1 and Simu 3 cases, once an area was selected, a buffer buffer, solution that can keep its relative acidity or alkalinity constant, i.e., keep its pH constant, despite the addition of strong acids or strong bases. of neighboring areas with background risk (excluded thereafter from the random selection) was placed around it to produce the required pattern of isolated high-risk clusters. The three generated patterns were defined as follows: * Simu 1: five isolated single wards with expected counts ranging from 0.8 to 7.3 corresponding, respectively, to the 10th, 25th, 50th, 75th, and 90th percentiles of the distribution of the expected counts * Simu 2: a group of contiguous wards representing 1% of the total expected counts. In effect, this chosen 1% cluster grouped four wards with fairly comparable expected counts ranging from 3.6 to 7.0, giving an average expected count per ward of 5.4 over the four wards * Simu 3: a situation with high heterogeneity comprising 20 such 1% clusters that are nonoverlapping. Note that for Simu 3, the twenty 1% clusters each have a total expected count close to 17 but a large disparity dis·par·i·ty n. pl. dis·par·i·ties 1. The condition or fact of being unequal, as in age, rank, or degree; difference: "narrow the economic disparities among regions and industries" in terms of numbers of constitute areas: 10 clusters had 2 or 3 areas, whereas 8 clusters had more than 8 areas, up to a maximum of 18 areas. Correspondingly, the expected counts in each of the wards in the clusters ranged from 0.3 for some wards in the 18-area cluster to 12 for the cluster with 2 areas. Simu 3 thus corresponds to a realistic situation of heterogeneity of risk where both small clusters with high expected counts, for example, typically a populated pop·u·late tr.v. pop·u·lat·ed, pop·u·lat·ing, pop·u·lates 1. To supply with inhabitants, as by colonization; people. 2. area, and large clusters each with small expected counts, for example, in rural areas, are present. This high degree of heterogeneity has to be considered when interpreting the results for the Simu 3 case where an average over all the 20 clusters is presented. Note also that contrary to the Simu 2 case, about half the background areas in Simu 3 have a neighbor that belongs to one of the 20 clusters. In each case, apart from the elevated risk areas described above, all other areas are called background areas. For each spatial pattern in Simu 1 and Simu 2, counts [Y.sub.i] were generated as follows: Counts in all background areas were generated from a Poisson distribution with mean [E.sub.i]. For all the other areas, an elevated relative risk with magnitude [[theta].sub.i] > 1 was used and counts were simulated as Poisson variables with mean [[theta].sub.i] [E.sub.i]. The simulation was repeated for three values of [[theta].sub.i] (1.5, 2, and 3) and for different SFs that multiply mul·ti·ply v. 1. To increase the amount, number, or degree of. 2. To breed or propagate. the expected counts [E.sub.i] for all areas. Thus, results reported, for example, for an area with E = 1.92, [theta] = 2, and SF = 4, correspond to counts generated from a Poisson with mean 15.36 (2 x 4 x 1.92). For Simu 3 a slightly different procedure for generating the cases was used to ensure that [[SIGMA][Y.sub.i] = [[SIGMA][E.sub.i] (Appendix A). Note that for Simu 1 and Simu 2, the simulation procedure meant only that [[SIGMA][Y.sub.i] [approximately equal to] [[ZIGMA][E.sub.i]. This corresponds, for instance, to an epidemiologic situation where expected counts [E.sub.i] are calculated based on an external reference rate. However, Simu 3 uses internal reference rates because otherwise [[ZIGMA][Y.sub.i], would have been much larger than [E.sub.i], which could distort the overall risk estimates. The multinomial mul·ti·no·mi·al n. See polynomial. [multi- + (bi)nomial.] mul procedure used in Simu 3 and detailed in Appendix A implies that, in effect, the multiplicative mul·ti·pli·ca·tive adj. 1. Tending to multiply or capable of multiplying or increasing. 2. Having to do with multiplication. mul contrast between areas of elevated risk and background areas is still 3, 2, and 1.5, respectively, but the corresponding relative risks in each area (denoted [[theta].sup.*.sub. i]) relative to the internal (i.e., study region average) reference rates are now 2.1, 1.65, and 1.35 for the elevated areas and 0.7, 0.82, and 0.9 for the background areas. To allow for sampling variability, each simulation case was replicated 100 times. The results presented are averaged over these 100 replications. A total of 36 simulation scenarios were investigated, corresponding to three spatial patterns (Simu 1, 2, and 3) x three different magnitudes of elevated risk ([theta] = 3, 2, and 1.5) x 4 SFs for the expected counts [E.sub.i] (SF = 1, 2, 4, and 10). Bayesian Disease-Mapping Models Bayesian disease-mapping models treat the relative risks {[[theta].sub.i]} as random variables and specify a distribution for them. This part of the model is crucial, as the distributional assumptions thus made allow borrowing of information across the areas. The distribution specified is referred to as the second hierarchical level of the model to distinguish it from the first-level distribution specified in equation 1 that pertains to the random sampling variability of the observed counts about their local mean. It is at this second level that the spatial dependence In mathematical statistics, spatial dependence is a measure for the degree of associative dependence between independently measured values in a temporally or in situ between the relative risks is introduced. This spatial dependence is represented by means of a prescribed pre·scribe v. pre·scribed, pre·scrib·ing, pre·scribes v.tr. 1. To set down as a rule or guide; enjoin. See Synonyms at dictate. 2. To order the use of (a medicine or other treatment). neighborhood graph that defines the set of neighbors (denoted by ??) for each area i. The most commonly used parametric model In statistics, a parametric model is a parametrized family of probability distributions, one of which is presumed to describe the way a population is distributed. Examples
p([v.sub.i]\[v.sub.j], j[not equal to]i) ~ N([v.sub.i], ([[sigma].sup.2]/[n.sub.i]), [2] where [[sigma].sup.2] is an unknown variance The discrepancy between what a party to a lawsuit alleges will be proved in pleadings and what the party actually proves at trial. In Zoning law, an official permit to use property in a manner that departs from the way in which other property in the same locality parameter, and [v.sub.i] = [[SIGMA].sub.j[euro] [partial derivative partial derivative In differential calculus, the derivative of a function of several variables with respect to change in just one of its variables. Partial derivatives are useful in analyzing surfaces for maximum and minimum points and give rise to partial differential ] where [n.sub.i] is the number of neighbors of area i. Thus, essentially the log relative risk in one area is influenced by the average log relative risk of its neighbors, with variability characterized char·ac·ter·ize tr.v. character·ized, character·iz·ing, character·iz·es 1. To describe the qualities or peculiarities of: characterized the warden as ruthless. 2. by a conditional variance In statistics, conditional variance is a special form of the variance. If we have a conditional distribution Y|X the conditional variance is defined as where [sigma] 2/[n.sub.i]. This CAR model makes a strong spatial assumption and has only one free parameter The introduction to this article provides insufficient context for those unfamiliar with the subject matter. Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page. linked to the conditional variance [sigma] 2. To increase flexibility, Besag et al. (1991) recommend modeling log ([[theta].sub.i]) as the sum of a CAR process and an unstructured exchangeable component [[delta].sub.i] ~ N(0, [[tau].sup.2]), i = 1, ..., n independently: log ([[theta].sub.i]) = [v.sub.i] + [delta] [3] This is the BYM model introduced by Besag et al. (1991) that we referred to earlier. We use this model as a benchmark, as its use in disease-mapping studies has been widespread since 1991. The Gaussian distribution A random distribution of events that is graphed as the famous "bell-shaped curve." It is used to represent a normal or statistically probable outcome and shows most samples falling closer to the mean value. See Gaussian noise and Gaussian blur. used in the CAR specification above induces a high level of smoothness. In the same 1991 article, Besag et al. (1991) discussed an alternative specification using the heavier-tailed, double-exponential distribution rather than the Gaussian distribution in Equation 2. In effect, this is similar to performing a median-based local smoothing (or L1 norm) rather than a mean-based smoothing, thus allowing more abrupt changes in the geographical pattern of risk. We will refer to this model as L1-BYM. With any such parametric See parametric modeling, parametric symbol and PTC. specification, the amount of smoothing performed (e.g., controlled by the parameters ([[sigma].sup.2] and [[tau].sup.2]) is affected globally by all the areas and is not adaptive. Concerns that such parametric models could oversmooth have led several authors to develop semiparametric spatial models that replace the continuously varying spatial distribution for {[[theta].sub.i]} by discrete allocation The apportionment or designation of an item for a specific purpose or to a particular place. In the law of trusts, the allocation of cash dividends earned by a stock that makes up the principal of a trust for a beneficiary usually means that the dividends will be treated as or partition models. Such models allow discontinuities in the risk surface and make fewer distributional assumptions. Partition models that allow a variable number of clusters have been proposed by Denison and Holmes (2001) and Knorr-Held and Rasser (2000). In this article we investigate the performance of a related spatial mixture model recently proposed by Green and Richardson (2002) that we refer to as MIX. This model leads to good estimation estimation In mathematics, use of a function or formula to derive a solution or make a prediction. Unlike approximation, it has precise connotations. In statistics, for example, it connotes the careful selection and testing of a function called an estimator. of the relative risks compared with the BYM model for a variety of cases of discontinuities of the risk surface. The idea underlying the MIX model is to replace a continuous model for [[theta].sub.i] by a mixture model that uses a variable number of risk classes and a spatially correlated cor·re·late v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates v.tr. 1. To put or bring into causal, complementary, parallel, or reciprocal relation. 2. allocation model to distribute each area to a class. By averaging over a large number of possible configurations, the marginal distribution In probability theory, given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y of the relative risk is nevertheless smooth. To be precise, it is assumed that [[theta].sub.i] = [[theta].sub.Zi], where [Z.sub.i], i = 1, 2, ..., n are allocation variables taking values in 1, 2, ..., k and [[theta].sub.j], j = 1, 2, ..., k are the values of the relative risks that characterize the k different components or risk classes. To have maximum flexibility, the number of components k of the mixture is treated as unknown. Given k, the allocations [Z.sub.i] follow a spatially correlated process, the Potts model In statistical mechanics, the Potts model, a generalization of the Ising model, is a model of interacting spins on a crystalline lattice. By studying the Potts model, one may gain insight into the behaviour of ferromagnets and certain other phenomena of solid state physics. , which has been used in image processing image processing Set of computational techniques for analyzing, enhancing, compressing, and reconstructing images. Its main components are importing, in which an image is captured through scanning or digital photography; analysis and manipulation of the image, accomplished and other spatial applications A spatial application is a techological application (such as video) requiring high spatial resolution, possibly at the expense of reduced temporal positioning accuracy, i.e., increased jerkiness. and involves a positive interaction parameter [PSI] (similar to an autocorrelation Autocorrelation The correlation of a variable with itself over successive time intervals. Sometimes called serial correlation. parameter) that influences the degree of spatial dependence of the allocations. Specifically, the allocation of an area to a risk component will be favored probabilistically prob·a·bil·is·tic adj. 1. Of, relating to, or based on probabilism. 2. Of, based on, or affected by probability, randomness, or chance: "The Big Bang universe is . . . by the number of neighbors currently attributed to that component scaled multiplicatively by [PSI]. In this way the prior knowledge that areas close by tend to have similar risks can be reflected through the allocation structure. The interaction parameter [PSI] is treated as unknown and jointly estimated with the number of components and their associated risk. The MIX model can adapt to various patterns of risk and model discontinuities by creating a new risk class if there is sufficient information in the data to warrant this. Further details on the specification of the model are given in Green and Richardson (2002). Thus, in the comparison described later, we have implemented one reference model BYM and two alternative models, the parametric L1-BYM and the semiparametric MIX model. Implementation Bayesian inference Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process. is based on the joint posterior distribution of all parameters given the data. In our case this joint distribution is mathematically intractable intractable /in·trac·ta·ble/ (in-trak´tah-b'l) resistant to cure, relief, or control. in·trac·ta·ble adj. 1. Difficult to manage or govern; stubborn. 2. and is simulated using the framework of Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods), are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. techniques now commonly used in Bayesian analyses (Gilks et al. 1996). All parameters involved in the models described above, for example, the variances [[sigma].sup.2] or [[tau].sup.2] or the interaction parameter [PSI], are given prior distributions at a third level of the hierarchy. Implementation of the BYM and L1-BYM was carried out using the free software WinBUGS (Spiegelhalter et al. 2002). Implementation of the MIX model was carried out using a purpose-built purpose-built Adjective made to serve a specific purpose Adj. 1. purpose-built - designed and constructed to serve a particular purpose purpose-made Fortran FORTRAN: see programming language. FORTRAN Procedural computer programming language developed for numerical analysis by John W. Backus and others at IBM in 1957. The name derives from FORmula TRANslation. code. Results How Smooth Are the Posterior Means? The results of a Bayesian disease-mapping analysis are typically presented in the form of a map displaying a point estimate (usually the mean or median of the posterior distribution) of the relative risk for each area. To interpret such maps, one needs to understand the extent to which the statistical model is able to smooth the risk estimates to eliminate random noise while at the same time avoiding oversmoothing that might flatten flatten - To remove structural information, especially to filter something with an implicit tree structure into a simple sequence of leaves; also tends to imply mapping to flat ASCII. "This code flattens an expression with parentheses into an equivalent canonical form." any true variations in risk. To address this issue, we consider the two aspects separately: a) do the Bayesian methods provide adequate smoothing of the background rates, and b) to what extent is the posterior mean estimate different from the background risk in the small number of areas simulated with a true elevated risk? In all the cases simulated, we found substantial shrinkage of the relative risk estimates for the background rates. This is well illustrated in Figure 1, which displays raw and smooth estimates for all the background areas of Simu 2 and an SF of 1 or 4. Note that when SF = 1, the histogram histogram or bar graph Graph using vertical or horizontal bars whose lengths indicate quantities. Along with the pie chart, the histogram is the most common format for representing statistical data. of the raw standardized standardized pertaining to data that have been submitted to standardization procedures. standardized morbidity rate see morbidity rate. standardized mortality rate see mortality rate. mortality or morbidity morbidity /mor·bid·i·ty/ (mor-bid´it-e) 1. a diseased condition or state. 2. the incidence or prevalence of a disease or of all diseases in a population. mor·bid·i·ty n. ratio (SMR (Specialized Mobile Radio) The communications services used by police, ambulances, taxicabs, trucks and other delivery vehicles. Throughout the U.S., approximately 3,000 independent operators are licensed by the FCC to offer this service, which provides always-on ) estimates is very dispersed dis·perse v. dis·persed, dis·pers·ing, dis·pers·es v.tr. 1. a. To drive off or scatter in different directions: The police dispersed the crowd. b. (Figure 1A), with a range of 0-11, and shows a skewed distribution Skewed distribution Probability distribution in which an unequal number of observations lie below (negative skew) or above (positive skew) the mean. . Clearly, mapping the raw SMRs would present a misleading picture of the risk pattern, whereas any of the three Bayesian models give posterior mean relative risk estimates for the background areas that are well centered on 1 (Figure 1B-D B-D Becton, Dickinson & Co. ), with just a few areas having estimates outside the 0.9-1.1 range. When the expected counts are higher (SF = 4), the histogram of the raw SMRs is less spread but still substantially overdispersed, whereas those corresponding to the three models are even more concentrated on 1 than when SF = 1 (Figure 1F-H). Thus the false patterns created by the Poisson noise are adequately smoothed out by all the disease-mapping models. [FIGURE 1 OMITTED] Details of the performance of the BYM model in estimating the relative risk of the high-risk areas are presented in Table 1, with findings for L1-BYM and MIX shown in Tables 2 and 3, respectively. Overall, for the BYM model, a great deal of smoothing of the relative risks is apparent. For the isolated areas in Simu 1, one can see that relative risks of 1.5 in any single area are smoothed away, even in the most favorable fa·vor·a·ble adj. 1. Advantageous; helpful: favorable winds. 2. Encouraging; propitious: a favorable diagnosis. 3. case of an area with expected counts of 70 (90% area SF = 10). When the simulated relative risk is 2, the posterior mean risk estimate is above 1.2 only when the expected count is around 50 or more (e.g., 75% area with SF = 10). Relative risks of 3 are smoothed to about half their values when the expected counts are around 10 (e.g., 25% area with SF = 10 or 75% area with SF = 2). Comparison of Simu 2 with Simu 1 (75% area) shows that having a duster of high-risk areas rather than a single area with elevated risk slightly decreases the amount of smoothing for the same average expected count. Again, this is apparent in the many-cluster situation of Simu 3, where even though the true [[theta].sup.*].sub.i] are smaller, the relative risk estimates are higher than those for Simu 2. Overall, the performance of the L1-BYM model (Table 2) is similar to that of the BYM model. However, as expected, the L1-BYM model effects a little less smoothing in cases of large expected counts or high relative risk estimates. For Simu 3 the estimates are nearly identical to those of the BYM model. Thus, simply changing the distributional assumptions in the autoregressive specifications results in only a small modification in the estimates. The results for the MIX model given in Table 3 show a different pattern than those for the BYM or L1-BYM. For Simu 1 and an elevated relative risk of 1.5, strong smoothing toward 1 is apparent as for BYM. However, for Simu 2, posterior mean relative risks become higher than 1.2 for the largest SF. At the other end of the spectrum, relative risks of 3 are well estimated with posterior means above 2.5 as soon as the expected count is above 10 either for single areas (e.g., 50% area with SF = 4) or for the 1% clustered areas with SF = 2. These results are in accordance Accordance is Bible Study Software for Macintosh developed by OakTree Software, Inc.[] As well as a standalone program, it is the base software packaged by Zondervan in their Bible Study suites for Macintosh. with the nature of the MIX model. When there is sufficient evidence in the data to create a group of areas with higher risk, the posterior mean risks for the areas in this group are well estimated and close to the simulated values. Otherwise, all areas are allocated to the background category and smoothed toward 1. Having many heterogeneous clusters as in Simu 3 does not improve the MIX performance as much as that of BYM. Because of the more diffuse diffuse /dif·fuse/ 1. (di-fus´) not definitely limited or localized. 2. (di-fuz´) to pass through or to spread widely through a tissue or substance. dif·fuse adj. nature of some of the clusters, more areas in the background are randomly included in the group of areas with higher risk. Thus, the MIX model still has a mode close to the true relative risk, but the histogram of the mean posterior risks for all the high-risk areas has a longer left-hand left-hand adj. 1. Of, relating to, or located on the left. 2. Relating to, designed for, or done with the left hand. left-hand Adjective 1. tail than in the Simu 2 scenario (Figure 2). [FIGURE 2 OMITTED] The difference in performance of the three models is further illustrated in Figure 3, which displays, for the three models, box plots of the posterior mean estimates of the relative risk in the raised-risk areas over the 100 replicates for Simu 2 with true relative risks of 3 and 2. When the true relative risk is 3, the MIX model is clearly performing better than the other two models, whereas for a relative risk of 2 and the lowest SF, the MIX model is the model that produces the most smoothing. [FIGURE 3 OMITTED] Interpreting the Posterior Distribution of the Risk Mapping the posterior mean relative risk as discussed previously does not make full use of the output of the Bayesian analysis Bayesian analysis A decision-making analysis that '…permits the calculation of the probability that one treatment is superior based on the observed data and prior beliefs…subjectivity of beliefs is not a liability, but rather explicitly allows that provides, for each area, samples from the whole posterior distribution of the relative risk. Mapping the probability that a relative risk is greater than a specified threshold of interest has been proposed by several authors [e.g., Clayton and Bernardinelli (1992)]. We carry this further and investigate the performance of decision rules for classifying an area [A.sub.i] as having an increased risk based on how much of the posterior distribution of [[theta].sub.i] exceeds a reference threshold. Figure 4 presents an example of the posterior distribution of the relative risk for such an area. The shaded proportion corresponds to the posterior probability The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned when the relevant evidence is taken into account. that [theta] > 1. To be precise, to classify clas·si·fy tr.v. clas·si·fied, clas·si·fy·ing, clas·si·fies 1. To arrange or organize according to class or category. 2. To designate (a document, for example) as confidential, secret, or top secret. any area as having an elevated risk, we define the decision rule D(c, [R.sub.0]), which depends on a cutoff probability c and a reference threshold [R.sub.0] such that area [A.sub.i] is classified as having an elevated risk according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. D(c, [R.sub.o]) [left and right arrow] Prob ([[theta].sub.i] > [R.sub.0]) > c. The appropriate rules to investigate will depend on the shape of the posterior distribution of [[theta].sub.i] for the elevated areas. We first discuss rules adapted to the autoregressive BYM and L1-BYM models. For these two models we have seen that, in general, the mean of the posterior distribution of [[theta].sub.i] in the raised-risk areas is greater than 1 but rarely above 1.5 in many of the scenarios investigated. Thus, it seems sensible to take [R.sub.0] = 1 as a reference threshold. We would also expect the bulk of the posterior distribution to be shifted above 1 for these areas, suggesting that cutoff probabilities well above 0.5 are indicated. In the first instance, we choose c = 0.8. Thus, for the BYM and L1-BYM models, we report results corresponding to the decision rule D(0.8, 1). See Appendix B for a detailed justification of this choice of value of c and the performance of different decision rules. [FIGURE 4 OMITTED] In contrast, we have seen that the mean of the posterior distribution of [[theta].sub.i] for raised-risk areas for the MIX model is closer to the true value for many scenarios, and there is clear indication that the upper tail of this distribution can be well above 1. Furthermore, the spread of this distribution is less than the corresponding one for the BYM or L1-BYM models, as noted by Green and Richardson (2002). The choice of threshold is thus more crucial for this model, making it harder to find an appropriate decision rule. After some exploratory analyses of the simple clusters in Simu 1 and Simu 2, we found that a suitable decision rule for the MIX model in these two scenarios is to choose [R.sub.0] = 1.5. For such a high threshold, one would expect that it is enough for a small fraction (e.g., 5 or 10%) of the posterior distribution of [[theta].sub.i] to be above 1.5 to indicate that an area has elevated risk. Thus, for the MIX model we report results corresponding to the decision rule D(0.05, 1.5). Two types of errors are associated with any decision rule: a) a false-positive false-positive /false-pos·i·tive/ (pos´it-iv) 1. denoting a test result that wrongly assigns an individual to a category. 2. an individual so categorized. 3. an instance of a false-positive result. result, that is, declaring an area as having elevated risk when in fact its underlying true rate equals the background level (an error also traditionally referred to as type I error or lack of specificity); and b) a false-negative false-negative /false-neg·a·tive/ (fawls´ neg´ah-tiv) 1. denoting a test result that wrongly excludes an individual from a category. 2. an individual so excluded. 3. an instance of a false-negative result. result, that is, declaring an area to be in the background when in fact its underlying rate is elevated (an error also referred to as type II error or lack of sensitivity). In epidemiology, performances are discussed either by reporting these error rates or their complementary quantities that measure the success rates of the decision rule. The two goals of disease mapping can be summarized as follows: not to overinterpret excesses arising by chance, that is, to minimize the false-positive rate but to detect patterns of true heterogeneity, that is, to maximize the sensitivity. We thus choose to report these two easily interpretable quantities. To be precise, for any decision rule D(c, [R.sub.0]), we compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer. * the false-positive rate (or 1 - specificity), that is, the proportion of background areas falsely declared elevated by the decision rule D(c, [R.sub.o]) * the sensitivity (or 1 - false-negative rate), that is, the proportion of areas generated with elevated rates correctly declared elevated by the decision rule D(c, [R.sub.o]). It is clear that there must be a compromise between these two goals: a stricter rule (i.e., one with a higher value of c or [R.sub.0] or both) reduces the false-positive rate but also decreases the sensitivity and thus increases the false-negative rate. Thus, to judge the performance of any decision rule, one has to consider both types of errors, not necessarily equally weighted. See Appendix B for an illustration of the implication of different weighting on the overall performance of the decision rule. Table 4 summarizes the probabilities of false-positive results for the three models. For BYM and L1-BYM, the probabilities stay below 10% with no discernible dis·cern·i·ble adj. Perceptible, as by the faculty of vision or the intellect. See Synonyms at perceptible. dis·cern i·bly adv. pattern for Simu 1 and Simu 2. The error rates are
dearly smaller and around 3% for Simu 3. In this scenario, the
background relative risk is shifted below 1, so a decision rule with
[R.sub.0] = 1 is, in effect, a more stringent rule than in the case of
Simu 1 and Simu 2 where the background relative risks are close to 1.
For the MIX model, the false-positive rates are quite low for Simu 1 and
Simu 2 and stay mostly below 3%. However, as shown in the last line of
Table 4, these rates have greatly increased for the Simu 3 scenario,
indicating that the decision rule D(0.05, 1.5) is no longer appropriate
in this heterogeneous context. The heterogeneity creates a lot of
uncertainty, with some background areas being grouped with nearby
high-risk areas; consequently, the rule D(0.05, 1.5) is not stringent
(specific) enough. Thus, we have investigated a series of rules D(c,
1.5) for c = 0.1-0.4 for the MIX model in the Simu 3 scenario. As c
increases, the probability of false positive decreases; for D(0.4, 1.5),
the probability is, on average, around 3% and always below 7% (Table 5).Concerning the detection of truly increased relative risks and sensitivity, we first discuss the results for the BYM and L1-BYM models. As expected from the posterior means shown in Tables 1 or 2, the ability to detect true increased risk areas is limited when the increase is only of the order of 1.5. If one takes as a guideline guideline Medtalk A series of recommendations by a body of experts in a particular discipline. See Cancer screening guidelines, Cardiac profile guidelines, Gatekeeper guidelines, Harvard guidelines, Transfusion guidelines. the cases where the detection of true positive is 50% or more, Tables 6 and 7 show that this sensitivity is reached for an expected count of around 50 in the case of a single isolated area and around 20 for the 1% cluster scenario. This shows that for rare diseases and small areas, there is little chance of detecting increased risks of around 1.5 while adequately controlling the false-positive rate. True relative risks of 2 are detected with at least 75% probability when expected counts are between 10 and 20 per area, depending on the spatial structure of the risk surface, whereas true relative risks of 3 are detected almost certainly when expected counts per area are 5 or more. There is no clear pattern of difference between the results for BYM and L1-BYM; overall, the sensitivity is similar. For Simu 3 we see that the sensitivity is lower than for the other simulation scenarios with equivalent expected counts (as were the rates of false positive in Table 4), in line with the true relative risks being closer to 1 than for Simu 1 and Simu 2. Hence, the decision rule D(0.8, 1) is more specific but less sensitive in this scenario. In situations comprising a large degree of heterogeneity akin to Simu 3, it thus might be advantageous to consider alternative rules, even if the rate of false positive is less well controlled. For example, in the case of a true relative risk ([theta]) = 1.65 and SF = 4, the use of rule D(0.7, 1) for the BYM model leads to a higher probability of false positive (6% compared with the 3% shown in Table 4). However, the corresponding gain in sensitivity is more than 10%, with the probability of detecting a true positive increasing to 82% compared with 71% when using the rule D(0.8, 1) (Table 5). Nevertheless, even with this relaxed and more sensitive rule, the chance of detecting a true relative risk as small as 1.3 is only around 50% if the SF is 4 (i.e., average cluster with total expected count around 80). On the other hand, true relative risks of around 2 are detected with high probability as soon as the SF is 2 (which corresponds, on average, to a cluster with total expected count of 40). The contrasting behavior of the MIX model is again apparent in Table 8 when one compares the results for the [theta] = 1.5 scenario with the other columns. For Simu 1 and Simu 2 the sensitivity is generally below that of the BYM model and especially when the true relative risk is 1.5; single dusters with [theta] = 1.5 are simply not detected. In the 1% cluster case expected counts of at least 20 (10) are necessary to be over 95% certain of detecting a true relative risk of 2 (3) (Table 8). Note that the results of the last line of Table 8 should be discounted in view of the high probability of false-positive results corresponding to this scenario (Simu 3) for the D(0.05, 1.5) rule shown in Table 4. Thus, it is apparent that for the MIX model, it is hard to calibrate To adjust or bring into balance. Scanners, CRTs and similar peripherals may require periodic adjustment. Unlike digital devices, the electronic components within these analog devices may change from their original specification. See color calibration and tweak. a good decision rule appropriate for a variety of spatial patterns of elevated risk. In Table 5 we summarize sum·ma·rize intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es To make a summary or make a summary of. sum the results corresponding to the decision rule D(0.4, 1.5), which offers a reasonable compromise between keeping the rate of false positives below 7% and an acceptable detection rate of true clusters. With this rule true relative risks of 1.65 with an SF of 2 (i.e., average cluster with total expected count slightly under 40) or larger have more than a 50% chance of being detected, and true relative risks of around 2 are nearly always detected. However, this model does not detect a true relative risk as small as 1.3. Discussion This comprehensive simulation study highlights some important points to be considered in interpreting any disease-mapping exercise based on hierarchical Bayesian procedures. First, the necessary control of false positives is indeed achieved using any of the models described. However, this is accompanied by a strong smoothing effect that renders the detection of localized increases in risk nearly impossible if these are not based on large (3-fold or more) excess risks or, in the case of more moderate (2-fold) excess risks, substantial expected counts of approximately 50 or more. Thus, in any study it is important to report the range of expected counts across the map and to calibrate any conclusions regarding the relative risks with respect to these expected counts. In general Bayes procedures offer a tradeoff between bias and variance reduction In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates that can be obtained for a given number of iterations. of the estimates. Particularly in cases where the sample size is small, they produce a set of point estimates that have good properties in terms of minimizing squared error loss (Carlin car·line or car·lin n. Scots A woman, especially an old one. [Middle English kerling, from Old Norse, from karl, man.] and Louis 2000). This variance reduction is attained at·tain v. at·tained, at·tain·ing, at·tains v.tr. 1. To gain as an objective; achieve: attain a diploma by hard work. 2. through borrowing information resulting from the adopted hierarchical structure See hierarchical. , leading to Bayes point estimates shrunk shrunk v. A past tense and a past participle of shrink. shrunk Verb a past tense and past participle of shrink shrunk, shrunken shrink toward a value related to the distribution of all the units included in the hierarchical structure. The effect of shrinkage is thus dependent on the prior structure that has been assumed and conditional on the latter being dose to the true model in some sense. Consequently, different prior structures will lead to different shrinkage. Note that the desirable properties of the estimates thus obtained will depend on the ultimate goal of the estimation exercise. If producing a set of point estimates of the relative risk is the aim, then posterior means of the rdative risk are best in squared error loss terms. However, if the goal is to estimate the histogram or the ranks of the area relative risks, different loss function should be considered. The desirability and difficulty of simultaneously achieving these triple goals has been discussed by Shen Shen, in the Bible, place, perhaps close to Bethel, near which Samuel set up the stone Ebenezer. and Louis (1998) and has been illustrated in spatial case studies by Conlon and Louis (1999) and Stern and Cressie (1999). In our study, we focus on the goal of estimating the overall spatial pattern of risk, which involves producing and interpreting a set of point estimates that will not only give a good indication of the presence of heterogeneity in the relative risks but also highlight where on the map this heterogeneity arises and whether this is linked to isolated high- and/or low-risk areas or to more general spatial aggregation of areas of similar high or low risk. Inference about the latter will depend on the sensitivity and specificity of the posterior risk estimates, as discussed in this article. If the goal is purely the testing of heterogeneity, other methods could be used, such as the Potthoff-Whittinghill test or scan statistics [see Wakefield et al. (2000) for review] that test for particular prespecified patterns of overdispersion. Conversely, if the aim is a local study around a point source, then again, the disease-mapping framework is not appropriate, and focused models that make use of the additional information about the location of the putative Alleged; supposed; reputed. A putative father is the individual who is alleged to be the father of an illegitimate child. A putative marriage is one that has been contracted in Good Faith and pursuant to ignorance, by one or both parties, that certain cluster of high risk are required (Morris and Wakefield 2000). We have shown that besides reporting and mapping the mean posterior relative risk, the whole posterior distribution can be usefully exploited to try to detect true raised-risk areas. For the BYM model, decision rules based Using "if-this, do that" rules to perform actions. Rules-based products implies flexibility in the software, enabling tasks and data to be easily changed by replacing one or more rules. on computing the probability that the relative risk is above 1 with a cutoff between 70 and 80% gives a specific rule. With this type of rule an average expected count of 20 in each of the raised-risk areas leads to a 50% chance of detecting a true relative risk of 1.5, but at least a 75% chance if the true relative risk is 2. For the same scenarios, the posterior mean relative risks are 1.05 and 1.23, respectively, showing that the posterior probabilities rather than the mean posterior relative risks are crucial for interpreting results from the BYM model. On the other hand, 3-fold increases in the relative risk are detected almost certainly with average expected counts of only 5 per area, although the mean of the posterior distribution is typically smoothed to about half the true excess. Note that the performance of the BYM model does improve when the risk is raised in a small group of contiguous areas with similar expected counts rather than in a single area because of the way spatial correlation is taken into account in these models. We found no notable difference in performance between the BYM model, which uses a Gaussian distribution, and the L1 BYM version, which uses a heavier-tailed, double-exponential distribution. This finding is in agreement with that of an earlier simulation study (Best et al. 1999) that compared these two models. However, there were some clear differences between the BYM models and the spatial allocation model MIX, The performance of the latter model is characterized by an all-or-none feature in the sense that it tends to allocate To reserve a resource such as memory or disk. See memory allocation. the true raised-risk areas to either an elevated risk group or to a background group, depending on how much uncertainty is present in the data. If the information from the data is sufficient (i.e., moderate-size expected counts and/or high true excess risks) the MIX model is able to separate the raised-risk and background areas quite well, producing considerably less smoothing of the raised-risk estimates than BYM. When the information in the data is sparse, uncertainty in the groupings leads to more smoothing than the BYM. This type of dichotomy di·chot·o·my n. pl. di·chot·o·mies 1. Division into two usually contradictory parts or opinions: "the dichotomy of the one and the many" Louis Auchincloss. makes any decision rule exploiting the posterior distribution of the relative risks hard to calibrate and less useful than for the BYM model. The MIX model is best used for providing estimates of the underlying magnitude of the relative risks if those are dearly raised rather than as a tool for detecting the presence of areas with excess risk in a decision role context. Conclusion We have quantified to what extent some usual and some more recently developed Bayesian disease-mapping models are conservative, in the sense that they have low sensitivity for detecting raised-risk areas that have only a small excess risk but that, conversely, any identified patterns of elevated risk are, on the whole, specific. We would view this amount of conservatism as a positive feature, as we wish to avoid false alarms when investigating spatial variation in disease risk. However, the magnitude of the risk in any areas identified as raised is likely to be considerably underestimated, and it is worth investigating a range of spatial priors that produce different amounts of smoothing. Given that most environmental risks are small, it is clear that such methods are seriously underpowered to detect them. This represents a major limitation of the small-area disease-mapping approach, although exploiting the full posterior distribution of the relative risk estimates using the decision rules proposed here can improve the discrimination between areas with background and elevated rates. For localized excesses where the geographic source of the risk can be hypothesized, these methods are not appropriate, and focused tests should be used instead. Future applications of small-area disease-mapping methods should therefore consider carefully the tradeoff between size of the areas, size of the expected counts, and the anticipated magnitude and spatial structure of the putative risks. Recently proposed multivariate The use of multiple variables in a forecasting model. extensions of Bayesian disease-mapping models (e.g., Gelfand and Vounatsou 2003; Knorr-Held and Best 2001) also deserve further consideration, as they may lead to improved power by enabling risk estimates to borrow information across multiple diseases that share similar etiologies as well as across areas. Appendix A Generation of the observed cases for Simu 3 using the multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. The binomial distribution is the probability distribution of the number of "successes" in n . For Simu 3, the number of cases for each of the 532 areas is generated using the multinomial distribution as follows: [Y.sub.i] ~ Multinomial [N, [E.sub.i][[theta].sub.i]/[summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) over (i)][E.sub.i][[theta].sub.i]], where Nis the total number of cases in the study region and is set equal (to the nearest integer integer: see number; number theory ) to the sum of the expected counts across all 532 areas. Hence N = 1,732 for the SF = 1 scenario and appropriate multiples of this for the other SFs. The parameter [[theta].sub.i] represents the relative risk in area i relative to some nominal external reference rate. However, the constraint Constraint A restriction on the natural degrees of freedom of a system. If n and m are the numbers of the natural and actual degrees of freedom, the difference n - m is the number of constraints. [[sigma].sub.i] [[Y].sub.i] = N = [[sigma].sub.i] [E.sub.i] imposed by the multinomial sampling effectively rescales the true relative risk in each area to be [[theta].sub.i] = [[theta].sub.i] N/[summation over (i)] [E.sub.i][[theta].sub.i]. The interpretation of [[theta].sub.i] is the relative risk in area i relative to the average risk in the study region. Appendix B Tradeoff between false-positive and false-negative rates for different decision rules. Figure B 1 shows three different loss functions representing weighted tradeoffs between the two types of errors: false positive and false negative, associated with the D(c, 1) decision rule for detecting raised-risk areas using the BYM model, plotted against cutoff c. Defining as in the text the false-negative rate to be the probability of failing to detect a true raised risk (i.e., 1--sensitivity), and the false-positive rate to be the probability of false detection of a background area as corresponding to a raised risk (i.e., 1--specificity), the three loss functions used are as follows: Loss function 1 = (false negative + false positive), / 2 with each error being equally weighted. Loss function 2 = (2 x false negative) + false positive, / 3 where we weight the false negative error false negative error Type II error Statistics An error which occurs when the statistical analysis of a trial detects no difference in outcomes between a treatment group and a control group when in fact a true difference exists as twice as bad as the lack of specificity. Loss function 3 = false negative + (2 x false positive), / 3 where we weight the lack of specificity as twice as bad as the false negative. We wish to choose c to minimize the losses, and the graphs show that, on average, a value of around 0.7-0.8 is appropriate. Note that the plots in Figure B1 are based on Simu 2 with SF = 2 or 4. For a small number of other scenarios (mainly with SF = 1), a value of c < 0.7 was needed to minimize the loss. However, for consistency, we have used the same value of c (= 0.8) for all the BYM and L1-BYM results presented in this article. Small Area Health Statistics, Department of Epidemiology and Public Health, Imperial College Faculty of Medicine, Imperial College London History Imperial College was founded in 1907, with the merger of the City and Guilds College, the Royal School of Mines and the Royal College of Science (all of which had been founded between 1845 and 1878) with these entities continuing to exist as "constituent colleges". , Norfolk Place, London, United Kingdom This aticle is part of the mini-monograph "Health and Environment Information Systems for Exposure and Disease Mapping, and Risk Assessment." Address correspondence to S. Richardson, Department of Epidemiology and Public Health, Imperial College Faculty of Medicine, Imperial College London, Norfolk Place, London, W2 1PG, United Kingdom. Telephone: 44 0 207 594 3336. Fax: 44 0 207 402 2150. E-mail: sylvia.richardson@imperial.ac.uk We thank P. Green for stimulating discussions and for providing the computer code of the MIX model. The U.K. Small Area Health Statistics Unit is funded by the Department of Health, Department of the Environment, Food and Rural Affairs, Environment Agency, Health and Safety Executive, Scottish Executive, National Assembly for Wales The National Assembly for Wales (Welsh: Cynulliad Cenedlaethol Cymru) is a devolved assembly with power to make legislation in Wales. , and the Northern Ireland Assembly For earlier bodies of the same name, see Northern Ireland Assembly (disambiguation). The Northern Ireland Assembly (Irish: Tionól Thuaisceart Éireann,[1] Ulster Scots: Norlin Airlann Semmlie[2] . The authors declare they have no competing financial interests. Received 12 September 2003; accepted 2 March 2004. REFERENCES Alexander FE, Boyle P, eds. 1996. Methods for Investigating Localised localised - localisation Clustering of Disease. IARC Sci Publ 135:1-247. Besag J, York York, former name of Toronto, Canada York, Ont.: see Toronto, Ont., Canada. York, city, England York, city (1991 pop. 123,126) and district, North Yorkshire, N England, at the confluence of the Ouse and Foss rivers. J, Mollie A. 1991. Bayesian image restoration with applications in spatial statistics. Ann ANN, Scotch law. Half a year's stipend over and above what is owing for the incumbency due to a minister's relict, or child, or next of kin, after his decease. Wishaw. Also, an abbreviation of annus, year; also of annates. In the old law French writers, ann or rather an, signifies a year. Inst Math Stat 43:1-20. Best NG, Arnold, RA, Thomas (language) Thomas - A language compatible with the language Dylan(TM). Thomas is NOT Dylan(TM). The first public release of a translator to Scheme by Matt Birkholz, Jim Miller, and Ron Weiss, written at Digital Equipment Corporation's Cambridge Research Laboratory runs A, Waller LA, Conlon EM. 1999. Bayesian models for spatially correlated disease and exposure data (with discussion). In: Bayesian Statistics 6 (Bernardo JM, Berger JO, Dawid AP, Smith AFM (Atomic Force Microscope) A device used to image materials at the atomic level. AFMs are used to solve processing and materials problems in electronics, telecom, biology and other high-tech industries. , eds). Oxford:Oxford University Press, 131-156. Carlin BP, Louis A. 2001. Bayes and Empirical Bayes Methods In statistics, empirical Bayes methods are a class of methods which use empirical data to evaluate / approximate the conditional probability distributions that arise from Bayes' theorem. These methods allow one to estimate quantities (probabilities, averages, etc. . 2nd ed. London:Chapman and Hall Chapman and Hall was a British publishing house, founded in the first half of the 19th century by Edward Chapman and William Hall. Upon Hall's death in 1847, Chapman's cousin Frederic Chapman became partner in the company, of which he became sole manager upon the retirement of . Clayton D, Bernardinelli L. 1992. Bayesian methods for mapping disease risk. In: Geographical and Environment Epidemiology: Methods for Small Area Studies (Elliott P, Cuziek J, English D, Stern R, eds). Oxford:Oxford University Press, 205-220. Clayton D, Bernardinelli L, Montomoli C. 1993. Spatial correlation in ecological ecological emanating from or pertaining to ecology. ecological biome see biome. ecological climax the state of balance in an ecosystem when its inhabitants have established their permanent relationships with each analysis. Int J Epidernio122:1193-1202. Conlon EM, Louis TA. 1999. Addressing multiple goals in evaluating region-specific risk using Bayesian methods. In: Disease Mapping and Risk Assessment for Public Health (Lawson A, Biggeri A, Bohning D, Lesaffre E, Viel JF, Bertollini R, eds). Chichester, UK:John Wiley John Wiley may refer to:
Denison DGT DGT Dirección General de Tráfico (Spain) DGT Directorate General of Telecommunications (Taiwan) DGT Don't Go There DGT Direcciòn General de Transporte (Guatemala) , Holmes CC. 2001. Bayesian partitioning To divide a resource or application into smaller pieces. See partition, application partitioning and PDQ. for estimating disease risk. Biometrics The biological identification of a person. Examples are face, iris and retinal patterns, hand geometry and voice. Increasingly built into laptop computers, fingerprint readers have become popular as a secure method for identification. 57:143-149. Doll R. 1980. The epidemiology of cancer. Cancer 45:2475-2485. Elliott P, Wakefield JC, Best NG, Briggs DJ. 2000. Spatial Epidemiology Spatial epidemiology is the study of the spatial distribution of disease. : Methods and Applications. Oxford: Oxford University Press. Gelfand A, Vounatsou P. 2003. Proper multivariate conditional autoregressive models for spatial data Data that is represented as 2D or 3D images. A geographic information system (GIS) is one of the primary applications of spatial data (land maps). See spatial analysis, spatial resolution and GIS glossary. analysis. Biostatistics biostatistics /bio·sta·tis·tics/ (-stah-tis´tiks) biometry. bi·o·sta·tis·tics n. The science of statistics applied to the analysis of biological or medical data. 4:11-15. Gilks WR, Richardson S, Spiegelhalter DJ. 1996. Markev Chain Monte Carlo Monte Carlo (môNtā` kärlō`), town (1982 pop. 13,150), principality of Monaco, on the Mediterranean Sea and the French Riviera. in Practice. London:Chapman and Hall. Green P J, Richardson S. 2002. Hidden Markov models A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. and disease mapping. J Am Stat Assoc 97:1055-1070. Jarup L, Best N, Toledanao M, Wakefield J, Elliott P. 2002. Geographical epidemiology of prostate cancer in Great Britain Great Britain, officially United Kingdom of Great Britain and Northern Ireland, constitutional monarchy (2005 est. pop. 60,441,000), 94,226 sq mi (244,044 sq km), on the British Isles, off W Europe. The country is often referred to simply as Britain. . Int J Cancer 97:695-699. Knorr-Held L, Best NG. 2001. A shared component model for detecting joint and selective clustering of two diseases. J R Stat Sec [Ser A] 164:73-85. Knorr-Held L, Rasser G. 2000. Bayesian detection of clusters and discontinuities in disease maps. Biometrics 56:13-21. Lawson AB, Biggeri AB, Boehning D, Lesaffre E, Viel JF, Clark A, et al. 2000. Disease mapping models: an empirical evaluation. Stat Med 19:2217-2241. Morris SE, Wakefield JC. 2og0.Assessment of disease risk in relation to a pre-specified source. In: Spatial Epidemiology: Methods and Applications (Elliott P, Wakefield JC, Best NG, Briggs D J, eds). Oxford:Oxford University Press. Shen W, Louis TA. 1998. Triple-goal estimates in two-stage hierarchical models. J R Statist stat·ism n. The practice or doctrine of giving a centralized government control over economic planning and policy. stat ist adj. Soc [Ser B] 60:455-471.Spiegelhalter D J, Thomas A, Best NG, Lunn DJ. 2002. WinBUGS: Bayesian Inference Using Gibbs Sampling In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution (i.e. Manual, Version 1.4. London: Imperial College; Cambridge, UK:MRC See Maximum return criterion. Biostatistics Unit. Available: http://www.mrc-bsu.cam.ac.uk/bugs [accessed 17 January 2004]. Stern, H, Cressie, N. 1999. Inference for extremes in disease mapping. In: Disease Mapping and Risk Assessment for Public Health (Lawson A, Biggeri A, Behning D, Lesaffre E, Viel JF, Bertollini R, eds). Chichester, UK: John Wiley & Sons, 63-84. Wakefield JC, Kelsall JE, Morris SE. 2000. Clustering, cluster detection, and spatial variation in risk. In: Spatial Epidemiology: Methods and Applications (Eliiott P, Wakefield JC, Best NG, Briggs D J, eds). Oxford: Oxford University Press, 128-152. Sylvia Richardson, Andrew Thomson Andrew Thomson may refer to:
Table 1. Posterior mean relative risk estimates for the raised-risk
areas for the BYM model (average over replicate data sets).
SF = 1
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.02 1.06
25% area (E = 1.10) 1.03 1.04 1.10
50% area (E = 1.92) 1.02 1.05 1.15
75% area (E = 5.37) 1.03 1.05 1.31
90% area (E = 7.38) 1.03 1.07 1.34
Simu 2
1% cluster (E = 5.42) 1.04 1.08 1.45
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.04 1.23 1.63
(E range: 0.77-11.6)
SF = 2
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.02 1.12
25% area (E = 1.10) 1.00 1.03 1.15
50% area (E = 1.92) 1.00 1.05 1.28
75% area (E = 5.37) 1.03 1.07 1.55
90% area (E = 7.38) 1.03 1.10 1.62
Simu 2
1% cluster (E = 5.42) 1.04 1.14 1.76
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.07 1.3 1.74
(E range: 0.77-11.6)
SF = 4
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.03 1.20
25% area (E = 1.10) 1.01 1.05 1.28
50% area (E = 1.92) 1.02 1.08 1.46
75% area (E = 5.37) 1.04 1.12 1.86
90% area (E = 7.38) 1.04 1.15 2.07
Simu 2
1% cluster (E = 5.42) 1.05 1.23 2.11
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.12 1.38 1.84
(E range: 0.77-11.6)
SF = 10
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.07 1.40
25% area (E = 1.10) 1.02 1.09 1.52
50% area (E = 1.92) 1.03 1.16 1.79
75% area (E = 5.37) 1.05 1.33 2.35
90% area (E = 7.38) 1.07 1.40 2.47
Simu 2
1% cluster (E = 5.42) 1.09 1.45 2.43
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.19 1.48 1.95
(E range: 0.77-11.6)
Table 2. Posterior mean relative risk estimates forth e raised-risk
areas forth e L1-BYM model (average over replicate data sets).
SF = 1
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.02 1.05
25% area (E = 1.10) 1.01 1.03 1.11
50% area (E = 1.92) 1.01 1.03 1.16
75% area (E = 5.37) 1.02 1.05 1.32
90% area (E = 7.38) 1.04 1.07 1.48
Simu 2
1% cluster (E = 5.42) 1.04 1.08 1.45
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.04 1.22 1.61
(E range: 0.77-11.6)
SF = 2
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.02 1.12
25% area (E = 1.10) 1.00 1.04 1.15
50% area (E = 1.92) 1.00 1.05 1.28
75% area (E = 5.37) 1.03 1.08 1.56
90% area (E = 7.38) 1.03 1.13 1.93
Simu 2
1% cluster (E = 5.42) 1.04 1.14 1.76
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.07 1.29 1.74
(E range: 0.77-11.6)
SF = 4
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.02 1.16
25% area (E = 1.10) 1.00 1.06 1.24
50% area (E = 1.92) 1.01 1.08 1.55
75% area (E = 5.37) 1.03 1.13 1.98
90% area (E = 7.38) 1.05 1.25 2.43
Simu 2
1% cluster (E = 5.42) 1.05 1.23 2.11
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.12 1.38 1.85
(E range: 0.77-11.6)
SF = 10
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.07 1.21
25% area (E = 1.10) 1.03 1.09 1.35
50% area (E = 1.92) 1.03 1.17 2.22
75% area (E = 5.37) 1.05 1.35 2.67
90% area (E = 7.38) 1.08 1.60 2.72
Simu 2
1% cluster (E = 5.42) 1.09 1.45 2.43
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.19 1.49 1.97
(E range: 0.77-11.6)
Table 3. Posterior mean relative risk estimates for the raised-risk
areas forth e MIX model (average over replicate data sets).
SF = 1
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.00 1.01 1.02
25% area (E = 1.10) 1.00 1.02 1.09
50% area (E = 1.92) 1.00 1.02 1.25
75% area (E = 5.37) 1.00 1.03 1.57
90% area (E = 7.38) 1.00 1.03 1.60
Simu 2
1% cluster (E = 5.42) 1.02 1.06 1.98
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.02 1.19 1.55
(E range: 0.77-11.6)
SF = 2
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.00 1.02 1.27
25% area (E = 1.10) 1.00 1.01 1.17
50% area (E = 1.92) 1.00 1.04 1.88
75% area (E = 5.37) 1.00 1.07 2.44
90% area (E = 7.38) 1.01 1.09 2.46
Simu 2
1% cluster (E = 5.42) 1.01 1.25 2.66
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.05 1.31 1.64
(E range: 0.77-11.6)
SF = 4
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.00 1.01 1.53
25% area (E = 1.10) 1.00 1.05 1.80
50% area (E = 1.92) 1.00 1.23 2.78
75% area (E = 5.37) 1.01 1.42 2.91
90% area (E = 7.38) 1.01 1.49 2.91
Simu 2
1% cluster (E = 5.42) 1.03 1.72 2.92
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.12 1.44 1.81
(E range: 0.77-11.6)
SF = 10
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 1.01 1.10 2.50
25% area (E = 1.10) 1.01 1.22 2.67
50% area (E = 1.92) 1.02 1.72 3.02
75% area (E = 5.37) 1.04 1.87 3.02
90% area (E = 7.38) 1.06 1.89 3.02
Simu 2
1% cluster (E = 5.42) 1.21 1.92 2.98
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 1.31 1.55 2.06
(E range: 0.77-11.6)
Table 4. False-positive rates (1--specificity) forth e three
models. (a)
SF = 1
Background [theta] = 1.5 [theta] = 2 [theta] = 3
BYM
Simu 1 0.08 0.10 0.05
Simu 2 0.07 0.06 0.06
Simu 3 (b) 0.02 0.03 0.02
L1-BYM
Simu 1 0.05 0.09 0.06
Simu 2 0.07 0.09 0.06
Simu 3 (b) 0.04 0.03 0.02
MIX
Simu 1 0.00 0.04 0.00
Simu 2 0.00 0.01 0.11
Simu 3 (b) 0.02 0.51 0.44
SF = 2
Background [theta] = 1.5 [theta] = 2 [theta] = 3
BYM
Simu 1 0.04 0.06 0.04
Simu 2 0.05 0.05 0.06
Simu 3 (b) 0.02 0.03 0.02
L1-BYM
Simu 1 0.06 0.10 0.05
Simu 2 0.05 0.07 0.06
Simu 3 (b) 0.02 0.03 0.02
MIX
Simu 1 0.01 0.04 0.00
Simu 2 0.00 0.04 0.04
Simu 3 (b) 0.02 0.52 0.25
SF = 4
Background [theta] = 1.5 [theta] = 2 [theta] = 3
BYM
Simu 1 0.03 0.08 0.06
Simu 2 0.05 0.05 0.07
Simu 3 (b) 0.03 0.03 0.01
L1-BYM
Simu 1 0.03 0.06 0.06
Simu 2 0.05 0.06 0.06
Simu 3 (b) 0.03 0.03 0.02
MIX
Simu 1 0.03 0.02 0.00
Simu 2 0.00 0.06 0.01
Simu 3 (b) 0.01 0.33 0.12
SF = 10
Background [theta] = 1.5 [theta] = 2 [theta] = 3
BYM
Simu 1 0.03 0.05 0.08
Simu 2 0.04 0.08 0.10
Simu 3 (b) 0.03 0.02 0.01
L1-BYM
Simu 1 0.05 0.05 0.08
Simu 2 0.04 0.07 0.08
Simu 3 (b) 0.03 0.02 0.01
MIX
Simu 1 0.02 0.00 0.08
Simu 2 0.01 0.02 0.00
Simu 3 (b) 0.00 0.14 0.03
(a) Decision rules are D(0.8, 1) for BYM and L1-BYM and D(0.05, 1.5)
for MIX.
(b) For Simu 3, [theta] * = 1.35, 1.65, or 2.1 instead of [theta] =
1.5, 2, or 3, respectively.
Table 5. Simu 3: performance of the BYM and MIX models under
alternative decision rules.
SF = 1
[theta] * [theta] * [theta] *
=1.35 = 1.65 = 2.1
BYM--D(0.7, 1)
Probability (false detection) 0.10 0.07 0.05
Probability (true detection) 0.23 0.51 0.71
MIX--D(0.4, 1.5)
Probability (false detection) 0.00 0.03 0.07
Probability (true detection) 0.00 0.23 0.76
SF = 2
[theta] * [theta] * [theta] *
=1.35 = 1.65 = 2.1
BYM--D(0.7, 1)
Probability (false detection) 0.07 0.07 0.04
Probability (true detection) 0.36 0.68 0.84
MIX--D(0.4, 1.5)
Probability (false detection) 0.00 0.06 0.05
Probability (true detection) 0.00 0.62 0.88
SF = 4
[theta] * [theta] * [theta] *
=1.35 = 1.65 = 2.1
BYM--D(0.7, 1)
Probability (false detection) 0.08 0.06 0.03
Probability (true detection) 0.56 0.82 0.93
MIX--D(0.4, 1.5)
Probability (false detection) 0.00 0.07 0.03
Probability (true detection) 0.00 0.84 0.93
SF = 10
[theta] * [theta] * [theta] *
=1.35 = 1.65 = 2.1
BYM--D(0.7, 1)
Probability (false detection) 0.08 0.05 0.02
Probability (true detection) 0.81 0.95 0.99
MIX--D(0.4, 1.5)
Probability (false detection) 0.00 0.03 0.01
Probability (true detection) 0.00 0.93 0.98
Table 6. Sensitivity (1--false-negative rate) far the BYM model. (a)
SF = 1
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.08 0.06 0.08
25% area (E = 1.10) 0.36 0.48 0.38
50% area (E = 1.92) 0.32 0.48 0.40
75% area (E = 5.37) 0.08 0.30 0.74
90% area (E = 7.38) 0.12 0.22 0.74
Simu 2
1% cluster (E = 5.42) 0.18 0.42 0.95
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 0.09 0.34 0.56
(E range: 0.77-11.6)
SF = 2
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.04 0.02 0.36
25% area (E = 1.10) 0.20 0.24 0.36
50% area (E = 1.92) 0.16 0.32 0.66
75% area (E = 5.37) 0.12 0.52 0.98
90% area (E = 7.38) 0.10 0.64 0.98
Simu 2
1% cluster (E = 5.42) 0.30 0.74 1
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 0.17 0.51 0.74
(E range: 0.77-11.6)
SF = 4
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0 0.06 0.68
25% area (E = 1.10) 0.20 0.50 0.82
50% area (E = 1.92) 0.24 0.66 0.98
75% area (E = 5.37) 0.22 0.76 1
90% area (E = 7.38) 0.34 0.88 1
Simu 2
1% cluster (E = 5.42) 0.53 0.97 1
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 0.37 0.71 0.88
(E range: 0.77-11.6)
SF = 10
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.02 0.42 0.98
25% area (E = 1.10) 0.28 0.54 1
50% area (E = 1.92) 0.30 0.96 1
75% area (E = 5.37) 0.66 1 1
90% area (E = 7.38) 0.88 1 1
Simu 2
1% cluster (E = 5.42) 0.90 1 1
Simu 3 [theta] * [theta] * [theta] *
= 1.35 = 1.65 = 2.1
20 x 1 % clusters 0.66 0.90 0.94
(E range: 0.77-11.6)
(a) Decision rule is 0(0.8, 1).
Table 7. Probability of true detection (sensitivity) for the L1-BYM
model. (a)
SF = 1
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.02 0.04 0.04
25% area (E = 1.10) 0.26 0.34 0.38
50% area (E = 1.92) 0.28 0.38 0.42
75% area (E = 5.37) 0.08 0.24 0.74
90% area (E = 7.38) 0.16 0.22 0.76
Simu 2
1% cluster (E = 5.42) 0.17 0.35 0.91
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.10 0.31 0.55
(E range: 0.77-11.6)
SF = 2
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.04 0.08 0.32
25% area (E = 1.10) 0.24 0.38 0.40
50% area (E = 1.92) 0.30 0.42 0.66
75% area (E = 5.37) 0.06 0.50 0.94
90% area (E = 7.38) 0.10 0.68 0.98
Simu 2
1% cluster (E = 5.42) 0.16 0.64 1
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.16 0.48 0.75
(E range: 0.77-11.6)
SF = 4
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.02 0.02 0.54
25% area (E = 1.10) 0.16 0.46 0.88
50% area (E = 1.92) 0.26 0.56 0.96
75% area (E = 5.37) 0.20 0.78 1
90% area (E = 7.38) 0.24 0.90 1
Simu 2
1% cluster (E = 5.42) 0.39 0.95 1
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.35 0.70 0.89
(E range: 0.77-11.6)
SF = 10
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0.04 0.28 0.54
25% area (E = 1.10) 0.44 0.52 0.98
50% area (E = 1.92) 0.44 0.86 1
75% area (E = 5.37) 0.68 1 1
90% area (E = 7.38) 0.86 1 1
Simu 2
1% cluster (E = 5.42) 0.85 1 1
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.65 0.90 0.98
(E range: 0.77-11.6)
(a) Decision rule is D(0.8, 1).
Table 8. Probability of true detection (sensitivity) for the MIX
model. (a)
SF = 1
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0 0 0.05
25% area (E = 1.10) 0 0.02 0.20
50% area (E = 1.92) 0 0.02 0.33
75% area (E = 5.37) 0 0.02 0.51
90% area (E = 7.38) 0 0.05 0.55
Simu 2
1% cluster (E = 5.42) 0.02 0.10 0.86
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.04 0.85 0.99
(E range: 0.77-11.6)
SF = 2
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0 0.02 0.35
25% area (E = 1.10) 0 0.01 0.30
50% area (E = 1.92) 0 0.10 0.77
75% area (E = 5.37) 0 0.18 0.90
90% area (E = 7.38) 0 0.19 0.93
Simu 2
1% cluster (E = 5.42) 0.01 0.46 0.99
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.04 0.99 0.99
(E range: 0.77-11.6)
SF = 4
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0 0.04 0.56
25% area (E = 1.10) 0 0.16 0.72
50% area (E = 1.92) 0 0.51 0.98
75% area (E = 5.37) 0 0.67 0.99
90% area (E = 7.38) 0 0.68 0.99
Simu 2
1% cluster (E = 5.42) 0.05 0.95 1
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.06 0.99 0.99
(E range: 0.77-11.6)
SF = 10
Raised-risk area [theta] = 1.5 [theta] = 2 [theta] = 3
Simu 1
10% area (E = 0.84) 0 0.31 0.54
25% area (E =1.10) 0.06 0.53 0.98
50% area (E = 1.92) 0.05 0.94 1
75% area (E = 5.37) 0.10 0.98 1
90% area (E = 7.38) 0.14 0.98 1
Simu 2
1% cluster (E = 5.42) 0.47 1.00 1.00
Simu3 [theta] * = [theta] * = [theta] * =
1.35 1.65 2.1
20 x 1 % clusters 0.00 0.99 1.00
(E range: 0.77-11.6)
(a) Decision rule is D(0.5, 1.5).
|
|
||||||||||||||||||||

i·bly adv.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion