Using supervised principal components analysis to assess multiple pollutant effects.BACKGROUND: Many investigations of the adverse health effects of multiple air pollutants pollutants see environmental pollution. analyze the time series involved by simultaneously entering the multiple pollutants into a Poisson log-linear model log-linear model a statistical model which models frequency counts in contingency tables by using an analysis of variance approach. . This method can yield unstable unstable, adj 1. not firm or fixed in one place; likely to move. 2. capable of undergoing spontaneous change. A nuclide in an unstable state is called radioactive. An atom in an unstable state is called excited. parameter (1) Any value passed to a program by the user or by another program in order to customize the program for a particular purpose. A parameter may be anything; for example, a file name, a coordinate, a range of values, a money amount or a code of some kind. estimates when the pollutants involved suffer high intercorrelation; therefore, traditional approaches to dealing with multicollinearity Noun 1. multicollinearity - a case of multiple regression in which the predictor variables are themselves highly correlated statistics - a branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability , such as principal component analysis (PCA (tool, programming) PCA - A dynamic analyser from DEC giving information on run-time performance and code use. ), have been promoted in this context. OBJECTIVES: A characteristic of PCA is that its construction does not consider the relationship between the covariates and the adverse health outcomes. A refined version of PCA, supervised su·per·vise tr.v. su·per·vised, su·per·vis·ing, su·per·vis·es To have the charge and direction of; superintend. [Middle English *supervisen, from Medieval Latin principal components analysis (SPCA SPCA serum prothrombin conversion accelerator (coagulation factor VII). SPCA abbr. serum prothrombin conversion accelerator SPCA, n an acronym for serum p ), is proposed that specifically addresses this issue. METHODS: Models controlling for long-term Long-term Three or more years. In the context of accounting, more than 1 year. long-term 1. Of or relating to a gain or loss in the value of a security that has been held over a specific length of time. Compare short-term. trends and weather effects were used in conjunction with each SPCA and PCA to estimate the association between multiple air pollutants and mortality for U.S. cities. The methods were compared further via a simulation study. RESULTS: Simulation studies demonstrated that SPCA, unlike PCA, was successful in identifying the correct subset A group of commands or functions that do not include all the capabilities of the original specification. Software or hardware components designed for the subset will also work with the original. of multiple pollutants associated with mortality. Because of this property, SPCA and PCA returned different estimates for the relationship between air pollution and mortality. CONCLUSIONS: Although a number of methods for assessing the effects of multiple pollutants have been proposed, such methods can falter in the presence of high correlation among pollutants. Both PCA and SPCA address this issue. By allowing the exclusion of pollutants that are not associated with the adverse health outcomes from the mixture of pollutants selected, SPCA offers a critical improvement over PCA. KEY WORDS: air pollution, mortality, multiple pollutants, principal components analysis, time series. Environ en·vi·ron tr.v. en·vi·roned, en·vi·ron·ing, en·vi·rons To encircle; surround. See Synonyms at surround. [Middle English envirounen, from Old French environner Health Perspect 114:1877-1882 (2006). doi:10.1289/ehp.9226 available via http://dx.doi.org/ [Online 24 August 2006] ********** Numerous time-series studies have investigated the association between daily adverse health outcomes and daily ambient Surrounding. For example, ambient temperature and humidity are atmospheric conditions that exist at the moment. See ambient lighting. air pollution concentrations (Chock et al. 2000; Cifuentes Cifuentes may refer to:
1. a diseased condition or state. 2. the incidence or prevalence of a disease or of all diseases in a population. mor·bid·i·ty n. , ambient air pollution, and meteorologic me·te·or·ol·o·gy n. The science that deals with the phenomena of the atmosphere, especially weather and weather conditions. [French météorologie, from Greek covariates. The fitted models are then used to quantify Quantify - A performance analysis tool from Pure Software. the adverse health effects of ambient air pollution. Because the U.S. Environmental Protection Agency Environmental Protection Agency (EPA), independent agency of the U.S. government, with headquarters in Washington, D.C. It was established in 1970 to reduce and control air and water pollution, noise pollution, and radiation and to ensure the safe handling and regulates pollutants independently, much of the current time-series research on the adverse health effects of air pollution has focused on estimating the effect of an individual pollutant pol·lut·ant n. Something that pollutes, especially a waste material that contaminates air, soil, or water. (Dominici and Burnett 2003). However, because of the potential for high correlations to exist between ambient air pollutants, the results from studies that focus on a single pollutant can be difficult to interpret in practice (Vedal et al. 2003). For example, an observed positive association could occur because the single air pollutant is a proxy for another air pollutant or for a mixture of air pollutants. To overcome the limitations of single-pollutant time-series studies, a number of studies have investigated the concurrent adverse health effects of multiple air pollutants (Moolgavkar 2000; Wong et al. 2002). In the majority of these studies, the multiple air pollutants are simultaneously entered into a single Poisson log-linear model. The results from these studies are then used to isolate isolate /iso·late/ (i´sah-lat) 1. to separate from others. 2. a group of individuals prevented by geographic, genetic, ecologic, social, or artificial barriers from interbreeding with others of their kind. the adverse health effects of the individual pollutants. However, one important question that these multiple pollutant studies fail to answer is whether there is a specific mixture of pollutants associated with adverse health outcomes. Moreover, it has recently been stated that it may be more reasonable to assume that there is a mixture of pollutants that is considered harmful considered harmful - Edsger W. Dijkstra's note in the March 1968 "Communications of the ACM", "Goto Statement Considered Harmful", fired the first salvo in the structured programming wars. to health (Dominici and Burnett 2003; Moolgavkar 2003; Stieb et al. 2002). Assessing the adverse health effects of an air pollution mix may therefore be both more interpretable and more feasible than attempting to isolate the effects of individual pollutants independent of other pollutants. The development of new methodology and models to concurrently estimate the adverse health effects of multiple air pollutants has been identified by statisticians Statisticians or people who made notable contributions to the theories of statistics, or related aspects of probability, or machine learning: A to E
regression In statistics, a process for determining a line or curve that best represents the general trend of a data set. and the lasso lasso (lăs`ō, lăs `), light, strong rope, usually with a smooth, hard finish, made of a fine quality of hemp or nylon. (Hong et al. 1999; Roberts and Martin 2005,
2006a).
One method used or proposed by researchers to analyze the effect of multiple pollutants is principal components analysis (PCA) (Burnett et al. 2003; Cox 2000). PCA avoids the problem of unstable parameter estimates sometimes obtained in multiple pollutant studies because of the high correlations between pollutants. However, one characteristic of PCA in this context is that the mixture of pollutants identified as a principal component is constructed using only covariate covariate predictors during the allocation of experimental units in a randomized design. information without regard to the relationship between pollutant levels and mortality. We investigate a recently proposed modified version of PCA called supervised principal component analysis (SPCA) for analyzing the adverse health effects of multiple pollutants. SPCA was developed by Bair et al. (2006) for use in regression problems in which the number of predictors greatly exceeds the number of observations. In this article we refine their implementation of SPCA to make it suitable for use in multiple pollutant studies. For this purpose SPCA is similar to conventional PCA except that it uses a subset of the multiple pollutants that are selected on the basis of their association with the adverse health outcomes of interest rather than only on intrinsic intrinsic /in·trin·sic/ (in-trin´sik) situated entirely within or pertaining exclusively to a part. in·trin·sic adj. 1. Of or relating to the essential nature of a thing. 2. properties of the covariate space. As a result, SPCA is allowed to exclude pollutants not associated with the adverse health outcomes from the mixture of pollutants it returns. In addition to PCA, a number of methods have been developed in the regression literature to deal with the problem of high correlations among covariates or predictor variables Noun 1. predictor variable - a variable that can be used to predict the value of another variable (as in statistical regression) variable quantity, variable - a quantity that can assume any of a set of values . These methods include ridge regression, partial least squares, and latent Hidden; concealed; that which does not appear upon the face of an item. For example, a latent defect in the title to a parcel of real property is one that is not discoverable by an inspection of the title made with ordinary care. root regression (Bertrand et al. 2001). Similar to SPCA, partial least squares and latent root regression use information in the response variable to construct latent variables In statistics, Latent variables (as opposed to observable variables), are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed and directly measured. (linear combinations of the predictor variables) to be used as predictors. Latent root regression shares another feature in common with SPCA in that it also makes use of a PCA to construct the latent variables. Latent root regression and partial least squares may also prove useful tools in assessing the adverse health effects of multiple air pollutants. For this reason future studies that investigate the use of these methods for assessing the adverse health effects of multiple air pollutants may prove valuable. Further information on latent root regression can be found in Gunst et al. (1976) and Webster Webster, town (1990 pop. 16,196), Worcester co., S Mass., near the Conn. line; settled c.1713, set off from Dudley and Oxford and inc. 1832. The chief manufactures are footwear, fabrics, and textiles. et al. (1974), and further information on partial least squares can be found in Hoskuldsson (1988). Materials and Methods Materials. The data used in this article were obtained from the publicly available National Morbidity, Mortality, and Air Pollution Study (NMMAPS NMMAPS National Morbidity, Mortality, and Air Pollution Study ) database (Johns Hopkins Bloomberg School of Public Health The Johns Hopkins Bloomberg School of Public Health is part of Johns Hopkins University in Baltimore, Maryland, U.S. It was the first institution of its kind in the world. Founded in 1916 by William H. Welch and John D. 2005). The data extracted consists of concurrent daily time series of mortality, weather, and air pollution for nine cities in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. from 1987 to 2000. The nine cities selected had a relatively large number of days with measurements for all five air pollutants considered. Many of the cities in the NMMAPS database do not collect data on all five air pollutants and/or have a large number of days with missing air pollutant concentrations. Further details on the data used can be obtained at http://www.ihapss.jhsph.edu/(Johns Hopkins Bloomberg School of Public Health 2005). The mortality time-series data, aggregated at the county level, are nonaccidental daily deaths of individuals 65 years of age and older. Deaths of nonresidents were excluded from the mortality counts. The weather time-series data are 24-hr averages of temperature and dew-point temperature, computed from hourly observations. The five air pollutants considered are particulate matter particulate matter n. Abbr. PM Material suspended in the air in the form of minute solid particles or liquid droplets, especially when considered as an atmospheric pollutant. Noun 1. (PM) of < 10 [micro]m in diameter ([PM.sub.10]), ozone, sulfur dioxide sulfur dioxide, chemical compound, SO2, a colorless gas with a pungent, suffocating odor. It is readily soluble in cold water, sparingly soluble in hot water, and soluble in alcohol, acetic acid, and sulfuric acid. , carbon monoxide carbon monoxide, chemical compound, CO, a colorless, odorless, tasteless, extremely poisonous gas that is less dense than air under ordinary conditions. It is very slightly soluble in water and burns in air with a characteristic blue flame, producing carbon dioxide; , and nitrogen dioxide nitrogen dioxide n. A poisonous brown gas, NO2, often found in smog and automobile exhaust fumes and synthesized for use as a nitrating agent, a catalyst, and an oxidizing agent. Noun 1. . For [PM.sub.10], S[O.sub.2], CO, and N[O.sub.2], average daily concentrations were used. For [O.sub.3] the maximum hourly concentration for each day was used. In the analyses that follow, each of the pollutant time series was standardized standardized pertaining to data that have been submitted to standardization procedures. standardized morbidity rate see morbidity rate. standardized mortality rate see mortality rate. to have unit variance The discrepancy between what a party to a lawsuit alleges will be proved in pleadings and what the party actually proves at trial. In Zoning law, an official permit to use property in a manner that departs from the way in which other property in the same locality . Methods. The majority of time-series studies that have investigated the concurrent adverse health effects of multiple air pollutants simultaneously entered the pollutants into a single Poisson log-linear model. With this model the daily adverse health outcome counts are modeled as independent Poisson random variables with a time-varying mean [[mu].sub.t], where log([[mu].sub.t]) = confounder con·found tr.v. con·found·ed, con·found·ing, con·founds 1. To cause to become confused or perplexed. See Synonyms at puzzle. 2. [s.sub.t] + [[beta].sub.1][X.sub.1t] + [[beta].sub.2][X.sub.2t] + ... + [[beta].sub.k][X.sub.kt], [1] and where [confounders.sub.t] represents other time-varying variables related to the adverse health outcomes, [X.sub.it], i = 1,.., k represent the k pollutants under investigation, and [[beta].sub.i], i = 1,.., k, measure the adverse health effect of pollutant i, assuming all other pollutant levels are held fixed. Hereafter In the future. The term hereafter is always used to indicate a future time—to the exclusion of both the past and present—in legal documents, statutes, and other similar papers. , Model 1 will be referred to as the "standard model." A noted problem with the standard model is that the pollutant-effect parameter estimates may be unstable (i.e., have unduly high covariance Covariance A measure of the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely. structure) because of high correlation among pollutants (Burnett et al. 2003). When high correlation exists among pollutants, one of the pollutants can be well approximated by a linear combination of the remaining pollutants, resulting in the fitted model becoming close to unidentifiable Adj. 1. unidentifiable - impossible to identify identifiable - capable of being identified , and the associated parameter estimates becoming unstable (Ramsay et al. 2003). In the context of linear regression Linear regression A statistical technique for fitting a straight line to a set of data points. , this is commonly referred to as the problem of multicollinearity. When generalized linear models Not to be confused with general linear model. In statistics, the generalized linear model (GLM) is a useful generalization of ordinary least squares regression. It relates the random distribution of the measured variable of the experiment (the (GLMs) or generalized additive models In statistics, the generalized additive model (or GAM) is a statistical model developed by Trevor Hastie and Rob Tibshirani blending properties of multiple regression (a special case of general linear model) with additive models. (GAMs) are used, an analogous analogous /anal·o·gous/ (ah-nal´ah-gus) resembling or similar in some respects, as in function or appearance, but not in origin or development. a·nal·o·gous adj. problem to multicollinearity, concurvity, can occur. Concurvity refers to the situation in which a function of one of the covariates is well approximated by a linear combination of the functions of other covariates. In our context the functions would be the smooth functions used to model the effects of the confounding confounding when the effects of two, or more, processes on results cannot be separated, the results are said to be confounded, a cause of bias in disease studies. confounding factor covariates. Like multicollinearity, concurvity results in unstable parameter estimates from the fitted model. In the presence of concurvity, it was noted by Ramsay et al. (2003) and Dominici et al. (2002) that the variance estimates obtained from a fitted GAM do not reflect the resulting instability instability /in·sta·bil·i·ty/ (-stah-bil´i-te) lack of steadiness or stability. detrusor instability of the parameter estimates. To avoid this problem, Ramsay et al. (2003) suggested that the GLM GLM Global Language Monitor GLM Global Marine (stock symbol) GLM Graduated Length Method (ski instruction) GLM Good Looking Mom (used in pediatric practices) GLM God Loves Me be used instead of the GAM and that the confounding covariates be modeled parametrically, for example, by using natural cubic splines. For this reason all models in this article will be posed as the GLM with natural cubic splines used to model the effect of the confounding variables A confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical or research model that should have been experimentally controlled, but was not. . This is the same modeling approach that has been adopted in a number of recent studies (Luginaah et al. 2005; Peng et al. 2005). PCA is a method commonly used in regression analysis In statistics, a mathematical method of modeling the relationships among three or more variables. It is used to predict the value of one variable given the values of the others. For example, a model might estimate sales based on age and gender. to overcome the problems associated with correlated cor·re·late v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates v.tr. 1. To put or bring into causal, complementary, parallel, or reciprocal relation. 2. explanatory ex·plan·a·to·ry adj. Serving or intended to explain: an explanatory paragraph. ex·plan variables. In the context of the standard model, PCA finds the linear combination of the pollutant variables that has maximal max·i·mal adj. 1. Of, relating to, or consisting of a maximum. 2. Being the greatest or highest possible. variance among all such combinations. Specifically, constants [[alpha].sub.i], i = 1,.., k are found such that the variance of [Z.sub.1t] = [[alpha].sub.1][X.sub.1t] + [[alpha].sub.2][X.sub.2t] + ... + [[alpha].sub.k][X.sub.kt] is maximized. The standard model is then re-fit using the derived variable [Z.sub.1t], referred to as the first principal component, in place of the k original pollutant variables [X.sub.it], i = 1,.., k: log([[mu].sub.t]) = confounders[.sub.t] + [[beta].sub.1][Z.sub.1t]. [2] Using the single derived variable [Z.sub.1t] in Model 2 avoids the coefficient coefficient /co·ef·fi·cient/ (ko?ah-fish´int) 1. an expression of the change or effect produced by variation in certain factors, or of the ratio between two different quantities. 2. instability problems associated with fitting a model to the correlated pollutant variables. A simple justification for PCA is that by choosing the linear combination with maximum variance we are retaining as much of the information contained in [X.sub.it], i = 1,.., k as possible with the use of only a single variable [Z.sub.1t]. It should be noted that a complete PCA of the pollutant data returns k independent principal components [Z.sub.it], i = 1,.., k, each describing successively less of the information (variance) contained in [X.sub.it], i = 1,.., k. In many cases the first one or two principal components capture almost all the variability in the covariate space, and so for ease of interpretation we elect here to consider only the first and most important PCA variable in our analysis. Hereafter, Model 2 will be referred to as the "PCA model." One characteristic of PCA is that the pollutant mixture [Z.sub.1t] = [[alpha].sub.1][X.sub.1t] + [[alpha].sub.2][X.sub.2t] + ... + [[alpha].sub.k][X.sub.kt] derived from the PCA model is chosen without regard to the response variable--the daily adverse health outcomes of interest. The PCA approach was designed to reduce the dimension of a high-dimensional covariate space so that fewer variables might be considered in an analysis. As such, the relationship between the covariates and the response is simply not considered in constructing the principal components. This is a potentially undesirable characteristic because pollutants that are not associated or only weakly weak·ly adj. weak·li·er, weak·li·est Delicate in constitution; frail or sickly. adv. 1. With little physical strength or force. 2. With little strength of character. associated with the adverse health outcomes will, all other things being equal, be treated exactly the same by PCA as those pollutants strongly associated with the adverse health outcomes. SPCA is a modification of PCA that avoids this characteristic by explicitly incorporating information on the relationship between the predictor variables and the response. It should be noted that SPCA shares a feature similar to another multivariate analysis multivariate analysis, n a statistical approach used to evaluate multiple variables. multivariate analysis, n a set of techniques used when variation in several variables has to be studied simultaneously. technique, canonical correlation In statistics, canonical correlation analysis, introduced by Harold Hotelling, is a way of making sense of cross-covariance matrices. Definition Given two column vectors and analysis, that
also finds linear combinations of the predictor variables that depend on
the response variable or variables. However, the two methods have
important differences, including the criterion used for finding the
optimal linear combinations and that SPCA is used when there is a single
response variable and a number of explanatory variables, whereas
canonical correlation analysis is used when there are a number of
response variables and a number of explanatory variables.
As originally proposed, SPCA was designed for regression problems in which the number of predictors greatly exceeds the number of observations. Here we implement a version of SPCA that can be used as an alternative to PCA in multiple pollutant studies, a situation where the number of observations generally exceeds the number of parameters. Our implementation of SPCA proceeded as follows: 1. Fit a separate Poisson log-linear model for each pollutant variable, relating the confounders and the given pollutant to the adverse health outcomes. That is, for each pollutant ([X.sub.it], i = 1,.., k), fit the model log([[mu].sub.t]) = confounder[s.sub.t] + [[beta].sub.1][X.sub.it]. 2. For each of the k models fit in step 1, note the absolute value of Wald's statistic statistic, n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample. statistic a numerical value calculated from a number of observations in order to summarize them. [w = |[b.sub.1]/SE([b.sub.1])|]. Order the pollutant variables from most important to least important based on decreasing values of w. Denote de·note tr.v. de·not·ed, de·not·ing, de·notes 1. To mark; indicate: a frown that denoted increasing impatience. 2. this ordered list In HTML an ordered list
An unordered list
3. Using the first 90% of the data, fit the following k+1 models: log([[mu].sub.t]) = confounder[s.sub.t] + [[beta].sub.1][Q.sub.it], i = 0,1,.., k, where [Q.sub.it], i = 1,.., k corresponds to the first principal component of the pollutants [X.sub.[1]],.., [X.sub.[k]] and [Q.sub.0t] = 0, which corresponds to fitting a model with no pollutants, i.e., the model log([[mu].sub.t]) = confounders[.sub.t]. 4. Using the remaining 10% of the data, calculate the prediction error for each of these k+1 models, and select as the "best" of these models that with the smallest estimated prediction error. 5. Refit the best model using all the data, that is, fit the model log([[mu].sub.t]) = confounder[s.sub.t] + [[beta].sub.1][Q.sub.st], where s corresponds to the best model. Step 3 in the above algorithm algorithm (ăl`gərĭth'əm) or algorism (–rĭz'əm) [for Al-Khowarizmi], a clearly defined procedure for obtaining the solution to a general type of problem, often numerical. could be implemented using more than the first principal component variable. However, for the reasons discussed in our implementation of the PCA model, we have decided to use only the first principal component variable. The advantage of the SPCA procedure over PCA is that pollutants not associated or only weakly associated with the adverse health outcomes have a significant chance of being excluded from the chosen model. Hereafter, our implementation of SPCA using the above algorithm will be referred to as the "SPCA model." Simulation Study To conduct the simulations, we required a way of generating realistic mortality time series with known air pollution mortality effects. We used a method previously shown to generate realistic mortality time series (Roberts and Martin 2006b), which proceeds by fitting the following Poisson log-linear model similar to those used in previous analyses (Daniels et al. 2000), to the actual Cook County (Chicago), Illinois Illinois, river, United States Illinois, river, 273 mi (439 km) long, formed by the confluence of the Des Plaines and Kankakee rivers, NE Ill., and flowing SW to the Mississippi at Grafton, Ill. It is an important commercial and recreational waterway. , mortality and meteorologic time-series data: [Y.sub.t]~ Poisson([[omega].sub.t]) log([[omega].sub.t]) = confounder[s.sub.t] + [theta Theta A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option. ]([[alpha].sub.1][X.sub.1t] + [[alpha].sub.2][X.sub.2t] + [[alpha].sub.3][X.sub.3t] + [[alpha].sub.4][X.sub.4t] + [[alpha].sub.5][X.sub.5t]), confounder[s.sub.t] = [S.sub.t1](time, 3 df per year) + [S.sub.t2](tem[p.sub.0], 6 df) + [S.sub.t3](tem[p.sub.1-3], 6 df) + [S.sub.t4](de[w.sub.0], 3 df) + [S.sub.t5](de[w.sub.1-3], 3 df) + [gamma]DO[W.sub.t], [3] where the t refers to the day of the study, [Y.sub.t] is the simulated mortality count on day t and [[omega].sub.t] is the expected number of deaths on day t. The quantities [S.sub.ti]() are smooth functions of time, temperature (temp), and dew-point temperature (dew) with the indicated degrees of freedom (df). The smooth functions are represented using natural cubic splines. The quantity tem[p.sub.0] is the current day's mean 24-hr temperature and tem[p.sub.1-3] is the average of the previous 3 days' 24-hr mean temperatures. The values de[w.sub.0] and de[w.sub.1-3] are defined similarly for the 24-hr mean dew-point temperature, and DO[W.sub.t] is a set of indicator variables for the day of the week. The quantities [X.sub.it], i = 1,.., 5, are, respectively, the current day's daily concentrations of [PM.sub.10], N[O.sub.2], CO, [O.sub.3], and S[O.sub.2], [[alpha].sub.i], i = 1,.., 5, are the prespecified weights of each pollutant, and [theta] is the prespecified effect of the air pollutant mixture ([[alpha].sub.1][X.sub.1t] + ... + [[alpha].sub.5][X.sub.5t]) on mortality. All the analyses in this article were conducted using the statistical package R (R Development Core Team 2006). For the reasons discussed above, the GLMs along with natural cubic splines were used to fit the PCA and SPCA models. The offset option in the R GLM function was used to permit the relationship between the pollutants and mortality to be a priori a priori In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience. specified and included in the fitting process. Fitting Model 3 produced an expected mortality count for each day, [[omega].sub.t], that incorporated the effects of the five pollutant through the explicitly specified relationship [theta]([[alpha].sub.1][X.sub.1t] + ... + [[alpha].sub.5][X.sub.5t]). Using these expected mortality counts, the simulations proceeded as follows: 1. Choose values for ([theta], [[alpha].sub.1], [[alpha].sub.2], [[alpha].sub.3], [[alpha].sub.4], [[alpha].sub.5]). 2. Using Model 3, compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer. expected mortality counts for each day, [[omega].sub.t], that incorporate the pollutant effects selected in step 1. 3. Generate a mortality time series using a Poisson model with mean [[omega].sub.t] on day t. 4. Fit the PCA model to the simulated mortality time series; that is, fit the model log([[mu].sub.t]) = confounder[s.sub.t] + [[beta].sub.1][Z.sub.1t] to the simulated mortality time series from step 3. 5. Fit the SPCA model to the simulated mortality time series, that is, using the simulated mortality time series from step 3, follow steps 1-5 in the description of the SPCA model. This procedure results in a final model of the form log([[mu].sub.t]) = confounder[s.sub.t] +[[beta].sub.1][Q.sub.st] being fitted to the simulated mortality time series. 6. Repeat steps 3-5 1,000 times. In the simulations, 13 ([theta], [[alpha].sub.1], [[alpha].sub.2], [[alpha].sub.3], [[alpha].sub.4], [[alpha].sub.5]) combinations were used. For these 13 combinations, the effect of the air pollutant mixture on mortality [theta] ranged from 0 to 0.1. A [theta] value of 0.1 corresponds to approximately a 10% increase in mortality for a simultaneous one standard deviation In statistics, the average amount a number varies from the average number in a series of numbers. (statistics) standard deviation - (SD) A measure of the range of values in a set of numbers. (SD) increment To add a number to another number. Incrementing a counter means adding 1 to its current value. in the concentration of each air pollutant. Tables 1-3 contain the results of the simulations. Table 1 shows clearly that in most situations SPCA performed appropriately in terms of selecting the correct subset of pollutants associated with mortality. In 9 of the 13 scenarios considered, SPCA selected the correct subset of pollutants 75% or more of the time. This is an important improvement over standard PCA, which by construction will never exclude pollutants. Table 2 shows that for the majority of cases considered, the bias of the individual pollutant effect estimate obtained from SPCA was smaller than the bias of the corresponding estimate obtained from PCA. The reduction in bias was particularly striking for cases in which an individual pollutant was unrelated to mortality. The smaller bias of the SPCA estimates compared to the PCA estimates is because pollutants unrelated to mortality are retained by PCA; these unrelated pollutants explain by chance some of the morality effect that should be attributed to pollutants that are actually associated with mortality. However, the ability of SPCA to exclude pollutants is not without cost. The ability of SPCA to exclude pollutants means that the weights it assigns Individuals to whom property is, will, or may be transferred by conveyance, will, Descent and Distribution, or statute; assignees. The term assigns is often found in deeds; for example, "heirs, administrators, and assigns to denote the assignable nature of to each pollutant in the derived pollutant variable ([Q.sub.st]) are random, unlike the constant weights assigned as·sign tr.v. as·signed, as·sign·ing, as·signs 1. To set apart for a particular purpose; designate: assigned a day for the inspection. 2. by PCA. The random nature of the SPCA weights will typically result in the estimates obtained from SPCA having a larger variance than the corresponding estimates obtained from PCA. The increased variance is evidenced in Table 2, where for the majority of cases considered, the SD of the individual pollutant effect estimate obtained from SPCA is larger than the SD of the corresponding estimate obtained from PCA. The root-mean-squared error (rmse) is a measure of the average "closeness" of an estimator to the value that is being estimated; smaller values of the rmse correspond to "better" estimators. The rmse values in Table 3 indicate that SPCA produced better estimates or estimates with smaller error in slightly more than half of the cases considered. For these cases the increased variance of the SPCA estimates was more than compensated for by a reduction in bias. Perhaps more important, the rmse values for SPCA were much more stable under different simulation scenarios than the rmse values for PCA; the rmse values for SPCA ranged from 0 to 12.06 with a median of 1.71, whereas the rmse values for PCA ranged from 0.33 to 80.99 with a median of 2.58. This tells us that the benefits of using SPCA instead of PCA, in terms of both bias and the average closeness of the associated estimates to their true values, outweigh out·weigh tr.v. out·weighed, out·weigh·ing, out·weighs 1. To weigh more than. 2. To be more significant than; exceed in value or importance: The benefits outweigh the risks. the disadvantage in terms of variance alone of using SPCA instead of PCA. This suggests that aside from providing useful information on which pollutants are unrelated to mortality that the SPCA model has the additional benefit over the PCA model of producing estimates with smaller error on average. Application In this section the data from the nine cities described previously are used to illustrate the use of the SPCA model compared to the PCA model in the multiple pollutant context. For both models, the confounder adjustments used had the same specification described in the previous section for Model 3. Table 4 contains the results of fitting the models to the data from each city. Table 5 provides correlation matrices of the data from two of the nine cities considered--Cleveland, Ohio, and Nashville, Tennessee “Nashville” redirects here. For other uses, see Nashville (disambiguation). Nashville is the capital and the second most populous city of the U.S. state of Tennessee, after Memphis. . The correlation matrices for these two cities are shown because they correspond to the two cities where the SPCA model retained the least (zero pollutants) and most (four pollutants) pollutants, respectively. The results obtained from the two methods differ substantially, as for each city SPCA concluded that one or more of the five pollutants were not sufficiently associated with mortality to warrant inclusion. This in turn results in the two methods returning different estimates for the effect of air pollution on mortality. Since the results of the simulation study suggested that SPCA was successful in the majority of cases in determining the correct subset but not necessarily the magnitude of pollutants associated with mortality, the results obtained with SPCA are likely more reliable than those obtained with PCA. For example, for the Chicago data, SPCA concluded that only [O.sub.3] was associated with mortality and gave it a loading of one, while PCA gave each pollutant a roughly equal loading. In this situation the effect estimate obtained using PCA is likely biased because the mixture of pollutants on which this estimate is based contains a number of pollutants that may be unrelated to mortality. The results from the SPCA model reveal interesting interpretations from the analysis about the effects of the five air pollutants on mortality in the nine cities considered, specifically highlighting when a particular pollutant appears unrelated to mortality. This insight is unavailable from the results of the PCA model, which invariably in·var·i·a·ble adj. Not changing or subject to change; constant. in·var i·a·bil implicates all five pollutants albeit with differing weights. In all
cities SPCA suggests that the pollutant mixture associated with daily
mortality consists of only a subset of the five pollutants considered.
Additionally, in Cleveland, Ohio "Cleveland" redirects here. For the Cleveland metropolitan area, see . For other uses, see Cleveland (disambiguation).Cleveland is a city in the U.S. state of Ohio and the county seat of Cuyahoga County, the most populous county in the state. ; Houston, Texas “Houston” redirects here. For other uses, see Houston (disambiguation). Houston (pronounced /'hjuːstən/) is the largest city in the state of Texas and the ; and Salt Lake City, Utah For ships of the United States Navy of the same name, see . Salt Lake City is the capital and the most populous city of the U.S. state of Utah. The name of the city is often shortened to Salt Lake, or its initials, S.L.C. , none of the pollutants was found to be associated with mortality. This additional information could prove valuable to researchers interested in determining the specific pollutants associated with increased mortality in particular regions. Of course, a question of considerable interest raised by the SPCA results is why a particular pollutant, for example, [PM.sub.10], is found to be associated with mortality in some cities but not in others. The correlation matrices in Table 5 suggest reasons for the weights or loadings assigned to each pollutant by the PCA and SPCA models. In Cleveland the five pollutants considered are all interrelated in·ter·re·late tr. & intr.v. in·ter·re·lat·ed, in·ter·re·lat·ing, in·ter·re·lates To place in or come into mutual relationship. in to roughly the same extent, which results in the first PCA variable, or first derived variable (as seen in Table 4), giving each pollutant roughly equal loading or weight. In Cleveland, SPCA did not retain any pollutants, but the correlation structure of the pollutants for this city indicates that, like PCA, any pollutants retained by SPCA would have received a roughly equal loading. However, in Nashville because S[O.sub.2] is essentially unrelated to the other pollutants, it is given a relatively small weight in both the first derived SPCA and PCA variables. Discussion Principal component analysis is a commonly used remedial REMEDIAL. That which affords a remedy; as, a remedial statute, or one which is made to supply some defects or abridge some superfluities of the common law. 1 131. Com. 86. The term remedial statute is also applied to those acts which give a new remedy. Esp. Pen. Act. 1. measure for multicollinearity. The generally high positive correlation Noun 1. positive correlation - a correlation in which large values of one variable are associated with large values of the other and small with small; the correlation coefficient is between 0 and +1 direct correlation that exists between ambient air pollutants makes PCA a useful tool for multiple pollutant time-series studies. However, the use of PCA for this purpose raises concerns, of which the primary concern is that PCA invariably includes all pollutants in the selected mixture of air pollutants. A modified version of SPCA was shown to successfully deal with this problem, allowing a subset of the pollutants to contribute to a mixture related to the adverse health outcomes of interest. A shortcoming short·com·ing n. A deficiency; a flaw. shortcoming Noun a fault or weakness Noun 1. of SPCA, like PCA, is that once SPCA has selected the appropriate subset of pollutants, the loadings applied to each pollutant in the subset are also assigned without regard to the adverse health outcomes of interest. An interesting article by Hadi and Lin (1998) provides further cautionary notes on the use of PCA, many of which are also applicable to SPCA. In this article we have considered only the implementation of PCA and SPCA using the first derived variable, that is, the first principal component of all five pollutants for PCA and the first principal component of the retained pollutants for SPCA. We made this choice because for the pollutant data used in this article, the first derived variable captured a significant proportion of the variability and using only a single derived variable provides a single linear combination of pollutants to be interpreted. This simplification also allowed a more concise description of the new methodology. If more than one derived variable is used, it will be necessary to interpret other linear combinations of pollutants, combinations that often do not have an intuitive interpretation and that explain a relatively small proportion of overall variability. Of course, both the PCA and SPCA methods can be extended to consider more than a single derived variable. Indeed, the inclusion of additional derived pollutant variables in either or both PCA and SPCA may result in improved performance in terms of the bias and variance properties of the resulting pollutant effect estimates. For example, if the first derived variable in a PCA gives large weights to pollutants unrelated to mortality compared with those actually associated with mortality, then including additional derived variables may result in improved estimates. However, for reasons of brevity Brevity Adonis’ garden of short life. [Br. Lit.: I Henry IV] bubbles symbolic of transitoriness of life. [Art: Hall, 54] cherry fair cherry orchards where fruit was briefly sold; symbolic of transience. and clarity, we elected to consider only the first derived variable in each case in describing the proposed methodology. A number of different methodologies have been used to investigate the mixture of pollutants associated with an adverse health outcome. Hong et al. (1999) used a number of air pollution indices to evaluate the combined effects of various air pollutants. The indices used by Hong et al. were selected a priori and gave each pollutant included in the air pollutant index equal weight. This method is similar to using the first PCA of the multiple pollutants to estimate the adverse health effects of the multiple pollutants. The Hong et al. (1999) method possibly could be improved by using the same methodology employed in SPCA to remove unrelated pollutants. Other articles have investigated the mixture of pollutants associated with adverse health outcomes by assigning as·sign tr.v. as·signed, as·sign·ing, as·signs 1. To set apart for a particular purpose; designate: assigned a day for the inspection. 2. weights to each air pollutant that were explicitly estimated during the fitting process and constrained con·strain tr.v. con·strained, con·strain·ing, con·strains 1. To compel by physical, moral, or circumstantial force; oblige: felt constrained to object. See Synonyms at force. 2. to sum to one (Roberts 2006a, 2006b). These weighted methods have benefits over both PCA and SPCA in that the loadings assigned to each pollutant depend on the adverse health outcomes. The disadvantage of the weighted methods is that they do not avoid the problem of unstable parameter estimates that can arise because of the positive correlation among pollutants. Each of these methods has merit, but in each case a key factor in whether the method appropriately deals with individual pollutants is the extent to which the pollutants are correlated with one another. In the presence of high intercorrelation among the pollutants, SPCA offers a critical advantage over these techniques. Another recent study investigated the use of the "shrinkage Shrinkage The amount by which inventory on hand is shorter than the amount of inventory recorded. Notes: The missing inventory could be due to theft, damage, or book keeping errors. methods" ridge regression and the lasso for use in assessing the adverse health effects of multiple pollutants (Roberts and Martin 2005). Ridge regression and the lasso are methods that can be applied in a regression setting when some predictor variables are highly correlated. Again, these two methods have advantages over both PCA and SPCA in that the loadings assigned to each pollutant are dependent on the adverse health outcomes. However, an important advantage SPCA has over these shrinkage methods is that it is often able to successfully select the correct subset of pollutants that are associated with mortality. SPCA can be considered a method positioned somewhere between the shrinkage-based methods and the weighted methods. Like the shrinkage methods, SPCA is able to avoid unstable parameter estimates due to multicollinearity, and like the weighted methods it is often able to successfully select the correct subset of pollutants that are associated with mortality. REFERENCES Bair E, Hastie T, Paul D, Tibshirani R. 2006. Prediction by supervised principal components. J Am Statist stat·ism n. The practice or doctrine of giving a centralized government control over economic planning and policy. stat ist adj. Assoc 101:119-137
Bertrand D, Qannari EM, Vigneau E. 2001. Latent root regression analysis: an alternative to PLS See playlist. . Chemom Intell Lab Syst 58:227-234. Burnett RT, Brook J, Dann T, Delocla C, Philips O, Cakmak, et al. 2003. Association between particulate par·tic·u·late adj. Of or occurring in the form of fine particles. n. A particulate substance. particulate composed of separate particles. and gas phase components of urban air pollution and daily mortality in eight Canadian Canadian (kənā`dēən), river, 906 mi (1,458 km) long, rising in NE New Mexico. and flowing E across N Texas and central Oklahoma into the Arkansas River in E Oklahoma. cities. Inhal Toxicol 12(suppl 4):15-39. Chock DP, Winkler Winkler may refer to:
Pittsburgh (pronounced IPA: /ˈpɪtsbɚg/) is the second largest city in the Commonwealth of Pennsylvania. . J Air Waste Manage Assoc 50:1481-1500. Cifuentes LA, Vega Vega (vā`gə), brightest star in the constellation Lyra; Bayer designation Alpha Lyrae; 1992 position R.A. 18h36.7m, Dec. +38°47'. A white main-sequence star of spectral class A0 V, its apparent magnitude is 0. J, Kopfer K, Lave LB. 2000. Effect of the fine fraction of particulate matter versus the coarse mass and other pollutants on daily mortality in Santiago, Chile Santiago, officially Santiago de Chile (Spanish: (helpinfo)), is the capital of Chile, and the center of its largest conurbation (Greater Santiago). . J Air Waste Manage Assoc 50:1287-1298. Cox LH. 2000. Statistical issues in the study of air pollution involving airborne airborne /air·borne/ (ar´born) suspended in, transported by, or spread by air. airborne, adj carried through the air. In health care settings, viruses or bacteria may become airborne, e.g. particulate matter. Environmetrics 11:611-626. Daniels MJ, Dominici F, Samet JM, Zeger SL. 2000. Estimating particulate matter-mortality dose-response curves dose-response curve A graphic representation of the effects that varous doses of an agent–eg, ionizing radiation or a chemotherapeutic agent, have on a given parameter–eg, cell viability, mutation frequency, DNA damage, tumor growth or metastasis or and threshold levels Noun 1. threshold level - the intensity level that is just barely perceptible intensity, intensity level, strength - the amount of energy transmitted (as by acoustic or electromagnetic radiation); "he adjusted the intensity of the sound"; "they measured the : an analysis of daily time-series for the 20 largest US cities. Am J Epidemiology epidemiology, field of medicine concerned with the study of epidemics, outbreaks of disease that affect large numbers of people. Epidemiologists, using sophisticated statistical analyses, field investigations, and complex laboratory techniques, investigate the cause 152:397-406. Dominici F, Burnett RT. 2003. Risk models for particulate air pollution. J Toxicol Environ Health A 66:1883-1889. Dominici F, McDermott A, Zeger SL, Samet JM. 2002. On the use of generalized additive models in time-series studies of air pollution and health. Am J Epidemiol 156:193-203. Goldberg MS, Burnett RT, Valois MF, Flegel K, Bailar JC III, Brooks J, et al. 2003. Associations between ambient air pollution and daily mortality among persons with congestive heart failure congestive heart failure, inability of the heart to expel sufficient blood to keep pace with the metabolic demands of the body. In the healthy individual the heart can tolerate large increases of workload for a considerable length of time. . Environ Res 91:8-20. Gunst RF, Webster JT, Mason RL. 1976. A comparison of least squares and latent root regression estimators. Technometrics 18:75-83. Hadi AS, Ling ling: see cod. RF. 1998. Some cautionary notes on the use of principal components regression. Am Stat 52:15-19. Hong YC, Leem JH, Ha EH, Christiani DC. 1999. [PM.sub.10] exposure, gaseous gas·e·ous adj. 1. Of, relating to, or existing as a gas. 2. Full of or containing gas; gassy. pollutants, and daily mortality in Inchon, South Korea. Environ Health Perspect 107:873-878 Hoskuldsson A. 1988. PLS regression methods. J Chemom 2:211-228. Johns Hopkins Bloomberg School of Public Health. 2005. iHAPSS. Internet-based Health and Air Pollution Surveillance System. Baltimore Baltimore, city (1990 pop. 736,014), N central Md., surrounded by but politically independent of Baltimore co., on the Patapsco River estuary, an arm of Chesapeake Bay; inc. 1745. , MD: Johns Hopkins Bloomberg School of Public Health, Department of Biostatistics biostatistics /bio·sta·tis·tics/ (-stah-tis´tiks) biometry. bi·o·sta·tis·tics n. The science of statistics applied to the analysis of biological or medical data. . Available: http://www.ihapss.jhsph.edu/ [accessed 15 January 2006]. Kelsall JE, Samet JM, Zeger SL, Xu J. 1997. Air pollution and mortality in Philadelphia, 1974-1988. Am J Epidemiol 146:750-762. Kwon HJ, Cho SH, Nyberg F, Pershagen G. 2001. Effects of ambient air pollution on daily mortality in a cohort cohort /co·hort/ (ko´hort) 1. in epidemiology, a group of individuals sharing a common characteristic and observed over time in the group. 2. of patients with congestive heart failure. Epidemiology 12:413-419. Luginaah IN, Fung KY, Gorey KM, Webster G, Willis Wil·lis , Thomas 1621-1675. English anatomist and physician known for his studies of the nervous system and the brain. He discovered the circle of Willis at the base of the brain. C. 2005. Association of ambient air pollution with respiratory hospitalization hospitalization /hos·pi·tal·iza·tion/ (hos?pi-t'l-i-za´shun) 1. the placing of a patient in a hospital for treatment. 2. the term of confinement in a hospital. in a government-desginated "area of concern": the case of Windsor, Ontario Windsor is the southernmost city in Canada and lies at the western end of the heavily populated Quebec City-Windsor Corridor. Windsor is located directly south of Detroit and is separated from that city by the Detroit River. The city has views of the Detroit skyline. . Environ Health Perspect 113:290-296. Moolgavkar SH. 2003. Air pollution and daily mortality in two U.S. counties: season-specific analyses and exposure-response relationships. Inhal Toxicol 15:877-907. Moolgavkar SH. 2000. Air pollution and daily mortality in three U.S. counties. Environ Health Perspect 108:777-784. Ostro BD, Hurley S Hurley has become the English version of at least three distinct original Irish names: the Ó hUirthile, part of the Dál gCais tribal group, based in Clare and North Tipperary; the Ó Muirthile, based around Kilbritain in west Cork; and the OhIarlatha, from the district of , Lipsett MJ. 1999. Air pollution and daily mortality in the Coachella Valley Coachella Valley (kō'əchĕl`ə), arid region, SE Calif., N of the Salton Sea. Water is brought into the region by artesian wells and by the Coachella Canal (123 mi/198 km long), a branch of the All-American Canal built between 1938 and , California California (kăl'ĭfôr`nyə), most populous state in the United States, located in the Far West; bordered by Oregon (N), Nevada and, across the Colorado River, Arizona (E), Mexico (S), and the Pacific Ocean (W). : a study of PM10 dominated by coarse particles <onlyinclude> This is a list of particles in particle physics, including currently known and hypothetical elementary particles, as well as the composite particles that can be built up from them. . Environ Res 81:231-238. Peng RD, Dominici F, Pastor-Barriuso R, Zeger SL, Samet JM. 2005. Seasonal analyses of air pollution and mortality in 100 US cities. Am J Epidemiology 161:585-594. R Development Core Team. 2006. R: A Language and Environment for Statistical Computing computing - computer . Vienna:R Foundation for Statistical Computing. Available: http://www.R-project.org [accessed 7 December 2005]. Ramsay TO, Burnett RT, Krewski D. 2003. The effect of concurvity in generalized additive models linking mortality to ambient particulate matter. Epidemiology 14:18-23. Roberts S. 2006. A new model for investigating the mortality effects of multiple air pollutants in air pollution mortality timeseries studies. J Toxicol Environ Health A 69:417-435. Roberts S, Martin M. 2005. A critical assessment of shrinkage-based regression approaches for estimating the adverse health effects of multiple air pollutants. Atmos Environ 39:6223-6230. Roberts S, Martin MA. 2006a. Investigating the mixture of air pollutants associated with adverse health outcomes. Atmos Environ 40:984-991. Roberts S, Martin MA. 2006b. The question of non-linearity in the dose-response relationship The Dose-response relationship describes the change in effect on an organism caused by differing levels of exposure (or doses) to a stressor (usually a chemical). This may apply to individuals (eg: a small amount has no observable effect, a large amount is fatal), or to populations between particulate matter air pollution and mortality: can Akaike's Information Criterion There are a number of statistics that can act as an information criterion. They include:
Smith RL, Davis JM, Sacks J, Speckman P, Styer P. 2000. Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama Birmingham (pronounced [ˈbɝmɪŋˌhæm]) is the largest city in the U.S. state of Alabama and is the county seat of Jefferson County. . Environmetrics 11:719-743. Stieb DM, Judek S, Burnett RT. 2002. Meta-analysis meta-analysis /meta-anal·y·sis/ (met?ah-ah-nal´i-sis) a systematic method that takes data from a number of independent studies and integrates them using statistical analysis. of time-series studies of air pollution and mortality: effects of gases and particles and the influence of cause of death, age, and season. J Air Waste Manage Assoc 52:470-484. Vedal S, Brauer M, White R, Petkau J. 2003. Air pollution and daily mortality in a city with low levels of pollution. Environ Health Perspect 111:45-51. Webster JT, Gunst RF, Mason RL. 1974. Latent root regression. Technometrics 16:513-522. Wong TW, Tam WS, Yu TS, Wong AH. 2002. Associations between daily mortalities from respiratory and cardiovascular diseases Cardiovascular disease Disease that affects the heart and blood vessels. Mentioned in: Lipoproteins Test cardiovascular disease and air pollution in Hong Kong Air pollution in Hong Kong is considered a serious problem. Not only the flora and fauna are affected but also humans. Cases of asthma and bronchial infections have soared in recent years, and doctors place the blame squarely on poor air quality. , China. Occup Environ Med 59:30-35. Steven Roberts and Michael A. Martin School of Finance and Applied Statistics, College of Business and Economics, Australian National University Australian National University, located in Canberra and state-sponsored, founded 1946 as Australia's only completely research-oriented university. Originally limited to graduate studies, it expanded in 1960, merging with Canberra University College (est. 1929). , Canberra, Australian Capital Territory Australian Capital Territory (1991 pop. 276,468), 939 sq mi (2,432 sq km), SE Australia, an enclave within New South Wales, containing Canberra, capital of Australia. It was called the Federal Capital Territory until 1938. , Australia Address correspondence to S. Roberts, School of Finance and Applied Statistics, College of Business and Economics, Australian National University, Canberra ACT 0200 Australia. Telephone: 61 2 6125 3470. Fax: 61 2 6125 0087. E-mail: steven.roberts@anu.edu.au The authors declare they have no competing financial interests. Received 3 April 2006; accepted 24 August 2006.
Table 1. Number of pollutants retained by SPCA over sets of 1,000
simulations.
Effect No. of pollutants retained by SPCA (b)
(1,000 x [theta]) (a) 0 1 2 3 4 5 Percent correct (c)
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 0 0 0 15 0 85 85
50 0 0 1 24 0 75 75
25 0 3 8 22 3 65 65
12.5 0 11 11 16 8 54 54
[PM.sub.10], N[O.sub.2], and CO associated with mortality
([[alpha].sub.1] = [[alpha].sub.2] = [[alpha].sub.3] = 1/3,
[[alpha].sub.4] = [[alpha].sub.5] = 0)
100 0 0 0 100 0 0 100
50 0 0 0 100 0 0 100
25 0 2 6 86 4 3 86
12.5 0 11 14 45 12 18 44
[PM.sub.10] associated with mortality ([[alpha].sub.1] = 1,
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
100 0 100 0 0 0 0 100
50 0 100 0 0 0 0 100
25 0 100 0 0 0 0 100
12.5 0 94 5 1 0 0 94
No pollutant associated with mortality ([[alpha].sub.1] =
[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 30 27 16 9 8 10 30
(a) 1,000x the actual values of [theta] used to generate mortality.
(b) The percentage of time over each set of 1,000 simulations that a
subset of pollutants of a particular size was retained by SPCA. (c) The
percentage of time over each set of simulations that SPCA retained the
correct subset of pollutants. The corresponding values for PCA will be
100% for the cases in which all pollutants were associated with
mortality and 0% for all other cases.
Table 2. Bias and SD of the individual pollutant effect estimates
obtained from SPCA and PCA over sets of 1,000 simulations.
Efffect [PM.sub.10] N[O.sub.2]
(1,000 x [theta]) (a) Bias (b) SD (c) Bias
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 -0.09 (-0.50) 1.08 (0.53) 4.28 (3.48)
50 0.46 (-0.09) 1.60 (0.53) 2.71 (1.94)
25 1.18 (0.13) 2.47 (0.55) 1.63 (1.17)
12.5 1.55 (0.24) 2.63 (0.53) 0.80 (0.81)
PM, N[O.sub.2], and CO associated with mortality ([[alpha].sub.1]
= [[alpha].sub.2] = [[alpha].sub.3] = 1/3,[[alpha].sub.4] =
[[alpha].sub.5] = 0)
100 -4.78 (-9.95) 0.62 (0.53) 3.61 (-5.18)
50 -2.06 (-4.84) 0.83 (0.51) 2.18 (-2.43)
25 -0.23 (-2.27) 2.09 (0.54) 1.24 (-1.03)
12.5 1.18 (-0.98) 2.87 (0.55) 0.25 (-0.33)
PM associated with mortality ([[alpha].sub.1] = 1,[[alpha].sub.2] =
[[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5] = 0)
100 -1.76 (-80.99) 1.62 (0.52) 0 (22.90)
50 0.61 (-40.51) 1.77 (0.52) 0 (11.43)
25 1.99 (-20.12) 1.84 (0.52) 0 (5.88)
12.5 2.29 (-9.90) 2.31 (0.55) 0.03 (3.13)
No pollutant associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 0.80 (0.34) 1.62 (0.54) 0.29 (0.41)
Efffect N[O.sub.2] CO
(1,000 x [theta]) (a) SD Bias SD
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 1.95 (0.64) 0 (-1.04) 2.47 (0.51)
50 1.73 (0.64) 0.36 (-0.36) 2.13 (0.51)
25 2.03 (0.66) 0.05 (-0.02) 2.07 (0.53)
12.5 1.72 (0.64) -0.01 (0.17) 1.59 (0.52)
PM, N[O.sub.2], and CO associated with mortality ([[alpha].sub.1]
=[[alpha].sub.2] = [[alpha].sub.3] = 1/3,[[alpha].sub.4] =
[[alpha].sub.5] = 0)
100 0.80 (0.64) -0.19 (-10.6) 0.72 (0.52)
50 1.01 (0.62) 0.22 (-5.17) 1.06 (0.50)
25 2.08 (0.65) -0.02 (-2.43) 2.28 (0.52)
12.5 2.26 (0.66) -0.52 (-1.07) 2.15 (0.53)
PM associated with mortality ([[alpha].sub.1] = 1,[[alpha].sub.2]
= [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5] = 0)
100 0 (0.62) 0 (18.50) 0 (0.50)
50 0 (0.63) 0 (9.23) 0 (0.51)
25 0 (0.63) 0 (4.75) 0 (0.51)
12.5 0.45 (0.66) 0.02 (2.52) 0.32 (0.54)
No pollutant associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 0.75 (0.64) 0.38 (0.33) 0.87 (0.52)
Efffect [O.sub.3] SO[.sub.2]
(1,000 x [theta]) (a) Bias SD Bias
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 -11.51 (-9.98) 3.62 (0.27) -3.44 (-0.47)
50 -6.20 (-4.91) 2.23 (0.27) -2.60 (-0.07)
25 -3.15 (-2.37) 1.39 (0.28) -1.67 (0.13)
12.5 -1.34 (-1.09) 1.08 (0.27) -0.90 (0.25)
PM, N[O.sub.2], and CO associated with mortality ([[alpha].sub.1]
= [[alpha].sub.2] = [[alpha].sub.3] = 1/3,[[alpha].sub.4]
= [[alpha].sub.5] = 0)
100 0 (12.01) 0 (0.27) 0 (23.4)
50 0 (6.07) 0 (0.26) 0 (11.8)
25 0.26 (3.12) 1.03 (0.28) 0.17 (6.08)
12.5 0.65 (1.64) 1.04 (0.28) 0.57 (3.19)
PM associated with mortality ([[alpha].sub.1] = 1,[[alpha].sub.2] =
[[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5] = 0)
100 0 (9.77) 0 (0.27) 0 (19.1)
50 0 (4.87) 0 (0.27) 0 (9.5)
25 0.05 (2.51) 0.97 (0.27) 0 (4.89)
12.5 0.50 (1.33) 2.10 (0.28) 0 (2.6)
No pollutant associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 0.14 (0.18) 0.90 (0.27) -0.10 (0.34)
Efffect S[O.sub.2]
(1,000 x [theta]) (a) Bias
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 7.06 (0.53)
50 4.35 (0.53)
25 2.50 (0.55)
12.5 1.46 (0.53)
PM, N[O.sub.2], and CO associated with mortality ([[alpha].sub.1]
= [[alpha].sub.2] = [[alpha].sub.3] = 1/3,[[alpha].sub.4] =
[[alpha].sub.5] = 0)
100 0 (0.53)
50 0 (0.51)
25 1.01 (0.54)
12.5 1.25 (0.55)
PM associated with mortality ([[alpha].sub.1] = 1,[[alpha].sub.2]
= [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5] = 0)
100 0 (0.52)
50 0 (0.52)
25 0 (0.52)
12.5 0 (0.55)
No pollutant associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 1.03 (0.54)
(a) 1,000x the actual values of [theta] used to generate mortality.
(b) 1,000x the bias of the estimated individual pollutant effects
obtained from SPCA. The bias for PCA appears in parentheses.
(c) 1,000x the SD of the estimated individual pollutant effects
obtained from SPCA. The SD for PCA appears in parentheses.
Table 3. Root-mean-squared error of the individual pollutant effect
estimates obtained from SPCA and PCA over sets of 1,000 simulations.
Effect
(1,000 x [theta]) (a) [PM.sub.10] N[O.sub.2] CO
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 1.08 (0.73) (b) 4.70 (3.54) 2.46 (1.16)
50 1.66 (0.54) 3.22 (2.04) 2.16 (0.63)
25 2.73 (0.56) 2.60 (1.35) 2.07 (0.53)
12.5 3.05 (0.58) 1.90 (1.03) 1.58 (0.54)
PM, N[O.sub.2], and CO associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = 1/3, [[alpha].sub.4] =
[[alpha].sub.5] = 0)
100 4.82 (9.97) 3.70 (5.22) 0.74 (10.61)
50 2.23 (4.87) 2.40 (2.5) 1.08 (5.19)
25 2.10 (2.33) 2.42 (1.21) 2.28 (2.49)
12.5 3.10 (1.12) 2.28 (0.74) 2.21 (1.19)
PM associated with mortality ([[alpha].sub.1] = 1,[[alpha].sub.2] =
[[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5] = 0)
100 2.39 (80.99) 0 (22.91) 0 (18.50)
50 1.88 (40.52) 0 (11.44) 0 (9.24)
25 2.71 (20.12) 0 (5.92) 0 (4.78)
12.5 3.25 (9.92) 0.45 (3.20) 0.32 (2.58)
No pollutant associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 1.80 (0.63) 0.80 (0.76) 0.95 (0.62)
Effect
(1,000 x [theta]) (a) [O.sub.3] S[O.sub.2]
All pollutants associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 1/5)
100 12.06 (9.98) 7.85 (0.71)
50 6.59 (4.91) 5.06 (0.53)
25 3.44 (2.38) 3.01 (0.57)
12.5 1.71 (1.12) 1.72 (0.59)
PM, N[O.sub.2], and CO associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = 1/3, [[alpha].sub.4] =
[[alpha].sub.5] = 0)
100 0 (12.02) 0 (23.43)
50 0 (6.08) 0 (11.86)
25 1.06 (3.13) 1.02 (6.10)
12.5 1.23 (1.66) 1.37 (3.24)
PM associated with mortality ([[alpha].sub.1] = 1,[[alpha].sub.2] =
[[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5] = 0)
100 0 (9.77) 0 (19.06)
50 0 (4.88) 0 (9.52)
25 0.97 (2.52) 0 (4.92)
12.5 2.16 (1.36) 0 (2.66)
No pollutant associated with mortality ([[alpha].sub.1] =
[[alpha].sub.2] = [[alpha].sub.3] = [[alpha].sub.4] = [[alpha].sub.5]
= 0)
0 0.91 (0.33) 1.04 (0.64)
(a) 1,000x the actual values of [theta] used to generate mortality.
(b) 1,000x the root-mean-squared error of the estimated individual
pollutant effects obtained from SPCA. The root-mean-squared error for
PCA appears in parentheses.
Table 4. Results of fitting PCA and SPCA to the data from nine U.S.
cities for 1987-2000.
Pollutant loadings (a)
City [PM.sub.10] N[O.sub.2] CO [O.sub.3]
Chicago, IL
SPCA 0 0 0 1
PCA 0.466 0.556 0.45 0.231
Cleveland, OH
SPCA 0 0 0 0
PCA 0.496 0.512 0.437 0.351
Denver, CO
SPCA 0 -0.645 -0.678 0.353
PCA 0.484 0.515 0.536 -0.155
El Paso, TX
SPCA 0 0.707 0 0.707
PCA 0.493 0.538 0.573 0.128
Houston, TX
SPCA 0 0 0 0
PCA 0.281 0.56 0.516 0.354
Jersey City, NJ
SPCA 0.707 0 0 0.707
PCA 0.487 0.522 0.502 0.046
Nashville, TN
SPCA 0.587 0.568 0.552 0
PCA 0.606 0.542 0.432 0.380
Pittsburgh, PA
SPCA 1 0 0 0
PCA 0.512 0.53 0.486 0.196
Salt Lake City, UT
SPCA 0 0 0 0
PCA 0.415 0.54 0.539 -0.105
Pollutant loadings (a)
City S[O.sub.2] Total effect (b)
Chicago, IL
SPCA 0 0.005 (0.003)
PCA 0.467 -0.001 (0.003)
Cleveland, OH
SPCA 0 0 (0)
PCA 0.42 -0.002 (0.006)
Denver, CO
SPCA 0 0.013 (0.004)
PCA 0.435 0.014 (0.006)
El Paso, TX
SPCA 0 -0.023 (0.009)
PCA 0.35 -0.015 (0.01)
Houston, TX
SPCA 0 0 (0)
PCA 0.465 0.002 (0.008)
Jersey City, NJ
SPCA 0 0.006 (0.015)
PCA 0.485 -0.004 (0.012)
Nashville, TN
SPCA 0.168 -0.023 (0.010)
PCA 0.094 -0.023 (0.011)
Pittsburgh, PA
SPCA 0 0.005 (0.003)
PCA 0.427 0 (0.004)
Salt Lake City, UT
SPCA 0 0(0)
PCA 0.484 -0.022 (0.012)
(a) The loadings given to each pollutant by SPCA and PCA. A loading of 0
for SPCA means that the pollutant was not included in the subset of
pollutants retained by SPCA. (b) The estimated increase in mortality
([+ or -] SE) for a simultaneous 1 SD increment in the concentration of
each pollutant. 100x this value is approximately the percentage increase
in mortality. SEs of these estimated effects are in parentheses.
Table 5. Pairwise correlations between the mortality (Mort), temperature
(Temp), dewpoint temperature (Dew), and pollutant time-series data for
Cleveland, OH, and Nashville, TN.
Mort (a) Temp (b) Dew (c) [PM.sub.10] [O.sub.3]
Cleveland, OH
Mort 1 -0.07 -0.07 0.04 -0.02
Temp -0.07 1 0.91 0.41 0.66
Dew -0.07 0.91 1 0.36 0.52
[PM.sub.10] 0.04 0.41 0.36 1 0.56
[O.sub.3] -0.02 0.66 0.52 0.56 1
N[O.sub.2] 0.02 0.09 0.07 0.63 0.36
S[O.sub.2] 0.02 0.04 -0.02 0.48 0.26
CO 0.03 0.02 0.04 0.48 0.21
Nashville, TN
Mort 1 -0.19 -0.18 -0.07 -0.11
Temp -0.19 1 0.94 0.33 0.69
Dew -0.18 0.94 1 0.31 0.56
[PM.sub.10] -0.07 0.33 0.31 1 0.45
[O.sub.3] -0.11 0.69 0.56 0.45 1
N[O.sub.2] -0.04 0.11 0.08 0.44 0.26
S[O.sub.2] 0.03 -0.23 -0.24 0.08 -0.10
CO 0.01 -0.22 -0.22 0.4 -0.07
NO[.sub.2] SO[.sub.2] CO
Cleveland, OH
Mort 0.02 0.02 0.03
Temp 0.09 0.04 0.02
Dew 0.07 -0.02 0.04
[PM.sub.10] 0.63 0.48 0.48
[O.sub.3] 0.36 0.26 0.21
N[O.sub.2] 1 0.56 0.67
S[O.sub.2] 0.56 1 0.4
CO 0.67 0.40 1
Nashville, TN
Mort -0.04 0.03 0.01
Temp 0.11 -0.23 -0.22
Dew 0.08 -0.24 -0.22
[PM.sub.10] 0.44 0.08 0.40
[O.sub.3] 0.26 -0.1 -0.07
N[O.sub.2] 1 0.08 0.36
S[O.sub.2] 0.08 1 0.08
CO 0.36 0.08 1
(a) Nonaccidental daily deaths of individuals [greater than or equal] 65
years of age. (b) The current day's mean 24-hr temperature. (c) The
current day's mean 24-hr dew point temperature.
|
|
||||||||||||||||||

`)
and
i·a·bil
ist adj.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion