Epidemiologic evaluation of measurement data in the presence of detection limits.Quantitative measurements of environmental factors greatly improve the quality of epidemiologic studies epidemiologic study A study that compares 2 groups of people who are alike except for one factor, such as exposure to a chemical or the presence of a health effect; the investigators try to determine if any factor is associated with the health effect but can pose challenges because of the presence of upper or lower detection limits or interfering compounds, which do not allow for precise measured values. We consider the regression of an environmental measurement (dependent variable) on several covariates (independent variables). Various strategies are commonly employed to impute impute v. 1) to attach to a person responsibility (and therefore financial liability) for acts or injuries to another, because of a particular relationship, such as mother to child, guardian to ward, employer to employee, or business associates. values for interval-measured data, including assignment of one-half the detection limit to nondetected values or of "fill-in" values randomly selected from an appropriate distribution. On the basis of a limited simulation study, we found that the former approach can be biased unless the percentage of measurements below detection limits is small (5-10%). The fill-in approach generally produces unbiased parameter estimates but may produce biased variance estimates and thereby distort inference when 30% or more of the data are below detection limits. Truncated truncated adjective Shortened data methods (e.g., Tobit regression) and multiple imputation Multiple imputation is a statistical technique for analyzing incomplete data sets. See also
tr.v. in·di·vid·u·al·ized, in·di·vid·u·al·iz·ing, in·di·vid·u·al·iz·es 1. To give individuality to. 2. To consider or treat individually; particularize. 3. values for measurements below detection limits are needed for additional analysis, such as relative risk regression or graphical display, then multiple imputation produces unbiased estimates and nominal confidence intervals confidence interval, n a statistical device used to determine the range within which an acceptable datum would fall. Confidence intervals are usually expressed in percentages, typically 95% or 99%. unless the proportion of missing data is extreme. We illustrate various approaches using measurements of pesticide residues Pesticide residue refers to the pesticides that may remain on or in food after they are applied to food crops.[1] Regulation of pesticide residue in the US in carpet dust in control subjects from a case-control study case-control study, n an investigation employing an epidemiologic approach in which previously existing incidents of a medical condition are used in lieu of gathering new information from a randomized population. of non-Hodgkin lymphoma Non-Hodgkin lymphoma (NHL) describes a group of cancers arising from lymphocytes, a type of white blood cell. It is distinct from Hodgkin lymphoma in its pathologic features, epidemiology, common sites of involvement, clinical behavior, and treatment. . Key words: dust, environmental exposure, imputation IMPUTATION. The judgment by which we declare that an agent is the cause of his free action, or of the result of it, whether good or ill. Wolff, Sec. 3. , missing data, non-Hodgkin lymphoma, pesticides. Environ Health Perspect 112:1691-1696 (2004). doi:10.1289/ehp.7199 available via http://dx.doi.org/[Online 13 September 2004] ********** Epidemiologic studies often collect quantitative measurement data to improve precision and reduce bias in exposure assessment and in the estimation of the effect of exposure on risk of disease, as measured by odds ratios (Hatch and Thomas 1993; Sire SIRE. A title of honor given to kings or emperors in speaking or writing to them. 2002). Some measurements serve as biomarkers for "dose"--for example, residual radiation Nuclear radiation caused by fallout, artificial dispersion of radioactive material, or irradiation which results from a nuclear explosion and persists longer than one minute after burst. See also contamination; induced radiation; initial radiation. in tooth enamel enamel, a siliceous substance fusible upon metal. It may be so compounded as to be transparent or opaque and with or without color, but it is usually employed to add decorative color. It was used to decorate jewelry in ancient Egypt, Greece, and Rome. as a marker of exposure to ionizing radiation i·on·i·zing radiation n. High-energy radiation capable of producing ionization in substances through which it passes. Ionizing radiation (Desrosiers and Schauer 2001)--whereas other measures are more indirect--for example, urinary cotinine cotinine (kō´tinēn), n a substance that remains in body fluids after nicotine has been used. Presence of this chemical in body fluids is considered proof of recent nicotine use. level as an indicator of exposure to environmental tobacco smoke environmental tobacco smoke (ETS/passive smoke), n the gaseous by-product of burning tobacco products, including but not limited to commercially manufactured cigarettes and cigars; contains toxic elements harmful to the health of adults and children (Woodward and Al Delaimy 1999). Problems in the analysis of measurement data commonly arise because measurement procedures often have detection limits (DLs). A DL may represent a floor value, a ceiling value, or an interval where precise quantitative levels cannot be determined. For example, exposure assessment for nuclear workers relied on radiation film badges film badge (baj) a pack of radiographic film or films, usually worn on the body during potential exposure to radiation in order to detect and quantitate the dosage of exposure. that record radiation levels only above a fixed minimum, because of limits in film photosensitivity Photosensitivity Definition Photosensitivity refers to any increase in the reactivity of the skin to sunlight. Description The skin is a carefully designed interface between our bodies and the outside world. (Gilbert et al. 1996; Kerr 1994). Investigators encountered ceiling levels of particle-bound polycyclic aromatic hydrocarbons polycyclic aromatic hydrocarbon n. Any of a class of carcinogenic organic molecules that consist of three or more rings containing carbon and hydrogen and that are commonly produced by fossil fuel combustion. in rural Chinese dwellings when values exceeded 60,000 ng/[m.sup.3], the upper limit of the measurement protocol (Ligman et al. 2004). Although values below or above a DL are "missing," data are not missing at random in the usual sense, because their absence reflects levels of exposure. This type of missing data is called "nonignorable missing," and the simple exclusion of such "interval-measured" data can bias results (Little and Rubin 1987; Schafer 1997). Analytic procedures for environmental measurement data with DLs are often presented in the context of environmental monitoring where the primary goal is estimation of distributional parameters when numbers of measurements are limited (Gleit 1985; Haas and Scheff 1990; Helsel 1990; Persson and Rootzen 197; Singh and Nocerino 2002; Travis and Land 1990). In epidemiologic studies, measurement data are used to characterize exposures of study subjects and are typically employed in two ways: a) to develop regression models to examine the relationship between a measured value (dependent variable) and covariates (independent variables); and b) as covariates in a risk analysis to estimate the relationship between a binary disease outcome and exposure measures and other factors. In this article, we focus on the first application, namely, the regression of an exposure measurement on covariate factors. The use of measurements with DLs in risk regression will be considered in another article. Investigators apply various strategies for measurement data with DLs, including replacement of measurements below a DL with a single value, for example, DL, DL/2, or DL/[square root of (2)] (Helsel 1990; Hornung and Reed 1990). Less frequently, measurements below a DL are assigned a value of zero. Unless such measurements indicate a true zero exposure, this latter strategy clearly distorts results and is not considered further in this article. If the distribution of the measurement data is known--for example, measurements are log-normally distributed--then an alternative strategy replaces values below the DL with expected values Expected value The weighted average of a probability distribution. Also known as the mean value. of the missing measurements, conditional on being less than the DL (Garland et al. 1993; Gleit 1985). For measurement Z and detection limit DL, we denote this value E[Z|Z < DL]. Calculation of the conditional expected value requires the investigator to either know or estimate parameters of the measurement distribution. Substitution schemes like those described above are simple, because one value replaces all measurements below the DL, and, except for E[Z|Z < DL], distributional assumptions are not considered. However, because a single value represents all measurements below the DL, parameter estimates and their variances are likely biased, unless the proportion is small, which potentially distorts inference. This limitation led to a single-impute "fill-in" method (Helsel 1990; Moschandreas et al. 2001a, 2001b). An investigator first characterizes the form of the distribution and estimates its parameters and then assigns randomly sampled values below the DL from the estimated distribution. Fill-in values along with measured values above the DL are then used in analyses. With appropriate estimation techniques, this approach accommodates multiple DLs. As described by Helsel (1990) and applied by Moschandreas et al. (2001b), the fill-in method did not include complex modeling of regression factors. In addition, although the fill-in approach assigned random values from an appropriate distribution, it did not account for the variability of the imputation process, because the inserted values are not real data. In this article, we illustrate methods for epidemiologic data that account for measurements with DLs, using data from a case-control study of non-Hodgkin lymphoma (NHL NHL Non-Hodgkin's lymphoma, see there ) (Colt et al. 2004). The example evaluates the relationship between concentrations of pesticide analytes in carpet dust and use of pesticide products in and around the home. We restrict analysis to control subjects, with adjustment for study design factors. Example Data from a Case-Control Study of NHL and Pesticides The principal exposure of the general population to pesticides occurs in the home (Nigg et al. 1990) as the result of indoor use, track-in or drift from outdoors, intrusion of vapors from foundation treatments, or take-home contamination from occupational use (Bradman et al. 1997; Lewis et al. 1999, 2001). Pesticide residues are retained in carpets, migrating into the underlying foam pad, and may persist for months or years. Data source. We consider data from controls from a multicenter, population-based case-control study of NHL, conducted in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. : the Detroit, Michigan “Detroit” redirects here. For other uses, see Detroit (disambiguation). Detroit (IPA: [dɪˈtʰɹɔɪt]) (French: Détroit, meaning strait , metropolitan area; the state of Iowa; Los Angeles County, California Los Angeles County is a county in California and is by far the most populous county in the United States. Figures from the U.S. Census Bureau give an estimated 2006 population of 9,948,081 residents,[1] while the California State government's population bureau lists a ; and the Seattle, Washington The reason for its protection is listed on the protection policy page. , metropolitan area (Colt et al. 2004). Controls include 1,057 residents 20-74 years of age, frequency matched to cases on age, sex, race, and study area, with an oversampling Creating a more accurate digital representation of an analog signal. In order to work with real-world signals in the computer, analog signals are sampled some number of times per second (frequency) and converted into digital code. of African-American subjects in Los Angeles Los Angeles (lôs ăn`jələs, lŏs, ăn`jəlēz'), city (1990 pop. 3,485,398), seat of Los Angeles co., S Calif.; inc. 1850. and Detroit. Interviewers collected information from respondents on lifetime residential history and the frequency and form of pesticides used to treat various types of pests (e.g., flying insects, crawling insects, lawn weeds). Interviewers obtained vacuum cleaner vacuum cleaner, mechanical device using a draft of air to remove dust, loose dirt, or other particulate matter from dry surfaces. It is especially useful on highly textured surfaces, such as carpets and upholstery, that are difficult to clean by wiping or brushing. bags from 95% of subjects who had used their vacuum cleaners within the past year and had owned at least half of their carpets or rugs for 5 years or more. Bags were shipped in insulated in·su·late tr.v. in·su·lat·ed, in·su·lat·ing, in·su·lates 1. To cause to be in a detached or isolated position. See Synonyms at isolate. 2. boxes by overnight mail to Southwest Research Institute Southwest Research Institute (SwRI), headquartered in San Antonio, Texas, is one of the oldest and largest independent, nonprofit, applied research and development (R&D) organizations in the United States. Founded in 1947 by Thomas Slick, Jr. and placed in freezers. Samples were collected and analyzed for 513 control subjects. Measurement of carpet dust. The protocol for the collection and measurement of dust samples has been described previously (Colt et al. 2004). Briefly, before extraction and analysis, dust samples were sieved through a 100-mesh sieve to obtain the fine (< 150 [micro]m) dust. Neutral extractions were carried out for 25 pesticides (18 insecticides insecticides, chemical, biological, or other agents used to destroy insect pests; the term commonly refers to chemical agents only. Chemical Insecticides , six herbicides, and ortho-phenylphenol), seven polycyclic aromatic hydrocarbons, and five polychlorinated biphenyl polychlorinated biphenyl or PCB, any of a group of organic compounds originally widely used in industrial processes but later found to be dangerous environmental pollutants. congeners. Acid extractions were carried out for four herbicides and pentachlorophenol pentachlorophenol a wood preservative with great capacity to enter the body by any route, including percutaneously; causes weight loss, low milk production and general debility. . Extracts were analyzed using gas chromatography/mass spectrometry spectrometry /spec·trom·e·try/ (spek-trom´e-tre) determination of the wavelengths or frequencies of the lines in a spectrum. spec·trom·e·try n. (GC/MS GC/MS Gas Chromatograph/Mass Spectrometer GC/MS Gas Chromatograph/Mass Spectrometry GC/MS Gas Chromatograph/Mass Spectrograph ) in selected ion monitoring mode. Analyte amounts were quantified using the internal standard method. In the full study, GC/MS analysts were blinded to disease status. After analyzing about half of the samples, investigators began monitoring additional ions for some neutral analytes to clarify identification at low levels, resulting in raised DLs for 14 pesticides. DLs were also raised when < 2 g dust were available. An additional problem with some dust samples involved the presence of interfering compounds (i.e., compounds that coeluted with the target analyte), creating uncertainty and prohibiting assignment of specific concentrations. For three scenarios analysts could provide concentrations only within an interval, which we accommodated by defining a lower bound (LB) and an upper bound (UB) of possible values. If the analyte was not detected and no interferences were present (type I), the LB was set to zero and the UB was set to the specified DL. If there was an interfering compound but insufficient evidence insufficient evidence n. a finding (decision) by a trial judge or an appeals court that the prosecution in a criminal case or a plaintiff in a lawsuit has not proved the case because the attorney did not present enough convincing evidence. for the presence of the target analyte (type II), the GC/MS analyst reported the result as a nondetect with a DL equal to the entire peak of the coeluting compounds. We set the LB to zero and the UB to 20% of the raised peak or to the DL, whichever was larger. If the target analyte and the interference were both present (type III Type III may stand for:
For ease of presentation, we allow the replacement of measurements below the DL with DL/2 (which applies to missing data types I and II) to refer more generally to the replacement with (LB + UB)/2 (which applies to missing data types I, II, and III). Methods and Analysis Preliminary analysis indicates that measurement data are consistent with a log-normal distribution In probability and statistics, the log-normal distribution is the single-tailed probability distribution of any random variable whose logarithm is normally distributed. If Y is a random variable with a normal distribution, then X = exp(Y . If Z denotes the measured value of an analyte and is log-normally distributed, denoted Z ~ LN([mu], [[sigma].sup.2]), then by definition log(Z) is a normal random variable Normal random variable A random variable that has a normal probability distribution. with mean [mu] and variance [[sigma].sup.2], denoted log(Z) ~ N([mu], [[sigma].sup.2]) (Singh et al. 1997). Suppose X = [([X.sub.0], ..., [X.sub.K]).sup.t] is a column vector In linear algebra, a column vector is an m × 1 matrix, i.e. a matrix consisting of a single column of elements.v. To transfer one tissue, organ, or part to the place of another. . If data are complete, then a linear regression Linear regression A statistical technique for fitting a straight line to a set of data points. equation has the form log(Z) = [[beta].sup.t]X + [epsilon], where [epsilon] ~ N(0, [[sigma].sup.2]). For each X, the model implies that Z is log-normally distributed with mean [[beta].sup.t]X; that is, Z ~ LN([[beta].sup.t]X, [[sigma].sup.2]). Regression analysis In statistics, a mathematical method of modeling the relationships among three or more variables. It is used to predict the value of one variable given the values of the others. For example, a model might estimate sales based on age and gender. in control data. We evaluate the association between analyte concentration and pesticide use by fitting a linear regression model of the logarithm logarithm (lŏg`ərĭthəm) [Gr.,=relation number], number associated with a positive number, being the power to which a third number, called the base, must be raised in order to obtain the given positive number. of the analyte level on subject characteristics. Regression (independent) covariates include indicator variables for season of sample collection, presence of oriental rugs Oriental rug n. A rug made of wool that is knotted or woven by hand, often in complex and highly stylized designs, and produced in the Middle East and in many other parts of Asia. , study center, sex, age (< 45, 45-64, [greater than or equal to] 65 years), race (African American African American Multiculture A person having origins in any of the black racial groups of Africa. See Race. , Caucasian, other), type of home (single family, townhouse/duplex/apartment, other), year of home construction (< 1940, 1940-1959, 1960-1979, [greater than or equal to] 1980), and educational level (< 12, 12-15, [greater than or equal to] 16 years). As in Colt et al. (2004), covariates vary slightly with analyte. Models also include five variables describing the use of insect treatment products: ever/never used products to treat for crawling insects, flying insects, fleas/ticks, termites, and lawn/garden insects. We use data from current homes only. Regression analysis is hampered by the presence of measurements known only within bounds. We assume that the probability distributions Many probability distributions are so important in theory or applications that they have been given specific names. Discrete distributions With finite support
The function that describes the change of certain realizations for a continuous random variable. for a log-normal random variable. Suppose [X.sub.i] = [([X.sub.i0], ..., [X.sub.iK]).sup.t] is the covariate vector for the ith of i = 1, ..., n subjects. [LB.sub.i] and [UB.sub.i] are recorded for i = 1, ..., [n.sub.0] individuals, whereas a specific [Z.sub.i] measurement is recorded for i = [n.sub.0] + 1, ..., [n.sub.0] + [n.sub.1] individuals. LB and UB are subscripted to allow different DLs. Using a Tobit regression approach (Gilbert 1987; Persson and Rootzen 1977; Tobin 1958), the log-likelihood function has the form [MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression. NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. .] [1] The first summand derives from the [n.sub.0] interval measured values and involves the difference of the cumulative distribution function F evaluated at UB and at LB; that is, the probability the measurement lies between the LB and UB. The second summand derives from the [n.sub.1] detected values. Maximum likelihood estimates (MLEs) for [beta] and their covariance matrix In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable. are obtained by maximizing Equation 1 and computing the inverse information matrix using standard methods. Imputation of missing concentrations. If the goal is to evaluate pesticide use and analyte levels in carpet dust, represented by the [beta] parameters, then the Tobit regression of Equation 1 is sufficient and no imputation is required. For further analysis or for graphical display, it is useful to generate values for measurements below DLs. We consider several different approaches, including inserting DL/2, inserting E[Z|Z < DL], or using a single or multiple imputation (Little and Rubin 1987). A multiple imputation procedure is carried out as follows. Using all data (measured concentrations, missing data types I-III, and covariates), we create the log-likelihood function 1; solve for the MLEs of [beta] and [[sigma].sup.2] (denoted [??] and [[??].sup.2]), and impute a value by randomly sampling from a log-normal distribution with the estimated parameters. However, in selecting fill-in values we cannot ignore that [??] and [[??].sup.2] are themselves estimates with uncertainties. We therefore do not use [??] and [[??].sup.2] for the imputation, but rather [??] and [[??].sup.2], which are estimated from a bootstrap See boot. (operating system, compiler) bootstrap - To load and initialise the operating system on a computer. Normally abbreviated to "boot". From the curious expression "to pull oneself up by one's bootstraps", one of the legendary feats of Baron von Munchhausen. sample of the data (Efron 1979). Bootstrap data are generated as described below by sampling with replacement, and represent a sample from the same universe as the original data. We repeat the process to create multiple data sets, which are then independently analyzed and combined in a way that accounts for the imputation. Differences in regression results in the multiple data sets reflect variability due to the imputation process. This procedure, however, omits a source of variability. We have tacitly assumed that the LB and UB are fixed and known in advance. When there are no interfering compounds (missing type I), the assumption is justified because the DL is determined before the GC/MS dust analysis. When there are interfering compounds (missing types II and III), the assumption cannot be fully justified because the bounds depend on the amount of interference and therefore are random. In the NHL data, we assume this uncertainty is small relative to other uncertainties. The imputation proceeds as follows: Step 1: Create a bootstrap sample and obtain estimates [??] and [[??].sup.2] based on Equation 2. Bootstrap data are generated by sampling with replacement n times from the n subjects. Sampling "with replacement" selects one record at random and then "puts it back" and selects a second record. After n repetitions, some subjects are selected multiple times, whereas other subjects are not selected at all. If [w.sub.i] is the number of times the ith subject is sampled, then the log-likelihood function for the bootstrap data is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] [2] Step 2: Impute analyte values based on sampling from LN ([[??].sup.t]X, [[??].sup.2]). For the ith subject, assign the value [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] [3] This quantity consists of various elements. F([LB.sub.i]; [[??].sup.t]X, [[??].sup.2]) and F([UB.sub.i]; [[??].sup.t]X, [[??].sup.2]) are the cumulative probabilities at [UL.sub.i] and [UB.sub.i], respectively, based on parameters [??], [[??].sup.2]. Both values lie between zero and one. Select randomly from a uniform distribution on the interval [a, b], denoted Unif[a, b], in particular the interval [F([LB.sub.i]; [[??].sup.t][X.sub.i], [[??].sup.2]), F([UB.sub.i]; [[??].sup.t][X.sub.i], [[??].sup.2])]. The inverse cumulative distribution function, [F.sup.-1](*), is the required imputed value Imputed value Refers to the value of an asset, service, or company that is not physically recorded in any accounts but is implicit in the product, e.g., the opportunity cost of cash remaining in a savings account and not invested. in original units between [LB.sub.i] and [UB.sub.i]. Repeat using the same [??], [[??].sup.2] for each missing value. Detected values are not altered. Step 3: Repeat steps 1 and 2 to create M plausible (or "fill-in") data sets. Remarkably, M need not be large, and a recommended value is between 3 and 5, with larger values if greater proportions of data are missing (Little and Rubin 1987; Rubin 1987). We select M = 10 to fully account for the variance from the imputation. Step 4: Fit a regression model to each of the M data sets and obtain M sets of parameter estimates and covariance Covariance A measure of the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely. matrices. Combine the M sets of estimates to account for the imputation (Little and Rubin 1987; Schafer 1997). The imputation procedure results in confidence intervals (CIs) that are wider than the single-imputation, fill-in approach. Simulation study. We conducted a simulation study, using a simple regression Noun 1. simple regression - the relation between selected values of x and observed values of y (from which the most probable value of y can be predicted for any value of x) regression toward the mean, statistical regression, regression model with zero intercept and no covariates, to evaluate the imputation approaches, the effects of the proportion of data below the DL, and sample size. We generated data sets of size n by sampling from a log-normal distribution with parameters ([mu], [[sigma].sup.2]), and defined the DL such that in expectation p percent of the samples falls below the DL; that is, DL = [F.sup.-1](p; [mu], [[sigma].sup.2]). The simulation involves 5,000 independent data sets for each set of parameters. We compared five approaches: a) direct estimation (Tobit regression) of MLEs ([??] and [[??].sup.2]) using Equation 1; b) multiple imputation with allowance for uncertainty in model parameters; c) single imputation based on a random fill-in value for each datum The singular form of data; for example, one datum. It is rarely used, and data, its plural form, is commonly used for both singular and plural. below the DL, using MLEs ([??] and [[??].sup.2]) from Equation 1; d) insertion of DL/2 for all data below the DL; and e) insertion of E[Z|Z < DL] for data below the DL with the expected value based on the MLEs ([??] and [[??].sup.2]) from Equation 1. For approaches b) through e), estimators are the mean and variance of the logarithm of the observed and imputed Attributed vicariously. In the legal sense, the term imputed is used to describe an action, fact, or quality, the knowledge of which is charged to an individual based upon the actions of another for whom the individual is responsible rather than on the individual's data, with adjustment for multiple imputation in b). We compare results with estimates based on complete data. For the NHL example, we use SAS (1) (SAS Institute Inc., Cary, NC, www.sas.com) A software company that specializes in data warehousing and decision support software based on the SAS System. Founded in 1976, SAS is one of the world's largest privately held software companies. See SAS System. (SAS System (1) Originally called the "Statistical Analysis System," it is an integrated set of data management and decision support tools from SAS that runs on platforms from PCs to mainframes. for Windows, version 8.2; SAS Institute SAS Institute Inc., headquartered in Cary, North Carolina, USA, has been a major producer of software since it was founded in 1976 by Anthony Barr, James Goodnight, John Sall and Jane Helwig. Inc., Cary, NC) to generate bootstrap samples, fit linear regressions (PROC (language) PROC - The job control language used in the Pick operating system. ["Exploring the Pick Operating System", J.E. Sisk et al, Hayden 1986]. REG), solve log-likelihood Equations 1 and 2 (PROC LIFEREG), and combine results from multiple data sets (PROC MIANALYZE). The simulation was conducted using MATLAB (MATrix LABoratory) A programming language for technical computing from The MathWorks, Natick, MA (www.mathworks.com). Used for a wide variety of scientific and engineering calculations, especially for automatic control and signal processing, MATLAB runs on Windows, Mac and (version 7.0; MathWorks Inc., Natick, MA). Results We limited results to four insecticides, which exhibited various types and proportions of missing data: propoxur and carbaryl carbaryl (kär`bärəl): see insecticides. , both carbamate carbamate /car·ba·mate/ (kahr´bah-mat) any ester of carbamic acid. car·ba·mate n. A salt or ester of carbamic acid. insecticides; chlorpyrifos, an organophosphate organophosphate /or·ga·no·phos·phate/ (or?gah-no-fos´fat) an organic ester of phosphoric or thiophosphoric acid; such compounds are powerful acetylcholinesterase inhibitors and are used as insecticides and nerve gases. ; and [alpha]-chlordane, an organochlorine or·gan·o·chlo·rine n. Any of various hydrocarbon pesticides, such as DDT, that contain chlorine. . Regression analysis in control subjects. After omitting subjects' missing questionnaire data, there are 478 control subjects with carpet dust measurements and all regression variables. The percentages of measurements below DLs or known only within bounds vary from 25.7% for propoxur to 67.0% for carbaryl (Table 1). The arithmetic mean (mathematics) arithmetic mean - The mean of a list of N numbers calculated by dividing their sum by N. The arithmetic mean is appropriate for sets of numbers that are added together or that form an arithmetic series. (AM), geometric mean (mathematics) geometric mean - The Nth root of the product of N numbers. If each number in a list of numbers was replaced with their geometric mean, then multiplying them all together would still give the same result. (GM), and geometric standard deviation In probability theory and statistics, the geometric standard deviation describes how spread out are a set of numbers whose preferred average is the geometric mean. If the geometric mean of a set of numbers is denoted as μg (GSD GSD German Shepherd Dog GSD Graduate School of Design GSD Glycogen Storage Disease GSD General Services Division GSD Gundam Seed Destiny (anime) GSD Ground Sample Distance GSD Geometric Standard Deviation ), with fill-in imputations for interval-measured values, indicate that concentrations for the individual analytes varied substantially. For carbaryl and [alpha]-chlordane, the GM falls within the range of missing data. Figure 1A and B show quantile quantile division of a total into equal subgroups; includes terciles, quartiles, quintiles, deciles, percentiles. plots for measurements of propoxur and carbaryl and reveals good concordance concordance /con·cor·dance/ (-kord´ins) in genetics, the occurrence of a given trait in both members of a twin pair.concor´dant con·cor·dance n. with a log-normal distribution; Figure 1A and B show values used for imputation based on DL/2, denoted by stars, and the conditional expected value, denoted by triangles. For carbaryl, DL/2 values are nearly twice the conditional expected values. By construction, the fill-in values conform to Verb 1. conform to - satisfy a condition or restriction; "Does this paper meet the requirements for the degree?" fit, meet coordinate - be co-ordinated; "These activities coordinate well" the estimated distribution. [FIGURE 1 OMITTED] Table 2 shows proportional effects of the use of the insecticide insecticide Any of a large group of substances used to kill insects. Such substances are mainly used to control pests that infest cultivated plants and crops or to eliminate disease-carrying insects in specific areas. products in and around the home for direct estimation of regression parameters (Tobit regression), the multiple imputation approach, the replacement of missing concentrations by DL/2 and E[Z|LB < Z < UB], and a single set of fill-in values. Results differ slightly from those reported by Colt et al. (2004) due to differences in regressor variables. For the fill-in approach, we impute missing values In statistics, missing values are a common occurrence. Several statistical methods have been developed to deal with this problem. Missing values mean that no data value is stored for the variable in the current observation. using a model with regression variables (denoted "yes") and without regression variables except for an intercept variable (denoted "no"). In several instances, estimates for the various types of pests treated differ substantially, particularly for analytes with a high percentage of missing data. The multiplicative mul·ti·pli·ca·tive adj. 1. Tending to multiply or capable of multiplying or increasing. 2. Having to do with multiplication. mul standard errors for the replacement approaches (i.e., inserting DL/2, E[Z|LB < Z < UB], or a fill-in value) are smaller than standard errors from the multiple imputation approach and direct estimation. The smaller standard errors result from an inadequate account of missing data and result in CIs that are too narrow and inflated type I error rates. Table 2 shows several variables that do not achieve traditional significance levels when imputation is taken into account. In some instances, there are marked differences in estimates. Estimated increases in carpet dust levels of [alpha]-chlordane among subjects treating for termites are 2.6- and 3.1-fold based on DL/2 insertion and fill-in methods, respectively, and 3.7-fold based on multiple imputation and direct estimation approaches. Comparing the two fill-in approaches, standard errors are smaller when the covariate information is included than when covariate information is omitted. Fill-in values are obtained from regression models by sampling from LN([[??].sup.t][X.sub.i], [[??].sup.2]). Figure 1C and D show quantile plots of residuals, that is, from exp exp abbr. 1. exponent 2. exponential [log(Z) - [[??].sup.t]X] for each subject. Although GMs of the residuals are close to the expected value of 1.0 for the error distributions, plots suggest a slight underprediction at extreme values for propoxur and carbaryl. Simulation study. For the simulation study, we set [mu] = 0 and [[sigma].sup.2] = 1 without loss of generality Without loss of generality (abbreviated to WLOG or WOLOG and less commonly stated as without any loss of generality) is a frequently used expression in mathematics. and present results for n = 50, 100, 200, and 400 and with DLs such that the expected proportions of values below the DL are p = 10, 30, 50, and 70%. With 5,000 repetitions, the standard error for coverage of 95% CIs is 0.003. Table 3 shows that estimates of [mu] based on Tobit regression, multiple imputation, and single impute fill-in approaches are generally unbiased. Insertion of DL/2 or E[Z|Z < DL] results in substantial bias unless the proportion of missing data is small, [less than or equal to] 10%. Table 3 also shows coverage of the 95% CI for the estimate of [mu]. In comparison with complete data, Tobit regression and the multiple imputation approaches are the only methods that achieve nominal coverage over a broad range of simulation parameters, although the multiple imputation begins to degrade TO DEGRADE, DEGRADING. To, sink or lower a person in the estimation of the public. 2. As a man's character is of great importance to him, and it is his interest to retain the good opinion of all mankind, when he is a witness, he cannot be compelled to disclose when more than about 50% of the measurements are below DLs. The single imputation approach results in anomalous CIs when about [greater than or equal to] 30% of the data are below DLs. Discussion Results of our analysis of use of pesticide products in and around the home and pesticide residues in carpet dust and of the simulation study suggest that the method of imputation of missing environmental measurement data can substantially affect estimation of effects and statistical inference Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population. It is distinguished from descriptive statistics. . The practice of inserting a single value, such as DL/2 or the conditional expected value E[Z|Z < DL] or by analogy DL/[square root of (2)], is ill-advised unless there are relatively few measurements below DLs. The use of a single imputation to fill in missing data is unbiased or minimally biased quite generally but suffers from inaccurate estimates of variance and, consequently, CIs that are too narrow, particularly when missing data exceed about 30%. The best protection against biased inference in the presence of nonignorable missing data is the use of multiple imputation, although with a high proportion of values below the DL, a large number of measurements are needed. It is worth reiterating, however, that multiple imputation is necessary only if explicit values are needed for measurements below DLs. If the purpose is to estimate regression parameters, then procedures for truncated data, such as Tobit regression, are nominal (Little and Rubin 1987). In environmental monitoring, estimation of distributional parameters is often problematic because of limited numbers of measurements and an inability to evaluate distributional forms precisely. With 5-15 measurements, MLEs can be biased (Gleit 1985), suggesting the need for more robust approaches (Helsel 1990). With epidemiologic data, which usually include hundreds or thousands of measurements, MLEs are unbiased and fully efficient (Gilliom and Helsel 1986), and more detailed regression analyses are feasible. When analyzing environmental data on pesticides, Moschandreas et al. used a fill-in imputation approach that applied the "best-fitting" probability distribution Probability distribution A function that describes all the values a random variable can take and the probability associated with each. Also called a probability function. probability distribution for values above a DL (Helsel 1990; Moschandreas et al. 2001a, 2001b), although Helsel and Hirsch (1991) had cautioned that the approach should be used primarily for estimating summary statistics. The approach we outline permits multiple DLs, incorporates regression parameters, and applies multiple imputation to account correctly for interval-measured data and to allow unbiased inference. However, our simulation study suggests that the fill-in approach may be quite adequate when measurements below the DL account for less than about 30% of the data. The Tobit regression and multiple imputation approaches assume that the limits of detection are fixed and known in advance. In our example, we are justified in assuming DLs are fixed for type I missing measurements, but not for type II and III missing data where DLs depend on the amount of interfering compounds and are random variables. If the DL is not known, an estimate of its value is the minimum order statistic In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. of the observed measurements--that is, the smallest measured value. Simulations suggest that for a random DL, estimates remain unbiased but variances are underestimated (Zuehlke 2003). Thus, CIs in Table 2 may be too narrow. However, relative to other sources of uncertainty that arise in the collection and handling of carpet dust samples, and the accuracy of questionnaire information, additional uncertainty induced by random DLs for type II and III missing values is likely small. Environmental data are frequently well approximated by a log-normal distribution, and our data on concentrations of pesticide analyte in carpet dust are consistent with this assumption. Equations 1 and 2 remain valid for more general distributions, although estimation of parameters may be more problematic and necessitate potentially computer-intensive search algorithms In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. . Validity of parameter estimates and their variances depend, of course, on the correct choice of error distribution. Our simulation study was based on a correct distributional form; however, misspecification of the probability model can lead to markedly biased results (Paarsch 1984). In the absence of knowledge about the error distribution, semiparametric and nonparametric methods have been proposed (Austin 2002a; Chay and Powell 2001; DiNardo and Tobias 2001). Bayesian approaches have also been suggested in the Tobit regression context (Austin 2002b). A reviewer suggested considering the set of measurements of a subject as a vector of multivariate The use of multiple variables in a forecasting model. outcomes, so that the covariance structure among the analytes could provide information for the imputation process. In our example, this requires the assumption that the logarithms of the measurements are multivariate normally distributed. The suggestion, however, adds complexity as the number of analytes increases, and additional work is needed to evaluate its practical feasibility. The motivation for this work arose from the analysis of pesticide analytes in carpet dust and use of pesticide products in and around the home. However, data with DLs arise in a variety of settings, including upper DLs from health-care-related questionnaire data (Austin 2002a) and psychological profile scores, such as the Fagerstrom test for nicotine dependence (Fagerstrom and Schneider 1989; Heatherton et al. 1991) and lower DLs in radiation film badge measurements (Gilbert et al. 1996; Kerr 1994). In summary, with epidemiologic data, our analyses indicate that unless there are very few measurements below DLs (< 5-10%), inserting DL/2, E[Z|Z < DL], or any single value to impute missing measurement data is not advisable. Further, inserting a randomly selected fill-in value is also inadvisable, unless the proportion of missing data is less than about 30%. Multiple imputation of missing data is the best approach of ensuring unbiased estimates of effects and nominal CIs. Support for this study included contracts with the National Cancer Institute: N01-PC-67010, N01-PC-67008, N02-PC-71105, N01-PC-67009, and N01-PC-65064. The authors declare they have no competing financial interests. Received 21 April 2004; accepted 13 September 2004. REFERENCES Austin PC. 2002a. A comparison of methods for analyzing health-related quality-of-life measures. Value Health 5:329-337. Austin PC. 2002b. Bayesian extensions of the Tobit model The Tobit Model is an econometric, biometric model proposed by James Tobin (1958) to describe the relationship between a non-negative dependent variable for analyzing measures of health status. Med Deals Making 22:152-162. Bradman MA, Harnly ME, Draper W, Seidel S sei·del n. A beer mug. [German, from Middle High German s del, from Latin situla, bucket.]Noun 1. , Teran S Teran (Italian: Carso Terrano) is a wine produced on the Kras plateau in Slovenia and Italy, as well as in the West Istrian wine region of Croatia. It is made from the grapes of the vine refošk (Italian: refosco). , Wakeham D, et al. 1997. Pesticide exposures to children from California's Central Valley: results of a pilot study. J Expo Anal Environ Epidemiol 7:217-234. Chay KY, Powell JL. 2001. Semiparametric censored regression models Censored regression models commonly arise in econometrics in cases where the variable of interest is only observable under certain conditions. A common example is labor supply. . J Econ Perspect 15:29-42. Colt JS, Lubin J, Camann D, Davis S, Cerhan J, Severson RK, et al. 2004. Comparison of pesticide levels in carpet dust and self-reported pest treatment practices in four US sites. J Expo Anal Environ Epidemiol 14:74-83. Desrosiers M, Schauer DA. 2001. Electron paramagnetic resonance electron paramagnetic resonance: see magnetic resonance. (EPR EPR Electron Paramagnetic Resonance EPR Extended Producer Responsibility EPR Electronic Patient Record(s) EPR Emergency Preparedness and Response (US DHS) EPR Endpoint Reference EPR Ethylene-Propylene Rubber ) biodosimetry. Nucl Instr Meth B 184:219-228. DiNardo J, Tobias J. 2001. Nonparametric density and regression estimation. J Econ Perspect 15:11-28. Efron B. 1979. Bootstrap methods; another look at the jack-knife. Ann Stat 7:1-26. Fagerstrom KO, Schneider NG. 1989. Measuring nicotine dependence--a review of the Fagerstrom tolerance questionnaire Fagerstrom Tolerance Questionnaire Addiction disorders An instrument for assessing tobacco dependence, which evaluates, among other factors, depth of inhalation, time from awakening to day's first cigarette, smoking when bedridden with illness, and difficulty in . J Behav Med 12:159-182. Garland M, Morris JS, Rosoer BA, Stampfer MJ, Spate VL, Baskett CJ, et al. 1993. Toenail toenail /toe·nail/ (to´nal) the nail on any of the digits of the foot. ingrown toenail see under nail. toe·nail n. trace-element levels as biomarkers--reproducibility over a 6-year period. Cancer Epidemiol Biomarkers Prey 2:493-497. Gilbert ES, Fix JJ, Baumgartner WV. 1996. An approach to evaluating bias and uncertainty in estimates of external dose obtained from personal dosimeters. Health Phys 70:336-345. Gilbert RO. 1987. Statistical Methods for Environmental Pollution Monitoring. New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of :Van Nostrand Reinhold. Gilliom RJ, Helsel DR. 1986. Estimation of distributional parameters for censored cen·sor n. 1. A person authorized to examine books, films, or other material and to remove or suppress what is considered morally, politically, or otherwise objectionable. 2. trace level water quality data. I. Estimation techniques. Water Resour Bes 22:135-146. Gleit A. 1985. Estimation for small normal data sets with detection limits. Environ Sci Technol 19:1201-1206. Haas CN, Scheff PA. 1990. Estimation of averages in truncated samples. Environ Sci Technol 24:912-919. Hatch M, Thomas D Thomas D. (born Thomas Dürr, December 30 1968 in Ditzingen close to Stuttgart, Germany) is a rapper in the German hip hop group Die Fantastischen Vier. He frequently works on solo projects. Life After finishing Realschule he took on an apprenticeship as a barber. . 1993. Measurement issues in environmental epidemiology. Environ Health Perspect 101:49-57. Heatherton TF, Kozlowski LT, Frecker RC, Fagerstrom KO. 1991. The Fagerstrom test for nicotine dependence--a revision of the Fagerstrom tolerance questionnaire. Br J Addict 86:1119-1127. Helsel DR. 1990. Less than obvious--statistical treatment of data below the detection limit. Environ Sci Technol 24:1766-1774. Helsel DR, Hirsch RM. 1991. Statistical methods in water resources. In: Techniques of Water-Resources, Book 4. Reston, VA:U.S. Geological Survey The term geological survey can be used to describe both the conduct of a survey for geological purposes and an institution holding geological information. A geological survey . Available: http://water. usgs.gov/pubs/twri/twri4a3/[accessed 13 August 2004]. Hornung RW Reed LD. 1990. Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg 5:46-51. Kerr GD. 1994. Missing dose from mortality studies of radiation effects among workers at Oak-Ridge-National-Laboratory. Health Phys 66:206-208. Lewis RG, Fortune CR, Blanchard FT, Camann DE. 2001. Movement and deposition of two organophosphorus or·gan·o·phos·pho·rus n. An organophosphate. or gan·o·phos pesticides within a residence after
interior and exterior applications. J Air Waste Manage Assoc 51:339-351.Lewis RG, Fortune CR, Willis RD, Camann DE, Antley JT. 1999. Distribution of pesticides and polycyclic aromatic hydrocarbons in house dust as a function of particle size Particle size, also called grain size, refers to the diameter of individual grains of sediment, or the lithified particles in clastic rocks. The term may also be applied to other granular materials. . Environ Health Perspect 107:721-726. Ligman B, Shaughnessy R, Kleinerman R, Lubin J, Fisher E, Wang ZY, et al. 2004. Indoor air pollution characterization of underground dwellings in China, 1997. Blacksburg, VA:Virginia Polytechnic Institute and State University Virginia Polytechnic Institute and State University, at Blacksburg; land-grant and state supported; coeducational; chartered and opened 1872 as an agricultural and mechanical college. Press, 51-56. Little RJA RJA Royal Jordanian Airlines (ICAO code) RJA Red Jumpsuit Apparatus (band) RJA Rolf Jensen & Associates RJA Repetitive Join Attempt (Unreal game engine security exploit) , Rubin DB. 1987. Statistical Analysis with Missing Data. New York:John Wiley John Wiley may refer to:
Moschandreas DJ, Karuchit S, Kim Y, Ari H, Lebowitz MD, O'Rourke MK, et al. 2001a. On predicting multi-route and multimedia residential exposure to chlorpyrifos and diazinon diazinon an organophosphorus insecticide, used in ear tags for cattle and in flea collars and rinses for dogs. Called also dimpylate. See also organophosphorus compound. . J Expo Anal Environ Epidemiol 11:56-65. Moschandreas DJ, Kim Y, Karuahit S, Ari H, Lebowitz MD, O'Rourke MK, et al. 2001b. In-residence, multiple route exposures to chlorpyrifos and diazinon estimated by indirect method models. Atmos Environ 35:2201-2213. Nigg N, Beier RC, Carter O, Chaisson C, Franklin C, Lavy T, et al. 1990. Exposure to pesticides. In: The Effects of Pesticides on Human Health, Vol 18 (Baker SR, Wilkinson CF, eds). Princeton, NJ:Princeton Scientific, 35-130. Paarsch HJ. 1984. A Monte-Carlo comparison of estimators for censored regression models. J Econ 24:197-213. Persson T, Rootzen H. 1977. Simple and highly efficient estimators for a type I censored normal sample. Biometrika 64:123-128. Rubin DB. 1987. Multiple Imputation for Nonresponse in Surveys. New York:John Wiley & Sons. Schafer JL. 1997. Analysis of Incomplete Multivariate Data. New York:Chapman & Hall. Sim M. 2002. Case studies in the use of toxicological measures in epidemiological studies. Toxicology toxicology, study of poisons, or toxins, from the standpoint of detection, isolation, identification, and determination of their effects on the human body. Toxicology may be considered the branch of pharmacology devoted to the study of the poisonous effects of drugs. 181:405-409. Singh A, Nocerino J. 2002. Robust estimation of mean and variance using environmental data sets with below detection limit observations. Chemometr Intell Lab 60:69-86. Singh AK, Singh A, Engelhardt M. 1997. The Lognormal Distribution Lognormal distribution Pattern of frequency of occurrence in which the logarithm of the variable follows a normal distribution. Lognormal distributions are used to describe returns calculated over periods of a year or more. in Environmental Applications. Washington, DC:U.S. Environmental Protection Agency Environmental Protection Agency (EPA), independent agency of the U.S. government, with headquarters in Washington, D.C. It was established in 1970 to reduce and control air and water pollution, noise pollution, and radiation and to ensure the safe handling and , Office of Solid Waste and Emergency Response. Tobin J. 1958. Estimation of relationships for limited dependent variables. Ecenometrica 26:24-36. Travis CC, Land ML. 1990. Estimating the mean of data sets with nondetectable values. Environ Sci Technol 24:961-962. Woodward A, Al Delaimy W. 1999. Measures of exposure to environmental tobacco smoke--validity, precision, and relevance. Ann NY Acad Sci 895:156-172. Zuehlke TW. 2003. Estimation of a Tobit model with unknown censoring censoring in epidemiology, a loss of information from a study, whether by subjects dropping out of the study or because of infrequent measurement. threshold. Appl Econ 35:1163-1169. Jay H. Lubin, (1) Joanne S. Colt, (1) David Camann, (2) Scott Davis Scott Davis is the name of various people:
(1) Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland Bethesda is an urbanized, but unincorporated, area in southern Montgomery County, Maryland, just Northwest of Washington, D.C. It takes its name from a church located there, the Bethesda Presbyterian Church, built in 1820 and rebuilt in 1850, which in turn took its name from , USA; (2) Southwest Research Institute, San Antonio, Texas “San Antonio” redirects here. For other uses, see San Antonio (disambiguation). San Antonio is the second most populous city in Texas, the third most populous metropolitan area in Texas, and is the seventh most populous city in the United States. As of the 2006 U.S. , USA; (3) Fred Hutchinson
Mayo Clinic voluntary association of more than 500 physicians in Rochester, Minnesota. [Am. Hist.: EB, 11: 723] See : Medicine , College of Medicine, Rochester, Minnesota, USA; (5) Karmanos Cancer Institute and Department of Family Medicine, Wayne State University Wayne State University, at Detroit, Mich.; state supported; coeducational; established 1956 as a successor to Wayne Univ. (formed 1934 by a merger of five city colleges). , Detroit, Michigan, USA; (6) Department of Preventive Medicine preventive medicine, branch of medicine dealing with the prevention of disease and the maintenance of good health practices. Until recently preventive medicine was largely the domain of the U.S. , Norris Comprehensive Cancer Center, Keck v. i. 1. To heave or to retch, as in an effort to vomit. [ imp. & p. p. os> r>; p. pr. & vb. n. os> n. 1. An effort to vomit; queasiness. School of Medicine, University of Southern California The U.S. News & World Report ranked USC 27th among all universities in the United States in its 2008 ranking of "America's Best Colleges", also designating it as one of the "most selective universities" for admitting 8,634 of the almost 34,000 who applied for freshman admission , Los Angeles, California, USA Address correspondence to J. Lubin, National Cancer Institute, Biostatistics biostatistics /bio·sta·tis·tics/ (-stah-tis´tiks) biometry. bi·o·sta·tis·tics n. The science of statistics applied to the analysis of biological or medical data. Branch, 6120 Executive Boulevard, Room 8042, Rockville, MD 20852 USA. Telephone number: (301) 496-3357. Fax: (301) 402-0081. E-mail: lubinj@mail.nih.gov
Table 1. Percentage of measurements below DLs or known only
within bounds and AMs, GMs, and GSDs based on fill-in values
from a single imputation (data on 478 control subjects).
Measurements known
only within bounds
Type I Type II
Insecticide Percent Range Percent Range
Propoxur 21.1 (0-46.0) 2.9 (0-65.0)
Carbaryl 37.9 (0-260.0) 11.1 (0-268)
Chlorpyrifos 28.2 (0-77.4) 0.2 (0-20.9)
[alpha]-Chlordane 60.9 (0-44.7) 0.0 --
Measurements known
only within bounds
Type III Dust (ng/g)
Insecticide Percent Range AM GM GSD
Propoxur 1.7 (21.1-75.7) 456.6 65.6 6.0
Carbaryl 18.0 (20.7-694.8) 1503.0 64.0 14.0
Chlorpyrifos 0.0 -- 893.1 105.6 8.3
[alpha]-Chlordane 0.4 (20.8-29.1) 90.7 11.6 8.0
Types of missing measurements are as follows: no analyte detected
and no interfering compound (I), no analyte detected but with an
interfering compound present (II), and analyte and interfering
compounds both present (III). The range for the DLs reflects
the minimum of LBs and the maximum of UBs for the nondetected
measurements.
Table 2. Proportional increase in analyte concentration
in carpet dust (ng/g) for selected uses.
Crawling insects
Insecticide, imputation
approach (a) method Adjustment exp([beta]) SE
Propoxur
DL/2 No 1.426 (b) 1.167
E[Z|LB < Z < UB] No 1.432 (b) 1.170
Fill-in No 1.459 (b) 1.189
Fill-in Yes 1.511 (b) 1.182
Multiple impute Yes 1.487 (b) 1.196
Direct estimate Yes 1.503 (c) 1.276
Carbaryl
DL/2 No 0.853 1.201
E[Z|LB < Z < UB] No 0.849 1.226
Fill-in No 0.830 1.311
Fill-in Yes 0.940 1.274
Multiple impute Yes 0.826 1.338
Direct estimate Yes 0.785 1.499
Chlorpyrifos
DL/2 No 1.578 (b) 1.209
E[Z|LB < Z < UB] No 1.620 (b) 1.218
Fill-in No 1.917 (b) 1.243
Fill-in Yes 1.745 (b) 1.244
Multiple impute Yes 1.770 (b) 1.252
Direct estimate Yes 1.796 (c) 1.378
[alpha]-Chlordane
DL/2 No 0.966 1.129
E[Z|LB < Z < UB] No 0.954 1.153
Fill-in No 1.060 1.230
Fill-in Yes 0.762 1.206
Multiple impute Yes 0.852 1.363
Direct estimate Yes 0.858 1.379
Flying insects
Insecticide, imputation
approach (a) method Adjustment exp([beta]) SE
Propoxur
DL/2 No 0.987 1.144
E[Z|LB < Z < UB] No 0.986 1.147
Fill-in No 0.966 1.163
Fill-in Yes 1.030 1.157
Multiple impute Yes 1.016 1.165
Direct estimate Yes 0.994 1.235
Carbaryl
DL/2 No 0.661 (b) 1.173
E[Z|LB < Z < UB] No 0.629 (b) 1.194
Fill-in No 0.591 (b) 1.265
Fill-in Yes 0.432 (b) 1.235
Multiple impute Yes 0.508 (b) 1.272
Direct estimate Yes 0.512 (c) 1.413
Chlorpyrifos
DL/2 No 0.779 1.181
E[Z|LB < Z < UB] No 0.771 1.188
Fill-in No 0.757 1.210
Fill-in Yes 0.740 1.210
Multiple impute Yes 0.763 1.223
Direct estimate Yes 0.740 1.323
[alpha]-Chlordane
DL/2 No 0.938 1.112
E[Z|LB < Z < UB] No 0.925 1.132
Fill-in No 0.828 1.198
Fill-in Yes 0.927 1.177
Multiple impute Yes 0.915 1.235
Direct estimate Yes 0.919 1.316
Fleas/ticks
Insecticide, imputation
approach (a) method Adjustment exp([beta]) SE
Propoxur
DL/2 No 1.231 1.153
E[Z|LB < Z < UB] No 1.231 1.156
Fill-in No 1.225 1.173
Fill-in Yes 1.251 1.166
Multiple impute Yes 1.247 1.170
Direct estimate Yes 1.245 1.250
Carbaryl
DL/2 No 1.560 (b) 1.185
E[Z|LB < Z < UB] No 1.703 (b) 1.208
Fill-in No 1.812 (b) 1.285
Fill-in Yes 2.337 (b) 1.252
Multiple impute Yes 2.047 (b) 1.313
Direct estimate Yes 2.180 (b) 1.452
Chlorpyrifos
DL/2 No 1.264 1.182
E[Z|LB < Z < UB] No 1.300 1.190
Fill-in No 1.389 (c) 1.212
Fill-in Yes 1.383 (c) 1.212
Multiple impute Yes 1.401 (c) 1.223
Direct estimate Yes 1.392 1.327
[alpha]-Chlordane
DL/2 No 0.910 1.118
E[Z|LB < Z < UB] No 0.894 1.140
Fill-in No 0.868 1.210
Fill-in Yes 0.908 1.188
Multiple impute Yes 0.804 1.202
Direct estimate Yes 0.803 1.339
Termites
Insecticide, imputation
approach (a) method Adjustment exp([beta]) SE
Propoxur
DL/2 No 1.145 1.219
E[Z|LB < Z < UB] No 1.135 1.223
Fill-in No 1.072 1.249
Fill-in Yes 1.209 1.239
Multiple impute Yes 1.082 1.244
Direct estimate Yes 1.090 1.363
Carbaryl
DL/2 No 1.129 1.266
E[Z|LB < Z < UB] No 1.199 1.300
Fill-in No 1.486 1.417
Fill-in Yes 1.538 1.366
Multiple impute Yes 1.326 1.490
Direct estimate Yes 1.281 1.651
Chlorpyrifos
DL/2 No 1.581 (c) 1.276
E[Z|LB < Z < UB] No 1.613 (c) 1.288
Fill-in No 1.669 (c) 1.322
Fill-in Yes 1.631 (c) 1.323
Multiple impute Yes 1.689 (c) 1.336
Direct estimate Yes 1.698 1.492
[alpha]-Chlordane
DL/2 No 2.626 (b) 1.168
E[Z|LB < Z < UB] No 3.031 (b) 1.199
Fill-in No 3.110 (b) 1.303
Fill-in Yes 3.898 (b) 1.271
Multiple impute Yes 3.686 (b) 1.290
Direct estimate Yes 3.666 (b) 1.442
Lawn/garden
insects
Insecticide, imputation
approach (a) method Adjustment exp([beta]) SE
Propoxur
DL/2 No 0.756 (b) 1.151
E[Z|LB < Z < UB] No 0.751 (b) 1.154
Fill-in No 0.737 (c) 1.171
Fill-in Yes 0.687 (b) 1.165
Multiple impute Yes 0.704 (b) 1.173
Direct estimate Yes 0.714 1.249
Carbaryl
DL/2 No 1.660 (b) 1.183
E[Z|LB < Z < UB] No 1.746 (b) 1.205
Fill-in No 1.735 (b) 1.282
Fill-in Yes 1.779 (b) 1.249
Multiple impute Yes 1.950 (b) 1.351
Direct estimate Yes 2.115 (b) 1.444
Chlorpyrifos
DL/2 No 0.759 1.188
E[Z|LB < Z < UB] No 0.746 1.196
Fill-in No 0.713 (c) 1.219
Fill-in Yes 0.731 1.219
Multiple impute Yes 0.708 1.234
Direct estimate Yes 0.702 1.338
[alpha]-Chlordane
DL/2 No 1.091 1.117
E[Z|LB < Z < UB] No 1.110 1.138
Fill-in No 1.079 1.208
Fill-in Yes 1.293 1.186
Multiple impute Yes 1.169 1.270
Direct estimate Yes 1.211 1.334
Entries are exponentials of parameter estimates ([beta]) and their
SEs from linear regression models of the logarithm of pesticide
analyte on age, sex, race, education, study site, season, and
pesticide use variables. Regression models also included year
house was built (propoxur, carbaryl, [alpha]-chlordane), type
of home (carbaryl), and presence of oriental rugs
([alpha]-chlordane).
(a) See "Materials and Methods" for a description of methods;
adjusted imputation includes regression variables. (b) 95% CI
excludes 1. (c) 90% CI excludes 1.
Table 3. Results of simulation study of imputation
approaches (a) for log-normally distributed data with
[mu] = 0 and [[[sigma].sup.2] = 1 with a DL (entries
are means of 5,000 repetitions).
Complete Tobit
Sample size (no.) Percent < DL data analysis
50
Estimate of [mu] 10.0 0.002 0.000
30.0 0.002 -0.003
50.0 0.002 -0.004
70.0 0.002 -0.006
Coverage of 95% CI 10.0 0.947 0.944
30.0 0.947 0.949
50.0 0.947 0.953
70.0 0.947 0.931
100
Estimate of [mu] 10.0 0.003 0.002
30.0 0.003 0.001
50.0 0.003 0.000
70.0 0.003 -0.006
Coverage of 95% CI 10.0 0.944 0.945
30.0 0.944 0.949
50.0 0.944 0.948
70.0 0.944 0.940
200
Estimate of [mu] 10.0 -0.001 -0.002
30.0 -0.001 -0.003
50.0 -0.001 -0.002
70.0 -0.001 -0.003
Coverage of 95% CI 10.0 0.952 0.950
30.0 0.952 0.955
50.0 0.952 0.948
70.0 0.952 0.947
400
Estimate of [mu] 10.0 0.001 0.001
30.0 0.001 0.000
50.0 0.001 0.001
70.0 0.001 0.000
Coverage of 95% CI 10.0 0.954 0.954
30.0 0.954 0.948
50.0 0.954 0.954
70.0 0.954 0.947
Multi-impute Single impute
using ([??], using ([??],
Sample size (no.) Percent < DL [[??].sup.2]) [[??].sup.2])
50
Estimate of [mu] 10.0 -0.003 -0.003
30.0 -0.003 -0.004
50.0 -0.003 -0.003
70.0 -0.005 -0.002
Coverage of 95% CI 10.0 0.943 0.943
30.0 0.938 0.928
50.0 0.928 0.876
70.0 0.895 0.707
100
Estimate of [mu] 10.0 0.000 0.000
30.0 0.000 0.000
50.0 0.000 -0.001
70.0 -0.004 -0.002
Coverage of 95% CI 10.0 0.940 0.940
30.0 0.938 0.929
50.0 0.922 0.870
70.0 0.904 0.721
200
Estimate of [mu] 10.0 -0.002 -0.002
30.0 -0.003 -0.003
50.0 -0.002 -0.002
70.0 -0.001 -0.002
Coverage of 95% CI 10.0 0.951 0.950
30.0 0.936 0.926
50.0 0.925 0.874
70.0 0.914 0.725
400
Estimate of [mu] 10.0 0.001 0.001
30.0 0.000 0.000
50.0 0.001 0.001
70.0 0.000 0.000
Coverage of 95% CI 10.0 0.952 0.951
30.0 0.938 0.928
50.0 0.927 0.880
70.0 0.914 0.723
Insert Insert
Sample size (no.) Percent < DL DL/2 E[Z|Z < DL]
50
Estimate of [mu] 10.0 -0.020 0.007
30.0 -0.017 0.032
50.0 0.052 0.073
70.0 0.229 0.143
Coverage of 95% CI 10.0 0.943 0.942
30.0 0.942 0.928
50.0 0.938 0.832
70.0 0.280 0.520
100
Estimate of [mu] 10.0 -0.019 0.009
30.0 -0.015 0.034
50.0 0.055 0.076
70.0 0.232 0.142
Coverage of 95% CI 10.0 0.943 0.942
30.0 0.942 0.914
50.0 0.910 0.781
70.0 0.036 0.440
200
Estimate of [mu] 10.0 -0.023 0.006
30.0 -0.019 0.031
50.0 0.052 0.074
70.0 0.229 0.142
Coverage of 95% CI 10.0 0.941 0.946
30.0 0.940 0.904
50.0 0.877 0.708
70.0 0.000 0.306
400
Estimate of [mu] 10.0 -0.021 0.008
30.0 -0.017 0.034
50.0 0.053 0.076
70.0 0.230 0.144
Coverage of 95% CI 10.0 0.931 0.949
30.0 0.941 0.874
50.0 0.776 0.545
70.0 0.000 0.128
(a) Parameter estimation using observed data with DLs (Tobit
analysis), ([??], [[??].sup.2]) multiple imputation with
allowance for uncertainty in model parameters using ([??],
[[??].sup.2]), a single imputation using ([??], [[??].sup.2]),
the insertion of DL/2, and insertion of the expected value
conditional on being belowthe DL, E[Z|Z < DL].
|
|
||||||||||||||||||

elements.
del, from Latin situla, bucket.]
gan·o·phos
Printer friendly
Cite/link
Email
Feedback
Reader Opinion