# A simplified framework for using multiple imputation in social work research.

Missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings. Increasingly, social science researchers are turning to multiple imputation to handle missing data. Multiple imputation, in which missing values are replaced by values repeatedly drawn from conditional probability distributions, is an appropriate method for handling missing data when values are not missing completely at random. However, use of this method requires developing an imputation model from the observed data. This is typically a rigorous and time-consuming process. To encourage wider adoption of multiple imputation in social work research, a simple framework for designing imputation models is presented. The framework and its ability to generate unbiased estimates are demonstrated in a simulation study.KEY WORDS: missing data; multiple imputation; nonresponse

**********

Missing data are ubiquitous in social research, and missingness or nonresponse can represent a threat to the validity of inferences because of undue effects on efficiency, power, and parameter bias (Shadish, Cook, & Campbell, 2002). Social work researchers are now addressing missing data in a more rigorous manner. Recently, Saunders et al. (2006) and Choi, Golder, Gillmore, and Morrison (2005) described important data imputation methods and dispelled misunderstandings regarding popular imputation methods, such as mean substitution.

Recent advances in analytic methods, such as multiple imputation (MI), are taking hold in social work research. With MI, missing values are replaced with values repeatedly drawn from simulated conditional probability distributions (Schafer, 1997), thus creating multiple versions of the data set. Each version of the data set is analyzed according to the data analysis model, and the multiple results are combined into point estimates (Rubin, 1996). A critical task in MI is to devise an imputation model (Allison, 2002) or missing data model (Graham, Olchowski, & Gilreath, 2007), which involves specifying the measures that are putatively associated with the missing values. Although this process adds additional steps, the specification of an imputation model and the creation of multiple data sets can produce less-biased estimates in the presence of missing data across a wide variety of data analysis techniques (Schafer, 1997).

Besides MI, there are many other methods for addressing missing data (Schafer, 1999; Schafer & Graham, 2002). An equally rigorous method known as direct or full information maximum likelihood (FIML) estimation can produce unbiased estimates and correct standard errors in the presence of missing data. When the number of imputations is sufficiently large, identical missing data models will produce the same estimates under MI and FIML (Graham et al., 2007). Unlike MI, FIML is limited to maximum likelihood analytic techniques and the missing data model must be included in the analysis model. Although we focus on MI, the steps we describe for developing an imputation model are equally appropriate for use in the missing data model for FIML. On the basis of the MI literature, this article describes a framework for developing an imputation model for use with any free or commercial software package that performs MI.

BRIEF REVIEW OF MISSING DATA CONCEPTS

Generally, both MI and a broad range of missing data issues have received ample attention in the applied literature (Graham et al., 2007). We briefly discuss the three types of distributions that describe the randomness of nonresponse given that this property has consequences for the development of an imputation model. For a discussion of general missing data concepts that are not critical to understanding our discussion of imputation model development, we refer readers to Schafer and Graham (2002).

Distribution of Nonresponse

The probability distribution of nonresponse--more frequently referred to as the missing data or nonresponse mechanism (Rubin, 1976)--is both an important factor in the decision to impute with MI and a context for the development of an imputation model. Technical definitions are available in Rubin (1976) and Schafer (1997); for less technical definitions see Saunders et al. (2006), Allison (2002), and von Hippel (2004). Schafer and Graham's (2002) work provides helpful diagrams and promotes use of the term distribution over that of mechanism because of the latter's misleading implication of an underlying process. The distribution describes the randomness of missing data in the context of observed data (that is, the extent to which being missing is systematically related to observed or missing values).

Completely Random. As described by Rubin (1976), data may be missing completely at random (MCAR). In MCAR, the probability of nonresponse is independent of either observed or missing values. When nonresponse is MCAR, deletion--also known as complete case analysis--is a potential strategy given that the randomness of the missing values does not induce bias, although efficiency and power will be negatively affected (Schafer & Graham, 2002). However, deletion is the easiest strategy to implement and is regarded as desirable if the assumption of MCAR is supported. MCAR can be tested with a likelihood ratio chi-square test of the null MCAR hypothesis that compares the means of the observed data for the pattern of nonresponse observed on each variable (for a detailed explanation, see Little, 1988).

Systematic. Nonresponse may be systematic, that is, it may be conditioned on observed or missing values. In the worst case scenario, nonresponse may be associated with the missing values, which is known variously as missing not at random (MNAR) (Schafer & Graham, 2002), not missing at random (NMAR) (von Hippel, 2004) or nonignorable (Allison, 2002). This type of nonresponse can be a problem because there is no widely available theoretically grounded strategy for handling MNAK missing data (see Schafer & Graham, 2002). A more desirable distribution occurs when nonresponse is randomly distributed over the unobserved data but is associated with the observed data. This type of nonresponse is known as missing at random (MAR). This is also referred to as ignorable (Allison, 2002).When the data are MAR, techniques used to create the multiple imputations eliminate missing data bias (Schafer, 1999).

Multiple Imputation

With MI, missing values are replaced with values repeatedly drawn from conditional probability distributions by using a simulation method called Markov Chain Monte Carlo (MCMC). As noted previously, this process yields multiple versions of a data set, and the same analysis (for example, a linear regression) is conducted on each version, generating multiple estimates for each parameter. By using a set of rules that adjusts the standard errors for the uncertainty of the imputed values, the multiple estimates are combined to obtain a single parameter estimate (Rubin, 1987). This process adds an additional step to data analysis because it requires the analyst to specify an imputation model before data analysis.

MI can produce unbiased estimates when the fraction of missing information approaches 90% (Graham et al., 2007). This does not imply that 90% of the data points can be missing: The fraction of missing information depends on the proportion of missing data points as well as the covariance of the missing data points with observed data; the lower this relationship is, the higher is the proportion of missing information (Graham et al., 2007). For example, if 50% of the data points on income were missing, then the fraction of missing information is likely to be lower if education is in the imputation model (which has a high putative association with income) than if it is excluded. Therefore, the proportion of missing data points alone is a poor measure for determining whether MI is appropriate for a given situation.

Unfortunately, the MAR assumption underlying MI may not be plausible. This assumption cannot be tested because doing so would require having access to the values that are missing. If the data cannot be tested for MAR, then at best they can only be assumed to be MNAR, and MI does not eliminate missing data bias when data are MNAR. However, Schafer (1997) and Allison (2002) argued that by using an informed and well-constructed imputation model, performing MI on MNAR data could at least reduce bias, even to negligible levels. As Schafer (1997) noted, "The crucial assumption made by [multiple imputation] is not that the propensity to respond is completely unrelated to the missing data, but that this relationship can be explained by data that are observed" (p. 27). For example, if income is frequently missing in a data set--and higher income respondents were less likely to respond--then the missing data are MNAR. However, MI may yield less bias if an observed variable such as education (which covaries with income) is included in the imputation; even less bias is produced when both education and gender (also a variable that is predictive of income) are included. Thus, the focus should be on constructing the best possible imputation model and, arguably, not on assumptions about the nonresponse distribution. The process of setting up an imputation model is complicated, and there is a risk of increasing parameter bias if it is done poorly.

DEVELOPING AN IMPUTATION MODEL

The intended purpose of the imputation model is to capture the association among variables as related to the missing values. Treatments of MI dealing directly with the development of the imputation model from an applied perspective are rare (see Barnard & Meng, 1999; Sinharay, Stern, & Russell, 2001). Instead, most tend to focus on specific issues (for example, variable format) rather than a general strategy (see Carpenter & Kenward, 2006, for an annotated bibliography). In this article, we combine these treatments into a framework for development of an imputation model. We first discuss considerations for creating the model.

Considerations for the Imputation Model

Several considerations guide the development of imputation models, including the covariation of the imputation and analysis models, the distributions of the variables, and the amount of information not used in the data analysis but available for imputation.

Model for Data Analysis. The covariance structure of the model for data analysis must be accounted for in the imputation model; otherwise, parameters estimated from the imputed data may be biased (Rubin, 1996). This means that all variables to be analyzed and any interactions must appear in the imputation model (Barnard & Meng, 1999).

Joint Multivariate Normality. In MI, the variables are assumed to have a joint multivariate normal distribution (Allison, 2002). However, imputing categorical data by using a normal model usually performs well (Graham & Schafer, 1999). Clustered and panel data structures, which violate the assumptions of joint multivariate normality, must also be addressed (Allison, 2002).

Relationship between Model Size and Bias. On balance, the more information included in an imputation model, the better. In other words, any variables that are assumed to have modest association with the missing values should be included in the imputation model because they are likely to improve estimates and reduce bias (Collins, Schafer, & Kam, 2001 ; Rubin, 1996). In this sense, developing the imputation model is more like forecasting than model fitting: Instead of parsimony for achieving best fit, the set of variables for imputing is allowed to be large to predict values as close as possible to the real, unobserved values (Schafer & Graham, 2002). This assumption implies that as many variables as possible should be included in the model. However, exceptions must be made for the robustness of the data set, including the potential for collinearity (Barnard & Meng, 1999). The robustness of the data, rather than strictly the percentage of missing data points (which factors into the overall robustness by constraining the amount of variability), helps determine whether imputation is appropriate for a given set of variables (Schafer, 1997).

Three-Step Reduction Strategy for Preparing an Imputation Model

On the basis of these considerations, we describe a three-step reduction strategy for developing an imputation model from a data set containing many variables, some of which may be used in the data analytic model whereas others may not. The framework assumes that a data analysis model has been established, that this model is included in the imputation model, and that the Little (1988) MCAR test (available in SPSS) confirms deletion would produce biased estimates--thus requiring a more rigorous strategy, such as imputation.

Identify Potential Imputation Variables. The first step is to select variables that should be candidates for the imputation model. The selected variables should have at least minimal association with the variables containing missing values (Allison, 2002; Collins et al., 2001). The absence of the observed values from which to estimate correlations requires an external basis for claiming a putative association. For example, research has established a high correlation between gender and physical aggression in childhood. Therefore, it is advisable for the analyst to include a variable for gender when imputing for a model of aggressive behavior, even if gender is not a focus of the analysis. Allison (2000) cautions that relying solely on the association of the nonanalytic variables with the pattern of nonresponse is not appropriate because this association may increase bias if not informative of the missing values. Although variables with missing values can be used in the imputation model, those with relatively fewer missing values will provide more information at a lower cost (in terms of model size) (Schafer, 1997). We recommend eliminating variables that can be assumed to have no association with the missing values and those with more than a relatively small number of missing data points as both of these types increase the size of the model but contribute little to bias reduction (Collins et al., 2001). Interactions can be implemented in one of two ways: by performing separate imputations for each category of one of the interacted variables or by calculating a product of the interacted variables before imputation and including this product variable in the imputation model (Allison, 2002).

Address the Distribution Assumptions. Categorical, skewed, and range-limited variables pose special problems for imputation. A special package for imputing only categorical variables (CAT) has been developed in S-Plus and R (Schafer, 1997). When variables are both multivariate normal and categorical, another package (MIXED) is available for S-Plus and R (Schafer, 1997). However, Schafer (1997) demonstrated that using MI under the assumption of normality works well even when variables are nonnormal, including binomial. Consequently, if the missing values are dispersed between categorical and continuous variables, then software for multivariate normal variables can be used. In this situation, continuous variables should be given priority (for example, continuous variables should not be collapsed into dummy variables before imputation). Similarly, polychotomous variables with k levels may be converted into k - 1 dummy variables (Allison, 2002). Dummy variables can be then imputed under the assumption of multivariate normality, which can result in imputed values besides zero and one. Nevertheless, out-of-range values should not be rounded after imputation, as this may induce bias (Horton, Lipsitz, & Parzen, 2003). Variables that have skewed or range-limited distributions should be considered for transformation (for example, logarithmic transformation for right-skewed or nonnegative data), requiring retransformation afterward (Allison, 2002; Schafer & Graham, 2002).

Strategies for imputing clustered data continue to be developed, but few imputation packages account explicitly for clustering. PAN is a package developed for the S-Plus and P- statistical computing languages designed for use with panel data (which suffers from the same failure of independence assumptions as does clustered data) (Schafer, 1997). Although some researchers have suggested that including the clustering variable in the imputation model may circumvent the problem of nesting in MCMC, this method is ad hoc and does not preserve the associations between cases within the same cluster as effectively as does specialized software such as PAN. For repeated measures on individual cases (longitudinal or panel data), Allison (2002) suggested that multivariate normal imputation software can be used by first reformatting the data so there is one observation per case and then performing a normal imputation. Subsequently, the data should be converted to the period-by-case form required for repeated measures analysis.

Remove Collinearity. Collinearity is often a problem when reducing data to a working imputation model, particularly when starting with hundreds of variables. Collinearity (including linear dependence) can result in a failure of the MCAR test to provide a meaningful result and a failure of MCMC to run properly (Schafer, 1997). Linear dependence often arises when scales and their constituent items are included in a data set; therefore, either the scales or the items should be excluded (with a preference for normally distributed variables, we recommend retaining the scales, particularly when the items are binomial). Linear dependence also occurs when [kappa] dummy variables, representing [kappa] categories, are included in the model. In MI, the solution is the same as in regression analysis: Omit one of the categories.

If the previous steps have been followed, then linear dependence should not be a problem. However, high collinearity among the variables selected in the previous steps may still produce biased estimates (Barnard & Meng, 1999). Therefore, we recommend testing for collinearity with a variance inflation factor (VIF) test before the MCAP, test and imputation. AVIF is calculated for a variable as [(1 - [R.sup.2]).sup.-1] using the coefficient of determination [R.sup.2] from a model regressing this variable on every other variable in the model (Kennedy, 2003). AVIF approaching 10 usually indicates high collinearity. Note, however, that a VIF test is a multivariate complete case analysis in which cases with missing values are deleted. Therefore, higher VIFs may emerge--and collinearity can seemingly worsen--as deletion of the missing data in this step reduces the sample size and produces a more homogenous sample. Therefore, we recommend using the VIF test only as a guide before imputation. Data robustness diagnostics that are conducted during the imputation, available in some software packages, can be reviewed after imputation. These are better indicators of the robustness of the model (Schafer, 1997) and are discussed in more detail in the following section. Conventional strategies for dealing with collinearity can be used when collinearity is found to be high, such as deletion of highly collinear variables or the use of a transformation such as mean centering.

Imputation

The theory behind imputation and the specific procedures and algorithms used in MI are described in nontechnical language by Schafer and Graham (2002), Allison (2002), and Saunders et al. (2006). A more technical and advanced treatment is available in Schafer (1997). These treatments also discuss decisions involved in the imputation process that are beyond the scope of this article. These include deciding on the number of iterations of the algorithms used in the simulation process and selecting diagnostics such as autocorrelation plots. Software documentation should be consulted to determine how to specify these options and diagnostics.

After Imputation

After imputation, the imputation diagnostics are reviewed to assess the robustness of the imputation model and then the desired analyses are conducted. Many programs report diagnostics statistics such as the fraction of missing information and relative efficiency for each variable (see Schafer, 1997, for formulas). If a large number of imputations are performed, the fraction of missing information will be lower and relative efficiency can exceed 99% (Graham et al., 2007). The relative efficiency will be lower if parameter estimates vary widely across the imputations, which can occur as a result of either a poorly constructed imputation model or imputations with few imputed data sets (Graham et al., 2007). In the latter case, these statistics are potentially misleading (Allison, 2002).

After reviewing the diagnostics, the researcher should conduct the desired analysis on each copy of the data set that is generated during the simulation process. The estimates from each version of the analysis should then be combined into a single set of parameter estimates using Rubin's (1987) rules for inference, which inflate the standard errors to account for the uncertainty of the simulated values.

DEMONSTRATION

We conducted a simulation to demonstrate that the framework is useful in designing an imputation model that will reduce parameter bias .We randomly selected 300 of 481 complete case records from data collected during an evaluation of a school-based skills training program (see Fraser et al., 2005). We then simulated MCAR, MAR., and MNAR data by subsampling and deleting selected items in a manner consistent with the desired distribution (Schafer, 1997).We tested each of these data sets for MCAR. Subsequently, for the MAR. and MNAR. data we followed the steps described in the previous section. We conducted the same analysis for all four data sets--complete, MCAR, MAR, and MNAR--and then compared the estimates, standard errors, t statistics and p values (Allison, 2002).

Simulation of Missing Data Distributions

We simulated the MCAR data by randomly sampling and deleting selected items. Furthermore, we simulated the MAR. data by conditioning the deletion of these items on the values of another variable. Similarly, we simulated the MNAR data by conditioning the deletion of the same items on the values of the items to be deleted. Overall, we deleted 150 data points from each data set. Before conducting the reduction, we confirmed the missing data distributions using the Little (1988) MCAR test.

Analysis Model

The analysis involved scale scores calculated from five sets of survey items, each measured before and after an intervention was administered: (1) social competence (10 items); (2) social aggression (six items); (3) physical aggression (seven items); (4) social engagement (five items); and (5) cognitive concentration (12 items).The scale scores were calculated from the items after the deletion procedure; consequently, if the items were missing, the scale scores were missing as well.

The analytic model that we tested contained six variables: time 1 social aggression (T1SA), time 2 social aggression (T2SA), gender, race/ethnicity, treatment condition, and school indicators. T2SA was regressed on the other variables with treatment condition as the independent variable of interest. This model was tested to demonstrate that it did not contain harmful levels of collinearity. We identified candidates for imputation from among (1) the remaining variables, which consisted of age, all items at T1 and T2 (72 at each time point), and (2) the other eight scales that were not analyzed (that is, those at T1 and T2 not being modeled). A data reduction was required because the number of variables was too high to use all of them. To be sure, we wanted to use as many variables as possible, but the observed data would not have supported an imputation model containing all of the variables. Linear dependencies would have caused the imputation to fail. If it did run, bias would be a concern.

Constructing the Imputation Model

Identify Potential Imputation Variables. In reducing the number of variables, our goal was to construct an imputation model that would yield unbiased estimates in a model of T2SA on T1SA controlling for theoretically relevant demographics. Any items used in the calculation of the scales would result in linear dependence if all were included; we preferred to remove the items and focus on the scales (requiring a smaller set of variables in the model). This step left us with age, the four other unused scales atT1 and T2, the eight unused items atT1 and T2 (for a total of eight scales and 16 items), and the variables in the analysis model. We assumed that many, if not all, of these variables had modest association with the missing values and should be retained as candidates for imputation.

Address the Distributional Assumptions. Of the variables that were candidates for the imputation model, all were continuous or discrete measures. All demographic variables were dichotomous, with the exception of race/ethnicity, which was polychotomous. For the purpose of the analytic model, we intended to examine African American and Latino race/ethnicity. Therefore, two binary variables were created from the race/ethnicity source variable.

Remove Collinearity. We used SAS Proc Reg to estimate a VIF for each variable. For the MAR. data, social competence (an imputation variable) was probably collinear across the two time points, with a VIF of 8.7 at T1 and 11.3 at T2. Because the T2 VIF was higher than 10, we removed T2 social competence from the model and then retested it, which produced satisfactory results. Our analysis did not detect any collinearity in the MNAR. data. Having satisfactorily created imputation models, we then conducted the imputation and performed the analysis.

Imputation and Analysis

We imputed five data sets for the MAR. and MNAR data by using SAS Version 9 Proc MI (Clark, 2004). This procedure imputed missing values for nine scales and one item. In both MAR and MNAR data, we obtained a relative efficiency of 98% or greater. We conducted the same analysis on all four simulations.

For the imputed data, we summarized the results by using Proc MI Analyze, a procedure that implements Rubin's (1987) rules for inference of imputed data. We then compared the estimates, standard errors, t statistics, and p values of the program effect across the four types of data (Allison, 2002) (see Table 1). We assumed that the complete data produced the correct estimate of the program effect. The basis for comparison was whether we would have made the same inference about the effect of the treatment with each type of data. We found that we would have drawn the same conclusions from nearly every version of the analysis, with one exception: For MCAR data, we would have concluded that the treatment was only marginally significant (p < .10).This was probably because of the reduced power of the model that resulted from the deletion of cases with missing values. The actual effect (as demonstrated by the full simulation data) was both positive and significant (b = .16, p < .05); this was supported for both MAR. and MNAR. data (b = .20 and. 18, respectively, p < .05). Overall, the imputation of MNAR data performed nearly as well as the MAR. imputation and did better than the deletion under MCAR. Differences in standard errors were negligible, and the effect was slightly inflated in the MAR and MNAR data.

DISCUSSION

The core idea of MI is that plausible values may be imputed for missing values by simulating a relatively small number of data sets from observed measures, conducting the same data analyses in each data set, and aggregating the findings into point estimates. In this article, we concentrated on a strategy for developing an imputation model that sidesteps assumptions about the missing data distribution and instead focuses on the quality of the imputation model. Though we demonstrate MI, the strategy is equally relevant for researchers wishing to develop an informed missing data model for use with FIML (Graham et al., 2007).

On the basis of a review of guidelines specified in the MI literature, we described a three-step process for building an imputation model that can yield parameter estimates with reduced bias. This framework provides guidance for the selection of imputation variables, the recoding and transforming of variables, and diagnostic procedures to check for robustness and collinearity before and after imputation. A simulation showed that this process can reduce bias even when the data are known to be MNAR.

We used SAS (Version 9) for the simulation. However, many other software packages are capable of imputing data through the use of the most advanced algorithms, including specialized packages available for S-Plus and R, (MIX, PAN, CAT, and Norm) (Schafer, 1997).The theory and application of MI is constantly evolving, and we recommend consulting the software documentation to determine whether a preferred package is capable of imputing. Equally important, software should be capable of analyzing imputed data. A user could conduct each analysis separately and then combine the estimates by using Rubin's (1987) rules in longhand, but a quicker and easier approach is to use software that is capable of doing this automatically. SAS Proc MI Analyze enables this. Currently, Mplus (Muthen & Muthen, 2007) and HLM 6.0 (Raudenbush, Bryk, Cheong, & Congdon, 2004) both provide combined estimates from imputed data if the correct options are specified.

The overarching purpose of this article was to encourage wider adoption of MI in social work research. When used properly, MI provides researchers with a tool that ensures missing data do not cause substantial parameter bias. Indeed, our simulation suggests that the use of MI can reduce bias even when data are not missing at random.

Original manuscript received February 21, 2007

Accepted May 9, 2008

REFERENCES

Allison, P. D. (2000). Multiple imputation for missing data: A cautionary tale. Sociological Methods and Research, 28, 301-309.

Allison, P. D. (2002). Missing data [Series: Quantitative applications in the social sciences 136]. Thousand Oaks, CA: Sage Publications.

Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17-36.

Carpenter, J., & Kenward, M. (2006). Annotated bibliography. Available from http://www.lshtm.ac.uk/msu/ missingdata/bibliography.pdf

Choi, Y., Golder, S., Gillmore, M. R.., & Morrison, D. M. (2005). Analysis with missing data in social work research. Journal of Social Service Research, 31, 23-48.

Clark, V. (2004). SAS/Stat 9. I User's guide. Cary, NC: SAS Institute.

Collins, L. M., Schafer, J. L., & Kam, C. M. (2001).A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330-351.

Fraser, M.W., Galinsky, M.J., Smokowski, P.R., Day, S. H., Terzian, M. A., Rose, R. A., & Guo, S. (2005). Social information-processing skills training to promote social competence and prevent aggressive behavior in the third grades. Journal of Clinical and Consulting Psychology, 73, 1045-1055.

Graham, J. W., Olchowski, A. E., & Gilreath, T.D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206-213.

Graham, J.W., & Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with small sample size. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 1-29). Thousand Oaks, CA: Sage Publications.

Horton, N.J., Lipsitz, S. R., & Parzen, M. (2003).A potential for bias when rounding in multiple imputation. The American Statistician, 57, 229-232.

Kennedy, R (2003). A guide to econometrics (5th ed.). Cambridge, MA: MIT Press.

Little, R.J.A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 1198-1202.

Muthen, L. K., & Muthen, B.O. (2007). Mplus user's guide. (5th ed.). Los Angeles: Author.

Raudenbush, S.W., Bryk, A. S., Cheong, Y. E, & Congdon, R.T. (2004). HLM 6: Hierarchical linear and nonlinear modeling. Lincolnwood, IL: Scientific Software International, Inc.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581-592.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473-489.

Saunders, J. A., Morrow-Howell, N., Spitznagel, E., Dore P., Proctor, E. K., & Pescarino, R. (2006). Imputing missing data: A comparison of methods for social work researchers. Social Work Research, 30, 19-31.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: Chapman Hall/CRC.

Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3-15.

Schafer, J. L., & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177.

Shadish, W. R., Cook, T. D., & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.

Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6, 317-329.

von Hippel, P.T. (2004). Biases in SPSS 12.0 missing value analysis. The American Statistician, 58, 160-164.

Roderick A. Rose, MS, is research associate, and Mark W. Fraser, Ph.D., is Tate Distinguished Professor, School of Social Work, University of North Carolina at Chapel Hill. This article was supported in part by a grant from the National Institute on Drug Abuse (R21 5-33627). The authors appreciate the thoughtful comments of the reviewers of this article. Address correspondence to Roderick A. Rose, 301 Pittsboro, CB 3550, Chapel Hill, NC 27599-3550; e-mail: rarose@email.unc.edu.

Table 1: Estimates, Standard Errors, t Statistics, and p Values from Four Simulations Characteristic Raw MCAR MAR NMAR Estimates African American -0.09 -0.11 -0.13 -0.09 Latino 0.27 0.24 0.32 0.30 Treatment 0.16 0.15 0.20 0.18 Gender 0.05 0.04 0.05 0.01 School 0.06 0.10 0.10 0.13 Pretest SA 0.64 0.6 0.62 0.59 Standard Errors African American 0.11 0.11 0.11 0.11 Latino 0.09 0.1 0.09 0.10 Treatment 0.08 0.08 0.08 0.08 Gender 0.07 0.08 0.07 0.07 School 0.09 0.09 0.09 0.09 Pretest SA 0.05 0.05 0.05 0.05 t Statistics African American -0.85 -1.02 -1.23 -0.80 Latino 2.92 2.45 3.40 3.13 Treatment 2.10 1.82 2.45 2.28 Gender 0.67 0.48 0.74 0.08 School 0.75 1.06 1.18 1.48 Pretest SA 12.85 11.40 11.71 11.85 p Values African American .40 .31 .22 .42 Latino .00 .01 .00 .00 Treatment .04 .07 .01 .02 Gender .50 .63 .46 .93 School .46 .29 .24 .14 Pretest SA .00 .00 .00 .00 Notes: MCAR = missing completely at random. MAR = missing at random. NMAR = not missing at random. SA = social aggression.

Printer friendly Cite/link Email Feedback | |

Author: | Rose, Roderick A.; Fraser, Mark W. |
---|---|

Publication: | Social Work Research |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Sep 1, 2008 |

Words: | 5513 |

Previous Article: | Indirect versus verbal forms of victimization at school: the contribution of student, family, and school variables. |

Next Article: | Assessing mothers' and fathers' authoritarian attitudes: the psychometric properties of a brief survey. |

Topics: |