# Comparing The Sample-Weighted And Unweighted Meta-Analysis: An Applied Perspective.

An extensive comparison of the sample-weighted method (Hunter & Schmidt, 1990), and a newer unweighted method (Osburn & Callender, 1992) of meta-analysis is presented using actual data. Several of the advantages of the unweighted method predicted by Osburn and Callendar's simulation research did not always hold in actual application. Specifically, the unweighted method did not always produce larger estimates of observed variance, credibility intervals, and confidence intervals than the sample-weighted method when large sample outliers are present. Also, Osburn and Callender's research on mean sampling variance formulae did not generalize to meta-analysis using the average correlation estimator to measure sample error variance. Finally, results show that while both methods may generate similar parameter and variance estimates in primary meta-analysis, they may lead researchers to reach different substantive conclusions in the analysis of moderators. (c) 1999 Elsevier Science Inc. All rights reserved.The development of new refinements in meta-analytic techniques and methods has generally been the result of analytical research (e.g., Huffcutt & Arthur, 1995; Hunter & Schmidt, 1994; Schmidt, Hunter, & Raju, 1988), or computer simulations (e.g., Law, Schmidt, & Hunter, 1994; Osburn & Callender, 1992; Raju, Burke, Normand, & Langlois, 1991; Sackett, Harris, & Orr, 1986). Only in rare cases are these new refinements validated by using actual meta-analytic data (e.g., Huffcutt & Arthur, 1995; Schmidt, Law, Hunter, Rothstein, Pearlman, & McDaniel, 1993). Consequently, applied research scholars may adopt new meta-analytic techniques with only limited evidence that the proposed advantages hold in actual application or make any substantial difference.

The most widely used form of meta-analysis is the one developed by Hunter and Schmidt (1990). Their method of meta-analysis is based on sample-weighting. This means that when the mean correlation is computed, the individual correlations are weighted by their respective sample sizes. The observed variance across the individual correlations is computed in a similar manner--by weighting each difference (mean correlation minus the individual study correlation) by sample size of the study.

Recently, Osburn and Callender (1992) presented an alternative method for estimating means and variances in a meta-analysis. In their method, the mean correlation is computed as the simple average of the individual correlations, without accounting for sample size. Similarly, the observed variance is computed in the traditional manner without weighting each difference by the study sample size. As discussed later, there are also other differences between the two methods with regard to how they compute sampling error variance across the individual correlations (which relates to credibility intervals) and sampling variance of the mean correlation (which relates to confidence intervals).

Osburn and Callender' s (1992) unweighted method has been used in several recent applied research efforts (e.g., Fuller, Hester, Dickson, Allison, & Birdseye, 1996; Williams & Livingstone, 1994). However, there has yet to be a direct comparison of the unweighted and sample-weighted methods using actual meta-analytic data. Directly comparing the two methods from an applied perspective is important because they may lead to different conclusions regarding the overall strength of an effect, how consistent the effect is across studies, and whether other variables moderate the relationship. Additionally, other critical issues should be addressed when comparing the two methods. For example, to what extent does the presence of a large sample outlier affect the results of each method? We begin by discussing three critical issues relating to this comparison: (1) the influence of large sample outliers; (2) the homogeneity of the relationship; and (3) the use of different procedures for estimating sampling error varianc e and mean sampling variance.

Critical Issues

Large Sample Outliers

In statistics, an observation is generally considered to be an outlier if it appears to be inconsistent with the other data of which it is a part (Huffcutt & Arthur, 1995). In meta-analysis, a large sample outlier (LSO) would be a sample so large that it is inconsistent with the other sample sizes included in the sample population. Large sample outliers deserve careful attention in meta-analytic research because they may have substantial influence upon estimates of mean effect size, observed variance, and sampling error variance (Hunter, Schmidt, & Jackson, 1982; Hunter & Schmidt, 1990). Osburn and Callender (1992) proposed that one of the most important advantages of the unweighted method is that it deals with the influence of large sample outliers more effectively than the sample-weighted method.

Prior to Osburn and Callender's (1992) research, the prescribed technique for assessing the potential influence of large sample outliers was to conduct a second meta-analysis with the LSO removed, thereby providing an indication of the influence of the outlier (Hunter et al., 1982: 41). If the LSO has a significant influence upon the size of the estimated mean correlation, Hunter and Schmidt (1990) suggest that the LSO should simply be discarded. In Osburn and Callender's (1992) new unweighted method, it is not necessary to remove an LSO since each sample is given the same weight. In general, the unweighted meta-analysis provides a more stable estimate of mean correlation than the sample-weighted method because no one study has undue influence because of its sample size. Thus, the advantage of the unweighted method is that it does not sacrifice power by dropping large sample studies from the analysis.

While LSOs can increase or decrease mean correlation estimates (Osburn & Callender, 1992: 117), they tend to have greater influence upon variance estimates (Hunter & Schmidt, 1990). Previous research has shown that LSOs tend to reduce sample-weighted estimates of observed variance (Petty, McGee, & Cavender, 1984; Williams & Livingstone, 1994). Therefore, estimates of residual variance (i.e., observed variance minus sampling error variance) may be lowered if the weighted method is used to compute the observed variance. The removal of an LSO also results in a larger estimate of sampling error variance due to the reduction in mean sample size, which could further lower the residual variance.

Bias in the estimation of variances is important because these estimates can affect substantive conclusions drawn from a meta-analysis. For example, potential moderators are generally assumed to be present if sampling error and other artifacts account for at least 75% of the observed variance (Hunter & Schmidt, 1990). If the sample-weighted method does underestimate the observed variance, then sampling error variance will comprise a larger portion of the observed variance, raising the possibility than an LSO could conceal the presence of an important moderator variable. Credibility intervals, another indicator of the presence of moderator variables, could also be artificially small when an LSO is included in the dataset. Credibility intervals show the typical range of possible values that individual study correlations can have when moderator variables are present. [1]

The presence of an LSO may also affect estimates of mean sampling variance around the mean correlation, which are used to form confidence intervals. Confidence intervals provide an estimate of the range of values the mean correlation could take if a different set of studies from the same population were used. As the number of studies in a meta-analysis decreases, the mean correlation estimate gets increasingly unstable--an effect commonly referred to as "second order" sampling error. Osburn and Callender's (1992) simulation research indicated that sample-weighted estimates of mean sampling variance underestimate the true mean sampling variance when one or more LSOs exist. Smaller estimates of mean sampling variance lead to smaller width confidence intervals, which would be less likely to include zero. Under traditional significance testing, a relationship would be presumed to not exist if the confidence interval includes zero.

In summary, when an LSO is present, the unweighted method is proposed to have three distinct advantages over the sample-weighted method. First, the estimate of the mean correlation is not biased. Removing the LSO from the sample-weighted analysis should make the mean estimate more similar to the unweighted estimate, but the unweighted method is simpler because the LSO does not have to be identified and no data have to be eliminated. Second, the variance estimates are not biased; specifically, the estimate of the observed variance is not negatively biased, while the estimate of sampling error variance is not positively biased. Therefore, the unweighted method should always produce larger residual variances and wider credibility intervals than the sample-weighted method, which in turn may affect conclusions about the presence of moderator variables. Third, the unweighted method does not underestimate mean sampling variance, and thus should always generate wider confidence intervals than the sample-weighted method.

Homogeneity/Heterogeneity

The issue of "homogeneity" in meta-analysis has been the topic of debate, particularly regarding its definition and occurrence in applied research. In a homogeneous case, all correlations are considered to be equal (i.e., coming from the same population). In a heterogeneous case, correlations are considered to come from more than one population. One perspective defines homogeneity by outlining a set of statistical assumptions, which effectively eliminates the possibility of finding a homogeneous case in applied research. According to this perspective, four conditions must be met for homogeneity to exist:

(a) true validities (correlations) must be constant across all studies, (b) predictor and criterion reliabilities must be constant across all studies, (c) restriction of range must be constant across all studies, and (d) the mean correlation must be estimated by sample-size weighting of the obtained correlations (Osburn & Callender, 1992: 115).

With this perspective, homogeneity would only be encountered in simulation research where these assumptions can be imposed upon the data.

Another perspective on homogeneity is that it is best defined as an approximation (Hunter & Schmidt, 1990: 426). This view of homogeneity is based upon the concept of validity generalization, which states that if the residual variance is zero or near zero, then a homogeneous sample population is presumed to exist (Burke, Rupinski, Dunlap, & Davison, 1996). More specifically, the generally accepted rule is that if sampling error, measurement error, and range restriction account for at least 75% of the observed variance, then the remaining variance will be accounted for by other artifacts. In such a case, the sample population is assumed to be homogeneous (Sagie & Koslowsky, 1993; Schmidt, Hunter, & Raju, 1988). Unlike the first perspective on homogeneity, cases where 75% or even 100% of observed variance is accounted for by sampling error are commonly encountered in applied research.

Formulas for Sampling Variance of the Mean Correlation

The issue of homogeneity is important because it may influence the choice of formulae used to estimate sampling variance of the mean, which in turn affects the width of its confidence interval. There are a number of different formulae that can be used to calculate the sampling variance of the mean uncorrected correlation. For homogeneous cases (i.e., where residual variance is less than 25%), Whitener (1990; see also Schmidt et al, 1988) suggested using:

V = [(1-[r.sup.2]).sup.2]/(N - K), (1)

Where r is the mean observed correlation, N is the total sample size, and K is the number of studies. For heterogeneous cases, Whitener (1990) suggested using:

V = [[(1-[r.sup.2]).sup.2]/(N - K)] + [V.sub.res]/K, (2)

where [[V.sub.res].sup.2] is the residual variance. Both these formulas are based on the sample-weighted method. Osburn and Callender (1992) found that while Whitener' s formula for homogeneous cases produces accurate results, her formula for heterogeneous cases underestimates the sampling variance of the mean correlation.

Osburn and Callender (1992) proposed an alternate formula for estimating the sampling variance of the mean correlation, one based on the unweighted method. Their formula is:

V = [V.sub.r]/K, (3)

where [V.sub.r] is the unweighted observed variance. This formula produced larger, but equally accurate, estimates for the unweighted method regardless of the distinction between heterogeneity/homogeneity. [3] Therefore, Osburn and Callender (1992) concluded that the unweighted method would produce more accurate results if EQ3 was used rather than EQ1 or EQ2. They further stated that using EQ3 avoids the need to determine sample heterogeneity/homogeneity prior to selecting which mean sampling variance formula to use (i.e., EQ1 or EQ2). Only in homogeneous cases, which Osburn and Callender dismiss as rarely found in applied situations, should the sample-weighted meta-analytic method (using EQ1 to calculate the mean sampling variance) produce more accurate results than the unweighted method due to the smaller mean sampling variance.

Osburn and Callender (1992) presented strong evidence that the unweighted method is superior to the sample-weighted method, stating that:

with realistic amounts of skew in sample sizes, there is a danger of underestimating the variance of the observed correlations, which will affect other important parameters, including the variance of the true correlations and the standard error of the mean uncorrected correlation (p. 121).

Although Osburn and Callender (1992) used sample distributions from a number of previously published meta-analyses to set up their simulation database, they did not use any of these datasets to illustrate the differences between the sample-weighted and unweighted procedures. Given that the unweighted method should always generate wider credibility and wider confidence intervals, it seems probable that choice of method could influence the results of meta-analytic research.

Comparison of the Methods

In order to illustrate the impact choice of methods may have on substantive conclusions of applied meta-analytic research, we replicate select portions of several research efforts. The meta-analyses included as examples were chosen to illustrate specific points and issues, not because of any shortcomings. Our primary research questions include: (a) How close are the sample-weighted and unweighted mean correlation estimates? (b) If an LSO is present, do sample-weighted mean correlation estimates become more similar to unweighted estimates when the LSO is removed? (c) Is the unweighted observed variance estimate always larger than the sample-weighted estimate? (d) If an LSO is present, is the sample-weighted observed variance estimate with the LSO removed closer to the original sample-weighted estimate or the unweighted estimate? (e) Is the indication of a moderator variable (based on the 75% explained variance rule) consistent across both the sample-weighted and unweighted methods? (1) If an LSO is present, i s the sample-weighted method with the LSO removed more similar (in terms of suggesting a moderator variable) to the original sample-weighted results or to the unweighted results? (g) Are the credibility intervals substantially different across methods? (h) Is the unweighted estimate of mean sampling variance always greater than the sample-weighted estimate, and does choice of mean sampling variance estimator make a substantial difference in the resulting confidence intervals? and (i) Does it make a difference whether one chooses the individual or average correlation estimator to estimate sample error variance?

This first example contains two LSOs. The purpose of this example is a simple comparison of the basic differences between the two methods.

Example One: Lord, DeVader, and Alliger (1986)

Lord, DeVader, and Alliger's (1986) research examined the relationship between personality traits and perceptions of leader emergence. One of the specific relationships they examined was between leadership perceptions and dominance. Table 1 shows the results of our replication of Lord et al.'s (1986) work based upon the data reported in their article. The sample-weighted mean observed correlation is larger than the unweighted correlation (.09 and .07). However, when the LSOs are removed, the sample-weighted mean correlation estimate (.10) does not converge on the unweighted estimate (.07)--it actually moves away from the unweighted estimate, although it should be noted that all three estimates are relatively close. Consistent with Osburn and Callender's s (1992) assertion, the observed variance of the unweighted method is larger than the observed variance of the sample-weighted method, and the sample-weighted estimate increases when the LSOs are removed. The larger observed variance of the unweighted method also results in a much lower percent of explained variance, which leads to different indications of sample homogeneity (76% vs. 29%). However, the indications of homogeneity change when the LSOs are removed (62% explained variance). The unweighted method generates larger credibility interval estimates than the sample-weighted method even when the LSO is removed.

Table 1 also shows that the unweighted estimate of mean sampling variance is much larger than the sample-weighted estimate when LSOs are present. The confidence intervals constructed for the two different methods reflect the different mean sampling variance estimates. Regardless of mean sampling variance estimator, the confidence intervals for the unweighted method are larger than the confidence intervals of the sample-weighted method. Unlike the sample-weighted method, the confidence interval for the unweighted method also includes zero, which some researchers would view as an indication that no relationship exists between leadership and dominance. However, when the LSO's are removed, the sampling variance estimates and confidence intervals of both methods converge.

While there was some difference in estimates of mean sampling variance depending upon which estimator was used (i.e., EQ2 vs. EQ3 for the unweighted, and EQ1 or EQ2 vs. EQ3 for the sample-weighted), there was virtually no difference in the confidence intervals within each method (unweighted, (--.02, .16) vs. (--.01, .16); and sample-weighted, ((.04, .15) or (.04, .14) vs. (.04, .15)). This also holds for the sample-weighted analysis with the two LSOs removed, except for the noticeably smaller confidence interval generated using EQ1. As estimates of explained variance decrease, EQ1 will generate increasingly smaller confidence intervals relative to EQ2 because residual variance is not included in the formula for EQ1. This example illustrates why EQ1 is not used in cases where explained variance is less than 75% (i.e., heterogeneous case). These results indicate that while there may be differences in the estimates of mean sampling variance depending upon which estimator is used, this difference has little subs tantive influence upon what is really important--confidence intervals. This is a new finding since Osburn and Callender (1992) did not include calculations of confidence intervals in their research on mean sampling variance estimators.

Example 2: Cohen and Hudecek (1993)

Cohen and Hudecek (1993) had a large-sample outlier in their recent meta-analysis relating organizational commitment and turnover. Table 2 shows the results of both sample-weighted and unweighted procedures. Because Cohen and Hudecek reported only corrected correlation values in their research, corrected correlations form the basis for our replication.

The results of the primary meta-analyses show a comparison between the two methods, as well as an additional meta-analysis performed with the large-sample outlier (Blegan, Mueller, & Price, 1988; n = 1,813) removed from the sample-weighted method. Similar to the first example, the sample-weighted correlation does not converge on the unweighted mean correlation when the LSO is removed. Although there is no change in the sample-weighted estimate of mean observed correlation, the observed variance estimate increases. Again, the un-weighted method generates larger credibility interval estimates than the sample-weighted method even when the LSO is removed.

In the sample-weighted analysis, the mean sampling variance estimate also increases. This results in a smaller percentage of variance explained, but no corresponding increase in the width of the confidence interval (.07 and .07), unlike the first example where the smaller mean sampling variance estimates of the sample-weighted method resulted in smaller confidence intervals. Also, a comparison of the widths of the confidence intervals across methods shows little difference (.07 sample-weighted vs. .08 unweighted). The critical ratio Z scores and a comparison of the overlap in confidence intervals suggest that there is little difference in the correlation estimates of the three meta-analyses.

The moderator analyses in this example illustrate what may occur when only one or two large sample outliers exist in a meta-analysis that examines theoretically predicted moderators (e.g., Mento, Steel, & Karren, 1987; Petty, McGee, & Cavender, 1984; Russell et al., 1994; Tett & Meyer, 1993; Williams & Livingstone, 1994). When the total sample is subgrouped into moderator categories, the large sample outlier may fall in only one subgroup--leaving the other subgroup(s) without any sample that might be considered an outlier. In this case, the moderator subgroup without the sample-size outlier would not necessarily require any special analysis (i.e., the unweighted procedure, or influence assessment), unlike the subgroup with the large-sample outlier. This is evident in the sample-weighted moderator meta-analyses. When the LSO is removed, the moderator meta-analyses show that the sample-weighted and unweighted methods produce virtually the same mean correlation estimates.

One of the most interesting and problematic illustrations of the Cohen and Hudecek (1993) example is that the two methods generate very different estimates of observed variance for the 9-item OCQ subgroup. This difference in observed variance estimates results in 100% of the variance being explained (homogeneous) in the sample-weighted method, but only 43% of variance explained (heterogeneous) in the unweighted method. When the LSO is removed from the sample-weighted meta-analysis, the observed variance estimate and the sampling error variance both increase and the amount of variance explained drops marginally to 96%, which still indicates homogeneity. The estimates of residual variance of the two methods are also affected by the difference in observed variance estimates. The estimate of residual variance in the unweighted method is much larger than the estimate of residual variance of both sample-weighted meta-analyses. This results in a relatively wide 95% credibility interval in the unweighted meta-analys is (-.29 to -.06) and a relatively narrow credibility interval in the sample-weighted analysis with the LSO removed (-.20 to -.14). The lack of any residual variance in the sample-weighted meta-analysis that includes the LSO results in a point estimate of the credibility interval. Koslowsky and Sagie's (1993) research on credibility intervals indicates both sample-weighted credibility intervals provide strong indications of sample homogeneity, but the unweighted interval is wide enough to indicate the presence of moderators.

In the moderator analysis, the confidence intervals for the 15-item OCQ subgroup are exactly the same across methods, while the unweighted confidence interval estimate for the 9-item OCQ subgroup is 50% larger than the sample-weighted estimate. The sample-weighted mean sampling variance and confidence interval width increases when the LSO is removed, again illustrating the suppression effect. Also similar to the first example, 95% confidence intervals are virtually the same within each meta-analysis regardless of which mean sampling variance estimator is used.

Both methods indicate that the relationship between commitment and turnover are moderated by type of OCQ measure. The critical ratio Z scores for the sample-weighted method (Z = 3.81) and the unweighted method (Z = 4.07) indicate a moderating effect (one-tailed test, Z = 1.64, p = .05). Perhaps a stronger indication of the moderating effect is that the confidence intervals clearly do not overlap for either the sample-weighted (-.36, -.25; and -.21, -.15) or the unweighted method (-.37, -.26; and -.22, -.13).

Example Three: Williams and Livingstone (1994)

Williams and Livingstone's (1994) research examining the relationship between performance and voluntary turnover provided a comparison of the unweighted and sample-weighted methods in their primary meta-analyses. However, unlike Williams and Livingstone, our focus is on providing an extensive comparison of the two methods. Therefore, we replicate Williams and Livingstone's research and extend the comparison of meta-analytic methods to the analysis of moderators.

Table 3 shows our replication of Williams and Livingstone's primary meta-analysis based upon the data reported in their article. The primary meta-analyses illustrate the results of the sample-weighted procedure that removes the large-sample outlier (i.e., Ofsanko, 1979) to assess its influence. This procedure shows that, unlike the Cohen and Hudecek (1993) example, when the LSO is removed, the mean correlation changes some (-.12 to -.16), converging on the unweighted correlation estimate. The observed variance also increases after the Ofsanko (1979) study is removed. Finally, the results show that removing the Ofsanko (1979) study results in an increase in the magnitude of the mean sampling variance of the sample-weighted method. This results in a confidence interval that is marginally larger than the unweighted confidence interval.

One of the moderators that Williams and Livingstone (1994) looked at was reward contingency. Table 3 also shows the results of the reward contingency moderator analysis. This extends the comparison of both methods, which was not the primary purpose of Williams and Livingstone's original research. A comparison of the mean observed correlation estimates for the reward contingency (NO) subgroup shows that removing the LSO causes the sample-weighted estimate to converge on the unweighted estimate. However, a similar comparison of the reward contingency (YES) subgroup (which does not contain an LSO), shows that the mean correlations are somewhat different (-.21 vs. -.25). Similar to previous examples, the unweighted observed variance estimate is larger than the sample-weighted estimate resulting in marginally larger credibility intervals. The critical ratio Z scores for the sample-weighted method indicate there is a significant moderating effect based upon reward contingency (Z = 3.42, Z = 2.54; p [less than] 0.0 5).

However, the Z score for the unweighted method does not indicate a significant moderating effect (Z = 1.44, p = .07). In cases such as this, where the results of the critical ratio Z test approach significance, a comparison of the confidence intervals of the moderator groups may prove to be more informative than the significance level of the Z score. The confidence intervals of the sample-weighted method clearly do not overlap (-.34, -.17; vs. -.13, -.06, or - .17, -.09), while there is considerable overlap in the confidence intervals of the unweighted method (-.29, -.12; vs. -.18, -.10). A comparison of the critical ratio Z scores and the overlap of the confidence intervals would probably lead researchers to different conclusions about moderating influences, depending upon which method was utilized. Therefore, unlike the Cohen and Hudecek (1993) example, choice of method can make a difference in conclusions concerning moderating influences.

Perhaps the most interesting aspect of this example is the reason why the two methods reach different conclusions about moderating influences. Once the LSO is removed from the reward contingency (NO) moderator subgroup, the widths of the confidence intervals for both sample-weighted and unweighted methods are identical (.08). Further, the widths of the confidence intervals of the reward contingency (YES) subgroup, which does not contain an LSO, are identical for both methods as well (.17). Yet, the two methods yield different indications of the moderating effect of reward contingency. The reason this occurs is the two meta-analytic methods generate different estimates of mean correlations for both subgroups, with the estimates for the reward contingency (YES) subgroup being large enough to indicate different moderating effects. Therefore, the reason the sample-weighted and unweighted meta-analyses reach different substantive conclusions about moderating effects is not because of different estimates of observ ed variance or mean sampling variance, but rather different estimates of mean observed correlation--for the subgroup without an LSO!

Even though the unweighted observed variance and mean sampling variance estimates are larger than the sample-weighted estimates, two results of this example are different from the prior examples. One difference is that EQ2 and EQ3 produce exactly the same estimates of mean sampling variance within each method. Consequently, the confidence intervals within each method are also identical. Another difference from prior examples is the widths of the confidence intervals are identical across methods. Therefore, the primary reason the unweighted and sample-weighted methods have different indications of the moderating influence of reward contingency is due to the estimates of mean correlation.

There is a possible explanation for the identical mean sampling variance estimates found in the present example. Based upon Hunter and Schmidt's (1990) recommendation that the average correlation estimator (VEAVE) provides a more accurate estimate of first-order sampling error variance than the individual correlation estimator (VEIND), many researchers use the following formula with mean observed correlation r, and the average sample size N:

VEAVE = [(1-[r.sup.2]).sup.2]/(N - 1).

However, most validity generalization research is conducted using the individual correlation estimator. When using the individual correlation estimator, the sampling error variance for each study is estimated with observed correlation [r.sub.i], and sample size [N.sub.i]:

[ve.sub.i] = [(1-[[r.sup.2].sub.i]).sup.2]/([N.sub.i] - 1),

and sampling error variance for the meta-analysis as a whole is calculated as follows:

VEIND = Ave ([ve.sub.i]).

Careful replication of Williams and Livingstone's (1994) research indicates they used the average correlation estimator, while Osburn and Callender's (1992) conclusions were based upon simulation data generated using the individual correlation estimator.

This difference may be important because recent research indicates that choice of estimator may influence the accuracy of meta-analytic research (Hunter & Schmidt, 1994; Law et al., 1994; Schmidt et al., 1993). Hunter and Schmidt (1994) conclude that sampling error is generally underestimated in most meta-analytic research. Results of analytical and simulation research indicate that in homogeneous (Hunter & Schmidt, 1994) and heterogeneous cases (Law et al., 1994), VEAVE provides larger, and thus more accurate, estimates of sampling error variance than VEIND. By including VEAVE and VEIND in our next examples, we can assess the extent to which Osburn and Callender's (1992) research generalizes to general effects meta-analysis utilizing VEAVE, and provide an applied investigation of recent research examining these two estimators (Hunter & Schmidt, 1994; Law et al., 1994; Schmidt et al., 1993).

Example Four: Transformational Leadership and Performance

The question of whether percept-percept methods produce inflated relationship estimates has been the source of considerable debate (Crampton & Wagner, 1994; Podsakoff & Organ, 1986; Spector, 1987; Williams, Cote, & Buckley, 1989). In order to test this proposition in the leadership domain, we aggregated studies relating Bass' (1985) measures of transformational leadership (Multifactor Leadership Questionnaire) to percept-percept and multisource measures of performance. These studies are included in the appendix.

The final dataset had two sample size outliers (Roush & Atwater, 1992, n = 1235; Yammarino & Bass, 1990, n = 793) which were included in all of the analyses. The magnitude of these outliers are similar to the sample-size distributions used by Osburn and Callender (1992) and encountered by Williams and Livingstone (1994). Tables 4 and 5 present the comparisons of sampling error variance estimations generated by the average correlation estimator (VEAVE) and the individual correlation estimator (VEIND). The differences resulting from using VEAVE rather than VEIND are negligible, and do not change any conclusion about moderating influences or heterogeneity/homogeneity (i.e., proportion of variance or credibility interval).

One of the most notable trends found in Tables 4 and 5 is that VEIND generates the same estimate for the sample-weighted method as the unweighted method, while VEAVE generates different results for each method. This is not unexpected because VEIND is based upon the observed correlation in each individual study, while VEAVE is based on an average that is either weighted or unweighted by sample size depending on method. In the present example, the VEAVE estimate for the unweighted meta-analytic method is larger than the estimate for the sample-weighted method. However, this is due to the fact that the unweighted correlation estimates are mostly smaller in magnitude than the sample-weighted estimates in our example, rather than any artifactual influence. Comparison of VEAVE and VEIND for heterogeneous populations shows that VEAVE generates larger estimates than VEIND for all of the unweighted meta-analyses, but only 1 of the 6 sample-weighted meta-analyses. Therefore, our results indicate that Law et al.'s (199 4) conclusions generalize to the unweighted procedure, but not to the sample-weighted procedure that Law et al. based their research on. VEAVE also provided larger estimates than VEIND for the three homogeneous subgroups (sample-weighted), which supports Hunter and Schmidt's (1994) research.

Perhaps the most important finding of the comparison between VEAVE and VEIND is that the credibility intervals presented in Tables 4 and 5 are identical regardless of which sampling error variance estimator is used. Similar to our conclusion that mean-sampling variance formulas make little difference in confidence intervals, sampling error variance formulas (VEAVE or VEIND) make little difference in credibility intervals.

Support for the expected moderating influence of performance measurement type differ strongly depending upon meta-analytic method. The results of the sample-weighted moderator analyses (see Table 5) indicate a moderating influence based upon choice of research design (i.e., percept-percept or multisource). The two sample-size outliers fell into the percept-percept subgroup, which leaves the multisource subgroup without a significant sample-size outlier. Critical ratio Z scores indicate that moderator subgroups were significantly different for all three (Z = 2.68, Z 3.32, and Z = 1.92). An examination of the confidence intervals indicates there is no overlap of moderator subgroups for the charisma and intellectual stimulation dimensions, and only a small amount of overlap for the individualized consideration dimension. The Z scores of the unweighted analyses present a different picture of the extent to which research design influences leadership-performance relationships. None of the three transformational le adership dimensions have significantly different subgroups (Z = 1.25, Z = 1.01, and Z = .37). The lack of a moderating effect indicated by the Z scores is also supported by the extent of the overlap of the confidence intervals.

Both meta-analytic methods produced similar results indicative of homogeneity for the multisource moderator subgroups. Unlike the previous examples, this is the first case where the unweighted method clearly indicates homogeneity. Sampling error alone accounted for 100% of the variance in two of the three dimensions, providing a strong indication of sample homogeneity (Hunter & Schmidt, 1994).

Finally, the individual consideration dimension (Table 4) and the multisource subgroup of the charisma dimension (Table 5) provide our first indication that the unweighted estimates of observed variance and mean sampling variance are not always greater than the sample-weighted estimates. The results of these two meta-analyses indicate that two of the proposed advantages of the unweighted method do not always hold in applied research.

While the previous four examples have illustrated some of the most important aspects of accounting for large sample outliers in applied meta-analytic research, a number of issues remain. First, Law et al.'s (1994) conclusions about VEA VE generating larger estimates of second-order sampling error variance were not supported in most of the sample-weighted meta-analyses, but they were supported in the unweighted meta-analyses. These deviations may be due to the small number of studies (K) in our example. Second, the deviations from research on observed variance (Osburn & Callender, 1992), mean sampling variance (Osburn & Callender, 1992), and sampling error variance (Law et al., 1994; Schmidt et al., 1993) found in the transformational leadership example may be unusual. Third, while the transformational leadership example indicates that choice of sampling error variance estimator (VEA VE or VEIND) makes no difference in credibility interval estimates, we have not examined the influence choice of estimator has upon estimates of mean-sampling variance (i.e., standard error), and confidence intervals--an issue previously mentioned in the Williams and Livingstone (1994) example. In order to address these three issues, we present three additional examples.

Additional Examples

The first study, Petty et al. (1984), examined the relationship between performance and job satisfaction. We reanalyzed two of these meta-analyses (satisfaction with supervision and coworkers). The second example is a recent meta-analysis by Huffcutt and Arthur (1995) that examined the relationship between structure and interview validity. The third example is a meta-analysis by Fuller et al. (1996), which examined the relationship between job satisfaction and turnover intentions. All three of these examples contain large sample outliers and provide a wide range of sample sizes and number of studies.

The results presented in Table 6 provide additional evidence that VEA VE does not always generate larger estimates of sampling error variance than VEIND, and that the unweighted observed variance is not always larger than the sample-weighted observed variance. In both the Petty et al. (1984) supervision subgroup and the Huffcutt and Arthur (1994) example, VEIND generates larger estimates than VEA VE in the sample-weighted method. In all three examples the observed variance of the unweighted method is smaller than the sample-weighted method, except for the civilian subgroup in Fuller et al. (1996).

Table 6 also shows that the unweighted mean sampling variance is not always larger than the sample-weighted variance. Only in the civilian subgroup of the Fuller et al. (1996) example is the unweighted mean sampling variance greater than the sample-weighted mean sampling variance. However, the larger mean sampling variance estimates in the sample-weighted meta-analyses translate into wider confidence intervals in only two of the five cases (i.e., Petty et al., 1984 supervision and coworker).

A comparison of sampling error estimators (VEAVE and VEIND) and mean sampling variance estimators (EQ2 and EQ3) found in Table 6 is revealing. Similar to the transformational leadership example, the choice of VEAVE or VEIND does not influence credibility interval estimates in any of the examples. However, the data in Table 6 show that choice of sampling error variance estimator has a notable influence on estimates of mean sampling variance. When VEAVE is used, the corresponding estimates of second-order mean sampling variance generated by EQ2 and EQ3 are identical (see formulae for EQ2 and EQ3). This explains the unusual results that were found in the Williams and Livingstone (1994) example. Within either the sample-weighted or unweighted method for heterogeneous populations, it makes no difference whether mean sampling variance is calculated with the common variance formula (EQ3) or Whitener's (1990) heterogeneous formula (EQ2). [4] Thus, Osburn and Callender's (1992) conclusion that EQ3 provides a better e stimate of mean sampling variance than EQ2 does not generalize to general effects meta-analyses utilizing VEAVE. Perhaps more important from an applied perspective, even when the estimates of mean sampling variance are different (i.e., when VEIND is used), there is no impact upon the 95% confidence interval estimates.

Discussion

The primary purpose of this research was to compare the sample-weighted and unweighted methods of meta-analysis and assess the extent to which the advantages of the unweighted method proposed by Osburn and Callender (1992) hold in actual application. While we expected to find that sample-weighted mean correlation estimates would converge on the unweighted estimates when an LSO is removed, we found cases where this did not happen. Osburn and Callender (1992) indicated that the unweighted observed variance estimate will always be larger than the sample-weighted observed variance when LSO' s are present. Our findings provide clear evidence that this is not always true. Therefore, one cannot simply assume that the unweighted observed variance will always lead to a better estimate of the variance of the true population correlations (i.e., residual variance). Further, since residual variance is used as the basis for moderator detectors (i.e., percent of variance explained by sampling error variance, credibility in tervals, and chi-square tests), one cannot assume that the unweighted method will always produce more conservative indications of sample homogeneity than the sample-weighted method. This issue is important for researchers who believe that homogeneity is not as rare as assumed by Osburn and Callender. We feel some of the examples in our study support this perspective.

A more important issue in meta-analysis is the calculation of mean sampling variance. Osburn and Callender (1992) indicated that the unweighted mean sampling variance estimate will always be larger than the sample-weighted mean sampling variance estimate when an LSO is present. Our findings again failed to support this contention. More important from an applied perspective, when the unweighted mean sampling variance is larger, it does not always translate into a wider confidence interval. Therefore, one cannot assume that unweighted confidence intervals will always be more conservative (i.e., wider in width) than sample-weighted confidence intervals, even when the unweighted mean sampling variance is larger than that of the sample-weighted method. This means that the risk of a Type I error is not always reduced by using the unweighted method. Indeed, some of our examples included cases where the sample-weighted confidence interval was larger than the unweighted confidence interval.

A major focus of Osburn and Callender's (1992) research was the relative superiority of different mean sampling variance estimators in heterogeneous cases. Our findings indicate that while the different estimators (EQ2 vs. EQ3) may yield different estimates within meta-analytic method when VEIND is used to estimate sampling error variance, there is virtually no difference in the corresponding confidence interval estimates. Therefore, in the heterogeneous case, it appears to make little difference which mean sampling variance estimator is chosen.

The results of our comparison of sampling error variance estimators (i.e., VEAVE vs. VEIND) provide some of the more interesting results of our study. Osburn and Callender (1992) were able to suggest EQ3 generates estimates of mean sampling variance more accurate than the estimates of EQ2 because they used the individual correlation estimator (VEIND). However, if the average correlation estimator (VEAVE) is used to calculate sampling error variance, EQ2 and EQ3 generate identical estimates of mean sampling variance and corresponding confidence interval (recall that EQ2 is calculated by adding and subtracting equal quantities to BQ3). Therefore, when VEAVE is used to estimate sampling error variance the only advantage EQ3 has over EQ2 is simple parsimony. Altogether, our results indicate choice of mean sampling variance estimators is of much less importance than suggested in recent analytical and simulation research.

We included the average correlation estimator (VEAVE) in the present study due to the pervasiveness of this estimator in general random effects meta-analysis and to explain the seemingly unusual results of the mean sampling variance comparison found in the Williams and Livingstone (1994) example. Further, recent research indicates that choice of sampling error variance estimator is an important consideration in meta-analysis. Hunter and Schmidt (1994) and Law et al. (1994) both suggested that VEAVE gives better (i.e., larger) estimates of sampling error variance than the individual correlation estimator (VEIND) utilized by Osburn and Callender (1992). Our results support Hunter and Schmidt's (1994) propositions that, when 100% of the observed variance is accounted for by sampling error variance (i.e., the homogeneous case), VEAVE produces larger estimates of sampling error variance than VEIND. However, where less than 100% of the observed variance is accounted for by sampling error variance we found a number of cases where VEAVE did not produce larger estimates of sampling error variance than VEIND. These findings are limited to the sample-weighted method as the unweighted method fully support Law et al.'s (1994) research. Therefore, if VEAVE is used with the unweighted meta-analysis, some of the advantage gained by virtue of the method's larger observed variance estimate is offset by the larger sampling error variance estimate. If Hunter and Schmidt (1994) and Law et al. (1994) are correct that sampling error variance is generally underestimated in meta-analytic research, then VEAVE appears to be a better choice than VEIND when using the unweighted method since it also negates any difference in mean sampling variance estimates. However, different estimates of sampling error variance made negligible difference in percentage of variance accounted for by sampling error variance, and virtually no difference in credibility interval estimates. Based upon these results, it seems fair to conclude that choice of samplin g error variance estimator is of less importance than other issues in meta-analysis.

One of our biggest questions was whether or not it made any difference which method is used in meta-analytic research, where the sample population is skewed by one or more large sample outlier. In some of our examples, there were minimal differences in the sample-weighted and unweighted analyses, particularly after removing LSO' s from the sample-weighted analyses. However, this was not always the case. Our first example clearly shows the potential in primary meta-analysis for each method to indicate different substantive conclusions about sample homogeneity and the significance of the mean correlation, if the influence of the LSO is not taken into account. This difference was due, in part, to the small size of the estimated mean correlation. Most of the remaining examples had larger correlations where the issue of significance was not really in question. While some of the direct comparisons between methods provide different indications of sample homogeneity, a comparison of the overlap of confidence interva ls and Z scores indicates no difference in estimates of mean correlation--in primary meta-analysis.

However, we feel one of the important contributions of this study is to point out that it is not in primary meta-analysis that the difference between the two methods is likely to be noteworthy, but rather in the analysis of moderators. Small differences in parameter and interval estimates can be amplified in moderator analysis, leading to substantially different indications about moderators that are investigated based on theoretical reasons. In cases where the two different methods provide different conclusions about hypothesized moderators, the researcher is left with two choices--choosing one method over another and justifying the reasons, or concluding that the evidence does not strongly support the hypothesized moderating influence. However, there were some cases where both methods indicate a hypothesized moderator has a significant influence on the relationship in question (e.g., Cohen & Hudecek, 1993). In situations such as these, one can be much more confident about conclusions regarding moderating ef fects (e.g., Fuller & Hester, 1998).

The remaining question is whether it is worth the effort to conduct both sample-weighted and unweighted meta-analysis at the same time. The unweighted method can be considered a more conservative meta-analytic method than the sample-weighted method when it generates a larger observed variance and mean sampling variance. However, because the results of this study indicate that these proposed advantages of the unweighted method do not always hold in applied research, it would be risky, indeed, to simply claim its superiority without the benefit of evidence to support that claim (i.e., conducting a parallel sample-weighted analysis for purposes of comparison).

It is important to note that there are two possible limitations to the present study. One possible limitation is that some of our results may be due to the presence of unidentified LSOs remaining in the datasets. Another potential limitation is that is impossible to know which method provides the better estimate of population parameters of actual meta-analytic data. However, these are limitations found in all applied meta-analytic research.

In conclusion, should the unweighted method be rejected simply because it does not always produce larger variance estimates? Of course not, however, most meta-analysis programs, whether based upon micro-computer spreadsheet software or BASIC, can easily be altered to provide the results of both methods. Since the results of this study show one method may lead to different conclusions than the other, we suggest that conducting parallel meta-analyses is well worth both the time and effort.

Perhaps even more important in a broader sense, our results indicate that analytical and computer simulation research does not always generalize well to applied research. Some of the claims about mean sampling variances and sampling error variances made by previous researchers (i.e., Hunter & Schmidt, 1994; Law et al., 1994; Osburn & Callender, 1992) did not hold in some of our examples. This indicates a need for further research to help explain these discrepancies. The best way to determine these deviations from the expected behavior of meta-analytic methods is to use simulations or Monte Carlo studies where the population parameters can be specified and subsequently estimated using various meta-analytic estimation procedures. We strongly urge researchers engaged in applied meta-analytic research to verify that meta-analytic refinements suggested by analytical and simulation studies actually enhance their research.

Acknowledgment: We thank Coleman Patterson, Donna Stringer, Mickey Petty, and this journal's reviewers for their helpful comments on earlier drafts of this article.

Appendix

Studies Used In Transformational Leadership Meta-Analysis

Avolio, B. J., Waldman, D. A., & Einstein, W. O. 1988. Transformational leadership in a management game simulation: Impacting the bottom line. Group and Organization Studies, 13: 59-80.

Deluga, R. J. 1991. The relationship of leader and subordinate influencing activity in naval environments. Military Psychology, 3: 25-39.

Ehrlich, S. B., Meindl, J. R., & Viellieu, B. 1990. The charismatic appeal of a transformational leader: An empirical case study of a small, high-technology contractor. Leadership Quarterly, 1: 229-247.

Hackman, M. Z., Hills, M. J., Furniss, A. H., & Paterson, T. J. 1992. Perceptions of gender-role characteristics and transformational and transactional leadership behaviours. Perceptual and Motor Skills, 75: 311-320.

Hater, J. J., & Bass, B. M. 1988. Superiors' evaluations and subordinates' perceptions of transformational and transactional leadership. Journal of Applied Psychology, 73: 695-702.

Howell, J. M., & Avolio, B. J. 1993. Transformational leadership, transactional leadership, locus of control, and support for innovation: Key predictors of consolidated business unit performance. Journal of Applied Psychology, 78: 891-903.

Keller, R. T. 1992. Transformational leadership and the performance of research and development project groups. Journal of Management, 18: 489-501.

Roush, P. E., & Atwater, L. E. 1992. Using the MBTI to understand transformational leadership and self-perception theory. Military Psychology, 4: 17-34.

Seltzer, J., & Bass, B. M. 1990. Transformational leadership: Beyond initiation and consideration. Journal of Management, 16: 693-703.

Silins, H. C. 1992. Effective leadership for school reform. Alberta Journal of Educational Research, 38: 317-334.

Waldman, D. A., Bass, B. M., & Einstein, W. 0. 1987. Leadership and outcomes of performance appraisal processes. Journal of Occupational Psychology, 60: 177-186.

Yammarino, F. J., & Bass, B. M. 1990. Transformational leadership and multiple levels of perspective. Human Relations, 43: 975-995.

Notes

(1.) A 95% credibility interval is calculated by the following formula where [V.sub.res] is residual variance:

95% CI= r + - [1.96.sup.*] [V.sub.res]

(2.) Residual variance, or the true variance of the population correlations, is the amount of observed variance left after removing sampling error variance (Whitener, 1990: 317).

(3.) For purposes of comparison with the four different mean sampling variance formulae used in Osburn and Callender's (1992) research ([V1.sub.r] through [V4.sub.r]), where r is the mean observed correlation, N = the cumulative sample size over all studies, K = number of studies, [V.sub.r] = observed variance, and [V.sub.e] = sampling error variance:

EQ1 (Whitener, 1990, Homogeneous case) = [V1.sub.r] = [(1-[r.sup.2]).sup.2]/(N - K)

EQ2 (Whitener, 1990, Heterogeneous case) [V4.sub.r] = [(1-[r.sup.2]).sup.2]/(N - K) + [V.sub.res])/K

EQ3 (Osburn & Callender) = [V3.sub.r] = Vr/K

EQ4 (Osburn & Callender) = [V2.sub.r] = Ve/K

(4.) There is an analytical explanation for this effect. Osburn and Callender showed that if residual variance ([V.sub.res]) is expressed as

[V.sub.res] = [V.sub.r] - [V.sub.e],

then EQ2 can be rewritten as

[(1-[r.sup.2]).sup.2]/(N - K) + [V.sub.res]/K = [V.sub.r]/K = [V.sub.e]/K - [(1-[r.sup.2]).sup.2]/(N - K),

or

EQ2 = EQ3 + EQ4 - EQ1.

When mean r (VEAVE) is used instead of individual r(VEIND) as the estimate of sampling error variance in EQ4, EQ4 is equal to EQ1, which results in EQ2 = EQ3. We appreciate this being pointed Out by an anonymous reviewer.

References

Bass, B. M. 1985. Leadership and performance beyond expectations. New York: Free Press.

Blegan, M. A., Mueller, C. W., & Price, J. L. 1988. Measurement of kinship responsibility for organizational research. Journal of Applied Psychology, 73: 402-409.

Burke, M. J., Rupinski, M. T., Dunlap, W. P., & Davison, H. K. 1996. Do situational variables act as substantive causes of relationships between individual difference variables? Two large-scale tests of "common cause models. Personnel Psychology, 49: 573-598.

Cohen, A., & Hudecek, N. 1993. Organizational commitment-turnover relationship across occupational groups. Group & Organization Management, 18: 188-213.

Crampton, S. M., & Wagner, J. A. III 1994. Percept-percept inflation in microorganizational research: An investigation of prevalence and effect. Journal of Applied Psychology, 79: 67-76.

Fuller, J. B., Hester, K., Dickson, P., Allison, B. J., & Birdseye, M. 1996. A closer look at select cognitive precursors to organizational turnover: What has been missed and why? Psychological Reports, 76: 1311-1352.

Fuller, J. B., & Hester, K. 1998. The effect of labor relations climate on the union particpation process. Journal of Labor Research, 19: 173-187.

Huffcutt, A. I., & Arthur, W. 1994. Hunter and Hunter (1984) revisited: Interview validity for entry-level jobs. Journal of Applied Psychology, 79: 184-190.

Huffcutt, A. I., & Arthur, W. 1995. Development of a new outlier statistic for meta-analytic data. Journal of Applied Psychology, 80: 327-334.

Hunter, J. E., & Schmidt, F. L. 1990. Methods of meta-analysis. Newbury Park, CA: Sage Publications, Inc.

Hunter, J. E., & Schmidt, F. L. 1994. Estimation of sampling error variance in the meta-analysis of correlations: Use of average correlation in the homogeneous case. Journal of Applied Psychology, 79: 171-177.

Hunter, J. E., Schmidt, F. L., & Jackson, G. B. 1982. Meta-analysis: Cumulating research findings across studies. Beverly Hills, CA: Sage Publications.

Koslowsky, M., & Sagie, A. 1993. On the efficacy of credibility intervals as indicators of moderator effects in meta-analytic research. Journal of Organizational Behavior, 14: 695-699.

Law, K. S., Schmidt, F. L., & Hunter, J. B. 1994. A test of two refinements in procedures for meta-analysis. Journal of Applied Psychology, 79: 978-986.

Lord, R. G., DeVader, C. L., & Alliger, G. M. 1986. A meta-analysis of the relation between personality traits and leadership perceptions: An application of validity generalization procedures. Journal of Applied Psychology, 71: 402-410.

Mento, A. J., Steel, R. P., & Karren, R. J. 1987. A meta-analytic study of the effects of goal setting on task performance: 1966-1985. Organizational Behavior and Human Decision Processes, 39: 54-83.

Ofsanko, F. J. 1979, Employee turnover by job performance level. Dissertation Abstracts International, 40: 2419B.

Osburn, H. G., & Callender, J. 1992. A note on the sampling variance of the mean uncorrelated correlation in meta-analysis and validity generalization. Journal of Applied Psychology, 77: 115-122.

Petty, M. M., McGee, G. W., & Cavender, J. W. 1984. A meta-analysis of the relationships between individual job satisfaction and individual performance. Academy of Management Review, 9: 712-721.

Podsakoff, P.M., & Organ, D. W. 1986. Self-reports in organizational research: Problems and prospects. Journal of Management, 1-2: 531-544.

Raju, N. S., Burke, M. J., Normand, J., & Langlois, G. M. 1991. A new meta-analytic approach. Journal of Applied Psychology, 76: 432-446.

Roush, P. E., & Atwater, L. B. 1992. Using the MBTI to understand transformational leadership and self-perception theory. Military Psychology, 4:17-34.

Russell, C. J., Settoon, R. P., MeGrath-Blanton, R. N., Kidwell, R. B., Lorhrke, F. T., Scifres, E. L., & Danforth, G. W. 1994. Investigator characteristics as moderators of personnel selection research: A meta-analysis. Journal of Applied Psychology, 79: 163-170.

Sackett, P. R., Harris, M. M., & Orr, J. M. 1986. On seeking moderator variables in the meta-analysis of correlational data: A monte carlo investigation of statistical power and resistance to type I error. Journal of Applied Psychology, 71: 302-310.

Sagie, A., & Koslowsky, M. 1993. Detecting moderators with meta-analysis: An evaluation and comparison of techniques. Personnel Psychology, 46: 629-640.

Schmidt, F. L., Hunter, J. E., & Raju, N. S. 1988. Validity generalization and situational specificity: A second look at the 75% rule and Fisher's z transformation. Journal of Applied Psychology, 73: 665-672.

Schmidt, F. L., Law, K., Hunter, J. E., Rothstein, H. R., Pearlman, K., & McDaniel, M. 1993. Refinements in validity generalization methods: Implications for the situational specificity hypothesis. Journal of Applied Psychology, 78: 3-12.

Spector, B. 1987. Transformational leadership: The new challenge for U.S. unions. Human Resource Management, 26: 3-16.

Tett, R. P., & Meyer, J. P. 1993. Job satisfaction, organizational commitment, turnover intention, and turnover: Path analyses based on meta-analytic findings. Personnel Psychology, 46: 259-293.

Whitener, E. M. 1990. Confusion of confidence intervals and credibility intervals in meta-analysis. Journal of Applied Psychology, 75: 315-321.

Williams, C. R., & Livingstone, L. P. 1994. Another look at the relationship between performance and voluntary turnover. Academy of Management Journal, 37: 269-298.

Williams, L. J., Cote, J. A., & Buckley, M. R. 1989. Lack of method variance in self-reported affect and perceptions at work: Reality of artifact? Journal of Applied Psychology, 74: 462-468.

Yammarino, F. J., & Bass, B. M. 1990. Transformational leadership and multiple levels of analysis. Human Relations, 43: 975-995.

Results Comparison of Lord, DeVader, and Alliger's (1986) Triats-Leadership Perceptions Meta-Analysis Mean Sampling Observed Observed Error N K r Variance Variance Sample-Weighted Procedure (Hunter & Schmidt, 1990). Dominance Triat 1,649 11 .09 .008432 .006392 731 9 .10 .018877 .011722 Unweighted Procedure (Osburn & Callender, 1992). Dominance Triat 1,649 11 .07 .022387 .006392 95% Sampling Explained Residual Credibility Variance Variance Variance Interval Estimator Sample-Weighted Procedure (Hunter & Schmidt, 1990). Dominance Triat 76% .002040 (.00.18) EQ1 EQ2 EQ3 62% .007155 (-.07.18) EQ1 EQ2 EQ3 Unweighted Procedure (Osburn & Callender, 1992). Dominance Triat 29% .015995 (-.17.32) EQ2 EQ3 Mean 95% Sampling Confidence Variance Interval Sample-Weighted Procedure (Hunter & Schmidt, 1990). Dominance Triat .000600 (.04.14) .000786 (.04.15) .000767 (.04.15) .001360 (.02.17) .002155 (.00.19) .002097 (.01.18) Unweighted Procedure (Osburn & Callender, 1992). Dominance Triat .002058 (-.02.16) .002035 (-.01.16)

Note: 95% Credibility Interval = mean observed correlation [plus or minus] (1.96 X (residual variance) [caret] 2).

95% Confidence Interval = mean observed correlation [plus or minus] (1.96 X (sampling variance [caret] 0.5)).

Equation 1 - Sampling Variance (Homogeneous) ((1 - (r [caret] 2)) [caret] 2/(N - K).

Equation 2 - Sampling Variance (Heterogeneous) = ((1 - [caret] 2) [caret] 2)/(N - K) + residual variance/K.

Equation 3 - Sampling Variance observed = variance/K.

Select Results of Cohen and Hudecek's (1993) Organizational Commitment-Turnover Meta-Analysis Mean Corrected N K r [+] Primary Meta-Analyses a Overall 10,596 36 -.22 (N-weighted) b Overall 8,783 35 -.22 (N-weighted) c Overall 10,596 36 -.24 (unwieghted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990). d OCQ- 3,817 17 -.30 15 item e OCQ- 4,350 12 -.18 9 item f OCQ- 2,537 11 -.17 9 item Unwighted Procedure (Osburn & Callender, 1992). OCQ- 3,817 17 -.32 15 item OCQ- 4,350 12 -.17 9 item Sampling Observed Error Variance Variance Primary Meta-Analyses a Overall .011099 .002954 (N-weighted) b Overall .013206 .003458 (N-weighted) c Overall .014602 .002954 (unwieghted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990). d OCQ- .013229 .003553 15 item e OCQ- .002578 .002579 9 item f OCQ- .004261 .004057 9 item Unwighted Procedure (Osburn & Callender, 1992). OCQ- .014679 .003553 15 item OCQ- .005972 .002579 9 item Explained Residual Variance Variance Primary Meta-Analyses a Overall 27% .008146 (N-weighted) b Overall 26% .009749 (N-weighted) c Overall 20% .011648 (unwieghted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990). d OCQ- 27% .009676 15 item e OCQ- 100% .000000 9 item f OCQ- 96% .000204 9 item Unwighted Procedure (Osburn & Callender, 1992). OCQ- 24% .011126 15 item OCQ- 43% .003393 9 item 95% Sampling Credibility Variance Interval EQ1/EQ2 Primary Meta-Analyses a Overall (-.40 -.04) .000312 (N-weighted) b Overall (-.42 -.03) .000382 (N-weighted) c Overall (-.45 -.03) .000407 (unwieghted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990). d OCQ- (-.50 -.11) .000786 15 item e OCQ- -.18 .000216 9 item f OCQ- (-.20 -.14) .000373 9 item Unwighted Procedure (Osburn & Callender, 1992). OCQ- (-.52 -.11) .000868 15 item OCQ- (-.29 -.06) .000500 9 item 95% Sampling Confidence Variance Interval EQ3 Primary Meta-Analyses a Overall (-.25 -.18) .000308 (N-weighted) b Overall (-.26 -.19) .000377 (N-weighted) c Overall (-.28 -.20) .000406 (unwieghted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990). d OCQ- (-.36 -.25) .000778 15 item e OCQ- (-.21 -.15) .000215 9 item f OCQ- (-.21 -.13) .000387 9 item Unwighted Procedure (Osburn & Callender, 1992). OCQ- (-.37 -.26) .000863 15 item OCQ- (-.22 .13) .000498 9 item 95% Critical Confidence Ratio Interval Z Primary Meta-Analyses a Overall (.-25 .18) (N-weighted) b Overall (-.26 -.19) 0.00 a,b (N-weighted) c Overall (-.28 -.20) 0.75 a,c (unwieghted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990). d OCQ- (-.36 -.25) 15 item e OCQ- (-.21 -.15) 3.81 d,e [*] 9 item f OCQ- (-.21 -.13) 3.81 d,f [*] 9 item Unwighted Procedure (Osburn & Callender, 1992). OCQ- (-.37 -.26) 15 item OCQ- (-.22 -.13) 4.07 [*] 9 item Note: r(+.) = corrected for measurement error; Critical Ratio Z: (*.)p [less than] .05, one tailed; b,f = Blegan, Mueller, and Price (1988; n = 1,813; r = -.19) removed. Select Results of Williams and Livingstone's (1994) Performance-Voluntary Turnover Meta- Analysis Mean Observed Observed N K r Variance Primary Meta-Analyses a Overall 15,138 55 -.12 .019169 (N-weighted) b Overall 11,152 54 -.16 .020692 (N-weighted) c Overall 15,138 55 -.15 .020527 (unweighted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990) d Reward 2,907 14 -.25 .025918 Contingency-YES e Reward 12,231 41 -.09 .012829 Contingency-YES f Reward 8,245 40 -.13 .015036 Contingency-NO Unweighted Procedure (Osburn & Callender, 1992). Reward 2,907 14 -.21 .027463 Contingency-YES Reward 12,231 41 -.14 .016891 Contingency-NO Sampling Error Explained Residual Variance Variance Variance Primary Meta-Analyses a Overall .003534 18% .015635 (N-weighted) b Overall .004613 22% .016079 (N-weighted) c Overall .003474 17% .017052 (unweighted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990) d Reward .004246 16% .021672 Contingency-YES e Reward .003303 26% .009526 Contingency-YES f Reward .004710 31% .010326 Contingency-NO Unweighted Procedure (Osburn & Callender, 1992). Reward .004433 16% .023030 Contingency-YES Reward .003239 19% .013652 Contingency-NO 95% Sampling Credibility Variance Interval EQ1/EQ2 Primary Meta-Analyses a Overall (-.37 .12) .000349 (N-weighted) b Overall (-.41 .09) .000383 (N-weighted) c Overall (-.41 .10) .000373 (unweighted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990) d Reward (-.54 .04) .001851 Contingency-YES e Reward (-.29 .10) .000313 Contingency-YES f Reward (-.33 .07) .000376 Contingency-NO Unweighted Procedure (Osburn & Callender, 1992). Reward (-.50 .09) .001962 Contingency-YES Reward (-.37 .09) .000412 Contingency-NO 95% Sampling Confidence Variance Interval EQ3 Primary Meta-Analyses a Overall (-.16 -.09) .000349 (N-weighted) b Overall (-.20 -.12) .000383 (N-weighted) c Overall (-.19 -.12) .000373 (unweighted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990) d Reward (-.34 -.17) .001851 Contingency-YES e Reward (-.13 -.06) .000313 Contingency-YES f Reward (-.17 -.09) .000376 Contingency-NO Unweighted Procedure (Osburn & Callender, 1992). Reward (-.29 -.12) .001962 Contingency-YES Reward (-.18 -.10) .000412 Contingency-NO 95% Critical Confidence Ratio Interval Z Primary Meta-Analyses a Overall (-.16 -.09) (N-weighted) b Overall (-.20 -.12) 1.48 a,b (N-weighted) c Overall (-.19 -.12) 1.12 a,c (unweighted) Moderator Meta-Analyses Sample-Weighted Procedure (Hunter & Schmidt, 1990) d Reward (-.34 -.17) Contingency-YES e Reward (-.13 -.06) 3.42 [*] d,e Contingency-YES f Reward (-.17 -.09) 2.54 [*] d,f Contingency-NO Unweighted Procedure (Osburn & Callender, 1992). Reward (-.29 -.12) Contingency-YES Reward (-.18 -.10) 1.44 Contingency-NO Note: Critical Ratio Z: (*.)p [less than] .05, one tailed; b,f = Ofsanko (1979; n = 3,986; r = 1.02) removed. Results of Transformational Leadership-Performance Primary Meta-Analyses Transformational Mean Sampling Leadership-Performance Observed Observed Error Relationships K N r Variance (VEAVE) Charisma N-weighted 12 3115 .43 .027600 .002548 Unweighted 12 3115 .40 .032172 .002711 Intellectual Stimulation N-weighted 12 3115 .38 .020454 .002832 Unweighted 12 3115 .34 .021185 .003036 Individual Condideration N-weighted 11 3067 .41 .032399 .002488 Unweighted 11 3067 .36 .026056 .002715 Transformational 95% Sampling 95% Leadership-Performance Explained Credibility Error Explained Credibility Relationships Variance Interval (VEIND) Variance Interval Charisma 9% (.12.74) .002625 10% (.12.74) 8% (.07.74) .002625 8% (.07.74) Intellectual Stimulation 14% (.12.64) .002957 14% (.12.64) 14% (.07.60) .002957 14% (.07.60) Individual Condideration 8% (.07.75) .002642 8% (.07.75) 10% (.06.66) .002642 10% (.06.66) 95% Transformational Confidence Critical Leadership-Performance Intervals Ratio Relationships EQ2 EQ3 Z Charisma (.34.53) (34.53) (.30.50) (.30.50) 0.43 Intellectual Stimulation (.30.46) (.30.46) (.26.42) (.27.46) 0.69 Individual Condideration (.30.52) (.30.52) (.27.46) (.27.46) 0.69 Note: N-weighted = Hunter and Schmidt (1990), Unweighted = Osburn and Callender (1992). Chi-square, (*.)p [less than] .05, (**.)p [less than] .10. Critical Ratio Z, (+.)p [less than] .05. Results of Transformational Leadership Research Design Moderator Meta-Analyses Mean Sampling Observed Observed Error K N r Variance (VEAVE) Charisma Percept-Percept N-weighted 7 2680 .46 .025706 .001623 Unweighted 7 2680 .45 .046155 .001671 Multi-source N-weighted 5 435 .27 .006696 .010034 Unweighted 5 435 .34 .005720 .009095 Intellectual Stimulation Percept-Percept N-weighted 7 2680 .41 .014025 .001805 Unweighted 7 2680 .37 .022249 .001941 Multi-source N-weighted 5 435 .18 .014099 .010878 Unweighted 5 435 .29 .015496 .009779 Individual Consideration Percept-Percept N-weighted 7 2680 .43 .033814 .001744 Unweighted 7 2680 .37 .037653 .001936 Multi-source N-weighted 4 387 .28 .004285 .008825 Unweighted 4 387 .34 .005119 .008137 95% Sampling Explained Credibility Error Explained Variance Interval (VEIND) Variance Charisma Percept-Percept 6% (.16 .77) .001575 6% 4% (.04 .86) .001575 3% Multi-source 150% [**] .27 .009094 136% [**] 159% [**] .34 .009094 159% [**] Intellectual Stimulation Percept-Percept 13% (.20 .63) .001878 13% 9% (.09 .65) .001878 8% Multi-source 77% [**] (.07 .29) .009607 68% [**] 63% [*] (.14 .44) .009607 62% [*] Individual Consideration Percept-Percept 5% (.08 .78) .001844 5% 5% (.00 .74) .001844 5% Multi-source 206% [**] .28 .008166 191% [**] 159% [**] .34 .008166 160% [**] 95% 95% Confidence Critical Credibility Intervals Ratio Interval EQ1 or EQ2 EQ3 Z Charisma Percept-Percept (.16 .77) (.34 .58) (.34 .58) (.03 .86) (.29 .61) (.29 .61) Multi-source .27 (.18 .35) (.19 .34) 2.68 [+] .34 (.26 .42) (.27 .41) 1.25 Intellectual Stimulation Percept-Percept (.20 .63) (.32 .50) (.32 .50) (.09 .65) (.26 .48) (.26 .48) Multi-source (.05 .31) (.08 .29) (.08 .29) 3.32 [+] (.14 .44) (.18 .40) (.18 .40) 1.01 Individual Consideration Percept-Percept (.08 .78) (.29 .57) (.29 .57) (.00 .75) (.23 .52) (.23 .52) Multi-source .28 (.19 .38) (.22 .35) 1.92 [+] .34 (.25 .43) (.27 .41) 0.37

Note: N-weighted = Hunter and Schmidt (1990), Unweighted Osburn and Callender (1992). Chi-Square, (*.)p [less than] .05, (**.)p [less than] .10. EQ2 is used when sampling error variance accounts for greater than 100% of observed variance. Critical Ratio Z, (+.)p [less than] .05.

Results of Select Examples N K Petty, McGee and Cavender (1984) JDI-Performance Superivision N-weighted 1,927 10 Unweighted 1,927 10 Coworker N-weighted 1,467 8 Unweighted 1,467 8 Huffcutt and Arthur (1994) Structure-Interview Validity L3 N-weighted 4,358 27 Unweighted 4,358 27 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military N-weighted 2,571 9 Unweighted 2,571 9 Civilian N-weighted 30,182 52 Unweighted 30,182 52 Mean Observed Observed r Variance Petty, McGee and Cavender (1984) JDI-Performance Superivision .27 .016601 .24 .010249 Coworker .20 .023484 .16 .016648 Huffcutt and Arthur (1994) Structure-Interview Validity L3 .34 .022107 .32 .021705 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military -.39 .018262 -.38 .017862 Civilian -.44 .010543 -.49 .017694 Sampling Error Variance VEAVE Petty, McGee and Cavender (1984) JDI-Performance Superivision .004467 .004628 Coworker .005073 .005219 Huffcutt and Arthur (1994) Structure-Interview Validity L3 .004894 .005049 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military .002546 .002561 Civilian .001123 .001004 Credibility Interval VEIND Petty, McGee and Cavender (1984) JDI-Performance Superivision (.06 .49) .004566 (.09 .39) .004556 Coworker (-.07 .46) .005070 (-.05 .37) .005070 Huffcutt and Arthur (1994) Structure-Interview Validity L3 (.08 .59) .004940 (.06 .57) .004940 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military (-.63 -.14) .002502 (-.62 -.14) .002502 Civilian (-.63 -.25) .000993 (-.74 -.23) .000993 Credibility Interval Petty, McGee and Cavender (1984) JDI-Performance Superivision (.06 .49) (.09 .39) Coworker (-.07 .46) (-.05 .37) Huffcutt and Arthur (1994) Structure-Interview Validity L3 (.08 .59) (.06 .57) Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military (-.63 -.14) (-.62 -.14) Civilian (-.63 -.25) (-.74 -.23) Standard Error and 95% Confidence Intervals by Sampling Error Estimator VEAVE EQ2 Petty, McGee and Cavender (1984) JDI-Performance Superivision .001660 .001025 Coworker .054181 .045619 Huffcutt and Arthur (1994) Structure-Interview Validity L3 .000819 .000804 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military .002029 .001985 Civilian .000203 .000340 VEIND 95% CI EQ2 Petty, McGee and Cavender (1984) JDI-Performance Superivision (.19 .35) .001651 (.18 .30) .001032 Coworker (.09 .30) .054184 (.07 .25) .045822 Huffcutt and Arthur (1994) Structure-Interview Validity L3 (.28 .39) .000817 (.26 .37) .000808 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military (-.47 -.30) .002034 (-.47 -.29) .001991 Civilian (-.47 -.41) .000205 (-.52 -.45) .000340 VEAVE VEIND 95% CI EQ3 EQ3 Petty, McGee and Cavender (1984) JDI-Performance Superivision (.19 .35) .001660 .001660 (.18 .30) .001025 .001025 Coworker (.09 .30) .054181 .054181 (.07 .25) .045619 .045619 Huffcutt and Arthur (1994) Structure-Interview Validity L3 (.28 .39) .000819 .000819 (.26 .37) .000804 .000804 Fuller, Hester, Dickson, Allison, and Birdseye (1996) Military (-.47 -.30) .002029 .002029 (-.47 -.29) .001985 .001985 Civilian (-.47 -.41) .000203 .000203 (-.52 -.45) .000340 .000340 Note: Petty et al. (1984) sample-size outlier, n = 579. Huffcutt and Arthur (1944) sample-size outlier, n = 1050. Fuller et al. (1996) sample-size outlier, n= 13,388.

Printer friendly Cite/link Email Feedback | |

Author: | Fuller, J. Bryan; Hester, Kim |
---|---|

Publication: | Journal of Management |

Article Type: | Statistical Data Included |

Geographic Code: | 1USA |

Date: | Nov 1, 1999 |

Words: | 11202 |

Previous Article: | Observers' Reactions to Social-Sexual Behavior at Work: An Ethical Decision Making Perspective. |

Next Article: | Containing Compensation Costs: Why Firms Differ in their Willingness to Reduce Pay. |

Topics: |