Practical significance: the use of effect sizes in school counseling research.
Numerous publications have advocated for additional and higher-quality school counseling research, especially in regard to comprehensive school counseling programs (e.g., American School Counselor Association [ASCA], 2005b; Bauman et al., 2002; Brown & Trusty, 2005a, 2005b; Lapan, 2001, 2005; Sink, 2005). In fact, prominent school counselor educators (e.g., Astramovich, Coker, & Hoskins, 2005; Baker & Gerler, 2004; Gysbers & Henderson, 2006; Myrick, 2003; Schmidt, 2002) and the authors of the ASCA National Model[R] (2005a) have suggested that accountability data must be collected to appraise the efficacy of program interventions, activities, and services. Moreover, ASCA's (2004) Ethical Standards for School Counselors indicate that a practitioner who acts in a professional manner "conducts appropriate research and report findings in a manner consistent with acceptable educational and psychological research practices" (F.lc). To further underscore this ethical declaration as well as reinforcing Bauman et al.'s sentiments, Whiston (2003) argued that "the future of professional school counseling is at risk unless there is a shared commitment to conducting research that clearly documents how professional school counselors make a positive difference in students' lives" (p. 447).
One of the major challenges with school counseling research is determining when an investigation's statistically significant findings are actually useful to practitioners. In other words, how do school counselors know when the results of a statistical analysis (e.g., t test or correlational analysis) have some real-world application? This practical "applicability" problem is chiefly an issue in quantitative studies, that is, those investigations that attempt to understand and summarize school counseling trends using numerical data.
Take, for example, a hypothetical quantitative study that includes five urban school districts and nearly 150 middle school students who were referred more than 10 times over a period of 2 years to school principals for inappropriate classroom behavior. A school counseling researcher wants to determine the effectiveness of three individual counseling methods (the independent variable called counseling method)--including Glasser's reality therapy (RT), Rogerian counseling (RC), and solution-focused counseling (SFC)--on the mean (M) recidivism rate of inappropriate classroom behavior (the dependent variable). The researcher then computes an analysis of variance (ANOVA) to statistically compare potential differences among the M recidivism scores. A statistically significant difference is found among the three individual counseling methods. The derived statistical output from this analysis includes the following: F(2, 142) = 15.89, p < .05 ([M.sub.Recidivism] for SFC = 5; [M.sub.Recidivism] for RT = 10; [M.sub.Recidivism] for RC = 20, with the smaller the M, the lower the recidivism rate for behavior problems).
Interpreting these results, and others not reproduced here, the researcher demonstrates that SFC yielded the greatest amount of behavior change (i.e., lowest recidivism rate of inappropriate classroom behavior) in middle school students. However, there is the question of practical applicability even though the above ANOVA result is statistically significant. Is it important enough for school counselors in the five districts to take seriously and change their individual counseling approach when working with students with major classroom behavior problems? Or more critically, should school counselors use SFC with middle school students more frequently than Roger's feeling-focused approach and Glasser's RT based on this statistical finding? A good way to estimate the applicability of research findings to school counseling practice is to calculate an effect size (ES).
In this article, we focus on the most widely used parametric ES measures and how they can be used in school counseling outcomes research. (Parametric data are data--e.g., test scores--that approximate a bell-shaped or normal curve.) In particular, we (a) discuss the rationale for reporting ESs, (b) overview the primary ES categories and the ESs associated with each, (c) explore how disparate research designs and sample sizes affect the magnitude of ESs, (d) highlight ES computations, (e) consider how these indices should be reported and interpreted in school counseling settings, and (f) provide additional ES resources to consult. Before discussing the two principal types of ESs, a brief caveat is in order. Given limitations on article length, we cannot explicate fully much of the technical material presented below. Before applying ESs to the results of a small or large study, readers are encouraged to first read up on the topic (see below for recommendations) and consult with a research-oriented counselor educator at a local university.
WHY SHOULD EFFECT SIZES BE REPORTED?
Although the value of ESs in counseling, educational, and social science research has certainly received more attention in recent years, the idea is not new. Ronald Fisher suggested more than 75 years ago that researchers include a correlation ratio or an eta squared ([[eta].sup.2]) with their ANOVA findings to indicate the magnitude of association between the independent and dependent variables (Huberty, 2002). Since then, more than 40 "effect magnitude" indices have been suggested (Kirk, 1996). What is new or at least what has become more pressing, given the No Child Left Behind (U.S. Department of Education, 2001) legislation, is the need for school counseling researchers to present data and findings that reflect both the statistical and practical significance of their work. Cohen (1990), in fact, adamantly stated that the point of research should be to measure the magnitude of an effect rather than simply its statistical significance.
Why, then, should school counseling researchers and those publishing in Professional School Counseling report both a p value indicating the research finding's statistical significance and an ES index as an estimate of its practical significance (Kirk, 1996)? In this section, we attempt to make the case for including ESs in research studies and program evaluations. First, when investigators only report a statistical test's derived significance level (i.e., its p value), readers are left without a useful statistical tool to determine the importance or practicality of the finding for their own work settings. Suppose, for instance, an instructional method used during an elementary school classroom guidance lesson increased the mean self-concept score (the dependent variable) of third graders in the experimental group by 5 points over those students in the control group. This fairly modest mean difference actually may be statistically significant (p < .05), especially if the sample size is large enough. In other words, "whether or not such a 5-point difference (i.e., magnitude of effect) between the groups is meaningful from an instructional standpoint depends on many factors besides the statistically significant p value" (Snyder & Lawson, 1993, p. 335).
Vasquez, Gangstead, and Henson (2000) further elaborated on how statistically derived p values are sharply influenced by the sample size:
Small differences/relationships can be interpreted as statistically significant based upon the presence of large sample sizes, or conversely, large differences/relationships can be declared statistically non-significant due to small sample sizes.... A relatively small p value does not necessarily mean that there is a strong relationship between the independent and dependent variables of interest in a study. (pp. 4-5)
Other writers have suggested that a single study resulting in a decision to reject or accept a null hypothesis based on an a priori alpha level does little to advance the development of theory, whereas the reporting of an ES allows the comparison of research studies and the contextualization of findings (e.g., Huberty, 1987). In short, if researchers fail to report ESs, and only include the research findings' derived significance levels, key information is missing that assists in understanding the practical value of the results.
A second reason for reporting ESs comes from the American Psychological Association's (APA) Publication Manual (2001; see also Fidler, 2002, for an overview), the most widely accepted writing style manual for school counseling, psychological, and educational researchers. It certainly encourages the use of ESs:
For the reader to fully understand the importance of your findings, it is almost always necessary to include some index of effect size or strength of relationship in your Results section. You can estimate the magnitude of the effect or the strength of the relationship with a number of common effect size estimates.... The general principle to be followed ... is to provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship. (APA, pp. 25-26)
According to Thompson (2006), 24 journals have heeded APA's suggestion (see Wilkinson & APA Task Force on Statistical Inference, 1999) and now require or strongly recommend that ESs be reported in manuscripts submitted for publication, including, for example--
* Career Development Quarterly
* Contemporary Educational Psychology
* Counseling &. Values
* Educational and Psychological Measurement
* Exceptional Children
* Journal of Applied Psychology
* Journal of Community Psychology
* Journal of Consulting and Clinical Psychology
* Journal of Counseling & Development
* Journal of Educational and Psychological Consultation
* Measurement and Evaluation in Counseling and Development
* Research in the Schools.
Thompson (2000) provided additional reasons why reporting ESs is important, because by doing so it (a) facilitates higher-quality meta-analytic research reviews, (b) assists future researchers to devise more specific study parameters and outcome expectations, and (c) aids with evaluating how a study's results fit within the context of previous research. (That is, how similar are these results compared to previous studies? What features of the research contributed to these similarities and/or differences?)
To recap, the inclusion of ESs above and beyond the reporting significance level is increasingly encouraged, and in many cases a requirement, for the publication of research in reputable, peer-reviewed journals. Effect sizes provide an indication of the magnitude of an effect and offers "comparative standards with past and present research ... assist[ing] the researcher in identifying important characteristics for subsequent follow-up research" (Vasquez et al., 2000, p. 4). Effect sizes allow investigators to say something more definitive about the practical strength of the findings than merely reporting the derived p value (Thompson, 1998, 2006).
FAMILIES OF EFFECT SIZES
Effect sizes are referred to in numerous ways in counseling, educational, and social science research. They often are interchangeably called ES estimates, ES indices, or ES measures. Kline (2004) indicated that the most commonly used ESs fall into one of two general families: (a) standardized mean differences (also called group difference indices; e.g., Cohen's d), or (b) strength of association (also referred to as relationship variance-accounted-for or variance-explained ESs; e.g., Pearson [r.sup.2], [[eta].sup.2]). VachaHaase and Thompson (2004) added a third class of ES called "corrected" effect sizes. These three categories are briefly explained next.
Tables 1 and 2 summarize sample standardized mean differences and strength of association ES indices. (Whereas the standardized mean difference ESs are reported in a nonsquared, standardized score metric, the relationship variance-accounted-for ESs are reported in a squared metric--e.g., [r.sup.2] and [R.sup.2] [Thompson, 2002].) The mean difference ESs are computed when the focus of the statistical analysis uses mean outcome scores to compare potential group differences. For instance, attest may be computed to determine whether there was a significant (p < .05, two-tailed) difference between girls' and boys' (group is the independent variable) mean math test scores (outcome or dependent variable). The means for each group also could be compared and reported, for example, using a Cohen's d ES.
Strength of association ES indices are calculated across a wide variety of research designs (e.g., ex post facto or causal comparative, experimental, quasi-experimental, prediction [regression and correlational]) using the general linear model (GLM; for detailed explanations, see Rutherford, 2001; Trochim, 2005) for the statistical analyses. Strength of association ESs indicate how much of the variance associated with the dependent variable (e.g., students' English test scores) can be accounted for (or explained) by the independent variable(s) (e.g., gender, ethnicity, method of counseling; Snyder & Lawson, 1993; Vacha-Haase & Thompson, 2004).
In other words, the GLM statistical procedure deployed to compute, for example, a factorial ANOVA produces strength of association or variance-accounted-for ESs. These are based on the ratio of between the specific variance that is related to the independent variable(s) (e.g., variance due to the differential effect of the intervention on group [independent variable]) to total variance (the combined variance relating to the independent and dependent variables as well as to residual or error variance). These ESs also can be reported as (a) coefficients of determination, where [r.sup.2] represents the squared bivariate Pearson correlation coefficients and [R.sup.2] symbolizes the squared multiple correlation among three or more variables; and (b) squared etas ([[eta].sup.2]s) generated from a statistical analysis with perhaps several independent and dependent variables.
Both ES families can be further categorized into biased (uncorrected) and unbiased (corrected) measures (Fan, 2001; Roberts & Henson, 2002; Snyder & Lawson, 1993; Thompson, 1999, 2002, 2006; Vacha-Haase & Thompson, 2004; see Tables 1 and 2). The former relates to "sampling error" and how the ESs derived from statistical procedures computed with smaller groups of participants (samples) drawn from a larger population will be higher than would be found either in the original population or in participant samples studied at a later date. For example, assume a school counseling researcher randomly selects a sample of 15 girls and 15 boys from a 10th-grade physical education class in one high school. This 30-person sample then is assumed to represent the entire population of male and female students attending all Grade 10 physical education classes across a district's five high schools. Because there is always sampling error (i.e., problems with selecting the students to participate in the research study), uncorrected ESs (e.g., Cohen's d, Glass' [DELTA], [eta.sup.2], [R.sup.2]) will be higher (i.e., positively biased or overestimated) for the 30 students than if the researcher actually could have included in the study every 10th-grade physical education student across all five high schools.
In contrast to biased ESs, unbiased ones (e.g., adjusted [R.sup.2], Hays' [omega.sup.2], [epsilon.sup.2], and Wherry formulas; see Tables 1 and 2) statistically correct for this positive bias by calculating the strength of association indices so that they better reflect the true ESs for the entire population and those calculated in future samples (e.g., see the Herzberg and Lord formulas in Tables 1 and 2). Because these formulas must adjust for sampling error occurring both in the present study and in subsequent investigations, ES estimates for future research samples result in more shrinkage (i.e., are smaller or more conservative) than those ES estimates based on an entire population (Snyder & Lawson, 1993). Therefore, interpretation of ESs is often less than straightforward and must be exercised with caution.
To summarize, parametric ES indices are available both for mean comparison statistical designs (standardized mean difference ESs) and for correlational and GLM statistical designs (strength of association or relationship variance-accounted-for ESs). Moreover, some ESs are uncorrected or biased, while others are corrected or unbiased. Statistical software can now be used to compute ESs and many of these can be found online at various Web sites.
HOW DO DIFFERENT RESEARCH DESIGNS AND SAMPLE SIZES INFLUENCE ES?
The type of research design selected by the school counseling investigator can determine whether to use, for example, a Cohen's d or an [eta.sup.2] ES. As alluded to previously, if the research design, for instance, calls for a statistical comparison between pretest and posttest mean scores for a group of students receiving a school counseling intervention versus those who are not (the control group), the ES to use is a standardized differences index. However, if the research design examines the relationship among variables (i.e., correlational design) and calls for a statistical procedure that uses GLM (e.g., ANOVA), then relationship variance-accotmted-for ESs should be reported.
Numerous factors in research can influence the amount of ES bias (O'Grady, 1982; Olejnik & Algina, 2000), including (a) limited reliability of the outcome or measurement instruments (generally, the better the reliability of the tests used, the higher the ESs); (b) sample size (larger sample sizes generally will produce less biased results); (c) number of independent variables to number of participants (the more participants per variable, the less bias); (d) heterogeneity of the study sample (more homogenous samples generally yield smaller effect sizes); and (e) type of research design (experimental studies generally produce smaller ESs). A well-designed and implemented study will, therefore, diminish the size of the ES bias and make it less likely that a corrected ES will be needed. If the research design is far less than optimal, always report the corrected ESs.
When larger samples (n > 50) are used, the magnitude of biased (uncorrected) and unbiased (corrected) ESs will be essentially the same (Snyder & Lawson, 1993). However, statistical corrections tend to be larger with small sample sizes (n > 30) and produce smaller ESs (Thompson, 1990, 1997). Tables 3 and 4 exemplify two different group sizes and their resulting ESs using various measures of strength of association. Sample 1 is relatively undersized (n = 24) and produces a small ES [[eta].sup.2]=.17), whereas sample 2 is large (n = 150) and yields a moderate-level ES ([[eta].sup.2]=.44). These examples also illustrate how sample size can influence how much strength of association indices can vary across different ES formulas. With a small sample size, the ESs range from .02 to .17. In contrast, with a large sample, the ES indices vary little, ranging from .42 to .44. In short, a larger sample size tends to produce more stable ESs.
HOW ARE EFFECT SIZES COMPUTED?
In general, standardized mean difference ESs are relatively easy to calculate. In the numerator, one computes the difference between the control (or comparison) and experimental groups' average scores, and this figure then is divided by an estimate of the groups' standard deviation. Strength of association ESs, generally speaking, are derived using the ratio of between-group (or independent variable) variance to total variance. Because strength of association measures are based on a regression model (GLM), most can be interpreted as the percent of variance accounted for in the dependent variable. For example, an [eta.sup.2] of .26 means that 26% of the variance in the dependent variable can be accounted for by the independent variable(s). In GLM statistical analyses, most statistical programs (e.g., SPSS, 2005) will include a printout with a "partial eta squared" if one selects "estimates of effect sizes" under the "options" on the GLM tab (this is equal to a "R squared," as they are both calculated using the same formula). Most ES measures can be calculated by hand as well using the formulas provided in Tables 1 and 2. For additional information, Coe (2002) and Thompson (1997) have provided a useful overview of ES calculations.
HOW SHOULD EFFECT SIZES BE REPORTED AND INTERPRETED?
There are key ES reporting guidelines in counseling research (Vacha-Haase & Thompson, 2004). First, when including ESs in the results, it is vital to clearly indicate what type of ES is being reported. Without the name of the ES, readers cannot evaluate its strength. A Cohen's d of .75 means something quite different in practical terms than a [R.sup.2] of .75. Second, report the derived ESs in context to the normalcy of data used to calculate them (Huberty & Lowman, 2000; Vacha-Haase & Thompson). As mentioned earlier, ESs, like other statistics (e.g., t, F, r), are subject to the parametric assumptions underlying them. When "distribution or homogeneity assumptions are severely violated, F and p calculated values may be compromised, but so too will be the effect estimates" (Vacha-Haase & Thompson, p. 477). Another interpretation practice that is strongly encouraged in the ES literature is to formulate confidence intervals around a derived ES estimate (Bird, 2002; Thompson, 2002; Vacha-Haase & Thompson). A confidence interval provides a "band" around the derived ES that indicates how much the ES could fluctuate based on sampling error and, thus, vary from one sample or group of students to the next sample (Thompson).
Cohen (1992) published general guidelines for assessing and interpreting the magnitude of standardized difference effect sizes. As a rule, the stronger (or higher) the ESs, the more compelling the evidence that the statistically significant results are useful for school counseling practice. Table 1 suggests the threshold numbers (cutoffs) for labeling the magnitude of each type of ES. For example, small, medium, and large ds are the absolute values, .2, .5, and .8, respectively. However, the final interpretation of the practical or clinical significance of all ESs ultimately remains with the counseling researcher (Thompson, 2002; Vacha-Haase & Thompson, 2004). Any ES should always be considered within the context of previous related research, the design of the study, and the educational impact of the findings. When comparing ESs across school counseling studies, therefore, make sure to attend to the differences in research designs and how they may influence the size of the reported effects.
Thompson (1999, 2002) also cautioned against applying ES cutoffs with the same rigidity that has typically been applied in statistical significance testing, noting that Cohen only intended the cutoffs for small, medium, and large ESs as broad, general guidelines, not as inflexible universal standards. More succinctly, Thompson (2002) suggested that researchers never use these guidelines (the ES cutoffs) blindly "with the same rigidity that the [alpha] = .05 criterion has been used, we would merely be being stupid in a new metric" (p. 68).
SUPPLEMENTARY RESOURCES ON EFFECT SIZES
Because there are numerous Web sites that address the basics and the nuances of ES usage, listing one or two here would not cover the wide variety of Web-based resources available to readers. An Internet search (e.g., using "Google") for the term effect sizes would produce a substantial number of "hits." Some Web sites and their documents are straightforward and readily comprehensible (see, e.g., Coe, 2002), though others are less so. Google also will lead searchers to simple-to-use ES calculators. If readers would prefer helpful texts, we recommend four relatively recent books by Green and Salkind (2004), Grisson and Kim (2005), Kline (2004), and Rosenthal, Rosnow, and Rubin (2000). Many of the publications cited in this article are useful to consult as well.
The importance of ES reporting and interpretation is widely recognized in the educational and counseling literature. Effect size reporting is increasingly a preference--if not a requirement--by many educational and counseling journals (e.g., Journal of Counseling & Development) and APA's (2001) Publication Manual. It is imperative that both consumers of research and those interested in conducting school counseling studies understand the value of ESs as a measure of practical significance. Even though this article is by no means an exhaustive discussion of ESs and their uses, we hope that it will serve as a valuable introduction and resource for school counselor educators, practitioners, and graduate students to consult as they attempt to make better sense of statistical findings they produce or read. With the reporting of ESs, administrators and policymakers as well should be more equipped to interpret whether the school counseling research outcomes they are reviewing are applicable to their schools.
American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.
American School Counselor Association. (2004). Ethical standards for school counselors. Alexandria, VA: Author. Retrieved December 1, 2005, from http://www.schoolcounselor.org/content.asp?contentid=173
American School Counselor Association. (2005a). The ASCA national model: A framework for school counseling programs (2nd ed.). Alexandria, VA: Author.
American School Counselor Association. (2005b). Foundations and basics. Alexandria, VA: Author.
Astra movich, R. L., Coker, J. K., & Hoskins, W. J. (2005).Training school counselors in program evaluation. Professional School Counseling, 9, 49-54.
Baker, S. B., & Gerler, E. R. (2004). School counseling in the twenty-first century (4th ed.). Upper Saddle River, NJ: Pearson.
Bauman, S., Siegel, J.T., Davis, A., Falco, L. D., Seabolt, K., & Szymanski, G. (2002). School counselors' interest in professional literature and research. Professional School Counseling, 5, 346-352.
Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educational & Psychological Measurement, 62, 197-226.
Brown, D., & Trusty, J. (2005a). School counselors, comprehensive school counseling programs, and academic achievement: Are school counselors promising more than they can deliver? Professional School Counseling, 9, 1-8.
Brown, D., & Trusty, J. (2005b). The ASCA national model, accountability, and establishing causal links between school counselors' activities and student outcomes: A reply to Sink. Professional School Counseling, 9, 13-15.
Coe, R. (2002, September). It's the effect size, stupid: What effect size is and why it is important. Paper presented at the annual conference of the British Educational Research Association, University of Exeter, England. Retrieved December 13, 2005, from http://www.leeds.ac.uk/educol/documents/00002182.htm
Cohen, J. (1990).Things I have learned (so far). American Psychologist, 45, 1304-1312.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. Journal of Educational Research, 94, 275-282.
Fidler, F. (2002).The fifth edition of the APA publication manual: Why its statistical recommendations are so controversial. Educational and Psychological Measurement, 62, 749-770.
Green, S. B., & Salkind, N. J. (2004). Using SPSS for Windows and Macintosh:Analyzing and understanding (4th ed.). Upper Saddle River, N J: Prentice Hall.
Grisson, R. J., & Kim, J. J. (2005). Effect sizes for research. Mahwah, N J: Erlbaum.
Gysbers, N. C., & Henderson, P. (2006). Developing and managing your school guidance program (4th ed.). Alexandria, VA: American Counseling Association.
Huberty, C. J. (1987). On statistical testing. Educational Researcher, 16, 4-9.
Huberty, C. J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62, 227-240.
Huberty, C. J., & Lawman, L. L. (2000). Group overlap as a basis for effect size. Educational & Psychological Measurement, 60, 543-563.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational & Psychological Measurement, 55, 746-759.
Kline, R. B. (2004). Beyond significance testing. Washington, DC: American Psychological Association.
Lapan, R.T. (2001). Results-based comprehensive guidance and counseling programs: A framework for planning and evaluation. Professional School Counseling, 4, 289-299.
Lapan, R.T. (2005). Evaluating school counseling programs. In C. A. Sin k (Ed.), Contemporary school counseling: Theory, research, and practice (pp. 257-293). Boston: Houghton-Mifflin. Myrick, R. D. (2003). Developmental guidance and counseling (4th ed.). Minneapolis, MN: Educational Media.
O'Grady, K. E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin, 92, 766-777.
Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Application, interpretations, and limitations. Contemporary Educational Psychology, 25, 241-286.
Roberts, J. K., & Henson, R. K. (2002). Correction for bias in estimating effect sizes. Educational and Psychological Measurement, 62, 241-253.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge, UK: Cambridge University Press.
Rutherford, A. (2001). Introducing ANOVA and ANCOVA: A GLM approach. Thousand Oaks, CA: Sage.
Schmidt, J. (2002). Counseling in schools: Essential services and comprehensive programs (4th ed.). Boston: Allyn and Bacon.
Sink, C. A. (2005). Comprehensive school counseling programs and academic achievement--A rejoinder to Brown and Trusty. Professional School Counseling, 9, 9-12.
Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334-349.
SPSS. (2005). Data analysis with comprehensive statistics software. Retrieved December 10, 2005, from http://www.spss.com/spss/
Thompson, B. (1990). Finding a correction for the sampling error in multivariate measures of relationship: A Monte Carlo study. Educational and Psychological Measurement, 50, 15-31.
Thompson, B. (1997). Computing effect sizes. College Station, TX: Texas A & M University. Retrieved on December 2, 2005, from http://www.coe.tamu.edu/~bthompson/effect.html
Thompson, B. (1998). Statistical significance and effect size reporting: Portrait of a possible future. Research in the Schools, 5, 33-38.
Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 9, 165-181.
Thompson, B. (2000). A suggested revision to the forthcoming 5th edition of the APA publication manual Retrieved on December 2, 2005, from http://www.coe.tamu.edu/~bthompson/apaeffec.htm
Thompson, B. (2002)."Statistical," "practical," and "clinical": How many kinds of significance do counselors need to consider? Journal of Counseling & Development, 80, 64-71.
Thompson, B. (2006).The role of effect sizes in contemporary research in counseling. Counseling and Values, 50, 176-186.
Trochim, W. M. K. (2005). General linear model Retrieved December 2, 2005, from http://www.socialresearchmethods.net/kb/genlin.htm
U.S. Department of Education. (2001). No Child Left Behind Act (Pub. L No. 107-110). Retrieved January 6, 2006, from http://www.ed.gov/nclb/overview/intro/index.html
Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling and Psychology, 51, 473-481.
Vasquez, L. M., Gangstead, S. K., &Henson, R. K. (2000, January). Understanding and interpreting effect size measures in general linear model analyses. Paper presented at the Annual Meeting of the Southwest Educational Research Association, Dallas, TX.
Whiston, S. C. (2003). Outcomes research on school counseling services. In B.T. Erford (Ed.), Transforming the school counseling profession (pp. 435-447). Upper Saddle River, N J: Merrill Prentice-Hall.
Wilkinson, L., & American Psychological Association Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.
Christopher A. Sink, Ph.D., NCC, LMHC, is a professor and chair with the School of Education, Seattle Pacific University, WA. E-mail: firstname.lastname@example.org
Heather R. Stroh, Ed.D., is a research analyst with the Washington School Research Center, Lynnwood, WA.
Table 1. Effect Size Matrix ES Index Symbol Formula Standardized Difference Biased Cohen's d d [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Eta squared [[eta].sup.2] [t.sup.2]/[t.sup.2] + ([N.sub.1] + [N.sub.2] - 2) Eta squared [[eta].sup.2] [t.sup.2]/[t.sup.2] + N - 1 Glass' delta [DELTA] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Unbiased Hedges g g [[bar.X].sub.A] - [[bar.X].sub.B]/ [square root of ([MS.sub.w])] Variance-Accounted-for Biased Pearson r r [square root of ([t.sup.2]/ [t.sup.2] + [df.sub.within])] R squared [R.sup.2] [SS.sub.regression]/ [SS.sub.total] Univariate [[eta].sup.2] [SS.sub.between]/ eta squared [SS.sub.total] Multivariate [[eta].sup.2] 1 - [[LAMBDA].SUP.1/s] eta squared Eigenvalue [delta] [SS.sub.between]/ [SS.sub.within] Unbiased Adjusted R [R.sup.2] 1 - (1 -[R.sup.2]) squared N - 1/N - k - 1 Partial eta [[eta].sup.2]P [SS.sub.between]/ squared [SS.sub.between] + [SS.sub.error] Has' [[omega].sup.2] [SS.sub.between] - (v - 1) [omega.sup.2] [MS.sub.within]/[SS.sub.total] + [MS.sub.within] Epsilon [[epsilon].sup.2] [SS.sub.between] - (v - 1) squared [MS.sub.within]/[SS.sub.total] Wherry none 1 - [(n - 1)/(n - k - 1)] (1 - [R.sup.2]) Herzberg [R.sup.2] 1 - [(n - 1)/(n - k - 1)] [(n + k + 1)/n](1 - [R.sup.2]) Lord none 1 - (1 - [R.sup.2]) [(n + k + 1)/(n - k -1)] ES Index Procedures Interpretation (a) Used With Standardized Difference Biased Cohen's d Independent- .2 [approximately equal to] small samples .5 [approximately equal to] medium t tests (2 .8 [approximately equal to] large groups) Eta squared Independent- .01 [approximately equal to] small samples t .06 [approximately equal to] medium tests .14 [approximately equal to] large Eta squared Paired- .01 [approximately equal to] small samples (or .06 [approximately equal to] medium dependent) .14 [approximately equal to] large t tests (e.g., pretest and posttest scores) Glass' delta Meta- .2 [approximately equal to] small analyses .5 [approximately equal to] medium .8 [approximately equal to] large Unbiased Hedges g Independent- .2 [approximately equal to] small samples t .5 [approximately equal to] medium tests .8 [approximately equal to] large Variance-Accounted-for Biased Pearson r Bivariate [absolute value of .1] linear and [approximately equal to] small regression, [absolute value of .3] partial [approximately equal to] medium correlations [absolute value of .5] [approximately equal to] large R squared GLM: .01 [approximately equal to] small Multiple .06 [approximately equal to] medium regression .14 [approximately equal to] large Univariate Univariate .01 [approximately equal to] small eta squared GLM: .06 [approximately equal to] medium ANOVA, .14 [approximately equal to] large ANCOVA, repeated measures ANOVA Multivariate Multivariate .01 [approximately equal to] small eta squared GLM: .06 [approximately equal to] medium MANOVA, .14 [approximately equal to] large MANCOVA, repeated measures MANOVA Eigenvalue GLM: Discriminant analysis Unbiased Adjusted R GLM: .01 [approximately equal to] small squared Multiple .06 [approximately equal to] medium regression .14 [approximately equal to] large Partial eta Univariate .01 [greater than or equal to] small squared GLM: .06 [greater than or equal to] medium ANOVA, .14 [greater than or equal to] large ANCOVA, repeated measures ANOVA Has' Univariate .01 [approximately equal to] small [omega.sup.2] GLM: .06 [approximately equal to] medium ANOVA, .14 [approximately equal to] large ANCOVA, repeated measures ANOVA Epsilon Univariate .01 [approximately equal to] small squared GLM: .06 [approximately equal to] medium ANOVA, .14 [approximately equal to] large ANCOVA, repeated measures ANOVA Wherry Univariate .01 [approximately equal to] small GLM: .06 [approximately equal to] medium. ANOVA, .14 [approximately equal to] large ANCOVA, repeated measures ANOVA Herzberg Univariate .01 [approximately equal to] small GLM: .06 [approximately equal to] medium ANOVA, .14 [approximately equal to] large ANCOAA, repeated measures ANOVA Lord Univariate .01 [approximately equal to] small GLM: .06 [approximately equal to] medium ANOVA, .14 [approximately equal to] large ANCOVA, repeated measures ANOVA ES Index Notes Standardized Difference Biased Cohen's d Ranges in value from 0 to 1. One of the most widely used and commonly known ESs. Eta squared Ranges in value from 0 to 1. 0 indicates that the mean of the differences in scores is equal to zero. 1 indicates that the differences in scores in the sample are all the same nonzero value. Eta squared Ranges in value from 0 to 1. 0 indicates that the mean of the differences in scores is equal to zero. 1 indicates that the differences in scores in the sample are all the same nonzero value. Glass' delta Ranges in value from 0 to 1. Difference between group means standardized using the SD of the control/comparison group. Unbiased Hedges g Ranges in value from 0 to 1. Difference between group means standardized using the pooled variance estimate. Variance-Accounted-for Biased Pearson r Range from -1 to +1; + means that as X increases, Y increases; - means that as X increases, Y decreases. Values closer to +/-1 indicate stronger linear relationships. [r.sup.2] = the amount of variance accounted for in the criterion variable by the predictor variable R squared Ranges from 0 to 1. 0 = no linear relationship; 1= perfect linear relationship. Indicates the amount of variance accounted for in the criterion variable by the predictor variable(s). Shows greater bias when the sample size is small and the number of predictors is larvae. Univariate Ranges from 0 to 1.0 = no association; 1 = eta squared perfect association. Same as [R.sup.2]. Estimates the degree of association for the sample. Multivariate Ranges from 0 to 1.0 = no association; 1 = eta squared perfect association. Unclear what should be considered as small, medium, and large effects; interpret similar to univariate [eta.sup.2]. Eigenvalue Value is [greater than or equal to] 0. Eigenvalue has no upper limit. Difficult to interpret. See Green and Salkind (2004) for more information. Unbiased Adjusted R Ranges from 0 to 1.0 = no linear relationship; 1= squared perfect linear relationship. Indicates the amount of variance accounted for in the criterion variable by the predictor variable(s). Adjusted for overestimation population [R.sup.2]. Use when sample size is small and number of predictors/independent variables is large. Partial eta Ranges from 0 to 1. Values are usually smaller squared than those from an [eta.sup.2]. Small, medium, and large cutoffs for [eta.sup.2] are probably too large, so interpret with caution. Has' Estimate of the degree of association in the [omega.sup.2] population. It is unclear what should be considered as small, medium, and large effects, so interpret with caution. Epsilon Estimate of the degree of association in the squared population. It is unclear what should be considered as small, medium, and large effects, so interpret with caution. Wherry Estimate of the degree of association in the population. It is unclear what should be considered as small, medium, and large effects, so interpret with caution. Herzberg Corrects for estimates potentially realized in future samples. It is unclear what should be considered as small, medium, and large effects, so interpret with caution. Lord Corrects for estimates potentially realized in future samples. It is unclear what should be considered as small, medium, and large effects, so interpret with caution. Note. SS = sum of squares; MS = mean square; n = number of participants in sample; v = number of levels in independent variable (factor); [bar.X] = mean (average score); SD = standard deviation; [SD.sub.pooled] = average within groups SD; k = number of predictor variables; df = degrees of freedom; t = t statistic; s = number of levels of the factor minus 1, or number of dependent variables (whichever is smaller); GLM = general linear model; ANOVA = analysis of variance; ANCOVA = analysis of covariance; MANOVA = multivariate analysis of variance. (a) These are general guidelines from Green and Salkind (2004). Table 2. Effect Size Measures by Statistical Procedure Statistical Procedure Effect Size Estimate Symbol Used Biased Cohen's d d Independent Eta squared [[eta].sup.2] samples t test Unbiased Hedges g g One-sample Biased Glass' delta [DELTA] Dependent Biased Eta squared [[eta].sup.2] (paired) samples Meta-analysis Biased Glass' delta [DELTA] Bivariate linear Biased Pearson r r regression and partial correlation Multiple regression Biased R squared [R.sup.2] Unbiased Adjusted R [R.sup.2] squared ANOVA (univariate), Biased Univariate [[eta].sup.2] ANCOVA, or eta squared ANOVA with repeated measures (or within- Unbiased Partial eta Partial subjects ANOVA) squared [[eta].sup.2] Hays' omega [[omega].sup.2] Epsilon [[epsilon].sup.2] squared Wherry none Herzber [R.sup.2] Lord none MANOVA, Biased Omnibus [[eta].sup.2] MANCOVA, or multivariate MANOVA with repeated eta squared measures (or within- subjects MANOVA) Discriminant analysis Biased Eigenvalue [lambda] Statistical Procedure Effect Size Estimate Used Biased Cohen's d [[bar.X].sub.A] - [[bar.X].sub.B]/ [S.sub.pooled] Independent Eta squared [t.sup.2]/[t.sup.2] samples + ([N.sub.1] + [N.sub.2] -2) t test Unbiased Hedges g [[bar.X].sub.A] - [[bar.X].sub.B]/ [square root of ([MS.sub.w])] One-sample Biased Glass' delta ([[bar.X].sub.E] - [[bar.X].sub.C])/ [SD.sub.control] Dependent Biased Eta squared [t.sup.2]/[t.sup.2] (paired) + N -1 samples Meta-analysis Biased Glass' delta ([[bar.X].sub.E] - [[bar.X].sub.C])/ [SD.sub.control] Bivariate linear Biased Pearson r [square root of regression and ([t.sup.2]/[t.sup.2] partial correlation + [df.sub.within])] Multiple regression Biased R squared [SS.sub.regression]/ [SS.sub.total] Unbiased Adjusted R 1 - (1 - [R.sup.2]) squared N - 1/N - k - 1 ANOVA (univariate), Biased Univariate [SS.sub.between]/ ANCOVA, or eta squared [SS.sub.total] ANOVA with repeated measures (or within- Unbiased Partial eta [SS.sub.between]/ subjects ANOVA) squared [SS.sub.between] + [SS.sub.error] Hays' omega [SS.sub.between] - (v - 1) [MS.sub.within]/ [SS.sub.total] + [MS.sub.within]/ Epsilon [SS.sub.between] - squared (v - 1) [MS.sub.within]/ [SS.sub.total] + Wherry 1 - [(n -1)/ (n - k -1)] (1 - [R.sup.2]) Herzber 1 - [(n -1)/ (n - k -1)] (n + k + 1)/n (1 - [R.sup.2]) Lord 1 - (1 - [R.sup.2]) [(n + k + 1)/ (n - k -1)] MANOVA, Biased Omnibus 1 - MANCOVA, or multivariate [[LAMBDA].sup.1/2] MANOVA with repeated eta squared measures (or within- subjects MANOVA) Discriminant analysis Biased Eigenvalue [SS.sub.between]/ [SS.sub.within]/ Table 3. Sample 1 Showing Results of GLM ANOVA (Test of Between-Subjects Effect), with Group as Independent Variable and Self-Concept Scale Score as Dependent Variable, and ES Computations (N = 24) Source Sum of Squares df Mean Square F Corrected model (b) 27.75 2 13.88 2.14 Intercept 759.38 1 759.38 117.36 Group 27.75 2 13.88 2.14 Error 135.88 21 6.47 Total 923.00 24 Corrected total 163.63 23 Source p (a) Partial [Eta.sup.2] Corrected model (b) .14 .17 Intercept .00 .85 Group .14 .17 Error Total Corrected total ES Computations [R.sup.2] [R.sup.2] = [SS.sub.regression]/[SS.sub.total] = 27.75/163.625 = .17 [[Eta].sup.2] [[eta].sup.2] = [SS.sub.between]/[Ss.sub.total] = 27.75/163.625 = .17 Hays' [[omega].sup.2] = [SS.sub.between] - (v -1) [omega.sup.2] [MS.sub.within]/[SS.sub.total] + [MS.sub.within] = 27.75 - (3 - 1) 6.47/163.625 + 6.47 = 27.75 - 12.94/170.095 [Epsilon.sup.2] [[epsilon].sup.2] = [SS.sub.between] - (v - 1) [MS.sub.within]/[SS.sub.total] = 27.75 - (3 - 1) 6.47/163.625 = 27.75 - 12.94/163.625 = .09 Wherry 1 - [(n - 1)/(n - k 1)] (1 - [R.sup.2]) = 1 - [(24 -1)/(24 - 1 - 1)] (1 - .17) = 1 - (1.045)(.83) = .13 Herzberg [R.sup.2] = 1 - [(n - 1)/(n - k - 1)][n + k + 1)/n] (1 - [R.sup.2]) = 1 - [(24 - 1)/(24 - 1 - 1)][24 + 1 + 1)/24] (1 - .17) = 1 - (1.045)(1.083)(.83) = .06 Lord 1 - (1 - [R.sup.2]) [n + k + 1)/(n - k - 1)] = 1 - (1 - .17) [(24 + 1 + 1)/(24 - 1 - 1)] = 1 - (.83)(1.18) = .02 (a) Computed using alpha = .05. (b) [R.sup.2] = .21 (adjusted [R.sup.2] = .17). Table 4. Sample 2 Showing Results of GLM ANOVA (Test of Between-Subjects Effect), with Group as Independent Variable and Self-Concept Scale Score as Dependent Variable, and ES Computations (N = 150) Source Sum of Squares df Mean Square F Corrected model (b) 449.92 2 224.96 54.27 Intercept 4118.64 1 4118.64 1048.49 Group 449.92 2 224.96 57.27 Error 577.44 147 3.93 Total 5146.00 150 Corrected total 1027.36 149 Source p (a) Partial [Eta.sup.2] Corrected model (b) .00 .44 Intercept .00 .88 Group .00 .44 Error Total Corrected total ES Computations [R.sup.2] [R.sup.2] = [SS.sub.regression]/[SS.sub.total] = 449.92/1027.36 = .44 [[Eta].sup.2] [[eta].sup.2] = [SS.sub.between]/[Ss.sub.total] = 449.92/1027.36 = .44 Hays' [[omega].sup.2] = [SS.sub.between] - (v -1) [omega.sup.2] [MS.sub.within]/[SS.sub.total] + [MS.sub.within] = 449.92 -(3 - 1)3.928/1027.36 + 3.928 = 449.92 - 7.856/1031.28 = .43 [Epsilon.sup.2] [[epsilon].sup.2] = [SS.sub.between] - (v - 1) [MS.sub.within]/[SS.sub.total] = 449.92 - (3 - 1)3.928/1027.36 = 449.92 - 7.856/1027.36 = .43 Wherry 1 - [(n - 1)/(n - k 1)] (1 - [R.sup.2]) = 1 - [(150 - 1)/(150 - 1 - 1)] (1 - .44) = 1 - (1.01)(.56) = .43 Herzberg [R.sup.2] = 1 - [(n - 1)/(n - k - 1)][n + k + 1)/n] (1 - [R.sup.2]) = 1 - [(150 - 1)/(150 - 1 - 1)][150 + 1 + 1)/150] (1 - .44) = 1 - (1.01)(1.01)(.56) = .43 Lord 1 - (1 - [R.sup.2]) [n + k + 1)/(n - k - 1)] = 1 - (1 - .44) [(150 + 1 + 1)/(150 - 1 - 1)] = 1 - (.56)(1.03) = .42 (a) Computed using alpha = .05. (b) [R.sup.2] = .44 (adjusted [R.sup.2] = .43).
|Printer friendly Cite/link Email Feedback|
|Author:||Stroh, Heather R.|
|Publication:||Professional School Counseling|
|Date:||Jun 1, 2006|
|Previous Article:||Recent innovations in small-N designs for research and practice in professional school counseling.|
|Next Article:||The ASCA National School Counseling Research Center: a brief history and agenda.|