Being an Informed Consumer of Quantitative Educational Research.
ALTHOUGH WE like to think of the United States as the land of equal opportunity, we recognize that not all children receive the same quality of education or exhibit the same level of academic achievement -- despite the fact that public schooling is available to all students. Research has found that a passing grade in one school may be a failing grade in another, depending on several factors, including the socioeconomic status of the student, the location of the school, and the demographic characteristics of the school's students.1 As long as such disparities remain, we will continue to invest substantial resources in the search for "what works" in educating our students. This question is at the very core of some of the research evaluating school reform programs, but it is not being asked by the people who need answers the most: the consumers of those programs.
Exploring "what works" involves identifying the programs and practices that provide the clearest evidence of positive, significant effects on school and student performance. Researchers have at various times recommended homogenous grouping, heterogeneous grouping, drills, no drills, mastery teaching, mastery learning, remediation, and early intervention. The list that provides often contradictory answers to "what works" already seems infinite. Yet we continue to pour resources into the search for successful approaches.
Although we should continuously strive to find innovative ways to improve our education systems, we should not ignore evidence of the effectiveness of existing programs. Rather we should foster effective communication between researchers, teachers, administrators, policy makers, and other stakeholders. As a consequence of the communication gap between educational researchers and practitioners, the "why" of turning positive research results into practice is often missed by those who most need to understand it -- teachers and administrators. This article strives to bridge this gap between researchers and practitioners by exploring how to become an informed consumer of educational research.
Over the years it has become clear that there is no panacea for what ails our education system. Before we can determine what works, when, and where, we must understand the conditions that individual schools or districts are facing. The challenges affecting troubled schools are numerous and varied, and solutions must be tailored to meet specific needs.2
A challenge for both researchers and practitioners is that a solution that effectively addresses one problem in one setting may not necessarily prove to be successful in another. For example, a program that is successful in improving the reading scores of School A may have no effect on the reading scores of School B because of differences between the schools or their students. With this knowledge, is it possible to ensure that the results obtained in pilot studies will also be obtained when programs are implemented in different school settings?
Being an informed consumer of educational research requires an understanding of a district's or school's capacity, needs, and goals and, more important, an understanding of the barriers to meeting these needs and attaining these goals. Being an informed consumer also requires an understanding of the components of solutions that may address specific needs. For example, if a school's third- and fourth- grade students exhibit weak reading skills, then that school would need to find a program that is designed to improve the reading of students in these grades and that provides the training, materials, and technical assistance needed for success. In addition, the school should select a program that has demonstrated effectiveness with students with similar needs and in similar contexts. Schools or districts should consider such factors as the turnover rate of their teachers and students, their students' socioeconomic status and other demographic characteristics, and the motivation and capacity of their staff, administration, and students to effectively implement reform models. These are just some of the criteria that informed consumers of educational research must take into account.
After diagnosing a district's or school's needs, it is important to research existing programs to determine if they have been widely disseminated and effectively replicated and, most important, if they will fit the identified needs. Schools and districts must understand that they are not required to adopt the first program presented to them and that simply because a program can provide evidence of effectiveness or has been adopted by thousands of other schools does not necessarily mean that it is suited to their own specific needs. Instead of making hasty, uninformed decisions, consumers of educational research must closely examine a variety of programs and select the one most likely to help them achieve their goals in their schools.
Being an informed consumer is especially important in the field of education because it is the consumers who either make or break a program. In other words, if the teachers who are to use a program are not completely informed as to how its components specifically address their needs, it is unlikely that they will fully implement the program, and thus they may limit its potential benefits for the students.3
FACTORS TO CONSIDER AS AN INFORMED CONSUMER OF EDUCATIONAL RESEARCH
In 1997, history was made in U.S. education policy with the passage of a bipartisan piece of legislation titled the Comprehensive School Reform Demonstration (CSRD) Program, commonly referred to as Obey- Porter, after the co-sponsors of the bill, Rep. David Obey (D-Wis.) and Rep. John Porter (R-Ill.). CSRD aimed to improve the educational opportunities of all of the nation's children by providing participating schools with at least three years of funding to implement certain research-based programs. The legislation created an opportunity for teachers, researchers, legislators, administrators, policy makers, school board members, parents, and other education stakeholders to learn from one another about the goals, needs, problems, and outcomes related to school reform efforts. In addition, CSRD provided funding to reach the goals, meet the needs, solve the problems, eliminate the obstacles to success, and eventually achieve the outcomes desired and promised. The No Child Left Behind (NCLB) Act, which was signed into law on 8 January 2002, takes an even firmer stance, requiring schools to adopt only reform programs that are "scientifically based" and expressing a preference for research based on randomization, random assignment, or random selection.
Under the auspices of CSRD, many programs tout evidence of their evaluation. This is not enough -- consumers need evidence of effectiveness, and not all programs that have been evaluated have demonstrated success. Evidence of effectiveness should show not only that a program has been successful but also that the success is strictly a result of the program, not extraneous factors. Evidence of effectiveness is defined as including replicability, wide dissemination, rigorous evaluation designs, fair measures, and improvement in student academic outcomes, such as scores on externally developed standardized tests.4 In addition to showing that a program contributed to the desired outcomes, evidence of its effectiveness should be presented by means of reported effect sizes.
CATEGORIES OF REFORM PROGRAMS
There are basically three different types of comprehensive school reform programs, each with features that consumers should consider when choosing and implementing them.
Organizational programs. These programs focus on the organizational and administrative needs of the school rather than directly address academic achievement in specific subject areas. While they may not include academic curricula, they often restructure the organizational makeup of the school and improve the school climate through goal setting and similar activities. Such programs include Accelerated Schools, Consistency Management and Cooperative Discipline (CMCD), and the School Development Program (SDP). (See the sidebar on page 536 for information on these and other programs mentioned in this article.) These programs are based in part on the finding that schools are sometimes able to improve the academic achievement of their students by restructuring the school organization.5 For example, Accelerated Schools and SDP have provided some evidence that students' academic achievement has improved as a result of the implementation of their programs.6
Schoolwide reform programs with curricula. These programs are typically designed to increase student academic achievement in specific curricular areas, and they provide level- and grade-specific curriculum and testing materials, training sessions and manuals, and many other forms of technical assistance. Because of their academic focus, these programs tend to be more structured than the broader organizational programs. They most often cover basic curriculum areas such as reading and math, but also may address subjects such as science and social studies. Some examples of curricular programs include Direct Instruction, Core Knowledge, Different Ways of Knowing, Open Court, Move It Math, and the University of Chicago Mathematics Program. In some cases a school may implement two or more different curricular programs concurrently in order to address all of its academic needs.
Combinations of organizational and curriculum-specific programs. The third type of schoolwide reform, which combines several programs or program types, is appropriate for schools that may need to make changes to the overall school environment in order to be able to implement programs that will improve student achievement in specific subject areas. Some schools in the Houston Independent School District chose this approach, initially implementing CMCD, an organizational and discipline program, and then, after the schools had developed the necessary capacity, adding a mathematics program that helped improve students' mathematics scores.7
Regardless of the type of reform that a district or school adopts, it is necessary to have a clear understanding of the research behind programs' claims of success in order to make an informed decision about which program best meets the site's needs. Specifically, to be an informed consumer of educational research, one must be aware of how various programs are implemented and evaluated. This requires an understanding of designs, assessment measures, how to interpret results, limitations, and targeting. The next section will discuss these aspects in detail.
THE RESEARCH DESIGN
In order to ensure that results are replicable and that a reform will succeed in various school settings, a program's evaluation must be rigorous and robust. In this case rigor refers to the extent to which the evaluation adhered to strict standards of research methodology. Robustness refers to the degree to which changes in outcomes can be explained by the program rather than by other factors, such as student or school characteristics. The ideal evaluation is one that uses a true experimental design: students or schools are randomly assigned to treatment groups (those receiving the program) and control groups (those included for comparison purposes) and both groups are tested with identical assessment instruments before and after the treatment. This ideal design is not always possible, because of financial, time, ethical, and other constraints. An alternative is to use a quasi- experimental design in which students are assigned to a control or treatment group and are matched on characteristics such as race, gender, socioeconomic status, grade, standardized test scores, English- language proficiency, and other relevant variables. The idea behind this design is to match the two groups on these different characteristics in order to isolate the effects of the program on student outcomes.
MEASURES USED TO ASSESS ACADEMIC ACHIEVEMENT
In addition to determining the quality of the research design, consumers need to examine the measures used to assess changes in student academic achievement. "Fair measures" are defined as those that are externally developed, independent, reliable, and valid.8 Standardized tests are considered to be the most objective measures of student achievement and are the most frequently used means by which states and districts hold schools accountable. Thus a program's ability to repeatedly improve its participants' scores on standardized tests that meet the criteria of fair measures is an indication of the program's effectiveness and replicability.
In order to avoid biased results, it is important for measures to be based on an objective set of goals rather than on treatment-specific goals determined by a program's creators. This is more likely to be the case when measures are developed externally.
It is also important to recognize that standardized tests often vary in their level of difficulty. Thus the tests used in a program evaluation must be of the same level of difficulty as those administered in schools to produce reliable results.
Once it has been established that an evaluation is based on fair measures, consumers must address the availability of pre- and posttest information. It is important to have access to baseline test scores of the two groups to ensure that they were not significantly different prior to implementation of the treatment, especially if students were not randomly assigned to groups. It is equally important that the program present posttest information that is based on similar or identical measures, so that any differences between pre- and posttest scores can be attributed to the program and not to differences in the tests themselves.
Some program evaluations present only posttest information that shows that at the end of the treatment one group may have outperformed the other. While this information may be correct, it does not tell us whether or not the difference is due to the treatment or to some other factor, and so the data are neither conclusive nor generalizable. Pretest information is crucial to establishing the degree of similarity between the two groups at the onset of the project.
THE CHARACTERISTICS OF RESEARCH SUBJECTS
In order to determine whether results are generalizable to other settings, it is also important to know the characteristics of the schools and students participating in program evaluations. Consumers of educational research should seek to determine whether the program's pilot sites experienced challenges similar to those faced by their own schools, including teacher turnover, student turnover or attrition, and high student absences, as well as whether the pilot sites had similar racial, gender, and socioeconomic makeup. Consumers of research should ask researchers how and why they selected their populations and what factors make the results found at the pilot sites generalizable to other schools.
Even if programs have presented evidence of effectiveness that can be generalized beyond the original pilot sites, more questions remain. This section explains the differences between gain scores, standardized test scores, and different types of effect sizes.
Gain scores. A gain score measures just what it implies -- the difference between pretest and posttest scores. It is a single number that does not tell us much if there is nothing to which we can compare the results. For example, a program may claim to have large gain scores, but those scores are meaningful only in comparison to the gain scores of control groups. Information about pre- and posttest scores and gain scores for both groups tells us where the groups started, what progress they have made, and whether there are significant differences between the two groups' achievement that may be ascribed to the program treatment. While these scores provide consumers and researchers with important information about the achievement of the two groups, they are only one part of the eventual interpretation of the data.
Norm-referenced standardized test scores. Even when it has been established that the two groups were equal prior to the program treatment, it is still difficult to compare one gain score to another - - and be impressed -- without knowing what the actual test scores were. Norm-referenced standardized test scores help us to measure the progress of both groups in terms of the program and in comparison to other students in the nation.
Many schools adopt new schoolwide reform programs in an attempt to improve their average test scores to at least the 50th percentile -- meaning that half of the nation's students scored higher and half lower than those in the adopting school. Norm-referenced standardized test scores provide us with this information. This does not mean that all new schoolwide reform efforts should be expected to improve test scores to the 50th percentile immediately. However, school reformers should aim to reach this level over a reasonable period and so should evaluate programs based in part on their rate of success at raising test scores to at least the 50th percentile.
When using test scores to evaluate school reform program outcomes, program developers and consumers must understand the strengths and limitations of specific programs. Some remedial reform programs, for example, are designed to improve student performance to functional, but not necessarily competent, levels. With this information, consumers may plan to begin their reform efforts with a remedial program and then replace it with a second curricular program once students' test scores demonstrate that they have achieved functional levels.
Normal-curve equivalents. It is well known that there are various standardized tests that have different means and standard deviations, meaning that a particular score on one test does not necessarily translate into the same score on another. It is possible to standardize the various test scores by turning each of the scores into normal-curve equivalents (NCEs). With this technique, the results of different tests become uniformly interpretable. As a rule, NCEs have a mean of 50 and a standard deviation of 21. In addition, if a program presents gain scores alone as evidence of its effectiveness, these can be compared to the average NCE expected gain of 8. So if a program reports NCE gains of 9 or more, consumers can consider this evidence of effectiveness.
Statistical significance. It is quite possible to find differences between the test scores of two groups of test-takers, but what is important is whether that difference is statistically significant. Tests for statistical significance ask, "To what extent are the differences a result of chance and not the program that was implemented?" Statisticians use levels of significance -- the most common being .05 and .01 -- to distinguish between the two possibilities. These significance levels are expressed as p<.05 or p<.01 and mean that the probability that the differences between the test scores of two groups occurred by chance is less than five in one hundred or one in one hundred respectively. In other words, you can be fairly confident that the differences occurred as a result of the program treatment.
Effect sizes. The final step in the search for evidence of program effectiveness involves computing effect sizes. An effect size measures the magnitude of the differences in the scores between the two groups involved. An effect size is:
a standard means of expressing achievement gains and losses across studies, showing differences between experimental and control groups in terms of standard deviation. An effect size of +1.00 indicates that the experimental group outperformed the control group by one full standard deviation. To give a sense of scale, this would be equivalent to an increase of 100 points on the SAT scale, two stanines, 21 NCEs (normal- curve equivalent ranks) or 15 points of IQ -- enough to move a student from the 20th percentile (the normal level of performance for children in poverty) to above the 50th percentile (in range with mainstream America). Because of differences among study designs and assessments, this can only be considered a "rough" measure of comparison.9
In statistics, +.30 would be considered a small effect size, +.50 a medium effect size, and +.80 a large effect size. While statistical analyses may determine that the differences between two or more groups' test scores are significant, effect sizes tell us the magnitude of the differences.
FACTORS CONTRIBUTING TO PROGRAM EFFECTIVENESS
We have now seen that, for a program to show effectiveness, it must have comparison groups, use fair measures, show significant differences in test scores between the groups, and be able to produce solid effect sizes. Several factors contribute to the likelihood that a program will be able to meet these criteria.
Clear goals and regular assessment. Effective programs are designed to achieve a set of clearly defined goals. For example, some programs, such as Exemplary Center for Reading Instruction (ECRI) or Project SEED, focus on one curricular area and provide specific services within that area. Effective programs are those that develop specialized, research-based methods to achieve their goals and assess students' progress on a regular basis. For instance, Direct Instruction conducts assessments at the end of each unit to inform teachers whether or not their students are meeting program expectations, and Success For All administers externally developed assessments every eight weeks.
Curricula, materials, and implementation. A program is defined by its objectives -- for example, improving the reading, math, or science skills of students. However, even a clearly specified program may not succeed if it is not implemented correctly. To avoid this, program developers create manuals, books, and curricular materials to be used in the classrooms. While some consumers may feel "constrained" by a program's level of detail and some programs may be perceived as too scripted, the programs that researchers have developed depend on the correct use of the curricula, materials, and teaching methods that have shown evidence of effectiveness. Program developers are accountable to their funders and consumers to produce results based upon what they prescribe. If schools do not implement the program correctly and thus do not achieve the optimal results, then both the schools and the program developers could ultimately lose funding. Therefore, consumers must understand that if they choose to adopt "scripted" programs, they must implement the programs thoroughly and willingly in order to yield the expected results.
Professional development. Training and technical assistance are two essential components of schoolwide reform programs. Given the specificity of the materials, curricula, and methods, programs generally require extensive professional development initially and follow-up training at regular intervals. Effective and successful professional development rarely occurs in a one-day workshop. In fact, well-developed professional development programs often last a week or longer and may include peer coaching, teacher learning communities, expert coaching, role-playing, and modeling. Follow-up training is also important because, as with many new and innovative endeavors, problems often arise after a program is up and running. Because schools have unique conditions, program developers may need to modify their training slightly to meet schools' specific needs. Effective programs realize this and are willing to make appropriate modifications to their professional development offerings as long as the changes do not alter the treatment provided to students. In addition to site trainings, many programs conduct annual meetings, regional training sessions, and other training and technical assistance events.
Support. Support for a program means that the stakeholders -- including district and school administrators, school staffs, and parents -- are ready to commit to the program. In order to garner support for reform, stakeholders should be involved in the selection process from the beginning. The principal should involve the staff in identifying the source of the school's problems and should consult with staff members, parents, and, in some cases, community organizations about the needs of the school. Stakeholders, in turn, must recognize their obligation to contribute to the changes that may be required to improve the school. Once a reform program is adopted, all parties must become active participants in the change process and commit themselves to implementing the program with the informed belief that they can and will achieve positive outcomes.
Most programs require the support of 80% of the staff prior to entering a school. An important question to ask is "What constitutes an 80% vote?" The most common and most effective method of determining support for a program is to survey all of the teachers by secret ballot and consider only programs that receive a thumbs up from 80% of them. There are other, less effective, methods of ensuring support that should raise a red flag with consumers and program developers, as they can readily contribute to the failure of a program. One method involves school reconstitution, wherein a state government takes over the administration of a school and mandates that teachers who remain implement a new reform program. Another method involves terminating the teachers and rehiring them only if they agree to implement the desired reform model. While these tactics will most likely yield high- percentage votes, it is unlikely that staff members who supported a reform program out of fear of job loss will implement it effectively.
Consumers should ask researchers whether and how different voting methods have affected program results at different sites. They should then compare the various results to their desired outcomes. This approach may prove useful if, for example, a district or a state is requiring a school to support and implement a program without involving stakeholders in the program selection process. Knowing the results from schools that have been in similar situations may not affect the ultimate selection decision, but such information can alert the newly adopting school to potential pitfalls in program implementation.
Many people become educators because they truly desire to make a positive change in the lives of children. While their intention is noble, educators must realize that if the strategies they use to create that change are not effective, then they must either do something new or consider changing schools or exiting the profession. Educators desire to effect change, but they must also be open to being affected by change. All must become informed consumers of educational research, for the research exists to help teachers, educators, community members, administrators, legislators, and -- most important -- the children.
1. National Education Longitudinal Study of 1988 (Washington, D.C.: National Center for Education Statistics, U.S. Department of Education, 1988); and What Do Grades Mean? Differences Across Schools (Washington, D.C.: Office of Educational Research and Improvement, U.S. Department of Education, 1994).
2. Sam Stringfield, Mary Ann Millsap, and Rebecca Herman, Special Strategies for Educating Disadvantaged Children: Results and Policy Implications (Washington, D.C.: U.S. Department of Education, 1997); Rebecca Herman et al., An Educators' Guide to Schoolwide Reform (Arlington, Va.: Educational Research Service, 1999); and Olatokunbo S. Fashola and Robert E. Slavin, "Schoolwide Reform Programs: What Works?," Phi Delta Kappan, January 1998, pp. 370-79.
3. Robert Cooper, Socio-Cultural and Within-School Factors That Affect the Quality of Implementation of Schoolwide Programs (Baltimore: Center for Research on the Education of Students Placed At Risk, Johns Hopkins University and Howard University, Technical Report No. 28, 1998).
4. Olatokunbo S. Fashola and Robert E. Slavin, "Promising Programs for Elementary and Middle Schools: Evidence of Effectiveness and Replicability," Journal of Education for Students Placed at Risk, vol. 2, 1997, pp. 251-307; Olatokunbo S. Fashola, "Implementing Effective After-School Programs," Here's How, National Association of Elementary School Principals, March 1999, pp. 1-4; and Olatokunbo S. Fashola, Building Effective After-School Programs (Thousand Oaks, Calif.: Corwin Press, 2001).
5. James P. Comer, "Educating Poor Minority Children," Scientific American, November 1988, pp. 42-48; and James P. Comer et al., Rallying the Whole Village: The Comer Process for Reforming Education (New York: Teachers College Press, 1996).
6. Comer, op. cit.; and Stephanie L. Knight and Jane A. Stallings, "The Implementation of the Accelerated School Model in an Urban Elementary School," in Richard L. Allington and Sean A. Walmsley, eds., No Quick Fix: Rethinking Literacy Programs in America's Elementary Schools (New York: Teachers College Press, 1995), pp. 236-52.
7. H. Jerome Freiberg, Consistency Management and Cooperative Discipline: A Sample Design (Houston: University of Houston, 1996).
8. Robert E. Slavin and Olatokunbo S. Fashola, Show Me the Evidence: Proven and Promising Programs for America's Schools (Thousand Oaks, Calif.: Corwin Press, 1998); and Fashola and Slavin, "Promising Programs for Elementary and Middle Schools."
9. Fashola and Slavin, "Promising Programs for Elementary and Middle Schools," p. 256.
OLATOKUNBO S. FASHOLA is a research scientist at the Center for Social Organization of Schools, Johns Hopkins University, Baltimore, and a senior research scientist at the American Institutes for Research, Washington, D.C.
Selected Comprehensive School Reform Programs
National Center for the Accelerated Schools Project
Stanford, CA 94305-3084
415/725-7158 or 415/725-1676
Consistency Management and
Cooperative Discipline (CMCD)
H. Jerome Freiberg
University of Houston
College of Education
Houston, TX 77204-5872
E. D. Hirsch
Core Knowledge Foundation
2012-B Morton Dr.
Charlottesville, VA 22903
Association for Direct Instruction
P.O. Box 10252
Eugene, OR 97440
School Development Program
Child Study Center
School Development Program
230 South Frontage Rd.
P.O. Box 20790
New Haven, CT 06520-7900
Success For All
Robert E. Slavin
Center for Social Organization of Schools
The Johns Hopkins University
3003 North Charles St.
Baltimore, MD 21218
Different Ways of Knowing
The Galef Institute
11050 Santa Monica Blvd., 3rd Floor
Los Angeles, CA 90025-3594
|Printer friendly Cite/link Email Feedback|
|Author:||Fashola, Olatokunbo S.|
|Publication:||Phi Delta Kappan|
|Date:||Mar 1, 2004|
|Previous Article:||Craft Knowledge: The Road to Transforming Schools.|
|Next Article:||Learning from Leadership Work: Maine Pioneers a School Leadership Network.|