Printer Friendly
The Free Library
14,695,408 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

High-stakes research: the campaign against accountability has brought forth a tide of negative anecdotes and deeply flawed research. Solid analysis reveals a brighter picture. (Feature).


"MAKE-OR-BREAK EXAMS GROW, BUT BIG Study Doubts Value" intoned in·tone  
v. in·toned, in·ton·ing, in·tones

v.tr.
1. To recite in a singing tone.

2. To utter in a monotone.

v.intr.
1.
 a front-page New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
 Times headline in December 2002. The article continued, "Rigorous testing that decides whether students graduate, teachers win bonuses, and schools are shuttered shut·ter  
n.
1. One that shuts, as:
a. A hinged cover or screen for a window, usually fitted with louvers.

b.
 does little to improve achievement and may actually worsen wors·en  
tr. & intr.v. wors·ened, wors·en·ing, wors·ens
To make or become worse.


worsen
Verb

to make or become worse

worsening adjn
 academic performance and dropout (1) On magnetic media, a bit that has lost its strength due to a surface defect or recording malfunction. If the bit is in an audio or video file, it might be detected by the error correction circuitry and either corrected or not, but if not, it is often not noticed by the human  rates, according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 the largest study ever on the issue:" Thus a deeply flawed flaw 1  
n.
1. An imperfection, often concealed, that impairs soundness: a flaw in the crystal that caused it to shatter. See Synonyms at blemish.

2.
 study was catapulted to national prominence. More important, its conclusions were opposite those found through rigorous scientific studies.

The report in question, authored by Arizona State University Arizona State University, at Tempe; coeducational; opened 1886 as a normal school, became 1925 Tempe State Teachers College, renamed 1945 Arizona State College at Tempe. Its present name was adopted in 1958.  researchers Audrey Amrein and David Berliner David C. Berliner is an educational psychologist and professor of education at Arizona State University.

Berliner received a Doctorate of Education from Stanford University.
, purported pur·port·ed  
adj.
Assumed to be such; supposed: the purported author of the story.



pur·ported·ly adv.
 to examine student-performance trends on national exams in states where legislators have attached "high stakes High Stakes is a British sitcom starring Richard Wilson that aired in 2001. It was written by Tony Sarchet. The second series remains unaired after the first received a poor reception. " to test scores. High-stakes testing A high-stakes test is an assessment which has important consequences for the test taker. If the examinee passes the test, then the examinee may receive significant benefits, such as a high school diploma or a license to practice law.  has become a lightning rod lightning rod, a rod made of materials, especially metals, that are good conductors of electricity, which is mounted on top of a building or other structure and attached to the ground by a cable.  as more and more states adopt accountability measures in response to the mandates of the federal No Child Left Behind Act The No Child Left Behind Act of 2001 (Public Law 107-110), commonly known as NCLB (IPA: /ˈnɪkəlbiː/), is a United States federal law that was passed in the House of Representatives on May 23, 2001 . While it is crucial to analyze and debate the wisdom of such policies, the discussion must be informed by evidence of the highest quality. The controversial nature of high-stakes testing has led to the hurried hur·ried  
adj.
1.
a. Moving or acting rapidly.

b. Required to move or act more rapidly; rushed.

2. Done in great haste: a hurried tour.
 release and dissemination dissemination Medtalk The spread of a pernicious process–eg, CA, acute infection Oncology Metastasis, see there  of research that lacks scientific rigor rigor /rig·or/ (rig´er) [L.] chill; rigidity.

rigor mor´tis  the stiffening of a dead body accompanying depletion of adenosine triphosphate in the muscle fibers.
, of which the Amrein and Berliner study is one of the more egregious e·gre·gious  
adj.
Conspicuously bad or offensive. See Synonyms at flagrant.



[From Latin
 examples.

This says much about the standards for research in education today. The situation is so contentious that in 2000 the National Research Council found it necessary to convene CONVENE, civil law. This is a technical term, signifying to bring an action.  a panel to decide which scientific principles should apply to educational research-the kind of question that other fields of social science settled long ago. In the case at hand, Amrein and Berliner trumpet trumpet, brass wind musical instrument of part cylindrical, part conical bore, in the shape of a flattened loop and having three piston valves to regulate the pitch.  the fact that their report was reviewed by a panel of four scholars based at other schools of education, yet this should only be a source of greater concern. Sharing a paper with sympathetic colleagues is no substitute for a system of blind peer review- a bedrock principle of scientific research.

Here we closely examine Amrein and Berliner's underlying data and methodology. Our results are astonishing a·ston·ish  
tr.v. as·ton·ished, as·ton·ish·ing, as·ton·ish·es
To fill with sudden wonder or amazement. See Synonyms at surprise.
: if basic statistical techniques are applied to their data, it reverses nearly every one of their conclusions. Later we also present the results of separate research on accountability that we conducted for a June 2002 Federal Reserve Bank of Boston The Federal Reserve Bank of Boston is responsible for the First District of the Federal Reserve, which covers Connecticut (excluding Fairfield County), Massachusetts, Maine, New Hampshire, Rhode Island and Vermont. It is headquartered in Boston, Massachusetts.  conference. Rigorous analysis reveals that accountability policies have had a positive impact on test scores during the past decade.

The Unscientific unscientific Unproven, see there  Method

Amrein and Berliner identified 28 states where test scores are used to determine various consequences, such as bonuses for teachers, the promotion of students, or allowing children to transfer out of a failing school. These stakes go beyond less controversial accountability measures such as publishing test scores in the newspaper. The states range from Georgia and Minnesota--where the only penalty is experienced by students who fail a high-school graduation exam--to North Carolina North Carolina, state in the SE United States. It is bordered by the Atlantic Ocean (E), South Carolina and Georgia (S), Tennessee (W), and Virginia (N). Facts and Figures


Area, 52,586 sq mi (136,198 sq km). Pop.
 and Texas, where the authors found a total of six stakes each, stakes that affect both schools and students.

Once Amrein and Berliner identified the high-stakes states, they looked at changes in the average scores students earned on the National Assessment of Educational Progress The National Assessment of Educational Progress (NAEP), also known as "the Nation's Report Card," is the only nationally representative and continuing assessment of what America's students know and can do in various subject areas.  (NAEP NAEP National Assessment of Educational Progress
NAEP National Association of Environmental Professionals
NAEP National Association of Educational Progress
NAEP National Agricultural Extension Policy
NAEP Native American Employment Program
). Choosing this test as a basis for considering the impact of high-stakes tests on students in the 4th and 8th grades (ages 9 and 13, respectively) is a sensible idea, because the validity and reliability of NAEP, often called the "nation's report card," are well accepted. It is a test for which students cannot easily be prepped and, since the performance of individual school districts, schools, or students is not reported, there is little incentive to cheat or even to prepare for the test. It also provides a neutral standard for assessing the effects of state policies. But if the Arizona State teams decision to look at NAEP scores was correct, less can be said for their other analytical choices.

Amrein and Berliner's basic strategy was to look at how each high-stakes state's scores changed with the introduction of accountability and to compare this with the national trend. If the state's gains exceeded the national gains, they deemed that an increase in scores. If the stare's gains trailed the national gains, they deemed that a decrease. But whenever the rate at which students were excluded from the NAEP because of a disability or lack of language proficiency Language proficiency or linguistic proficiency is the ability of an individual to speak or perform in an acquired language. As theories vary among pedagogues as to what constitutes proficiency[1], there is little consistency as to how different organisations  moved in the same direction as that state's NAEP scores (in other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, an increase in test scores coupled with an increase in test exclusions), Amrein and Berliner declared the results contaminated contaminated,
v 1. made radioactive by the addition of small quantities of radioactive material.
2. made contaminated by adding infective or radiographic materials.
3. an infective surface or object.
 and simply tossed our the state as inconclusive INCONCLUSIVE. What does not put an end to a thing. Inconclusive presumptions are those which may be overcome by opposing proof; for example, the law presumes that he who possesses personal property is the owner of it, but evidence is allowed to contradict this presumption, and show who is . (At least that is what they claimed to do; in fact, they applied the rule inconsistently.)

As a result, their conclusions are based on only a fraction of the high-stakes states. For instance, they recorded positive or negative results on the NAEP 4th-grade math test for just 12 of the 26 states with stakes for K-8 students (as noted earlier, two of the states, Georgia and Minnesota had only a high-school graduation exam and thus were not used for this analysis). Amrein and Berliner found that 4th-grade math scores increased at a slower rate than the national average in 8 of the 12 stares, faster in just 4. Yet they write this up in a highly misleading fashion, claiming that "67 percent of the states posted overall decreases in NAEP math grade 4 performance as compared to the nation after high-stakes tests were implemented." Actually, Amrein and Berliner witnessed gains slower than the national average in just 8 of 26 high-stakes states, or 31 percent.

Amrein and Berliner's misleading reporting practices took on new importance when the media dutifully du·ti·ful  
adj.
1. Careful to fulfill obligations.

2. Expressing or filled with a sense of obligation.



du
 broadcast their results as they were written. Consider the article in Education Week, which reported, "Movement in elementary-school reading scores was evenly split--better than the national average in half the states, worse in the other half:' In fact, Berliner and Amrein based their conclusions in 4th-grade reading on just ten states, five of which they recorded as gaining against the national average, five of which as losing. So less than a fifth of the high-stakes states saw decreases against the national average in reading, not "half." At the 8th-grade level in math, Amrein and Berliner were able to look at only eight states, five of which gained against the national average, three of which lost. Here, again, Amrein and Berliner wrongly reported this "63 percent of the states posted increases in NAEP math grade 8 performance as compared to the nation after highstakes tests were implemented."

All of this ignores the truly fatal flaw of Amrein and Berliner's methods: their point of comparison. If one wants to assess the effect of high-stakes testing, the obvious comparison is between states that adopted accountability systems and those that did not. Amrein and Berliner's decision instead to compare the gains in high-stakes states with the national average violates a basic principle of social-science research, The national gain on NAEP incorporates any gains in high-stakes states, so Amrein and Berliner's strategy is akin to a medical trial where the treatment group receives the full dose of a medication while the control group receives a half-dose. It would not be surprising to find that the full dose was not dramatically more effective. The real question is whether the full dose is more effective than no medication at all.

On Their Terms

Amrein and Berliner concluded, as announced in their press release, "High-stakes tests may inhibit the academic achievement of students, not foster their academic growth." Let's take a look at their evidence in more detail.

Before doing so, however, we need to be clear: we are not in any way endorsing Amrein and Berliner's analytical approach. We return below to discuss the results from a more scientific study of accountability. But using their approach in a systematic manner will at least reveal the degree to which their decisions about what information to include and to exclude distorted the facts and thereby confused the debate over accountability.

An initial problem with their analysis is that Amrein and Berliner disregarded the magnitude of any changes in test scores. By simply listing the results as "Increase," "Decrease," or "Unclear" (in cases where exclusion rates rose), Amrein and Berliner discarded dis·card  
v. dis·card·ed, dis·card·ing, dis·cards

v.tr.
1. To throw away; reject.

2.
a. To throw out (a playing card) from one's hand.

b.
 rich information. They converted useful continuous data (test scores) into hollow binary data binary data - binary file  (test scores went up or down). In a purely hypothetical example, say six of the high-stakes states gained 20 percent, while the other 20 gained 2 percent each and the no-accountability states made no gains whatsoever--yielding a national average gain of 3 percent. Amrein and Berliner's approach would supposedly demonstrate the failure of accountability: just six states beat the national average, while 20 were below the average. In fact, ignoring any complications from test exclusions, Amrein and Berliner would report this as something like,,, Just 23 percent of states posted gains on NAEP higher than the national average after high stakes were introduce" The right approach is to compare the average gains of high-stakes states with those of no-accountability states.

When this is done, the analysis yields starkly different results than Amrein and Berliner report. Table 1 compares the math gains among 4th and 8th graders in the same way as Amrein and Berliner--by following different cohorts as they reach 4th or 8th grade in different years. In other words, they compared the 4th graders of 1996 with the 4th graders of 2000, two completely different cohorts of students. For each of the comparisons, data were available for 34-36 states, 18-20 of which were part of the high-stakes group, due to the varying participation of states in the NAEP testing program. For either the 1992-2000 period or the 1996-2000 period, the average gain in math among high-stakes states noticeably exceeded that of the no-accountability states. The differences in performance were statistically significant at conventional levels, meaning that we can be highly confident that they are not just chance occurrences. (By contrast, Amrein and Berliner did no significance testing whatsoever, neglecting one of the oldest and most basic tools of social-science research.)

Amrein and Berliner might object that we have included states where students were excluded from tests at higher rates after accountability reforms were introduced, possibly contaminating con·tam·i·nate  
tr.v. con·tam·i·nated, con·tam·i·nat·ing, con·tam·i·nates
1. To make impure or unclean by contact or mixture.

2. To expose to or permeate with radioactivity.

adj.
 the results. Amrein and Berliner's solution was just to toss these states out, no matter how small the change in exclusion rate or how large the change in achievement. As Table 1 shows, we instead adjusted the achievement gains for observed changes in exclusion rates. And the results barely changed: high-stakes states still significantly outperformed no-accountability states across the board. In fact, the changes in test-participation rates were not statistically different in high-stakes states from those in other states, indicating that this was not even remotely as influential a factor as Amrein and Berliner declared it to be.

Scientific quality is determined not only by the overall methodology, but also by the care and precision of any measurements. To assess the latter, let's focus on the eight states where Amrein and Berliner concluded that 4th-grade math scores decreased following the introduction of high-stakes testing. Consider Table 2. Three of the eight states--New Mexico, Oklahoma, and West Virginia--adopted high-stakes testing during the 1980s. However, NAEP scores at the state level became available only during the 1990s. For these states, Amrein and Berliner lacked the "before" data for their "before and after" analytical strategy, but went ahead and labeled their scores as "decreasing" anyway. The other five "decreasing" states all experienced greater gains than no-accountability states during the time that they introduced high-stakes testing; New York even beat the national average gain in every time period. And this is the group of states that Amrein and Berliner identify as being harmed by accountability! Not a sing le one provides evidence of harm following the introduction of high-stakes testing.

Even where before-and-after data were available, Amrein and Berliner did not always use the data from the NAEP tests immediately preceding and following the adoption of high stakes. In several cases, they apparently chose an interval that began after the state's accountability system came on-line--an "after-after" comparison. These procedures yielded results that reflected negatively on accountability, but they have no scientific justification. To see this, consider the table on p. 52 and try to think of a consistent rule that justifies Amrein and Berliner's decision to place both Maryland and Missouri in the "decreasing" category.

In short, Amrein and Berliner used scientifically inappropriate methods and applied them in an even shoddier manner. Simply taking Amrein and Berliner's approach and applying it correctly to all of the data on NAEP achievement reverses their conclusions. Again, these simple comparisons are not the best way to examine these questions, but the results of even these crude analyses confirm the findings from the more sophisticated approach we describe below: greater accountability is accompanied by improved student performance.

Amrein and Berliner also used trends on the SAT, the ACT, and Advanced Placement (AP) exams to assess the effectiveness of minimum-competency exams in the 18 states where students must pass such tests in order to graduate from high school. This comparison suffers from all the same problems as the NAEP comparison and more. For example, does anyone believe that nothing else has changed in North Carolina since the introduction of a graduation test in 1980? Amrein and Berliner's simplistic sim·plism  
n.
The tendency to oversimplify an issue or a problem by ignoring complexities or complications.



[French simplisme, from simple, simple, from Old French; see simple
 trend analysis attributes all subsequent changes in graduation rates and dropouts to the introduction of this high-stakes exam. Nonetheless, because these discussions are less directly related to the current state accountability debates and these data are more difficult to interpret than NAEP scores, we do not pursue them.

Results of Rigorous Analysis

Assessing the impact of state accountability systems is clearly complicated. In many states, these systems are quite young in 1996, just ten states had active accountability systems. Moreover, states differ in many ways other than their accountability provisions--ways that can make it difficult to isolate the impact of high-stakes testing. They also change in different ways over time, adopting new accountability provisions and other legislation at different times and being influenced by shifting demographics The attributes of people in a particular geographic area. Used for marketing purposes, population, ethnic origins, religion, spoken language, income and age range are examples of demographic data.  at different rates. This does not make gathering evidence about the effects of accountability impossible. It simply reinforces the need to apply stringent scientific methods to the analysis.

Here we report results from our own analysis of state accountability systems using NAEP data. These results were reviewed at a high-profile conference and were subject to a blind peer review for publication in a Brookings Institution Brookings Institution, at Washington, D.C.; chartered 1927 as a consolidation of the Institute for Government Research (est. 1916), the Institute of Economics (est. 1922), and the Robert S. Brookings Graduate School of Economics and Government (est. 1924).  volume, No Child Left Behind? The Politics and Practice of Accountability, which is slated for release this fall.

NAEP tested 4th graders in mathematics in 1992 and 1996 and 8th graders four years after each of these assessments, in 1996 and 2000. As noted earlier, whereas Amrein and Berliner simply compared the test scores of 4th graders in one year with those of a different set of 4th graders four years later, we measured students' growth in achievement between the 4th and 8th grades. In other words, we compared 4th graders' math achievement in 1996 with their performance four years later, when they were 8th graders. The same exact students were not tested in each grade, but the two samples are at least representative of the same cohort cohort /co·hort/ (ko´hort)
1. in epidemiology, a group of individuals sharing a common characteristic and observed over time in the group.

2.
 of students. We also adjusted the data to account for changes in state spending on education and for parents' educational levels, which provides controls for simultaneous changes in state policies or differences in demographics that might confound con·found  
tr.v. con·found·ed, con·found·ing, con·founds
1. To cause to become confused or perplexed. See Synonyms at puzzle.

2.
 the analysis of how accountability systems influenced student achievement. Amrein and Berliner used no statistical controls at all.

Our analysis focuses on state testing and accountability systems that impose consequences on schools rather than on students. These are the most relevant policies for evaluating the potential impact of the No Child Left Behind Act. Our statistical analysis includes all states that have relevant NAEP data, and we explicitly allow for the timing of states' introduction of their accountability systems.

Figure 1 summarizes our findings in mathematics. The typical student progressing from grade 4 in 1996 to grade 8 in 2000 in a state with a consequential con·se·quen·tial  
adj.
1. Following as an effect, result, or conclusion; consequent.

2. Having important consequences; significant:
 accountability system could expect to see a 1.6 percent increase in his NAEP proficiency pro·fi·cien·cy  
n. pl. pro·fi·cien·cies
The state or quality of being proficient; competence.

Noun 1. proficiency - the quality of having great facility and competence
 score (calibrated cal·i·brate  
tr.v. cal·i·brat·ed, cal·i·brat·ing, cal·i·brates
1. To check, adjust, or determine by comparison with a standard (the graduations of a quantitative measuring instrument):
 to the appropriate learning standards Learning Standards is a term used to describe standards applied to education content, particularly in the US K-12 space.

The Learning Standards themselves can can be found on the individual web sites for states [1]
 for each grade). By contrast, the typical student in a state with no accountability system could expect to experience only a 0.7 percent gain in mathematics proficiency, a statistically significant difference. Students in states with "report card" systems, where scores are publicly reported but no consequences are attached to performance, fell in the middle: they could expect to gain 1.2 percent in achievement between grades 4 and 8, over and above what they would normally learn from grade to grade. In short, states with high-stakes and even low-stakes systems for schools performed significantly better on NAEP than states with no stakes at all.

We are not the only ones reporting positive effects of accountability. In a forthcoming paper, Stanford University Stanford University, at Stanford, Calif.; coeducational; chartered 1885, opened 1891 as Leland Stanford Junior Univ. (still the legal name). The original campus was designed by Frederick Law Olmsted. David Starr Jordan was its first president.  economists Martin Carnoy and Susanna Loeb conducted a similar analysis but expanded it to include testing policies that impose high stakes on students. They found that NAEP performance increased in states with high-stakes systems compared with states that had not yet attached consequences to schools' test scores. Carnoy and Loeb also investigated the impact of accountability on student retention and high-school graduation rates and demonstrated that there is no discernible dis·cern·i·ble  
adj.
Perceptible, as by the faculty of vision or the intellect. See Synonyms at perceptible.



dis·cerni·bly adv.
 negative effect on either outcome.

Other rigorous studies have been carried out of accountability systems within states and school districts. As opposed to the Amrein-Berliner study, they have been vetted at scientific conferences and are being peer reviewed according to normal scientific practice. The Brookings Institution volume is one example. The accumulated literature generally supports two conclusions. First, student performance on the available measures, usually state tests, improves after accountability reforms are introduced. Second, other short-run changes--such as students' being excluded from taking the tests at greater rates, or explicit cheating--are observed. In other words, some unintended consequences For the "Law of unintended consequences", see Unintended consequence

Unintended Consequences is a novel by author John Ross, first published in 1996 by Accurate Press.
 often tend to accompany the introduction of accountability, although there is little evidence that these influences continue over time.

Schools may exclude low-performing students from taking the test in an attempt to "game" the system--to increase their performance artificially by removing scores that bring down their averages. We looked at differences among the states in terms of their placement rates into special education--often one way to exclude students from state tests--and at whether these differences were related to the introduction of state accountability systems. From 1995 to 2000, the time when many state accountability systems were coming on-line, we found no evidence that special-education placement increased in reaction to the introduction of accountability. Special-education placements did increase nationally, just not in any systematic way suggestive of suggestive of Decision making adjective Referring to a pattern by LM or imaging, that the interpreter associates with a particular–usually malignant lesion. See Aunt Millie approach, Defensive medicine.  a relationship to state accountability.

No Accountability for Research

That a study of such dubious scientific quality could make the front page of the nation's most respected newspaper is disturbing, but perhaps not so unusual. In the contentious environment of K-12 education, the media too often gives attention to findings that are relevant to policy regardless of their scientific merit. This discussion shows that education studies vary so much in their scientific rigor that one cannot just review them based on press releases and the sensationalism sensationalism, in philosophy, the theory that there are no innate ideas and that knowledge is derived solely from the sense data of experience. The idea was discussed by Greek philosophers and is shown variously in the works of Thomas Hobbes, John Locke, George  of the reported results.

Reporters need not be experts in statistical analysis any more than they must be fully versed Versed® Midazolam Pharmacology A preoperative sedative  in biochemistry biochemistry, science concerned chiefly with the chemistry of biological processes; it attempts to utilize the tools and concepts of chemistry, particularly organic and physical chemistry, for elucidation of the living system.  or investment-banking regulations. But when a report is commissioned by an organization like the Great Lakes Great Lakes, group of five freshwater lakes, central North America, creating a natural border between the United States and Canada and forming the largest body of freshwater in the world, with a combined surface area of c.95,000 sq mi (246,050 sq km).  Center for Education Research and Practice, a Midwestern group sponsored by six state affiliates of the National Education Association, it would seem to call for a reasonable dose of skepticism. Why not bring in some outside expertise to review such a report before heralding its arrival? There will definitely be further opportunities for review. After all, the Arizona State shop promises that this is just the first of many annual reports on the impact of high-stakes testing.

The media is not alone. Resources at the state and federal levels must be committed to evaluating the quality of research and disseminating dis·sem·i·nate  
v. dis·sem·i·nat·ed, dis·sem·i·nat·ing, dis·sem·i·nates

v.tr.
1. To scatter widely, as in sowing seed.

2.
 evidence of effective practices to schools and the public. The No Child Left Behind Act's emphasis on research-based practices, the creation of the federal Institute of Education Sciences, and efforts such as the What Works Clearinghouse, which will review and disseminate dis·sem·i·nate  
v. dis·sem·i·nat·ed, dis·sem·i·nat·ing, dis·sem·i·nates

v.tr.
1. To scatter widely, as in sowing seed.

2.
 research findings, are important developments in this regard. State policymakers must also devote resources to evaluating their programs and synthesizing available research. Identifying effective reforms using rigorous evaluative techniques is a crucial task, especially since improving the education system is likely to have a greater economic impact than any of the medical breakthroughs of the past decade.

We also do not mean to suggest that the book has been closed on accountability. It appears that high-stakes states performed better than no-accountability states during the 1990s, but there is still much to be learned. For instance, there is uncertainty about the best way to translate test scores into overall school ratings. Also, states have yet to design accountability systems that directly link test-score performance to appropriate incentives. The vast majority of state accountability systems simply report the average scores for each school, sometimes disaggregating by racial and ethnic groups. However, average scores are highly dependent on socioeconomic so·ci·o·ec·o·nom·ic  
adj.
Of or involving both social and economic factors.


socioeconomic
Adjective

of or involving economic and social factors

Adj. 1.
 factors outside the control of schools. States--and researchers--must become adept at discerning dis·cern·ing  
adj.
Exhibiting keen insight and good judgment; perceptive.



dis·cerning·ly adv.
 the components that make up the scores and how they can be influenced by high-stakes regimes. Measuring the gains that students make over time would provide a better measure of school performance and serve as a proper basis for reward or sanction sanction, in law and ethics, any inducement to individuals or groups to follow or refrain from following a particular course of conduct. All societies impose sanctions on their members in order to encourage approved behavior. , but such val ue-added techniques need some work before they can serve as reliable performance measures. There are other issues as well. Nonetheless, the evidence points in the direction of refining refining, any of various processes for separating impurities from crude or semifinished materials. It includes the finer processes of metallurgy, the fractional distillation of petroleum into its commercial products, and the purifying of cane, beet, and maple sugar  accountability systems rather than scrapping them altogether.
Table 1

Rerunning the Amrein-Berliner Data

When the actual test scores in the states Audrey Amrein and David
Berliner identified as "high stakes" are compared with those in states
without accountability systems, the high-stakes states show much more
improvement.

                                    Increase in NAEP
                                    4th-grade math scores

                                    1992-2000              1996-2000

High-stakes states                  9.2                    4.2
No accountability states            3.8                    2.3
High-stakes advantage               5.3 points *           1.9 points *
High-stakes advantage after         5.2 points *           2.3 points *
 adjusting for changes in students
 excluded from NAEP

                                    Increase in NAEP
                                    8th-grade math scores

                                    1992-2000              1996-2000

High-stakes states                  8.8                    4.5
No accountability states            4.0                    1.7
High-stakes advantage               4.8 points *           2.8 points *
High-stakes advantage after         3.7 points *           2.5 points *
 adjusting for changes in students
 excluded from NAEP

* statistically significant at the .05 level

SOURCE: Authors

Table 2

A Panoply of Mistakes

Test scores actually increased at a faster rate than in
no-accountability states in almost all of the high-stakes states where
Audrey Amrein and David Berliner (AB) claimed to find decreases in
scores. In New Mexico, Oklahoma, and West Virginia, where AB found
decreases high-stakes testing was introduced too early to make a valid
before-and-after comparison.

States where AB     Introduction of    Change in 4th-grade NAEP
declared decreases  high-stakes        math scores between:
in NAEP scores      testing (AB date)  1992-1996  1996-2000  1992-2000

Kentucky            1994               4.9 (b)    1.0        5.9 (c)
Maryland            1993               3.4 (b)    1.6        5.0 (c)
Missouri            1993               2.5 (c)    3.8 (b)    6.3 (c)
Nevada              1998               N/A        2.7 (c)    N/A
New Mexico          1989 (a)           0.5        0.0        0.6
New York            1999               4.2 (b)    3.9 (b)    8.1 (b)
Oklahoma            1989 (a)           N/A        N/A        4.7 (c)
West Virginia       1989 (a)           8.1 (b)    1.5        9.6 (b)

Notes:

N/A - NAEP data unavailable for this time period

(a)No NAEP tests at or before introduction of high-stakes testing

(b)Change in NAEP scores exceeds the average in NAEP both for the nation
and for states not adopting high-stakes testing

(c)Change in NAEP scores exceeds the average change for states not
adopting high-stakes testing

SOURCE: Authors

Figure 1

Accountability Works

States that reward or sanction schools for their academic performance
made greater gains on the National Assessment of Educational Progress in
math from 1996 to 2000.

                          % Gain in Math Scores
                          from 4th to 8th Grade

No Accountability System          0.7% *
Report Card System              * 1.2% *
Accountability System             1.6% *

* Statistically significant at the 0.5 level

SOURCE: Authors

Note: Table made from bar graph


Margaret E. Raymond is the director of CREDO, an education policy research group at the Hoover Institution The Hoover Institution on War, Revolution and Peace is a public policy think tank and library founded by Herbert Hoover at Stanford University, his alma mater. The Institution was founded in 1919 and over time has amassed a huge archive of documentation related to President . Eric A. Hanushek is a senior fellow at the Hoover Institution.
COPYRIGHT 2003 Hoover Institution Press
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Hanushek, Eric A.
Publication:Education Next
Geographic Code:1USA
Date:Jun 22, 2003
Words:4078
Previous Article:Locked down. (Feature).(school choice and educational reform)
Next Article:Crowd control: an international look at the relationship between class size and student achievement. (Research).
Topics:



Related Articles
confronting institutional racism.(education system)
My Stakes Well Done.(education)
Doing High-Stakes Assessment Right.(education)
Testing dissidents: School leaders go public with their concerns over the harm of highstakes tests.(education)
The abuse of accountability.(in the education system)(Column)
Enemy of the good: no standardized test is perfect. But they're useful nonetheless. (Forum).(Brief Article)
Are women citizens?
Does accountability work?(correspondence)
The Unintended Consequences of High-Stakes Testing.(book)(Book Review)
MOVIE PIRACY ADS SET SAIL PUSH ON TO PREVENT THEFT, RE-EDUCATE.(Business)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles