Continuing tensions in standardized testing.
Those test scores usually appear on the first page of the newspaper, building a sense of their importance. Students talk about taking the test. Legislators talk about the test scores. School board members either break out the champagne to celebrate high scores or blame the superintendent, who in turn blames the teachers, for low scores. Poor scores prompt editorial writers to lament the sorry state of schools, often criticizing the quality of teaching, as if nothing else contributed. Teachers question the usefulness of the test scores. What are the conditions behind these tests that summon such varied responses?
In this article, the authors examine the tensions resulting from the use of these test scores. Three interwoven themes provide a background for these tensions. The first theme is that mass education was a great social experiment, first tried in the United States in the mid 1800s. The nation sought not only to provide education opportunities to all of its citizens, but also to maintain efficiency in doing so. The second theme is that achievement tests always have been used by the public to evaluate educational progress. Policymakers, including state and national legislators and school boards, make policy decisions and allocate resources based on test scores. It stands to reason that large-scale standardized testing at the national, state and school district levels is likely to continue. The third theme is that U.S. schools have used tests to weed out students and eliminate them from further education opportunities, rather than using tests to identify problems in learning that need intervention. Amid this tension, many students are not being well served - in particular, those who live in poverty and/or lack the language skills necessary to succeed in school and in society. This article examines the roles that educators might play in the future of standardized testing.
A standardized achievement test is designed to provide norm-referenced interpretations of student achievement in specific content areas at certain points in their education careers. Norm-referenced interpretations are relative, showing how students compare with others in the nation.
Part One: A Brief History of Standardized Testing in the U.S.
The impetus for standardized tests emerged in the 1800s and has continued. Problems with standardized testing today are really not very different from old ones.
The Inception of Standardized Testing
The first documented achievement tests were administered in the period 1840 to 1875, when American educators changed their focus from educating the elite to educating the masses. Cremin (1964) pointed out that the earliest tests were intended for individual evaluation, but test results were inappropriately used to compare schools and children without regard for non-school influences. As millions of immigrants came to the United States in the 19th century, the standardized test became a way to ensure that all children were receiving the same standard of education. In fact, however, test results were often used to emphasize the need for school reform (Office of Technology Assessment, 1992).
Ability (Intelligence) Testing
At the turn of the century, the focus shifted from achievement testing to ability testing for the purpose of sorting and classifying students. Schools wanted to identify and weed out students who were not going to succeed academically. Consequently, many ethnic groups new to the United States faced discrimination on the basis of new "intelligence" tests, such as the Binet Intelligence Scale.
In 1922, Walter Lippmann wrote a series of articles in the New Republic protesting the misuse of standardized ability tests, which echo the protests of current critics. Lippman characterized intelligence tests as
"[a] gross perversion by muddlehead and prejudiced men. . . . Intelligence is not an abstraction like length and weight; it is an exceedingly complicated notion which nobody has yet succeeded in defining. If the impression takes root that these tests really measure intelligence, that they contribute a sort of last judgment on the child's capacity, that they reveal "scientifically" his predetermined ability, then it would be a thousand times better if all the intelligence testers and their sand times better if all the intelligence questionnaires were sunk without warning in the Sargasso Sea. (cited in Perrone, 1976, pp. 14-15)"
Despite criticism, standardized ability testing quickly took hold in the United States. According to Deffenbaugh (1925), both ability and achievement tests were being used to sort and classify students, reflecting education's lingering elitism, as well as educators' failure to address the problems of low achievers.
The Beginning of Multiple-Choice, Standardized Achievement Tests
As noted earlier, two prevailing goals in American education have been 1) providing equal access to public education and 2) efficient delivery. The drive for greater efficiency turned American schools away from essay tests and toward multiple-choice tests. Critics and test specialists argued vehemently about the strengths and weaknesses of the two types of tests (e.g., O'Dell, 1928), an argument that continues today (Haladyna, 1994; Shepard, 1994).
Technological advances meant that multiple-choice tests could offer test data about many students at a very small cost. This method facilitated comparisons among teachers, schools, school districts and even states. The Stanford Achievement Test is recognized as the first of the large-scale publishers' tests (Haladyna, in press). First published in 1923, it continues to be an acknowledged leader in its field. Other popular standardized tests that can trace their origins to that era include the Iowa Test of Basic Skills (ITBS), the American College Testing Program, Scholastic Aptitude Test, the California Achievement Test and the Metropolitan Achievement Test. The ACT Assessment and the Scholastic Aptitude Test (SAT) became the nation's leading college admissions tests.
Haertel and Calfee (1983) stated that these tests at first only vaguely and generally reflected school learning, without any mention of a curriculum or instructional objectives. Prescriptive and didactic textbooks, however, began to have an influence on the tests. Diagnosis and prescription became central themes in standardized testing. Critics noticed that these tests measured concrete, lower-level school outcomes very well, but neglected more complex types of learning. By the 1950s, the Bloom taxonomy (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956) emerged, which justified teaching by objectives and raising the quality of testing to measuring more than simple memory-type learning. The biggest disappointment with the standardized achievement test was its remote connection to classroom teaching and the school district curriculum. This mismatch led to a different approach to testing.
Originating from Ralph Tyler's work in the 1930s, teaching and testing by instructional objectives came into vogue in the 1970s. The criterion-referenced test was supposed to be linked to objectives or learning domains that were easily tested. This kind of testing fostered systematic instruction. Proponents of criterion-referenced testing, such as James Popham (1995), called for tests that inform teachers about their successes and failures. Publishers' standardized achievement tests, however, are non-specific and unfocused with respect to the variety of objectives that teachers address in their classrooms. Interest in the criterion-referenced test has waned, because many of its outcomes seemed easy to teach and easy to test. Furthermore, the criterion-referenced test lacked the normative data that nationally normed standardized achievement tests could provide.
Testing in the Latter Part of the 20th Century
Three factors affected standardized testing in the latter part of the 20th century: 1) changing demographics caused by immigration, 2) technological challenges introduced during the Cold War and exacerbated by the computer age and 3) racial inequality. Arthur Jensen (1980) wrote essays and published studies that fueled concern over intelligence and racial differences. Studies show that low degrees of scholastic aptitude (i.e., intelligence) predict low levels of education, and that for most people low levels of education lead to unproductive lives (Herrnstein & Murray, 1994). Reversing this trend, while difficult, continues to be a goal in American education. The question remains: Can we educate all students enough so they can function in society in more positive ways?
Problems We Face Today
How do we measure achievement of today's students? For what purposes should standardized achievement tests be used? Do standardized achievement tests adequately measure school achievement, or are performance tests and portfolios more appropriate? Tensions continue to mount as both educators and non-educators seek answers to these questions.
Year after year, polls show that the public wants the information from publishers' standardized achievement tests; 68 percent of respondents to a USA/Gallup poll favor President Clinton's national testing program, and other polls continue to show support for standardized testing (Rose, Elam & Gallup, 1997). Parents are increasingly willing to pay for independent evaluations of their children's achievement in basic skills, as evidenced by the rise of private testing and tutoring centers.
Until a time when they are specifically designed to reflect curriculum and instruction, publishers' standardized achievement tests do not seem to be appropriate measures of instruction or curriculum. They do not reflect how instruction has affected learning and how demographics produces differences in performance. When the achievement tests are given, however, newspapers clamor to publish the scores, and then try to interpret what they mean. Reporters seem to have an innate need to line up the scores from top to bottom and then attempt to make judgments related to the effectiveness of schools. Even college admissions tests are used in this invalid way. The ACT and SAT were never intended to evaluate states or school districts, and the sampling of students is never adequate for this purpose. Nonetheless, members of the press continue to chant the mantra about how schools and, specifically, teachers are failing.
The Message from Testing's History
Four major conclusions emerge from our history of standardized testing:
* Testing has been and will always remain a basis for knowing about how schools affect students, despite the potential for misinterpretation and misuse of test scores
* Two governing principles continue to influence school testing: 1) all students must be given equal opportunities regarding their education and 2) schooling must be offered in an efficient manner
* The increase in the amount of testing, as well as the misuse of test scores, increases tensions in education, to the detriment of children and teachers
* Schooling plays an important role in each citizen's life; those with more education lead more productive lives, and the "quality" of that education is unfairly driven by test scores. (Office of Technology Assessment, 1992)
Standardized testing is entrenched in American education. The public continues to support testing because it perceives that test scores are valid indicators of children's learning. While it seems unlikely that educators will be able to change the public's taste for large-scale standardized tests, it is possible to ensure that the test results are responsibly interpreted and used.
Part two: The Role of Professional Organizations
Many national and international organizations have issued position papers or statements on standardized testing, including the: Association for Childhood Education International (1991), Association for Supervision and Curriculum Development (1987), Council for Exceptional Children (1993), National Association for the Education of Young Children (1988), National Association of Early Childhood Teacher Educators (1989), National Association of Elementary School Principals (1989), National Association of State Boards of Education (1988), National Council of Teachers of English (1989), National Council of Teachers of Mathematics (1989), National Commission on Testing and Public Policy (1990), American Psychological Association [APA], National Council on Measurement in Testing, and American Educational Research Association (1985), and the National Education Association (1972). These organizations recognize the mounting evidence that standardized testing often has detrimental and counterproductive effects on children and teachers.
Kamii (1990) summarized the growing concerns of professional organizations, teachers and parents about standardized testing: "[We] are not against accountability. We are all for it. Our reasons for opposing the use of achievement tests are that they are not valid measures of accountability and that they are producing classroom practices harmful to young children's development" (p. ix). Below is the gist of what organizations conclude about standardized testing:
* Testing increases pressure and stress on children, which sets them up for failure, lowered self-esteem and potential health risks
* Testing compels teachers to spend valuable time preparing children to take tests and teaching to the test, undermining what otherwise could be sound, responsive teaching and learning
* Testing limits children's education possibilities, which results in a mediocre curriculum and learning
* Testing discourages social and intellectual development, such as cooperation, creativity and problem-solving skills, as time is spent instead on learning exactly what appears on the test.
* Testing leads to harmful tracking and labeling of children, especially those of minority and low socioeconomic backgrounds.
The professional organizations have stated, in no uncertain terms, that testing does not provide useful information about individual children; yet, test scores often become the basis for making decisions about retention, promotion, kindergarten entrance, ability grouping and special education placements (Council for Exceptional Children, 1993; National Association for the Education of Young Children/Council for Exceptional Children, 1996).
Professional organizations have redoubled their efforts to protect children and teachers. They propose the cessation of all standardized testing below the 4th grade. Realizing, however, that schools are under public pressure to test children, they have made detailed recommendations about testing practices. For instance, the National Association for the Education of Young Children (NAEYC, 1988) issued strong recommendations about the selection, administration, interpretation and use of tests and scores, charging that:
"The most important consideration in evaluating and using standardized tests is the utility criterion: The purposes of testing must be to improve services for children and ensure that children benefit from their educational experiences. The ritual use even of "good tests" (those judged to be valid and reliable measures) is to be discouraged without documented research showing that children benefit from their use. (p. 53)"
The above excerpt alludes to something very ironic about testing practices: There is no evidence that supports its pervasive use. What are its direct benefits to children? What great advantage does testing provide teachers?
Professional advocate organizations agree that testing needs to be more humane, meaningful and varied (see, especially, Association for Childhood Education International/Perrone, 1991; Bredekamp & Copple, 1997; Council for Exceptional Children, 1993; National Association for the Education of Young Children, 1988; National Association for the Education of Young Children/Council for Exceptional Children, 1996; Perrone, 1976, 1977, 1981, 1991).
Many educators agree that continuing standardized testing is basically irresponsible (e.g., Meisels, 1987; Shepard & Smith, 1986; Weber, 1977).
* All standardized testing of children - preschool through later elementary - should cease or at least be severely reduced
* Teachers and parents should oppose all standardized testing, especially group-administered tests
* If tests are given, teachers and parents should oppose using test results alone to make any important judgments about a child
* Testing must recognize and be sensitive to individual diversity (age, ability; gender, culture, language and race)
* Tests should be used solely for their intended purpose
* Administrators and teachers must critically evaluate and select tests for validity
* Administrators and policymakers have the responsibility to ensure that schooling is both psychologically and morally prudent for children
* Administrators and teachers must be knowledgeable about interpreting test results, and cautious and conservative when sharing test results publicly.
Increasingly, professional organizations argue that standardized testing is an extremely high-stakes practice in which children's worth is "measured" by a score that is not likely to be validly interpreted. Furthermore, teachers' effectiveness is often unfairly judged on the basis of a classroom average. Annual, standardized tests make no allowance for the fact that students' development and cognitive abilities in the early years are uneven. While children's developmental growth is not uniform, standardized test norms are based on average growth without regard to unique developmental patterns. Individual character is lost and children who do not fare favorably on standardized tests (i.e., those with special needs and language barriers) remain "guilty until proven innocent" (Bredekamp & Copple, 1997; Meisels, 1987,1993; Shepard, 1994; Smith, 1991).
Despite strong support, no convincing evidence exists that standardized testing is beneficial. It can, however, increase chaos and reduce teachers' sense of efficacy (Hartman, 1991; Rosenholtz, 1989). The scope of the curriculum is also reduced. As a result, teachers spend precious time focusing on the mechanics of test-taking, and on narrow on test content (Haladyna, Nolen & Haas, 1991; Nolen, Haladyna & Haas, 1992).
In fact, Perrone (1991) stated that many school districts do not use any standardized testing programs, and they can produce alternative "evidence" of students' productivity and teachers' effectiveness (see also Bredekamp & Copple, 1997; Bredekamp & Rosegrant, 1992, 1995; Meisels, 1993). The use of student portfolios, for example, is gaining many supporters, including parents. The portfolio appears to directly reflect what students are learning in ways that standardized tests never can. One advantage of the portfolio over any test is the perspective of time. A good portfolio shows a student's growth in some important ability, such as writing, over the entire school year. This growth can be assessed by lay persons and parents, without the need for technical data.
Another advantage of the portfolio is that the students' personal written reflections it contains also show how motivation and attitude can affect students' growth. Thus, the portfolio yields a much richer assessment, especially when its contents are directed from a school district curriculum.
Professional organizations need to promote such alternative assessments. More important, organizations must be more active in assessment design, thereby providing school districts and instructional programs with valid and more humane methods to assess students.
Part Three: Valid and Invalid Interpretations and Uses of Test Results
The misinterpretation and blatant misuse of test scores is pervasive. Some questions to consider are:
What time span does a test score represent? Test scores can reflect the sum of a child's learning over several years. Most policymakers, however, as well as the lay public, want to know how much learning occurred in a particular school year. A standardized test given once a year is not a good measure of this kind of learning. These standardized tests are not precise enough, nor is instruction geared to reflect exactly what the test measures. At best, we get a rough year-to-year measure of student learning that does not accurately measure the sum of school learning because the school curriculum is seldom specifically correlated to what the test measures. If the curriculum and instruction did match the test, we would have another kind of problem, "teaching to the test," which will be discussed in another section.
The causes of a test score or set of scores are complex and difficult to assess. There are, in fact, many causes, some of which reside in school and some of which originate in the home or community. It would be incorrect to attribute test results solely to the teacher's expertise. So many other factors affect test results. In Arizona, for example, 28 percent of the students live in poverty. It is no stretch of the imagination to reason that these students lack the same opportunities the other 72 percent enjoys.
Standardized tests' very name conveys to laypersons a precision that no publisher would support. Each test is a sample of a large body of knowledge that teachers often feel compelled to teach. When teachers decide to "teach to the test," a practice that most of them deplore, the interpretation of the test score is corrupted. The standardized test score should never be a precise measure of student learning; it was meant to be merely a general survey instrument.
It has been widely reported that the Pressure for unfair accountability causes a number of teachers to tamper with their teaching methods in order to get high scores (Haladyna et al., 1991; Mehrens & Kaminski, 1989; Weber, 1977). While no one is proud of these practices, they are extensive and can be traced to the reductionist thinking that a test score precisely measures a teacher's or school's merit. The invalid interpretation of test scores, coupled with constant public scrutiny and the need for higher performance from chronically low-scoring students, drives some teachers into this unethical trap.
Valid Uses of Test Scores
National rank. One of the major selling points for any standardized survey achievement test is that it can rank students through percentiles. In a competitive world, such information helps shape expectations. We cannot overlook such national ranking, since test scores drive decisions about college admission. Yet, it would be unwise to create false expectations from test performance using national rank, unless mitigating circumstances could explain why certain students scored higher or lower than expected.
Future achievement. The best predictors of future test scores are past test scores. The best predictors of student grades are prior grades. These simple truisms show that a constancy in standardized test scores exists that teaching cannot influence. If these tests truly sample general achievement, then one test score can generally predict future test scores and, thus, future achievement. Intervention or changes may affect future achievement, but test scores are fairly dependable predictors of future test performance.
Curriculum evaluation. At the individual level, a set of test scores is hardly dependable enough to provide good information, but at a school or school district level subscores provide enough information, such as mathematics computation and mathematics problem solving, to furnish central administration and teachers with ideas about student performance relevant to curricula. If the scores for mathematics problem solving are low compared with other areas, the curriculum could be revised, which will, in turn, ultimately change instruction and future test results. This is a positive use of standardized test information. The negative side of this evaluation issue, and a common problem, is that if your curriculum is focused on content and processes that are not well represented on the publishers' tests, then test results can be very misleading about your instructional program's effectiveness.
Policy decisions. The primary education policymakers are school boards and federal and state legislators. They need information to make policies and allocate resources. Standardized test scores provide information that can be useful, but also misleading. A key requirement for making interpretations about the adequacy of student learning and program effectiveness is the linkage of any standardized test to the current curriculum.
In an experiment tried in a small Western school district, the first author met with members of a school district who were trying to justify their programs to an increasingly critical public. An examination of standardized achievement test results showed the district to be slightly above average, reflecting the community's social class and economic wealth. Teachers were asked to examine each test item. The test items were divided into two parts: instructionally relevant and instructionally irrelevant. On relevant items, the district's performance was well above the national average. On irrelevant items, its average was slightly below the national average. The lesson to be learned: if we test what we teach, we are more likely to get positive results than if we test what we do not teach.
Grouping students for instruction. In the early years of the 20th century, schools customarily used test scores to group students for instruction. Students who needed more time and patience were grouped for remedial instruction, while advanced students were permitted to work ahead. Multiage grouping strategies seem to have many positive benefits, according to Ong, Allison and Haladyna (submitted for publication). Since tests scores have good predictive value regarding performance and future performance, it is desirable to have good test information when determining groupings for instruction.
Diagnosis of weak areas in the curriculum. Traditional, standardized test scores provide convincing breakdowns of student performance by specific topics. These breakdowns can help school districts and schools plan for shifts in instructional emphasis to shore up lagging performance in critical areas. If a school's mathematics computation scores are low, a re-emphasis in all the grades might result in a more positive result the following year.
Invalid Uses of Test Scores
Invalid uses of test scores contribute mightily to the increasing tensions associated with standardized testing.
Cash for high test scores. Currently, Arizona is considering legislation that will reward teachers if their students' test scores are high. They would receive a $1,200 bonus in pay. Connecting pay bonuses to students' test performance has a great many flaws, not the least of which is that some teachers and school leaders will do almost anything, even cheat, to achieve a high score (Mehrens & Kaminski, 1989; Nolen, Haladyna, & Haas, 1992). Some educators might then produce fraudulent results by dismissing students who are likely to score low, reading the answers to students, or simply correcting students' answer sheets after the test. Such practices have been well documented, when teachers and other educators feel no recourse other than to tamper with the testing process.
Graduation or certification testing. Many states are experimenting with graduation or certification testing. Oregon, for example, has developed the Certificate of Initial Mastery, which requires additional qualifications to receive a high school diploma. Graduation or certification testing is certainly legitimate, because it reflects the public's current interest in having high standards in public schools. It remains to be proven, however, that making pass/fail decisions on the basis of test scores is always valid. The City of Chicago, for example, recently failed 8th-grade students on the basis of test scores from the Iowa Test of Basic Skills.
The schools' failure to provide students opportunities to learn the Iowa Test material, however, may provide a legal basis for striking down such action. How the passing score is set represents another important issue. Downing and Haladyna (1996) identified the types of validity evidence needed in such high-stakes testing and the legal implications of such testing. While states like Oregon go about high school certification using validity evidence as a guidance tool, do all sponsors of such tests stick to the Standards for Educational and Psychological Testing (APA et al., 1985)?
Evaluating teaching. Many researchers and experts in teacher evaluation strongly reject the idea of using test scores to evaluate teaching (Berk, 1988; Haertel, 1986). The most common argument against this practice is that students' learning capabilities can be affected by many powerful factors outside of school. Some of these factors are mental ability and social capital (Coleman, 1987), a broad, encompassing concept that includes family and home factors as well as neighborhood factors. Coleman argued that in the most extreme circumstances, no amount of teaching will overcome profoundly low social capital. Teachers from low-income areas already know this. Any progress they make with these vulnerable children will never earn them plaudits as teachers, despite the fact that they heroically work under adverse conditions.
Evaluating schools and school districts. This practice hardly seems defensible, because factors well beyond the teacher's control influence student learning. How, then, can school districts and schools be held accountable for test scores, particularly when the standardized achievement does not sample the domain of instruction found at the school? A good case in point arises in Arizona, where every elementary and secondary school student took a version of the Iowa test. A study by Noggle (1987) showed only a 26 percent correlation between an Iowa test and the state's content standards.
Curriculum alignment. Some schools ask teachers to abandon the regular curriculum in order to prepare students for the standardized achievement test; in other schools, the curriculum is aligned directly with the test (Nolen et al., 1992). Abandoning the curriculum seems to disrupt students and disturb the learning process. Curriculum continuity and coherence are critical in the formative years, especially as children learn how to read and write. Allowing the test to dictate the curriculum results in a watered down curriculum.
Teaching to the test. Nolen et al. (1992) showed that some schools abandon part of the academic year in favor of test preparation, and develop instructional packets to coach students for the test. Suspending normal instruction clearly has a negative effect on student development.
Part Four: Effects on Students and Teachers
The effects of testing on students and teachers has been studied by Smith (1991), Paris, Lawton, Turner and Roth (1991), and the authors of this article (Haas, Haladyna & Nolen, 1990; Hartman, 1991; Nolen et al., 1992).
Students are adversely affected by standardized tests in three ways: 1) it heightens student anxiety about the testing experience, 2) it decreases student motivation and learning and 3) students, by the time they are in high school, do not believe the tests hold much value (Paris et al., 1991).
Text anxiety. Test anxiety is a chronic problem for as many as 25 percent of all students (Haladyna et al., 1991). Pressure to perform well on tests may exacerbate students' natural anxiety, or create other, related problems. Hartman (1991) and Haas et al. (1990) collected many anecdotal comments from teachers who described their students' test anxiety. The tests left many of the students feeling angry, frustrated, tired and upset. One teacher complained:
"The CAT "California Achievement Test] is a nightmare of testing these children. And I mean a nightmare: Kids crying and throwing up, breaking their pencils, going to the bathroom, saying "I don't want to come to school" after going 15 minutes of a two-week ordeal. (Hartman, 1991, p. 53)"
Another teacher said:
"The children are tense. They don't eat or sleep well the night before the test. Many parents put tremendous pressure on the children to score high. (Haas et al., 1990, p. 50)"
Nolen et al. (1992) reported a variety of student problems, including truancy, upset stomach, irritability, crying, wetting or soiling, excessive bathroom breaks, concern over the time limit, "freezing" up on timed parts of the test, headaches, hiding, refusing to take the test, and increased aggression. The prevalence of these incidents, as reported by teachers, ranged from 6.7 percent for wetting and soiling to 44 percent for excessive concern over time limits. The extensiveness of this anxiety is considerable, and would seem to correlate with damage to students' self-concept, and their attitudes toward school and the subject matter.
Loss of valuable learning time. Another dimension of this problem is that the students spend an enormous amount of time studying for, and taking, this test. Nolen et al. (1992) reported that only 12.5 percent of the teachers surveyed at the elementary level spent no time preparing for the test, while others reported spending up to two months in preparation. While this time may be viewed as learning time, students are robbed of time to spend on curriculum-appropriate learning. A junior high school teacher summarized this problem in the following way:
"Because of the standardized test, I have found that my creativity and flexibility as a teacher have been greatly reduced. I spend a great deal of time zeroing in on skills that I know are on the test. This leaves only a bare minimum of opportunity to explore writing and enrichment reading. In reviewing the test I find that what I am going over is the same thing that the teachers in one grade lower and one grade higher are covering as well. This makes for a very redundant curriculum. Also, the skills we emphasize before the tests do help them to perform better on a day-to-day basis. (Haas et al., 1990, p. 63)"
The cognitive losses brought on by excessive over-studying of test-specific material may be difficult to assess. The point is, spending class time learning test-specific material detracts from learning other material that needs to be covered.
Inadequate effort or inattention to perform. A number of reports focus on student motivation to perform. A 7th-grade teacher reported:
"This testing is unacceptable. In my homeroom this year out of 28 students, 19 showed up for part of the testing. Many of those students "bubbled" randomly. The situation exists in many classrooms in this district. The students are highly transient and from poor homes. They expect to do poorly and don't try. (Haas et al., 1990, p. 62)"
Another teacher said:
"Any little thing could distract them. So it might be a pencil that dropped. It might be something, say we're sitting here and there's a bird out there on the wire and they might look over and see that bird. For ten questions they're looking at the blue jay on the wire and mark anything! . . . Here [the administration] has a test score that they are using to evaluate me on the entire school year and it's all based on the blue jay that was sitting on the rail out there! (Hartman, 1991, p. 66)"
Paris et al. (1991) reported lack of student effort to be a problem that increases with age. Students in the early grades think that the test is relevant to measuring what they have learned, but by high school most students know that the test does not reflect their intelligence or learning and has no bearing on their future. They believe tests that really count toward their future are the college admissions tests - the ACT or SAT. Reports indicate that students pay less attention to the test because they are unmotivated or find the tasks too hard.
Effects on Teachers
This section addresses problems that afflict teachers under the conditions introduced by testing. Mary Lee Smith (1991) put it succinctly:
"To understand the perceived effects of external testing on teachers, one needs only to ask. Their statements on questionnaires, in interviews, and during conversations in meetings and lounges reveal the anxiety, shame, loss of esteem, and alienation they experience from publication and use of test scores. (p. 8)"
Teachers suffer when standardized tests are used as the sole indicator of student learning. Haas et al. (1990) interviewed nearly 400 teachers, whose comments echoed much of those from Smith's (1991) observations.
Invalidity of test interpretation and use. As stated earlier, test results often are interpreted and used in an invalid manner. If a school's scores are lower than what parents expect, the teachers may be shamed into thinking that they have not done a good enough job. Even teachers who are lucky enough to teach in affluent areas, where test scores are generally very high, may realize they had very little to do with that achievement. Using test scores without a context brings shame and embarrassment to many teachers. They know that it is invalid to use these test scores to hold teachers accountable, but they are powerless in the face of the media commentary and political attacks.
Another factor contributing to the invalidity problem is the presence of children from different cultures who may have English language deficiencies that inhibit learning and affect the measurement of their learning. A 2nd-grade teacher put it this way:
"My students are mostly Native American and Hispanic [of] low SES [socioeconomic status], and as a result are exposed to many hardships. It hardly seems fair to compare their scores to students of the same age who have grown up with a well developed foundation in the English language. I think a national standardized norm-referenced test is a good idea for some things but the results should be taken into consideration with cultural and SES factors. (Haas et al., 1990, p. 102)"
A 6th-grade teacher said: "How can the test be appropriate for my Navajo and Hispanic students when the only time they speak English is when they are in school?" (Haas et al., 1991, p. 36).
Teachers seem to be divided roughly into two camps. Both camps admit that the tests reflect poorly on what they teach and how well they teach. Unfortunately, the first camp may resort to some type of strategy to improve student performance on invalid tests, even to the point of cheating. Members of the second camp merely ignore the test and teach according to their beliefs. The second group, while having to endure criticism for low test scores, may still be more satisfied.
Curriculum mismatch. Most teachers recognize that mandated publishers' tests do not match well with state-dictated standards, district curriculum, textbook series or what they themselves deem to be appropriate content. One 1st-grade teacher said:
"I feel that by mandating the standardized test in the primary grades, we are allowing test makers to design curriculum, at least in my district. A much better measure of what children are learning is through the district's curriculum referenced tests. Much valuable time is being lost to test preparation. (Haas et al., 1990, p 37)"
Because the state mandates a test that does not match what teachers are expected to teach, they have to either ignore the state's test and do what they think is professionally and responsibly correct, or cave in and teach to the test. This sets up teachers' sense of apathy that Smith (1991) summarizes as, "Why should we worry about these scores when we all know they are worthless?" (p. 9).
Teacher distress. As reported in the previous section, students suffer from the testing experience. Teachers see firsthand the effects of testing on students, especially young students. These teachers also see the effects of testing on themselves, their colleagues and even the administrators. In the words of a 6th-grade teacher:
"The test adds stress to everyone. In this district administrators are made to think that the test reflects on how well they do their job, which makes them more concerned for themselves than for the children. The children in this school are scared to death because they have been warned by their parents to do well and make them proud. The parents take the results of the test personally, as if they were being evaluated. (Haas et al., 1990, p. 49)"
This feeling of remorse about what is happening to students and colleagues seems to gnaw at these teachers. A feeling of hopelessness seems to pervade this atmosphere. It is hard for teachers to be satisfied with their chosen profession when the consequences of published tests can be used against them in so many ways.
Part Five: Where Do We Go from Here?
As we have learned, publishers' standardized achievement tests are not going away. In fact, standardized testing is gaining in popularity, despite its obvious harmful effects. Educators must continue to inform the public about these tests' negative effects, without compromising their integrity or commitment to sound curriculum, good instruction and appropriate assessment. Audiences for this message include: 1) state legislators and school board members, 2) the media, 3) parents, 4) the general public and 5) the few fellow educators who do not understand this message. It is important that educators be united about standardized testing.
What Message Do We Send To These Audiences?
To reiterate, accountability is a good idea. It is important to know where each student stands relative to standards, and to know what to do to help them improve. Simplistic uses of test scores that are only remotely connected to classroom teaching will not achieve this accountability. Test scores alone are not the complete picture, because they ignore environmental factors. What the authors advocate is a broad assessment that includes worthwhile outcomes of student learning, as well as information about factors that exist within and outside of school that affect learning. Additionally, test batteries should be aligned with the curriculum of the school district and instruction.
To advance this continuous dialogue with the public regarding students' education, we support four propositions that guide us toward responsible standardized testing:
Test score interpretations must be valid. Learning is a lifetime endeavor. These standardized survey tests only sample the large domain of knowledge. When we see scores, we need to ask what the test measures and what factors probably contribute to this level of performance. In other words, a test score represents only one level of learning out of a very large domain. We must squarely face the fact that these tests are not adequate reflections of a school district curriculum or instructional emphasis at the classroom level. As a measure of lifelong learning, we might want to speculate about which factors contributed to high or low performances. Some of these factors are: the quality and quantity of instruction, motivation, cognitive ability, amount and quality of family support, attitude, adequacy of instructional materials, and quality of educational leadership, both within the school and at the district level. Do not make judgments simply from a test score. Be careful about assigning a single cause to test scores, such as the teacher being solely responsible for the student outcomes. Such irresponsible interpretation unfairly judges teachers and children.
Test score uses must be valid. Insist that test scores be used only for valid purposes. Ask for evidence about the validity of any test use. Evaluating teachers on the basis of test scores is invalid, because no evidence can show that the test is well-matched to instruction; nor can it be shown whether other, non-school factors were taken into account. Test scores are simply inadequate to the task of evaluating teaching and teachers. Tests that are supposed to keep students accountable for their education are defensible if students are given adequate preparation for the test, as well as remedial instruction, when necessary.
Keep standardized tests standardized. Once the standardized test is accurately interpreted and used fairly and defensibly, the administration of the test ought to be standardized, as well. Too many reports from all parts of the United States concern educators who capitulated to the pressures of being accountable pressures by doing something unethical. To reiterate:
* Do not teach to the test. Do not alter the curriculum to conform to the test's content.
* Do not change the test administration times or conditions.should be
* Determining who should be tested should be spelled out as part of policy. Excluding certain students from testing is one way to inflate tests-scores, which may lead to invalid interpretations.
* Excluding students from testing is sometimes appropriate, however, as when the test result does not validly reflect students' knowledge. Language barriers, emotional difficulties, health and illness factors, and physical and mental handicaps are prominent factors that may hinder students' ability to perform well. The exclusion should be standardized from class to class, school to school, and district to district.
Examine and evaluate the consequences of standardized testing. Students should not take tests unless the interpretations and uses of test results are fair and useful. If it is shown that the consequences of testing have harmful effects on students, then such testing cannot be justified. As pointed out in Part Three, many educators have argued that standardized testing below grade 3 seldom provides true measures of student ability or development.
In addition, the reporting of test scores by racial or ethnic categories seldom serves useful purposes. These reports often perpetuate negative stereotypes of minority groups, masking relevant factors, such as poverty, that may account for a poor performance. If test interpretation and use lead to unequal treatment of students instead of providing equal opportunities, then we must take steps to resolve the problem. More and more states, for example, are adopting stiff high school graduation requirements. If students are not instructed adequately, then large numbers of students will not complete high school. Test scores already tell us that the most likely group of students destined to fail these graduation tests will be those living in poverty and those with the added task of trying to learn to read, write, speak and understand English. As we examine and evaluate the consequences of standardized testing, we will probably continue to offer the public what they demand, but we should insist that the testing is done in a manner that does not harm those whom the tests are intended to serve.
American Psychological Association, American Educational Research Association, National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Association for Children Education International/Perrone, V. (1991). On standardized testing. A position paper. Childhood Education, 67, 131-142.
Association for Supervision and Curriculum Development. (1987). Testing concerns. In Forty years of leadership: A synthesis of ASCD resolutions through 1987 (pp. 17-19). Alexandria, VA: Author.
Berk, R. A. (1988). Fifty reasons why student achievement gain does not mean teacher effectiveness. Journal of Personnel Evaluation in Education, 1(4), 345-364.
Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives. New York: D. McKay.
Bredekamp, S., & Coppie, C. (Eds.). (1997). Developmentally appropriate practice in early childhood programs (Rev. ed.). Washington, DC: National Association for the Education of Young Children.
Bredekamp, S., & Rosegrant, T. (Eds.). (1992). Reaching potentials: Appropriate curriculum and assessment for young children: Volume 1. Washington, DC: National Association for the Education of Young Children.
Bredekamp, S., & Rosegrant, T. (Eds.). (1995). Reaching potentials: Transforming early childhood curriculum and assessment: Volume 2. Washington, DC: National Association for the Education of Young Children.
Coleman, J. S. (1987). Families and schools. Educational Researcher, 16, 32-38.
Council for Exceptional Children. (1993). Division for Early Childhood recommended practices: Indicators of quality in programs for infants and young children with special needs and their families. Washington, DC: Author.
Cremin, J. (1964). The transformations of the school: Progressivism in American education, 1876-1957. New York: Vintage Books.
Deffenbaugh, W. S. (1925). Uses of intelligence tests in 215 cities. City School Leaflet No. 20. Washington, DC: Bureau of Education, U.S. Department of the Interior.
Downing, S. M., & Haladyna, T. M. (1996). Model for evaluating high-stakes testing programs: Why the fox should not guard the chicken coop. Educational Measurement: Issues and Practice, 15, 5-12.
Haas, N. S., Haladyna, T. M., & Nolen, S. B. (1990, April). War stories from the trenches: What teachers and administrators say about the test. Paper presented at a symposium at the annual meeting of the National Council on Measurement in Education, Boston.
Haertel, E. (1986). The valid use of student performance measures for teacher evaluation. Educational Evaluation and Policy Analysis, 8, 45-60.
Haertel, E., & Calfee, R. (1983). School achievement: Thinking about what to test. Journal of Educational Measurement, 20, 119-130.
Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Hillsdale, NJ: Lawrence Erlbaum Associates.
Haladyna, T. M. (in press). Review of the Stanford Achievement Test (8th ed.). Mental Measurement Yearbook.
Haladyna, T. M., Nolen, S. B., & Haas, N. S. (1991). Raising standardized achievement test scores and the origins of test score pollution. Educational Researcher, 20(5), 2-7.
Hartman, J. A. (1991). How mandated student assessment programs affect kindergarten teachers: Two steps forward, three steps backward. Unpublished doctoral dissertation. Urbana, IL: The University of Illinois.
Herrnstein, J., & Murray, C. (1994). The bell curve: Intelligence and class structure in American life. New York: Free Press.
Jensen, A. R. (1980). Bias in mental testing. New York: The Free Press.
Kamii, C. (Ed.). (1990). Achievement testing in the early grades: The games grown-ups play. Washington, DC: National Association for the Education of Young Children.
Mehrens, W. A., & Kaminski, J. (1989). Methods for improving standardized test scores: Fruitful, fruitless, or fraudulent? Educational Measurement: Issues and Practices, 8, 14-22.
Meisels, S.J. (1987). Uses and abuses of developmental screening and school readiness testing. Young Children, 42, 4-6, 68-73.
Meisels, S.J. (1993). Remaking classroom assessment with The Work Sampling System. Young Children, 48(5), 34-40.
National Association for the Education of Young Children. (1988). Position statement on standardized testing of young children 3 through 8 years of age. Young Children, 43(3), 42-47.
National Association for the Education of Young Children/Council for Exceptional Children. (1996). Guidelines for preparation for early childhood professionals. Washington, DC: Authors.
National Association of Early Childhood Teacher Educators. (1989). Resolution: Testing in the early years. The Journal of Early Childhood Teacher Education, 10(1), 16-17.
National Association of Elementary School Principals. (1989). Standardized tests. In Platform 1988-1989 (p. 7). Alexandria, VA: Author.
National Association of State Boards of Education. (1988). Right from the start. Alexandria, VA: Author.
National Commission on Testing and Public Policy. (1990). From gatekeepers to gateways: Transforming testing in America. Chestnut Hill, MA: Boston College.
National Council of Teachers of English. (1989). Testing and evaluation. In NCTE forum: Position statements on issues in education from the National Council of Teachers of English (pp. VI:I-VI:4). Urbana, IL: Author.
National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author.
National Education Association. (1972). Moratorium on standardized testing. Today's Education, 61, 41.
Noggle, N. L. (October 1987). Report on the match of the standardized tests to the Arizona Essential Skills. Tempe, AZ: College of Education.
Nolen, S. B., Haladyna, T. M., & Haas, N. S. (1992). Uses and abuses of achievement test scores. Educational Measurement: Issues and Practices, 11, 9-15.
O'Dell, C. W. (1928). Traditional examinations and new type tests. New York: Century.
Office of Educational Research and Improvement. (1996). Youth indicators 1996. Washington, DC: U.S. Department of Education.
Office of Technology Assessment. (1992). Testing in American schools: Asking the right questions. Washington, DC: Author.
Ong, W. S., Allison, J. M., Haladyna, T. M. (submitted for publication). A comparison or reading, writing and mathematics achievement in comparable single-age and multi-age classrooms.
Paris, S., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991). A developmental perspective on standardized achievement testing. Educational Researcher, 20, 12-20, 40.
Perrone, V. (1976). On standardized testing and evaluation. Olney, MD: Association for Childhood Education International.
Perrone, V. (1977). The abuses of standardized testing (Fastback 92). Bloomington, IN: Phi Delta Kappa Educational Foundation.
Perrone, V. (1981). Testing, testing, and more testing. Childhood Education, 58, 76-80.
Perrone, V. (1991). Standardized testing. ERIC Digest. Urbana, IL: ERIC Clearinghouse on Elementary and Early Childhood Education.
Popham, W. J. (1995). Classroom assessment: What teachers need to know. Boston: Allyn & Bacon.
Rose, L. C., Elam, S. M., & Gallup, A. C. (1997). The 29th annual Phi Delta Kappa/Gallup poll of the public's attitudes toward the public schools. Phi Delta Kappan, 79(1), 41-58.
Rosenholtz, S.J. (1989). Teachers' workplace: The social organization of schooling. New York: Longman.
Shepard, L. A. (1994). The challenges of assessing young children appropriately. Phi Delta Kappan, 76(3), 206-212.
Shepard, L. A., & Smith, M. L. (1986). Synthesis of research on school readiness and kindergarten retention. Educational Leadership, 44, 78-86.
Smith, M. L. (1991). Put to the test: The effects of external testing on teachers. Educational Researcher, 20, 8-11.
Weber, G. (1977). Uses and abuses of standardized testing in the schools. Washington, DC: Council for Basic Education.
Thomas Haladyna is Professor, Educational Psychology, Nancy Haas is Associate Professor, Instructional Design, and Jeanette Allison is Assistant Professor, Early Childhood Education, Arizona State University West, Phoenix.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||educational tests|
|Date:||Aug 6, 1998|
|Previous Article:||Cooperative learning: making it work in your classroom.|
|Next Article:||Developmentally appropriate practice is for everyone.|