The State of Performance Assessments.Beyond short-term burdens and costs, they could be a key element in standards-based reform From national and state policymakers to educators in local districts, most of us are committed to helping all children achieve high standards of performance and to preparing them to be successful citizens in the world of today and tomorrow. Assessment and accountability are seen as driving forces in attaining these goals, and indeed nearly every state and many school districts are involved in developing new assessment systems to gauge student progress and support accountability. But just as today's complex world demands new standards of accomplishment for students, so too do those new standards demand new forms of assessment. A rationale exists for such new forms--what is broadly termed "performance assessment"--and evidence supports the claims. Communicating Reality "What will be on the test, teacher? What do I need to do to get an A, to pass or to otherwise get by?" These common student refrains reflect the essence of why assessment is integral to achieving high standards: Tests communicate what is important to learn, and if those who are tested care about the results, they'll be motivated to perform. We see this phenomenon at all levels within the educational system--as students cram for final exams Noun 1. final exam - an examination administered at the end of an academic term final examination, final exam, examination, test - a set of questions or exercises evaluating skill or knowledge; "when the test was stolen the professor had to make a new set of , for instance, or as teachers gear their curriculum to what's tested, be it the state assessment or the Advanced Placement exam In the U.S., incoming freshmen usually take one or more placement tests on various subjects to determine which class should be taken in the fall. Placement exams are also administered to fifth graders entering middle school. . It's so pervasive, in fact, that assessment researchers have coined the acronym acronym: see abbreviation. A word typically made up of the first letters of two or more words; for example, BASIC stands for "Beginners All purpose Symbolic Instruction Code. "WYTIWYG WYTIWYG What You Think Is What You Get WYTIWYG What You Type Is What You Get (word processing) :" What you test is what you get. Standards-based assessment A standards based test is one based on the outcome-based education or performance-based education philosophy. [1] Assessment is a key part of the standards reform movement. The first part is to set new, higher standards to be expected of every student. , in part, is based on the notion of WYTIWYG. The idea is not so much that tests, strictly speaking Adv. 1. strictly speaking - in actual fact; "properly speaking, they are not husband and wife" properly speaking, to be precise , ought to drive teaching and learning, but rather that our assessments should reflect our standards for student performance. In fact, these expectations should guide both instruction and assessment. As Lauren and Dan Resnick, leading researchers at the University of Pittsburgh, have put it, the idea is to construct assessments that are "worth teaching for" because they embody em·bod·y tr.v. em·bod·ied, em·bod·y·ing, em·bod·ies 1. To give a bodily form to; incarnate. 2. To represent in bodily or material form: the standards we hold for student performance. We purposely pur·pose·ly adv. With specific purpose. purposely Adverb on purpose USAGE: See at purposeful. Adv. 1. use assessments to communicate both a vision and the reality of what's expected of students, to illustrate models for teaching and learning practice, to provide useful feedback to support improvement and to motivate performance. Assessments only can serve these purposes if they are well aligned with standards for student performance. Herein lies an important rationale for using performance assessments. If we accept WYTIWYG and its opposite--that what is not assessed is ignored--then our assessments, like our standards, must reflect the complex thinking and problem solving problem solving Process involved in finding a solution to a problem. Many animals routinely solve problems of locomotion, food finding, and shelter through trial and error. that students will need for future success. Life is not multiple choice. As children and as adults we must be able to apply what we know to create solutions, approach and solve novel problems and communicate effectively, to name just a few areas that call out for other than multiple-choice assessments. The essence of performance assessments--whether in the form of openended questions, essays, experiments or portfolios--is that they ask students to create something of meaning. A good performance assessment taps complex thinking and/or problem-solving, addresses important disciplinary content, invokes authentic or real-world applications and uses tasks that are instructionally meaningful (see related story, page 20). Furthermore, because they require students to construct a unique answer, performance assessments typically are scored by humans, exercising judgment, rather than by machines. Evidence of Quality While the rationale for performance assessment rests largely on its consequences for teaching and learning, this does not mean that the technical quality of the measures ceases to be essential. We value assessments in large part because they give us accurate information for planning, decision making, grading and other purposes and help ensure the fairness of our judgments and actions. Lacking sufficient technical quality, however, assessment results will provide misinformation mis·in·form tr.v. mis·in·formed, mis·in·form·ing, mis·in·forms To provide with incorrect information. mis about individual students, classrooms, schools and districts. Sound performance measures require evidence of validity and reliability, including the degree to which intended uses of the assessment are justified and the degree to which scores are free of measurement error. Are performance measures reliable? Just as we expect our height to be the same regardless of whose yardstick is used, so too do we want our measures of student performance to be reliable and consistent. Because performance assessments typically require that students' responses be evaluated by scorers (or raters), they pose special reliability challenges. Raters judging student performance must be in basic agreement as to what scores should be assigned to students' work, within some tolerable tol·er·a·ble adj. 1. Capable of being tolerated; endurable. 2. Fairly good; passable. See Synonyms at average. tol limits. Otherwise, the scores are a measure of who does the scoring rather than the quality of the work. This is not only a technical psychometric psy·cho·met·rics n. (used with a sing. verb) The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and issue but has important implications for school reform as well. When scorers (typically teachers) do not agree on what score should be given, this indicates that no consensus exists on the meaning of good performance or on the meaning of the underlying standard(s) and no agreement exists on expectations for students--all basic tenets in standards-based reform. Research suggests that in the early years of a new assessment, achieving reliable scoring can be a challenge because it occurs while the processes of achieving consensus on standards and expectations and of encouraging ownership and support for the new assessment are also incomplete. The technical process of reaching high levels of agreement, however, is fairly well understood. It requires good training and scoring procedures, including well-documented scoring rubrics exemplified by benchmark or anchor papers In standards based assessment, authentic assessment and Holistic grading, a test response is assigned a numeric score against typically both a scoring rubric, or set of rules, and an example paper or two for each level. These examples are called anchor papers. ; ample opportunities for scorers to discuss and practice applying the rubric RUBRIC, civil law. The title or inscription of any law or statute, because the copyists formerly drew and painted the title of laws and statutes rubro colore, in red letters. Ayl. Pand. B. 1, t. 8; Diet. do Juris. h.t. to student responses; and systematic checks before and during the scoring itself to ensure that evaluators are consistent, with retaining as necessary. The Meaning of Scores When we assess students, we typically want to use the results to generalize generalize /gen·er·al·ize/ (-iz) 1. to spread throughout the body, as when local disease becomes systemic. 2. to form a general principle; to reason inductively. about their learning or capability--whether content of a unit has been mastered, whether progress has been made in the discipline, whether written communication skills have improved and so forth. Yet we know from research that a student's performance will vary depending on which specific tasks are included on the assessment. Research by Richard Shavelson and others has shown that a student's success on one mathematics problem-solving task does not mean the student will do well on a second such task, and the students who perform well on one task may not be the same students who perform best on the second. This variability means that it takes a number of tasks to get an accurate estimate of student achievement in a particular domain, such as scientific problem solving or algebraic 1. (language) ALGEBRAIC - An early system on MIT's Whirlwind. [CACM 2(5):16 (May 1959)]. 2. (theory) algebraic - In domain theory, a complete partial order is algebraic if every element is the least upper bound of some chain of compact elements. thinking. Research by Shavelson, Eva Baker, Gail Baxter, Robert Linn For the composer, see . Robert P. Linn (b. December 27, 1908, d. August 22, 2004) was the longest-serving mayor in the United States. Linn, a Republican, served 58 years as the mayor of Beaver, Pennsylvania, a small town (borough) around 25 miles northwest of Pittsburgh. and others suggests quite a range may be necessary to obtain reliable individual estimates--from 5 to 23 tasks depending on the specific study. Because of the time demands of performance assessments and the costs of scoring, it typically is not feasible to have students respond to enough different tasks to ensure adequate evaluation. To produce accurate individual scores, most assessment systems thus rely on a combination of measurements, some of which are multiple choice. Others may be constructed responses administered in relatively short periods of time, and still others may require more extended administration time. Where individual results are not required--for example, in assessments designed to evaluate the effectiveness of the school curriculum or where school or grade-level progress is the issue--matrix sampling can be used to achieve accurate school-level results. In this approach, different groups of students respond to different tasks. This assessment process better covers the full curriculum while still minimizing the testing time required of individual students. Contributing to Reform As we have seen, the time demands of authentic tasks and the special and consistent scoring they require pose technical and practical challenges. Yet these same characteristics are essential to the instructional value of performance assessments and their leverage in standards-based reform. Not only do the assessments communicate what's important to learn, but they also model the kinds of instructional tasks and processes that teachers should use in their classrooms. The scoring rubrics communicate what is expected in student performance and research has shown that teachers adapt and use these rubrics in their classroom assessments. Involving teachers in the development and scoring of assessments thus serves multiple purposes: Teachers can better understand and come to consensus about what is expected, gain familiarity with examples of standards-based practice and gain new standards-based ways to assess their students' work. The effects of performance assessments on curriculum and teaching and the value of scoring and scoring rubrics are well documented in the research. * Effects on curriculum and teaching. A growing body of research suggests that performance assessments do indeed influence classroom instruction. Studies by Daniel Koretz, Brian Stecher, Mary Lee
Mary Lee (née Walsh) (February 14, 1821 – September 18, 1909) was an Irish-Australian suffragist and social reformer in South Australia. Mary Walsh was born in Ireland. Smith, Lorraine McDonnell, Hilda Borko Hilda Borko is an educational psychologist who researches teacher cognition and changes in novice and experienced teachers' knowledge and beliefs. Her work has identified factors that affect teachers' learning of reform-based practices. and Shelby Wolf, which examine assessment systems in Vermont, Maryland, Arizona, North Carolina North Carolina, state in the SE United States. It is bordered by the Atlantic Ocean (E), South Carolina and Georgia (S), Tennessee (W), and Virginia (N). Facts and Figures Area, 52,586 sq mi (136,198 sq km). Pop. and Kentucky, concluded that educators modify their classroom practices based on state assessments and standards. In line with the state's standards-based reform goals, teachers in Kentucky, for example, reported increased coverage of topics measured by the state assessment system and increased use of standards-based teaching and learning practices. As a result, pedagogy changed to include such practices as applying mathematics concepts to real-world problems, requiring students to explain their work and introducing novel, non-routine problems. One Kentucky math teacher said she and her colleagues "devote a lot of time to talking about math and talking about solving problems and solving problems in different ways. ... I'd say our computation Computation is a general term for any type of information processing that can be represented mathematically. This includes phenomena ranging from simple calculations to human thinking. instruction is maybe 20 or 30 percent of what we do, whereas before it was ... about 80 percent of what we did. So that's flipped completely." * New ways of thinking about teaching. Several studies concur CONCUR - ["CONCUR, A Language for Continuous Concurrent Processes", R.M. Salter et al, Comp Langs 5(3):163-189 (1981)]. that involving teachers in scoring student work is a powerful professional development experience. Teachers report that the process opens new windows of understanding, which elucidate e·lu·ci·date v. e·lu·ci·dat·ed, e·lu·ci·dat·ing, e·lu·ci·dates v.tr. To make clear or plain, especially by explanation; clarify. v.intr. To give an explanation that serves to clarify. new ideas "New Ideas" is the debut single by Scottish New Wave/Indie Rock act The Dykeenies. It was first released as a Double A-side with "Will It Happen Tonight?" on July 17, 2006. The band also recorded a video for the track. for classroom activities, potential gaps in their classroom curriculum and, perhaps most importantly Adv. 1. most importantly - above and beyond all other consideration; "above all, you must be independent" above all, most especially , insights about their students' strengths and weaknesses. One teacher put it this way: "Because we're listening to what they have to say instead of just grading the numbers on a page ... I see a lot of kids that have some strength in math that just 20 computation problems would not pick up." * The power of rubrics. The rubrics used during scoring sessions have important effects. Teachers not only have a tangible tool to evaluate their students' work, but the tool itself provides a way of thinking about what they are trying to accomplish and a focus for their planning and teaching. One teacher involved in scoring students' literature-based writing assessments noted, "Rubrics clarified what I wanted so I could plan more ... structured lessons in which students clearly knew what was expected ... and what had to be done in order to succeed." Access to such rubrics seems particularly important. Researchers Geoffrey Saxe, Maryl Gearhart and Megan Franke have shown that while teachers may be eager to adopt open-ended problems in their curriculum, they typically lack effective ways of looking at the work to evaluate student understanding, diagnose diagnose /di·ag·nose/ (di´ag-nos) to identify or recognize a disease. di·ag·nose v. 1. To distinguish or identify a disease by diagnosis. 2. strengths and weaknesses and provide useful feedback to students. The Downside Downside The dollar amount by which the market or a stock has the potential to fall. Notes: You might hear someone say that the downside on stock XYZ is $10. What that means is that the stock could fall by this amount if things got bad. Such benefits do not come without costs. Most studies have found that performance assessments increase burdens and pressures on teachers and schools. * Short-term burdens. Prime among these problems is the time element--time for teachers to become familiar with the new assessments and their administration, to understand how tasks are developed and scored, to apply criteria for assessing students' work, to develop the content and pedagogical ped·a·gog·ic also ped·a·gog·i·cal adj. 1. Of, relating to, or characteristic of pedagogy. 2. Characterized by pedantic formality: a haughty, pedagogic manner. knowledge to change their practice and to reflect upon and fine-tune their instructional and assessment practices. These demands and the professional development required to implement performance assessments represent important costs, but as noted above, they are investments that can have important payoffs in classroom practice. Strong and continuous leadership support is essential for standards-based reform to succeed. * Costs. It is difficult to provide precise cost estimates. Actual costs will vary depending on the assessment--the materials and time required, the length of student responses, whether the responses are scored by one or two raters and the reports that are provided, among other considerations. Nonetheless, clearly the costs of performance assessment are dramatically higher than those of traditional multiple-choice tests. Scoring costs are a prime example. Researcher Brian Stecher estimates the cost of scoring hands-on science tasks comprising one class period at $4 to $5 per student. In comparison, the complete battery of the Iowa Test of Basic Skills The Iowa Test of Basic Skills (ITBS) are a set of standardized tests given annually to school students in the United States. These tests are given to students beginning in kindergarten and progressing until Grade 8 to assess educational development. , a nationally standardized standardized pertaining to data that have been submitted to standardization procedures. standardized morbidity rate see morbidity rate. standardized mortality rate see mortality rate. multiple-choice test, costs about $1 per student. Performance assessments, involving hands-on materials and exercises, carry substantially higher costs still. Improving Learning The ultimate question, however, is how and whether the use of performance-based assessment affects student learning. Researchers disagree on this point. On the one hand, where performance assessments have been introduced, student scores are initially very low but rise over time, suggesting that student learning has increased. On the other hand, the gains are not necessarily evident in other measures of achievement. For example, while Kentucky's statewide assessment showed dramatic improvement over the first two biennials, results of the National Assessment of Educational Progress The National Assessment of Educational Progress (NAEP), also known as "the Nation's Report Card," is the only nationally representative and continuing assessment of what America's students know and can do in various subject areas. over the same period showed only very slight improvement. Some divergence divergence In mathematics, a differential operator applied to a three-dimensional vector-valued function. The result is a function that describes a rate of change. The divergence of a vector v is given by would be expected because of differences in the two tests, but the extent of the discrepancy DISCREPANCY. A difference between one thing and another, between one writing and another; a variance. (q.v.) 2. Discrepancies are material and immaterial. in some areas, such as middle school math, raised questions about the credibility of the gains and charges of inflated test scores. In fact, Kentucky teachers, surveyed by Dan Koretz and colleagues, questioned whether the gains represented meaningful improvement in learning. Other contributing factors might include greater familiarity and practice with the new performance item formats. In any event, research indicates that improvement in performance assessment scores does not necessarily mean that results on other tests also will improve. Research shows that teachers and principals take new performance assessments and the goals they represent seriously and often try to incorporate new pedagogical practices into their teaching. Teachers attempt to engage their students in the kinds of activities they see embodied em·bod·y tr.v. em·bod·ied, em·bod·y·ing, em·bod·ies 1. To give a bodily form to; incarnate. 2. To represent in bodily or material form: in the assessment. However, in the absence of sustained professional development, these classroom innovations will likely lead to superficial changes in practice that have little impact on student learning. Creating Capacity Authentic assessments Authentic assessment is an umbrella concept that refers to the measurement of "intellectual accomplishments that are worthwhile, significant, and meaningful,"[1] as compared to multiple choice standardized tests. may create the will to change but not the capacity to do so. Teachers cannot become experts in new types of standards-based teaching and assessment without serious opportunities to learn about, observe, apply, reflect upon and refine their use of these new strategies over a sustained period of time. Teachers also need time with other teachers to score and analyze student assessments to achieve mutual understanding of the meaning of the standards to be used. Projects that engage teachers in serious professional development around these issues document a variety of important effects on classroom teaching and learning. In the Los Angeles Unified School District The Los Angeles Unified School District (the "LAUSD") is the largest (in terms of number of students) public school system in California and the second-largest in the United States. Only the New York City Department of Education has a larger student population. , for example, teacher researcher Charlotte Higuchi and her colleagues involved teachers in a multiyear project developing standards-based curriculum and assessment units at various grade levels in language arts language arts pl.n. The subjects, including reading, spelling, and composition, aimed at developing reading and writing skills, usually taught in elementary and secondary school. . The voices of teachers in that project bespeak be·speak tr.v. be·spoke , be·spo·ken or be·spoke, be·speak·ing, be·speaks 1. To be or give a sign of; indicate. See Synonyms at indicate. 2. a. To engage, hire, or order in advance. the power of a standards-based assessment system that included performance assessments: * Teacher No. 1: "Before this project ... I was very much into the process of the teaching, but I totally overlooked that what I taught wasn't the point. It's where it took the child that's important. So that was the shift, from being focused only on the process of teaching to thinking about what it was that I wanted children to know and then designing assessments and curriculum to support the standards." * Teacher No. 2: "My views and my teaching methods have undergone great change. My planning and lessons are much tighter because my goals are clearer. ... Now my lessons are more focused, the expectations are clearer to my students [because of] rubrics, and student outcomes have improved." * Teacher No. 3: "It raised my expectations and I was pleased to see the students meet those expectations. A lot of teachers at my school say that our students can't do this kind of work. They did it and quite well too!" It is clear that performance assessment alone will not solve educational problems, but it is an important first step. Ultimately it is these kinds of transformations, and not assessments per se, that are needed to enable all students to achieve the high standards we hold for them. Joan Herman is the associate director of the National Center for Research an Evaluation, Standards and Student Testing at UCLA UCLA University of California at Los Angeles UCLA University Center for Learning Assistance (Illinois State University) UCLA University of Carrollton, TX and Lower Addison, TX Graduate School of Education and Information Sciences, Box 951522, Los Angeles Los Angeles (lôs ăn`jələs, lŏs, ăn`jəlēz'), city (1990 pop. 3,485,398), seat of Los Angeles co., S Calif.; inc. 1850. , Calif. 90095. E-mail: joan@cse.ucla.edu What Constitutes a Quality Assessment? The following criteria were developed by CRESST CRESST Cryogenic Rare Event Search using Superconducting Thermometers CRESST Center for Research on Evaluation Standards and and Student Testing for helping educators to evaluate a performance assessment. Further details about these criteria are available from the National Center for Research on Evaluation, Standards and Student Testing at UCLA, Box 951522, Los Angeles, Calif. 90095. Consequences. To what extent do the assessments model and encourage good teaching practice? Are intended positive consequences achieved? What are the unintended negative consequences? Alignment. Does the assessment reflect content and performance standards that have been established for students? Does the assessment measure important curriculum goals? Fairness. Does the assessment enable students, regardless of race, ethnicity ethnicity Vox populi Racial status–ie, African American, Asian, Caucasian, Hispanic , gender or economic status, to show what they know and can do? Have students had the opportunity to learn what's being assessed? Transfer and Generalizability. Will the results of an assessment provide accurate generalization gen·er·al·i·za·tion n. 1. The act or an instance of generalizing. 2. A principle, a statement, or an idea having general application. about student achievement? Content Quality. Is the assessment content consistent with the best current understanding of the subject matter? Does it reflect the enduring themes and/or priority principles, concepts and topics of the discipline? Cognitive Complexity. Does the assessment require students to use complex thinking and problem solving? Content Coverage. To what extent does the assessment cover the key elements of content standards and/or curriculum? Linguistic. Appropriateness. Does the assessment allow students to display what they know and what they are able to do without being swamped "Swamped" is the seventeenth episode of The Batman's second season. It originally aired in North America on June 11, 2005. Plot Synopsis Killer Croc, a half-man, half reptile plans to submerge all of Gotham in water in order to facilitate his plundering of the city. by language demands not required by the content? Meaningfulness. Do students find the assessment tasks realistic and worthwhile? Practicality and Cost. Is the information about students worth the cost and time to obtain it? For Further Reading Joan Herman suggests the following resources for those administrators who would like to read more about performance assessments: "New Directions in Student Assessment," edited by Pamela Aschbacher. Theory Into Practice, Autumn 1997 "Assessing Student Learning: New Rules, New Realities," edited by Ron Brandt, 1998, available from Educational Research Service, Arlington, Va., 703-243-2100 A Policy Maker's Guide to Standards-Led Assessment, by Robert Linn and Joan Herman, 1997, available from Education Commission of the States The Education Commission of the States (ECS) was founded as a result of the creation of the Compact for Education, supported by all 50 states and approved by Congress in 1965. The original idea of establishing an interstate compact on education and creating an operational arm to follow up , Denver, Colo., 303-299-3692 Understanding by Design, by Grant Wiggins and Jay McTighe, 1998, available from Association for Supervision and Curriculum Development The Association for Supervision and Curriculum Development, or ASCD, is a membership-based nonprofit organization founded in 1943. It has more than 175,000 members in 135 countries, including superintendents, supervisors, principals, teachers, professors of education, and , Alexandria, Va., 800-933-2723 or 703-578-9600 Other resources on performance assessment are available from the CRESST Web site, www.cse.ucla.edu |
|
||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion