Teacherdeveloped mathematics performance assessments in the context of reformbased professional development.One of the disappointments associated with the mathematical reform movement is the increasing mismatch mismatch 1. in blood transfusions and transplantation immunology, an incompatibility between potential donor and recipient. 2. one or more nucleotides in one of the double strands in a nucleic acid molecule without complementary nucleotides in the same position on the other between the improvements made in curriculum and instruction and prevalent assessment modes (Firestone fire·stone n. 1. A flint or pyrite used to strike a fire. 2. A fireresistant stone, such as certain sandstones. Noun 1. & Schorr Schorr (Hebrew: שור) is a surname and may refer to:
NCTM Nationally Certified Teacher of Music NCTM North Carolina Transportation Museum NCTM National Capital Trolley Museum NCTM Nationally Certified in Therapeutic Massage ), recent research into the teaching and learning of mathematics that has provided detiled consideration of its "socially situated nature," has not focused to the same degree on mathematics assessment (Morgan Morgan, American family of financiers and philanthropists. Junius Spencer Morgan, 1813–90, b. West Springfield, Mass., prospered at investment banking. , 1998). Therefore interest has increased in matching assessment methods to developments in curriculum. There is a pressing need to assess a much wider range of abilities than has been the case heretofore, including problem posing and solving, representing, and understanding. Traditional mathematical assessment has frequently relied upon the ability of students to display behavior Display behavior is the tendency of living things to express actions or formations, it is thought, for competitive advantage. Among animals Animals may use display behavior for different purposes including threat, courtship and direct competition for example. that matches their assessor's expectations rather than on any underlying understanding (Morgan, 1998). These traditional assessments communicate that mathematics is an endeavor that involves determining a quick answer using a preexisting pre·ex·ist or preex·ist v. pre·ex·ist·ed, pre·ex·ist·ing, pre·ex·ists v.tr. To exist before (something); precede: Dinosaurs preexisted humans. v.intr. , memorized method (Bell, 1995; Clarke Clarke , Arthur Charles Born 1917. British writer, scientist, and underwater explorer noted for his stories of space exploration. His works include 2001: A Space Odyssey (1968). , Clarke & Lovitt, 1990; Hancock & Kilpatrick Kilpatrick is an Irish and Scottish surname. The name refers to:
n. 1. (Zool.) A variety of the chamois found in the Pyrenees. , 1993; Wheeler, 1993). In contrast, assessment data that provide direct information about improving the learning experience increase legitimate mathematical learning that is thorough and connected (Black & William, 1998; NCTM, 1995). The measurement of decontextualized technical skills should be replaced with measures that reflect what is known about what it means to know and do mathematics, i.e., that capture the degree of acquisition of both conceptual and procedural knowledge Procedural knowledge is the knowledge exercised in the performance of some task. See below for the specific meaning of this term in cognitive psychology and intellectual property law. and the connections between them, and that assess the solving of worthwhile problems, the communication and justification of conjecture CONJECTURE. Conjectures are ideas or notions founded on probabilities without any demonstration of their truth. Mascardus has defined conjecture: "rationable vestigium latentis veritatis, unde nascitur opinio sapientis;" or a slight degree of credence arising from evidence too weak or too , and the representation of mathematical thinking in multiple ways (NCTM, 2000). As Ridgeway A ridgeway is a road or path that follows the highest part of the landscape. Roads and pathways
Incremental cost is additional or increased cost of an item or service apart from its actual cost. change in assessment systems will foster concurrent improvement in professional and curriculum development" (p. 2). The 1989 Standards states, "As the curriculum changes, so must the tests. Tests also must change because they are one way of communicating what is important for students to know.... In this way tests can effect change" (pp. 189, 190). Both the Assessment Standards (NCTM, 1995) and the Principles and Standards (NCTM, 2000) state that assessment tasks communicate what type of mathematical knowledge and performance are valued (p. 22). Therefore, standardsbased assessment complements standardsbased instruction (Dunbar & Witt, 1993). Paralleling reform in mathematics curriculum and instruction have been calls to authenticate (1) To verify (guarantee) the identity of a person or company. To ensure that the individual or organization is really who it says it is. See authentication and digital certificate. (2) To verify (guarantee) that data has not been altered. student assessment in all subject areas. Terms such as "Authentic Assessment Authentic assessment is an umbrella concept that refers to the measurement of "intellectual accomplishments that are worthwhile, significant, and meaningful,"^{[1]} as compared to multiple choice standardized tests. ," "Alternative Assessment," and "Performance Assessment" have become banners to rally focused efforts to change paradigms about the nature and purpose of assessment. According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. McMillan (2004), a performance assessment is "... one in which the teacher observes and makes a judgment about the student's demonstration of a skill or competency COMPETENCY, evidence. The legal fitness or ability of a witness to be heard on the trial of a cause. This term is also applied to written or other evidence which may be legally given on such trial, as, depositions, letters, accountbooks, and the like. 2. in creating a product, constructing a response, or making a presentation. They possess several important characteristics: 1. Students perform, create, construct, or produce something 2. Deep understanding and/or reasoning skills are assessed. 3. They involve sustained work. 4. They call on students to explain, justify, and defend. 5. Performance is directly observable. 6. They involve engaging ideas of importance and substance. 7. There is a reliance on trained assessor's judgments for scoring. 8. Multiple criteria and standards are prespecified and public. 9. There is no single correct answer. 10. The performance is grounded in realworld contexts and constraints" (p. 213). Although defined in many ways, performance assessment that is designed according to the above criteria provides many benefits that are closely tied to instruction, including the integration of instruction and assessment and the tying of assessment to realworld challenges and reasoning processes, thus helping instruction target more important outcomes, providing an alternative to traditional assessment, and authenticating the assessment process (Wiggins, 1993). In other words Adv. 1. in other words  otherwise stated; "in other words, we are broke" put differently , performance assessments conceptualized in this manner are therefore legitimately alternative and authentic in nature. Mathematics educators have joined in calling for the use of performance assessments that incorporate the aspects identified by McMillan in mathematics as both a means to align align (līn), v to move the teeth into their proper positions to conform to the line of occlusion. assessment with new reform curricula (Firestone & Schorr, 2004; Shepard, 2000) and as a means to improve the links between teaching practice and assessment (Pelegrino, Chubowksy, & Glaser, 2001). Although performance assessments provide benefits heretofore unexperienced via more traditional assessment procedures, they are not without limitations. They are usually expensive in terms of both the amount of time required and the materials needed to administer them. In addition, the results obtained from them can be subjective, an issue of interrater reliability Interrater reliability, Interrater agreement, or Concordance is the degree of agreement among raters. It gives a score of how much , or consensus, there is in the ratings given by judges. . Finally, those results often provide an inadequate basis for generalizing across tasks. Statement of Purpose The purpose of this study was to describe the principles that guided the teaching of elementary teachers in the development and administration of reformoriented, grade levelspecific performance assessments in the context of a professional development project, and to assess the internal consistency In statistics and research, internal consistency is a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether several items that propose to measure the same general construct produce similar scores. of the ratings of student performance obtained from the teachers in the course of providing that teaching. Assessment Development As an integral part of a twoyear professional development program, 85 elementary teachers representing virtually the entire faculties of three schools and an additional smattering of teachers from fifteen other schoolsall in central Utahwere involved in the development and implementation of a performance assessment system. We reasoned that this involvement would not only proide a natural context to enhance their understanding of appropriate assessment practice, but also serve to accelerate and enhance the acquisition of fundamenal notions of mathematics education reform. We designed the assessment creation task to call for adequate attention to the interplay in·ter·play n. Reciprocal action and reaction; interaction. intr.v. in·ter·played, in·ter·play·ing, in·ter·plays To act or react on each other; interact. among cognitive processes Cognitive processes Thought processes (i.e., reasoning, perception, judgment, memory). Mentioned in: Psychosocial Disorders , content categories, and task levels (Dunbar & Witt, 1993). Teachers worked in grade level teams to create two assessments: (a) a number sense assessment, and (b) an operation sense assessment. (Two examples of assessment instructions that were developed appear in Appendices ap·pen·di·ces n. A plural of appendix. A and B.) These two topics were chosen because of the need to assess important mathematics (NCTM, 2000; Dunbar & Witt, 1993; Morgan, 1998). Number and operations are the cornerstone cornerstone Ceremonial building block, dated or otherwise inscribed, usually placed in an outer wall of a building to commemorate its dedication. Often the stone is hollowed out to contain newspapers, photographs, or other documents reflecting current customs, with a view to of the entire mathematics curriculum internationally (NCTM, 2000; Reys & Nohda, 1994). A worthwhile mathematical task, i.e., an engaging word problem incorporating the intended mathematics (NCTM, 1991), was written for each assessment in such a way as to allow for the incorporation of varied levels of numerical numerical expressed in numbers, i.e. Arabic numerals of 0 to 9 inclusive. numerical nomenclature a numerical code is used to indicate the words, or other alphabetical signals, intended. complexity. The number sense assessments for each grade levels of numerical complexity. The number sense assessments for each grade level are similar in that they each call for demonstrating number comprehension comprehension Act of or capacity for grasping with the intellect. The term is most often used in connection with tests of reading skills and language abilities, though other abilities (e.g., mathematical reasoning) may also be examined. in a reallife context. For example, the fifthgrade assessment, based on U.S. geography, begins with a simple task: I went on a trip to see some of the wonderful tourist attractions Noun 1. tourist attraction  a characteristic that attracts tourists attractive feature, magnet, attractor, attracter, attraction  a characteristic that provides pleasure and attracts; "flowers are an attractor for bees" in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. , like the Grand Canyon Grand Canyon, great gorge of the Colorado River, one of the natural wonders of the world; c.1 mi (1.6 km) deep, from 4 to 18 mi (6.4–29 km) wide, and 217 mi (349 km) long, NW Ariz. , the Black Hills, and the Florida Everglads. I have flown ______ miles in my travels. What does that number mean? This general, openended question A closedended question is a form of question, which normally can be answered with a simple "yes/no" dichotomous question, a specific simple piece of information, or a selection from multiple choices (multiplechoice question), if one excludes such nonanswer responses as dodging a (NCTM, 2000), "What does that number mean?" is a part of the number sense assessments of all grades. Its openendedness is designed to elicit e·lic·it tr.v. e·lic·it·ed, e·lic·it·ing, e·lic·its 1. a. To bring or draw out (something latent); educe. b. To arrive at (a truth, for example) by logic. 2. an intitial umprompted response. Subsequent questions are then asked as needed as needed prn. See prn order. to further probe the nature of the students' number sense, such as, "Can you draw a picture of that number?" "Can you represent that number in expanded form?" "How many groups of ______ are in ______?" "What number is 100 (or 10, or 1,000) less than ______" etc. A sample set of responses for a threedigit task appears in Figure 1. The operation sense assessments for each grade level were based upon the operation most emphasized in the state curriculum for that grade: e.g., subtraction subtraction, fundamental operation of arithmetic; the inverse of addition. If a and b are real numbers (see number), then the number a−b is that number (called the difference) which when added to b (the subtractor) equals in third grade, multiplication multiplication, fundamental operation in arithmetic and algebra. Multiplication by a whole number can be interpreted as successive addition. For example, a number N multiplied by 3 is N + N + N. in fourth grade, etc. The fifthgrade operation sense assessment, for example, begins with the following worthwhile mathematical task: [FIGURE 1 OMITTED] I have ______ pieces of candy candy: see confectionery. candy Sweet sugar or chocolatebased confection. The Egyptians made candy from honey (combined with figs, dates, nuts, and spices), sugar being unknown. that I am going to put into bags of ______. How many bags will I have? A sample set of responses for a twodigit division task appears in Figure 2. [FIGURE 2 OMITTED] Note that numbers of varying numerical complexity can be inserted in the blanks depending upon the child's estimated level. These levels were based upon a "Hierarchy of Numerical Complexity" relative to number and the four operations that was also developed, which appears in Table 1. Multiple levels allows for obtaining data that is developmental in nature (Pegg, 2003; Wilson, 1999). In order to estimate the level at which the assessments would be administered, quick inventories were designed. The number sense inventory simply calls for the reading of numerals of varying sizes and appears in Figure 3. The operation sense inventory calls for the solving of simple exercises of varying complexity in the operation associated with the grade. Since the fifthgrade curriculum focuses on divsion, the fifth grade inventory consists of various division exercises and appears in Table 2. Note that both inventories are quite procedural in nature. It seemed logical to assume that procedural performance would be a good way to obtain a quick, rough estimate of the level at which the assessments would be administered as long as a child's instructional experience includes the development of solid connections between the learning of concepts and procedures (NCTM, 2000). If the child's performance in the initial stages of the assessment would warrant an adjustment in level, the teacher would make that adjustment by readministering the task with numbers of greater or lesser complexity. This assumption was born out as the teachers implemented the assessments in their classrooms. Those whose instructional programs promoted conceptualprocedural connections found the inventories to produce accurate level estimates. If the teachers tended to promote less conceptual thinking Conceptual thinking is problem solving or thinking based on the cognitive process of conceptualization is a process of independent analysis in the creative search for new ideas or solutions, which takes as its starting point that none of the accepted constraints of , the inventory results tended to provide level estimates that had to be adjusted. This latter situation served to inspire teacher change as the students' responses consistently revealed that procedural knowledge does not necessarily imply an underlying conceputal understanding. For example, one child's inventory revealed a procedural knowledge of multiplying mul·ti·ply^{ 1} v. mul·ti·plied, mul·ti·ply·ing, mul·ti·plies v.tr. 1. To increase the amount, number, or degree of. 2. Mathematics To perform multiplication on. twodigit numerals by onedigit numerals, such as 12 x 3. When she was presented with a worthwhile task that incorporated 12 x 3, she had absolutely no idea as to how to solve it, let alone solve or represent it in multiple ways. In fact, this particular student could not comprehend a multiplication situation as simple as 2 x 3, without at least some minimal assessor support. Seven criteria were selected as standards by which student performance would be judged: five analytical analytical, analytic pertaining to or emanating from analysis. analytical control control of confounding by analysis of the results of a trial or test. criteria based upon the NCTM "Process Standards" (2000) as suggested by Dunbar & Witt (1993) and two holistic Holistic A practice of medicine that focuses on the whole patient, and addresses the social, emotional, and spiritual needs of a patient as well as their physical treatment. Mentioned in: Aromatherapy, Stress Reduction, Traditional Chinese Medicine criteria as suggested by the "Learning Principle," also part of the Principles and Standards document. (We recognize that this suggestion by Dunbar & Witt was written seven years prior to the Principles and Standards document. However, the same five fundamental processes were key components of the predecessor to the Principles and Standards, namely the Curriculum and Evaluationh Standards (NCTM, 1989). These criteria serve, as stated by Morgan (1998) "... to provide a language that teachers and students can use both to help students to display the behaviors that will lead to success in the assssment process and critically to interrogate (1) To search, sum or count records in a file. See query. (2) To test the condition or status of a terminal or computer system. the assessment practices themselves" (paragraph 3). The five analytical criteria were: 1. Problem solving problem solving Process involved in finding a solution to a problem. Many animals routinely solve problems of locomotion, food finding, and shelter through trial and error.  accurately solving a worthwhile task using multiple strategies, 2. Communicating  explaining problem solving strategies clearly, 3. Reasoning  justifying those strategies in a mathematically sound manner, 4. Representing  showing or modeling mathematical ideas in multiple ways, and 5. Connecting  explaining the connections between strategies and/or representations. The two holistic criteria were: 1. Conceputal  demonstrating an overall understanding of the mathematics involved with solving the task, and 2. Procedural  demonstrating a knowledge of the rules or algorithms The following is a list of the algorithms described in Wikipedia. See also the list of data structures, list of algorithm general topics and list of terms relating to algorithms and data structures. involved with solving the task. The criteria were used to design a fourpoint rubric RUBRIC, civil law. The title or inscription of any law or statute, because the copyists formerly drew and painted the title of laws and statutes rubro colore, in red letters. Ayl. Pand. B. 1, t. 8; Diet. do Juris. h.t. with its scoring hierarchy based upon the degree of assessor prompting required in order for a student to experience success in the assessment. The incorporation of prompting as a factor in distinguishing rubric levels results in a blurring of the line between instruction and assessment in harmony with current assessment philosophy (McMillan, 2004). The rubric appears in Table 3. Both assessments also included an instruction guide and suggested questions or prompts to insure Insure can mean:
tr.v. an·a·lyzed, an·a·lyz·ing, an·a·lyz·es 1. To examine methodically by separating into parts and studying their interrelations. 2. Chemistry To make a chemical analysis of. 3. to insure that the key componets of number sense were addressed (NCTM, 2000; Sowder, 1992). In like manner, the operation sense questions were analyzed to insure that the key components of operation sense were addressed (NCTM, 2000). In this way we became confident that important mathematical knowledge was assessed (Dunbar & Witt, 1993; Morgan, 1998; NCTM, 2000) and were provided with evidence that the interpretations associated with the assessment possessed construct validity construct validity, n the degree to which an experimentallydetermined definition matches the theoretical definition. (Messick, 1989). A form for students to record their work and a teacher recording form were also developed and appear in Figures 4 and 5. Besides its use as a guide in estimating the level of numerical complexity at which the assessments should be administered, the "Hierarchy of Numerical Complexity" was then intended to be used after assessment administration to record the actual level of numerical functioning relative to both tasks. An additional determination of level regarding place value comprehension was designed for the number sense assessment, in cases where the numerals being examined were at least two digits in size, based upon the Ross Ross , Sir Ronald 18571932. British physician. He won a 1902 Nobel Prize for proving that malaria is transmitted to humans by the bite of the mosquito. FiveStage Place Value Understanding Model (Ross 1990, 1999). Those levels are: 1. Interprets a numeral numeral, symbol denoting anumber. The symbol is a member of a family of marks, such as letters, figures, or words, which alone or in a group represent the members of a numeration system. as the whole number it represents, but assigns Individuals to whom property is, will, or may be transferred by conveyance, will, Descent and Distribution, or statute; assignees. The term assigns is often found in deeds; for example, "heirs, administrators, and assigns to denote the assignable nature of no meaning to individual digits 2. Recognizes place value names ("ones," "tens") but attaches no meaning to the digit A single character in a numbering system. In decimal, digits are 0 through 9. In binary, digits are 0 and 1. digit  An employee of Digital Equipment Corporation. See also VAX, VMS, PDP10, TOPS10, DEChead, double DECkers, field circus. in those places 3. Interprets digits by their "face value," e.g. that the "2" in 25 means 2 of something but not necessarily 2 tens 4. Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, but the understanding is limited and performance is unreliable 5. Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, and the understanding is complete and performance is reliable Additional problemsolving labels for the strategies used in the operation sense assessment were based upon the Cognitively Guided Instruction Overview Cognitively Guided Instruction is an instructional method most often found in elementary math programs. Centered around the belief that all children come to school with informal or intuitive math knowledge, CGI involves learning with manipulatives or through the single digit and multidigit invented algorithms (Carpenter, Fennema, Franke, Levi, & Empson, 1999). When the assessment is administered, single digit strategies would be labeled as either Direct Modeling, Counting, Memorized Fact, or Derived Fact. Multidigit would be labeled as either Direct Modeling with Tens and Ones, Incrementing, Combining, or Compensating. [FIGURE 4 OMITTED] Three limitations associated with performance assessments that were discussed previously were addressed in our development efforts. First, the issue of expense with regard to materials was managed by the creation of tasks that allowed for the use of mathematics manipulatives commonly used by the teachers. The issue of expense with regard to time was managed in two ways. Upper grade teachers developed a response sheet that actually listed the task as well as prompts that paralleled many of the questions that would normally be asked in a oneonone interview. In this way, all students could respond to the performance assessment simultneously with a record of their work being completed in written and pictorial form. The teacher would then review these records and invite students to respond to futher probing and prompting questions should the written record warrant it. A fifthgrade teacher reported being able to accomplish the assessments for his entire class of 20 in about two hours in this way. [FIGURE 5 OMITTED] Inasmuch as in·as·much as conj. 1. Because of the fact that; since. 2. To the extent that; insofar as. inasmuch as conj 1. since; because 2. younger children do not often possess sufficient independent writing ability to allow for simultaneous assessment administration, lower grade teachers determined to use paraprofessionals and preservice university students who were already a part of the school culture to free them up from some of their other instructional responsibilities so that they could deliver the assessments oneonone. A second limitation of performance assessments relates to the issue of subjectivity or lack of interrater reliability (Linn linn n. Scots 1. A waterfall. 2. A steep ravine. [Scottish Gaelic linne, pool, waterfall.] & Baker, 1996). This is an issue of consistency in the scoring process that is well understood, however, and can be easily controlled with appropriate training of raters (Dunbar & Witt, 1993). Videotaped assessment administrations conducted by the instructor were shown to the teachers in order to deepend their comprehension of assessment procedures. In addition, the teachers each scored these videotaped assessments and the teachers' scores were used to promote scoring consistency. The data obtained from this video training will be discussed in further detail in the next section. A third limitation relates to the inability of performance assessment results to provide a basis for generalizing across tasks. For example, if a fifthgrade student performed well on a division operation sense task that does not necessarily mean that he or she could perform well on all division tasks. This limitation was addressed by the consistent instruction to teachers that the performance assessments that were developed serve as only one component of a more comprehensive assessment system. That is to say, each teacher was educated in multiple forms of assessmentobservations, interviews, conferences, openedended questions, portfolios, student selfassessment, constructed response items, selectedresponse items, and shortanswer items (NCTM, 2000; Stenmark, 1991)so that data obtained from the performance assessments would only partially constitute a more complete collection of information about student mathematical achievement. Interrater Reliability Analysis Procedures The instructor in the program administered the performance assessments to six different children, one in each of the grades K5. The teachers then viewed and scored videotaped administraitons of those assessments. Following the presentation of each videotaped performance assessment, teachers recorded their scores, and then a discussion was conducted in an effort to deepen deep·en tr. & intr.v. deep·ened, deep·en·ing, deep·ens To make or become deep or deeper. deepen Verb to make or become deeper or more intense Verb 1. understanding of the scoring criteria. The scores were subsequently analyzed using Cronbach's Alpha Cronbach's (alpha) has an important use as a measure of the reliability of a psychometric instrument. It was first named as alpha by Cronbach (1951), as he had intended to continue with further instruments. . Cronbach's Alpha is a test for a model's or survey's internal consistency and is sometimes referred to as a "scale reliability coefficient coefficient /co·ef·fi·cient/ (ko?ahfish´int) 1. an expression of the change or effect produced by variation in certain factors, or of the ratio between two different quantities. 2. " (Moffatt, 2005, p. 1). Multiple ratings of the same performance are analogous analogous /anal·o·gous/ (ahnal´ahgus) resembling or similar in some respects, as in function or appearance, but not in origin or development. a·nal·o·gous adj. to a test or survey in which several items purport To convey, imply, or profess; to have an appearance or effect. The purport of an instrument generally refers to its facial appearance or import, as distinguished from the tenor of an instrument, which means an exact copy or duplicate. PURPORT, pleading. to measure the same factor or attribute. In that sense, determining the degree of consistency between ratings compares to computing computing  computer the degree of internal consistency (or relability) between items. One method for determining the degree of internal consistency is referred to as "splithalf" in which a correlation coefficeint is computed between scores obtained from half of the items that measure an attribute and scores obtained from the other half. Cronbach's Alpha is mathematically equivalent to the average of all possible splithalf estimates, although that is not exactly how it is computed (Trochim, 2005). The first analysis involved computing the alpha for all ratings for all assessments in each of the four cohorts of teachers, as well as an overall alpha for all cohorts combined. As shown in Table 4, an extremely high degree of consistency existed among raters. A second analysis was performed in which the degree of consistency among all raters for each of the two assssments in each grade level was computed, the results of which are displayed in Table 5. It is not surprising that somewhat lower alpha coefficients were obtained when compared to those in Table 4 because only ten percent of the ratings used for obtaining the coefficients in Table 4 were used to compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer. each of the coefficients in Table 5. With that statistical fact kept in mind, they do indicate a comfortable degree of reliability overall. We also intended that similar statistical analyses would be conducted in order to determine the degree of consistency among level determinationsboth numerical complexity levels as well as place value comprehension levels and problem solving labels. However, there was complete agreement among all raters with regard to these level and label determinations which rendered a more sophisticated statistical analysis of no value. It is clear that the overall statistical analyses reveal a very high rate of correlation among teachers' ratings, with one caveat. Cronach's alpha ignores differences associated with rater rat·er n. 1. One that rates, especially one that establishes a rating. 2. One having an indicated rank or rating. Often used in combination: a thirdrater; a firstrater. means due to generosity Generosity See also Aid, Organizational; Kindness. Abbé Constantin selfsacrificing priest; curé of Longueral. [Fr. Lit.: The Abbé Constantin, Walsh Modern, 105] Amelia takes interest in Paul. [Br. Lit. or leniency le·ni·en·cy n. pl. le·ni·en·cies 1. The condition or quality of being lenient. See Synonyms at mercy. 2. A lenient act. Noun 1. errors. We intend to conduct more thorough statistical analsyses using ManyFacets Rasch Modeling Rasch models are used for analysing data from assessments to measure things such as abilities, attitudes, and personality traits. For example, they may be used to estimate a student's reading ability from answers to questions on a reading assessment, or the extremity of a person's (Linacre, 1989, 2003) in order to further investigate the presence of such errors. Conclusions We learned that the incorporation of thorough, extended work in performance assessmnt provides viable support for professional development in reform pedagogy. Our efforts also appear to be founded upon sound psychometric psy·cho·met·rics n. (used with a sing. verb) The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and principles as demonstrated by the high degree of correlation among teacher ratings. If the ideals of the mathematics reform movement are to achieve widespread adherence adherence /ad·her·ence/ (adher´ens) the act or condition of sticking to something. immune adherence , then there must be a synchrony synchrony /syn·chro·ny/ (krahne) the occurrence of two events simultaneously or with a fixed time interval between them. atrioventricular (AV) synchrony of improvement efforts in the areas of curriculum, instruction, and assessment (Morgan, 1998). Improvements in each of those areas will have concurrent efforts for the other two. Educating teachers in the designing and implementing of performance assessments provides a natural context in which reformbased assessment philosophy and research can be fostered. References Bell, K. N. (1995). How assessment impacts attitudes toward mathematics held by prospective elementary teachers. (Doctoral dissertation dis·ser·ta·tion n. A lengthy, formal treatise, especially one written by a candidate for the doctoral degree at a university; a thesis. dissertation Noun 1. , Boston Univeristy, 1995). Dissertation Abstracts International, 56, 09A. Black, P. & William, D. (1998). Assessment and classroom learning, Assessment in Education, 5, 774. Carpenter, T. P., Fennema, E., Franke, M., Levi, L. & Empson, S. B. (1999). Children's mathematics: Cognitively guided instruction. Portsmouth, N.H.: Heinemann. Clarke, D. C., CLark, D. M. & Lovitt, C. J. (1990). Changes in mathematics teaching call for assessment alternatives. In T. J. Cooney (Ed.), Teaching and learning mathematics in the 1990s (pp. 118129). Reston, VA: National Council of Teachers of Mathematics The National Council of Teachers of Mathematics (NCTM) was founded in 1920. It has grown to be the world's largest organization concerned with mathematics education, having close to 100,000 members across the USA and Canada, and internationally. . Dunbar, S. B. & Witt, E. A. (1993). Design innovations in measuring mathematics achievement. In Mathematical National Reseach Council. Measuring What Counts: A Conceptual Guide for Mathematics Assessment. Washington, D.C.: National Academy Press. Firestone, W. A., & Schorr, R. Y. (2004). Introduction. In W. A. Firestone, R. Y. Schorr, & L. F. Monfils (Eds.), The ambiguity Ambiguity Delphic oracle ultimate authority in ancient Greece; often speaks in ambiguous terms. [Gk. Hist.: Leach, 305] Iseult’s vow pledge to husband has double meaning. [Arth. of teaching to the test (118). Mahwah, NJ: Lawrence Erlbaum. Galbraith, P. (1993). Paradigms, problems and assessment: Some ideological implications. In M. Niss (Ed.), In Investigations into assessment in mathematics education (pp. 7386). Boston: Kluwer Academic Publishers. Glaser, R., Raghavan, K., & Baxer, G. P. (1992). Cognitive theory Conitive theory may refer to:
Hancock, L. & Kilpatrick, J. (1993). Effects of mandated testing on instruction. Measuring what counts (149174). Washington, D.C.: National Academy Press. Izard, J. (1993). Challenges to the improvement of assessment practice. In M. Niss (Ed.), Investigations into assessment in mathematics education (pp. 185194). Boston: Kluwer Academic Publishers. Linacre, J. M. (1989). Manyfacet Rasch measurement. Chicago: MESA Press. Linacre, J. M. (2003). A user's guide to FACETS [computer program manual]. Chicago: MESA Press. Linn, R. L. & Baker, E. L. (1996). Can performancebased assessments by psychometrically sound? In J. B. Baron baron Title of nobility, ranking in modern times immediately below a viscount or a count (in countries without viscounts). The wife of a baron is a baroness. Originally, in the early Middle Ages, the term designated a tenant of whatever rank who held a tenure of barony & D. P. Wolf (Eds.), Performancebased student assessment: Challenges and Possibilities. Ninetyfifth yearbook of the National Society for the Study of Education (pp. 84103). Chicago: University of Chicago Press The University of Chicago Press is the largest university press in the United States. It is operated by the University of Chicago and publishes a wide variety of academic titles, including The Chicago Manual of Style, dozens of academic journals, including . McMillan, J. H. (2004). Classroom assessment: Principles and practice for effective instrution. Boston: Pearson. Messick, S. (1989). Validity. In R. Linn (Ed.) Educational Measurement (3rd ed., pp. 13103) New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of : American Council on Education Established in 1918, the American Council on Education (ACE) is a United States organization comprising over 1,800 accredited, degreegranting colleges and universities and higher educationrelated associations, organizations, and corporations. and Macmillan Publishing Company. Mewborn, D. & Huberty, P. (1999). Questioning your way to the Standards. Teaching Children Mathematics, 6(4), 226246. Moffatt, M. (2005). Cronbach's AlphaDictionary definition of Cronbach's Alpha. Retrieved September 20, 2005, from About Economics: http://economics.about.com/cseconomicsglossary/g/cronbackalpha.htm Morgan, C. (1998). Assessment of mathematical behaviour: A social perspective. In P. Gates (Ed.), Mathematics education and society. Proceedings of the First International Mathematics Education and Society Conference (MEAS MEAS Measure MEAS Marine, Earth and Atmospheric Sciences (Department, North Carolina State University) MEAS Mission Essential Avionics Spares MEAS McCormick School of Engineering and Applied Sciences 1) (pp. 277283). Nottingham: Nottingham Univeristy. National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards or school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (1991). Professional standards for teaching mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (1995). Assessment standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (2000). Principles and standards for school Mathematics Principles and Standards for School Mathematics was a document produced by the National Council of Teachers of Mathematics [1] in 2000 to set forth a national vision for precollege mathematics education in the US and Canada. . Reston, VA: Author. Niss, M., Ed. (1993). Investigations into assessment in mathematics education. Boston: Kluwer Academic Press. Pegg, J. (2003). Assessment in mathematics: A developmental approach. In J. Royer (Ed.) Mathematical Cognition cognition Act or process of knowing. Cognition includes every mental process that may be described as an experience of knowing (including perceiving, recognizing, conceiving, and reasoning), as distinguished from an experience of feeling or of willing. (pp. 227259). Greenwich, CT: Information Age Publishing. Pelegrino, J. W., Chubowsky, N., Glaser, R. (Eds.) (2001). Knowing what students know: The science and design of educational assessment. Washington DC: National Academy Press. Reys, R. E. & Nohda, N. (Eds.) (1994). Computational Having to do with calculations. Something that is "highly computational" requires a large number of calculations. alternatives for the twentyfirst century: Crosscultural perspectives from Japan and the United States. Reston, VA: National Council of Teachers of Mathematics. Ridgeway, J. (1998). From barrier to lever lever, simple machine consisting of a bar supported at some stationary point along its length and used to overcome resistance at a second point by application of force at a third point. The stationary point of a lever is known as its fulcrum. : Revising roles for assessment in mathematics education. NISE NISE National Institute for Science Education NISE Nanoscale Informal Science Education (Network) NISE NCCOSC InService Engineering NISE Naval InService Engineering NISE Network Installation Service Engineer Brief, 2(1), 19. Ross, S. R. H. (1990). Children's acquisition of placevalue numeration numeration, in mathematics, process of designating Numbers according to any particular system; the number designations are in turn called numerals. In any place value system of numeration, a base number must be specified, and groupings are then made by powers of the concepts: The roles of cognitive development and instruction. Focus on Learning Problems in Mathematics, 12(3), 118. Ross, S. R. H. (1999). Place value. Using digit correspondence tasks for problem solving and written assessment. Focus on Learning Problems in Mathematics, 21(3), 2836. Shepard, L, A. (2000). The role of classroom assessment in teaching and learning. CSE Technical Report. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. Sowder, J. T. (1992). Making sense of numbers in school mathematics. In G. Leinhardt, P. Putman, & R. A. Hattrup (Eds.), Analysis of arithmetic or mathematics teaching (pp. 151). Hillsdale, NJ: Lawrence Erlbaum Associates. Stenmark, J. K. (Ed.) 1991. Mathematics assessment: Myths, models, good questions, and practical suggestions. Reston, VA: National Council of Teachers of Mathematics. Trochim, W. M. (2005). Types of reliability. Retrieved September 20, 2005, from Types of Reliability: http://www.socialresearchmethods.net/kb/reltypes.htm Wilson, M., (1999) Measurement of development levels. In G. N. Massers & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 151163). New York: Pergamon. Wheeler, D. (1993). Epistemological e·pis·te·mol·o·gy n. The branch of philosophy that studies the nature of knowledge, its presuppositions and foundations, and its extent and validity. [Greek epist issues and challenges to assessment: What is mathematical knowledge? In M. Niss (Ed.), Investigations into assessment in mathematics education (pp. 8795). Boston: Kluwer Academic Publishers. Wiggins, G. P. (1993). Assessing student performance: Exploring the purpose and limits of testing. San Francisco San Francisco (săn frănsĭs`kō), city (1990 pop. 723,959), coextensive with San Francisco co., W Calif., on the tip of a peninsula between the Pacific Ocean and San Francisco Bay, which are connected by the strait known as the Golden : JosseyBass. Appendix A 5th Grade Level Number Sense Instructions: 1. Administer the Number Sense Inventory to the child in order to estimate the level at which you should present your worthwhile mathematical task. 2. Present the worthwhile mathematical task at an estimated level of number complexity. Encourage the student to solve the task in any way she/he chooses. Provide manipulatives and invite the student to record responses on the Student Recording Form as needed. I went on a trip to see some of the wonderful tourist attractions in the United States, like the Grand Canyon, the Black Hills, and the Florida Everglades. I have flown ______ miles in my travels. What does that number mean? 3. Ask additional questions as needed to prompt and probe student thinking. Clarifying Questions: 1. How many groups of ______ are in ______? (P.S.) 2. How many ones (tens, hundreds, etc.) are in ______? (P.S.) 3. What is one (ten, hundred, etc.) less than ______? (P.S.) 4. Group this number another way. (P.S.) 5. Is the number ______ higher or lower than this number? (P.S.) 6. What is the place and the value of the digit ______ in this number? (P.S.) 7. Is this number big or small? What about compared to ______. (P.S.) 8. How do you know that ... (referring to above questions)? (Communicating) 9. Why do you think that ... (referring to the above question)? (Reasoning) 10. Show this number in another way. (pictures, manipulatives, numerals, expanded form) (Reasoning) 11. How does this picture (or manipulatives) match with these numbers. (Connecting) 4. Adjust the complexity of the number involved in the task if necessary at any time during the assessmentup or down. 5. Score and record the level according to the Hierarchy of Numerical Complexity. 6. Score and record the number comprehension level according to the following hierarchy: 1: Interprets a 2digit numeral as the whole number it represents, but assigns no meaning to individual digits 2: Recognizes place value names ("ones," "tens") but attaches no meaning to the digit in those places 3: Interprets digits by their "face value," e.g. that the "2" in 25 means 2 of something but not necessarily 2 tens 4: Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, but the understanding is limited and performance is unreliable 5: Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, and the understanding is complete and performance is reliable 7. Score the overall performance according to the rubric. Appendix B Fifthgrade Whole Number Operation Sense Instructions: 1. Administer the Operation Sense Inventory to entire class to determine the level at which you should present your worthwhile mathematical task. 2. Present the worthwhile mathematical task at an estimated level of number complexity. Encourage the student to solve the task in any way she/he chooses and to use the Student Recording Form as needed. Worthwhile mathematical task: There are ______ pieces of candy. We need to put them into ______ bags. How many pieces of candy will be in each bag? (Reallife application) (Partitive par·ti·tive adj. 1. Dividing or serving to divide something into parts; marked by division. 2. Grammar Indicating a part as distinct from a whole, as some of the coffee in the sentence ) 3. Ask additional questions as needed to prompt and probe student thinking and communication a. What type of a problem is this? Which operation would you use to find the answer? (Operation Sense) (Problem Solving) b. Explain how you solved the problem. (Communication) c. Solve the problem in a different way. Explain or show me. (Relationship between operations) (Connecting) d. What would happen to the numbers in the question if you multiplied mul·ti·ply^{ 1} v. mul·ti·plied, mul·ti·ply·ing, mul·ti·plies v.tr. 1. To increase the amount, number, or degree of. 2. Mathematics To perform multiplication on. them? (If student multiplied, ask what would happen to the numbers if they divided them.) (Relative effects of operations) (Reasoning) e. I have ______ pieces of candy that I am going to put into bags of ______. How many bags will I have? (Measurement) (Multiple definitions of operations)(Reallife application) f. Show this problem as a fraction. (Representing) g. Solve this problem using pictures, manipulatives, etc. (Connecting) 4. Adjust the complexity of the number involved in the task if necessary at any time during the assessmentup or down. 5. Score and record the numerical level according to the Hierarchy of Numerical Complexity. 6. Score and record the problem solving level according to the following list: 1. Direct Modeling 2. Counting Strategy 3. Memorized Fact 4. Derived Fact 5. Direct modeling with ones and tens 6. Incrementing 7. Combining with ones and tens 8. Compensating 9. Standard Algorithm algorithm (ăl`gərĭth'əm) or algorism (–rĭz'əm) [for AlKhowarizmi], a clearly defined procedure for obtaining the solution to a general type of problem, often numerical. 7. Score their overall performance according to the rubric. Damon L. Bahr Brigham Young University Brigham Young University, at Provo, Utah; LatterDay Saints; coeducational; opened as an academy in 1875 and became a university in 1903. It is noted for its law and business schools. Richard R Sudweeks Brigham Young University Table 1. Hierarchy of Numerical Complexity Level Number Sense Addition Subtraction A Rote counting Joining sets Separating sets B Onetoone Single digit 1 digit  1 digit = 1 correspondence addends & sum digit 5  3 = 2 3 + 2 = 5 C Single digit < 5 Single digit 2 digits  1 digit = 1 addends & double digit (decomposing) digit sum 13  5 = 8 3 + 9 = 12 D Single digit > 5 Multiple single 2 digits  1 digit = 2 digit addends digits (no decomposing) 3 + 2 + 4 = 9 27  5 = 22 E 2 digit > 15 < 20 2 digits + 2 digits no composing 32 + 24 = 56 F 2 digit > 9 < 16 2 digits + 2 2 digits  1 digit = 2 digits with digits (decomposing) composing 27  9 = 18 32 + 29 = 61 G 2 digit > 20 3 2digit addends 2 digits  2 digits = 2 with composing digits (no decomposing) 32 + 25 + 46 = 103 36  24 = 12 H 3 digit 3 digits + 3 2 digits  2 digits = 1 or digits varying 2 digits (decomposing) composing 32  18 = 14 391 + 467 = 858 I 3 digit, zeroes 3 3digit addends 3 digits  2 or 3 in ones or tens with composing digits = 1,2,or 3 digits places 323 + 257 + 469 (decomposing involving 1 zero) 406  178 = 228 J 4 digit 3 4digit addends 3 or 4 digits  2 or 3 with composing digits (1 decomposing) 3235 + 2579 + 4696 1469  635 K 5 digit 4 digits + 4 4 or 5 digits  4 or 5 digits varying digits (2 alternating composing decomposes) 4628  1809 4625 + 1856 L 6 digit Variable digit number (2 consecutive decomposes) 631  253 M 6 digit, zeroes Variable digit number (3 consecutive decomposes) 54363  14581 N 7 digit Variable digit number (decomposing involving 2 or more zeroes) 4001  1376 O Whole #'s & tenths P Whole #'s hundredths Q Whole #'s & thousandths Level Multiplication Division A 1 digit x 1 digit = 1 digit 1 digit / 1 digit = 1 digit 2 x 3 = 6 8 / 2 = 4 B 1 digit x 1 digit = 2 digits 2 digit / 1 digit = 1 digit (composing) 2 x 6 = 12 12 / 2 = 6 C 10 x single digit 10 x 3 = 30 1 or 2 digits / 1 digit = 1 digit with remainder 7 / 3 = 2 r 1 D 10 multiple x single digit = 10 multiple / 1 digit = 10 2 digits 20 x 3 = 60 multiple (no decomposing) (60 / 2 = 30) E 10 multiple x single digit = 3 digit 10 multiple / 1 digit (no 3 digits (composing) decomposing) 120 / 4 = 30 30 x 4 = 120 F 2 digits x 1 digit (no 2 or 3 digits / 1 digit = 2 digits composing) 13 x 2 = 26 (no decomposing) 65 / 2 = 32 r1 G 2 digits x 1 digit 2 or 3 digit 10 multiple / 1 (composing) 14 x 3 = 42 digit = 2 digits (decomposing) 150 / 7 = 21 r3 H 10 multiple x 10 multiple = 3 2 or 3 digits / 1 digit = 2 digits digits (composing) (decomposing) 78 / 3 = 26 20 x 20 = 400 I 2 digits x 2 digits (no 3 digits / 1 digit = 3 digits composing) 13 x 12 432 / 2 = 216 J 2 digits x 2 digits (one 3 digits / 1 digit = 3 digits composing on first row) (zero in quotient) 432 / 2 = 216 23 x 14 K 2 digits x 2 digits (one 2 or 3 digits / 10 multiple composing on second row) 60 / 20 = 3 43 x 24 L 2 digits x 2 digits (two composes) 23 x 65 M 2 digits x 2 digits (larger digits) 67 x 98 N O P Q Ask the child to perform tasks A and B, then ask the child to read the remaining numerals until she/he is not successful. A. How far can you count? B. Count these blocks (up to 10) C. 4 D. 7 E. 18 F. 12 G. 68 H. 492 I. 709 J. 3,579 K. 24,683 L. 147,836 M. 305,284 N. 3,548,921 O. 73.9 P. 697.34 Q. 28.108 Figure 3. Number Sense Inventory Table 2. 5th Grade Operation Sense InventoryWhole Number Division 8 / 2 12 / 2 7 / 3 60 / 2 120 / 4 65 / 2 150 / 7 78 / 3 60 / 20 Table 3. Performance Assessment Rubric Rubric level Problem Solving Communicating Reasoning 4 Independent Can solve the Can clearly Can clearly Understanding problem in two explain the justify the ways problem solving problem solving independently strategies strategies 3 Understanding with Can solve the Can clearly Can justify all minimal help problem in two explain all but but one part of ways with one part of the the problem minimal help or problem solving solving one way strategies strategies independently 2 Understanding with Can solve the Can explain Can justify substantial help problem at portions of the portions of the least one way problem solving problem solving with help strategies strategies 1 Little Cannot solve Cannot explain Cannot justify understanding the problem the strategies the strategies even with help even with help even with help Rubric level Representing Connecting Procedural 4 Independent Can represent Can independently Can solve the Understanding the problem in connect problem using at least two representations a procedure ways or strategies independently independently 3 Understanding with Can represent Can connect Can solve the minimal help the problem in representations problem two ways with or strategies procedurally minimal help or with minimal help with minimal one way help independently 2 Understanding with Can represent Can connect Can solve the substantial help some of the representations problem problem with or strategies procedurally help only with with substantial help substantial help 1 Little Cannot represent Cannot connect Cannot solve understanding the problem even representations the problem with help or strategies procedurally even with help even with help Rubric level Conceptual 4 Independent Can show thorough understanding of the problem and Understanding of the associated mathematics independently 3 Understanding with Can show some understanding with minimal help minimal help 2 Understanding with Can show some understanding with substantial help substantial help 1 Little Cannot show understanding even with help understanding Table 4. Cohort and Combined Alpha Coefficients Cohort N Cronbach's Alpha A 21 .992 B 19 .998 C 28 .993 D 17 .947 Combined 85 .978 Table 5. Grade Level Alpha Coefficients Cronbach's Alpha Grade Level Number Sense Operation Sense Kindergarten .932 .865 First .711 .633 Second .909 .846 Third .806 .925 Fourth .928 .881 Fifth .915 .910 

Reader Opinion