# Teacher-developed mathematics performance assessments in the context of reform-based professional development.

One of the disappointments associated with the mathematical reform
movement is the increasing mismatch between the improvements made in
curriculum and instruction and prevalent assessment modes (Firestone & Schorr, 2004; Niss, 1993). Despite early calls for change in
assessment practice, such as the 1989 Curriculum and Evaluation
Standards (NCTM), recent research into the teaching and learning of
mathematics that has provided detiled consideration of its
"socially situated nature," has not focused to the same degree
on mathematics assessment (Morgan, 1998). Therefore interest has
increased in matching assessment methods to developments in curriculum.

There is a pressing need to assess a much wider range of abilities than has been the case heretofore, including problem posing and solving, representing, and understanding. Traditional mathematical assessment has frequently relied upon the ability of students to display behavior that matches their assessor's expectations rather than on any underlying understanding (Morgan, 1998). These traditional assessments communicate that mathematics is an endeavor that involves determining a quick answer using a preexisting, memorized method (Bell, 1995; Clarke, Clarke & Lovitt, 1990; Hancock & Kilpatrick, 1993) thus failing to represent the true complexity of mathematics (Galbraith, 1993; Izard, 1993; Wheeler, 1993). In contrast, assessment data that provide direct information about improving the learning experience increase legitimate mathematical learning that is thorough and connected (Black & William, 1998; NCTM, 1995). The measurement of de-contextualized technical skills should be replaced with measures that reflect what is known about what it means to know and do mathematics, i.e., that capture the degree of acquisition of both conceptual and procedural knowledge and the connections between them, and that assess the solving of worthwhile problems, the communication and justification of conjecture, and the representation of mathematical thinking in multiple ways (NCTM, 2000). As Ridgeway (1998) states, "As an issue of policy, the implementation of standards-based curricula should always be accompanied by the implementation of standards-based assessment. In fact, incremental change in assessment systems will foster concurrent improvement in professional and curriculum development" (p. 2). The 1989 Standards states, "As the curriculum changes, so must the tests. Tests also must change because they are one way of communicating what is important for students to know.... In this way tests can effect change" (pp. 189, 190). Both the Assessment Standards (NCTM, 1995) and the Principles and Standards (NCTM, 2000) state that assessment tasks communicate what type of mathematical knowledge and performance are valued (p. 22). Therefore, standards-based assessment complements standards-based instruction (Dunbar & Witt, 1993).

Paralleling reform in mathematics curriculum and instruction have been calls to authenticate student assessment in all subject areas. Terms such as "Authentic Assessment," "Alternative Assessment," and "Performance Assessment" have become banners to rally focused efforts to change paradigms about the nature and purpose of assessment. According to McMillan (2004), a performance assessment is "... one in which the teacher observes and makes a judgment about the student's demonstration of a skill or competency in creating a product, constructing a response, or making a presentation. They possess several important characteristics:

Although defined in many ways, performance assessment that is designed according to the above criteria provides many benefits that are closely tied to instruction, including the integration of instruction and assessment and the tying of assessment to real-world challenges and reasoning processes, thus helping instruction target more important outcomes, providing an alternative to traditional assessment, and authenticating the assessment process (Wiggins, 1993). In other words, performance assessments conceptualized in this manner are therefore legitimately alternative and authentic in nature. Mathematics educators have joined in calling for the use of performance assessments that incorporate the aspects identified by McMillan in mathematics as both a means to align assessment with new reform curricula (Firestone & Schorr, 2004; Shepard, 2000) and as a means to improve the links between teaching practice and assessment (Pelegrino, Chubowksy, & Glaser, 2001).

Although performance assessments provide benefits heretofore unexperienced via more traditional assessment procedures, they are not without limitations. They are usually expensive in terms of both the amount of time required and the materials needed to administer them. In addition, the results obtained from them can be subjective, an issue of inter-rater reliability. Finally, those results often provide an inadequate basis for generalizing across tasks.

Statement of Purpose

The purpose of this study was to describe the principles that guided the teaching of elementary teachers in the development and administration of reform-oriented, grade level-specific performance assessments in the context of a professional development project, and to assess the internal consistency of the ratings of student performance obtained from the teachers in the course of providing that teaching.

Assessment Development

As an integral part of a two-year professional development program, 85 elementary teachers representing virtually the entire faculties of three schools and an additional smattering of teachers from fifteen other schools--all in central Utah--were involved in the development and implementation of a performance assessment system. We reasoned that this involvement would not only proide a natural context to enhance their understanding of appropriate assessment practice, but also serve to accelerate and enhance the acquisition of fundamenal notions of mathematics education reform.

We designed the assessment creation task to call for adequate attention to the interplay among cognitive processes, content categories, and task levels (Dunbar & Witt, 1993). Teachers worked in grade level teams to create two assessments: (a) a number sense assessment, and (b) an operation sense assessment. (Two examples of assessment instructions that were developed appear in Appendices A and B.) These two topics were chosen because of the need to assess important mathematics (NCTM, 2000; Dunbar & Witt, 1993; Morgan, 1998). Number and operations are the cornerstone of the entire mathematics curriculum internationally (NCTM, 2000; Reys & Nohda, 1994). A worthwhile mathematical task, i.e., an engaging word problem incorporating the intended mathematics (NCTM, 1991), was written for each assessment in such a way as to allow for the incorporation of varied levels of numerical complexity. The number sense assessments for each grade levels of numerical complexity. The number sense assessments for each grade level are similar in that they each call for demonstrating number comprehension in a real-life context. For example, the fifth-grade assessment, based on U.S. geography, begins with a simple task:

I went on a trip to see some of the wonderful tourist attractions in the United States, like the Grand Canyon, the Black Hills, and the Florida Everglads. I have flown ______ miles in my travels. What does that number mean?

This general, open-ended question (NCTM, 2000), "What does that number mean?" is a part of the number sense assessments of all grades. Its open-endedness is designed to elicit an intitial umprompted response. Subsequent questions are then asked as needed to further probe the nature of the students' number sense, such as, "Can you draw a picture of that number?" "Can you represent that number in expanded form?" "How many groups of ______ are in ______?" "What number is 100 (or 10, or 1,000) less than ______" etc. A sample set of responses for a three-digit task appears in Figure 1.

The operation sense assessments for each grade level were based upon the operation most emphasized in the state curriculum for that grade: e.g., subtraction in third grade, multiplication in fourth grade, etc. The fifth-grade operation sense assessment, for example, begins with the following worthwhile mathematical task:

[FIGURE 1 OMITTED]

I have ______ pieces of candy that I am going to put into bags of ______. How many bags will I have?

A sample set of responses for a two-digit division task appears in Figure 2.

[FIGURE 2 OMITTED]

Note that numbers of varying numerical complexity can be inserted in the blanks depending upon the child's estimated level. These levels were based upon a "Hierarchy of Numerical Complexity" relative to number and the four operations that was also developed, which appears in Table 1.

Multiple levels allows for obtaining data that is developmental in nature (Pegg, 2003; Wilson, 1999). In order to estimate the level at which the assessments would be administered, quick inventories were designed. The number sense inventory simply calls for the reading of numerals of varying sizes and appears in Figure 3. The operation sense inventory calls for the solving of simple exercises of varying complexity in the operation associated with the grade. Since the fifth-grade curriculum focuses on divsion, the fifth grade inventory consists of various division exercises and appears in Table 2.

Note that both inventories are quite procedural in nature. It seemed logical to assume that procedural performance would be a good way to obtain a quick, rough estimate of the level at which the assessments would be administered as long as a child's instructional experience includes the development of solid connections between the learning of concepts and procedures (NCTM, 2000). If the child's performance in the initial stages of the assessment would warrant an adjustment in level, the teacher would make that adjustment by re-administering the task with numbers of greater or lesser complexity. This assumption was born out as the teachers implemented the assessments in their classrooms. Those whose instructional programs promoted conceptual-procedural connections found the inventories to produce accurate level estimates. If the teachers tended to promote less conceptual thinking, the inventory results tended to provide level estimates that had to be adjusted. This latter situation served to inspire teacher change as the students' responses consistently revealed that procedural knowledge does not necessarily imply an underlying conceputal understanding. For example, one child's inventory revealed a procedural knowledge of multiplying two-digit numerals by one-digit numerals, such as 12 x 3. When she was presented with a worthwhile task that incorporated 12 x 3, she had absolutely no idea as to how to solve it, let alone solve or represent it in multiple ways. In fact, this particular student could not comprehend a multiplication situation as simple as 2 x 3, without at least some minimal assessor support.

Seven criteria were selected as standards by which student performance would be judged: five analytical criteria based upon the NCTM "Process Standards" (2000) as suggested by Dunbar & Witt (1993) and two holistic criteria as suggested by the "Learning Principle," also part of the Principles and Standards document. (We recognize that this suggestion by Dunbar & Witt was written seven years prior to the Principles and Standards document. However, the same five fundamental processes were key components of the predecessor to the Principles and Standards, namely the Curriculum and Evaluationh Standards (NCTM, 1989). These criteria serve, as stated by Morgan (1998) "... to provide a language that teachers and students can use both to help students to display the behaviors that will lead to success in the assssment process and critically to interrogate the assessment practices themselves" (paragraph 3). The five analytical criteria were:

1. Problem solving -- accurately solving a worthwhile task using multiple strategies,

2. Communicating -- explaining problem solving strategies clearly,

3. Reasoning -- justifying those strategies in a mathematically sound manner,

4. Representing -- showing or modeling mathematical ideas in multiple ways, and

5. Connecting -- explaining the connections between strategies and/or representations.

The two holistic criteria were:

1. Conceputal -- demonstrating an overall understanding of the mathematics involved with solving the task, and

2. Procedural -- demonstrating a knowledge of the rules or algorithms involved with solving the task.

The criteria were used to design a four-point rubric with its scoring hierarchy based upon the degree of assessor prompting required in order for a student to experience success in the assessment. The incorporation of prompting as a factor in distinguishing rubric levels results in a blurring of the line between instruction and assessment in harmony with current assessment philosophy (McMillan, 2004). The rubric appears in Table 3.

Both assessments also included an instruction guide and suggested questions or prompts to insure that opportunities were provided for students to express themselves verbally as well as in written form (Dunbar & Witt, 1993; Glaser, Raghaven, & Baxter, 1992), and to insure that students were invited to display behavior that addressed all analytical criteria, i.e., the Process Standards (Mewborn & Huberty, 1999). In addition, the questions associated with the number sense assessment were analyzed to insure that the key componets of number sense were addressed (NCTM, 2000; Sowder, 1992). In like manner, the operation sense questions were analyzed to insure that the key components of operation sense were addressed (NCTM, 2000). In this way we became confident that important mathematical knowledge was assessed (Dunbar & Witt, 1993; Morgan, 1998; NCTM, 2000) and were provided with evidence that the interpretations associated with the assessment possessed construct validity (Messick, 1989). A form for students to record their work and a teacher recording form were also developed and appear in Figures 4 and 5.

Besides its use as a guide in estimating the level of numerical complexity at which the assessments should be administered, the "Hierarchy of Numerical Complexity" was then intended to be used after assessment administration to record the actual level of numerical functioning relative to both tasks. An additional determination of level regarding place value comprehension was designed for the number sense assessment, in cases where the numerals being examined were at least two digits in size, based upon the Ross Five-Stage Place Value Understanding Model (Ross 1990, 1999). Those levels are:

1. Interprets a numeral as the whole number it represents, but assigns no meaning to individual digits

2. Recognizes place value names ("ones," "tens") but attaches no meaning to the digit in those places

3. Interprets digits by their "face value," e.g. that the "2" in 25 means 2 of something but not necessarily 2 tens

4. Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, but the understanding is limited and performance is unreliable

5. Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, and the understanding is complete and performance is reliable

Additional problem-solving labels for the strategies used in the operation sense assessment were based upon the Cognitively Guided Instruction single digit and multidigit invented algorithms (Carpenter, Fennema, Franke, Levi, & Empson, 1999). When the assessment is administered, single digit strategies would be labeled as either Direct Modeling, Counting, Memorized Fact, or Derived Fact. Multidigit would be labeled as either Direct Modeling with Tens and Ones, Incrementing, Combining, or Compensating.

[FIGURE 4 OMITTED]

Three limitations associated with performance assessments that were discussed previously were addressed in our development efforts. First, the issue of expense with regard to materials was managed by the creation of tasks that allowed for the use of mathematics manipulatives commonly used by the teachers. The issue of expense with regard to time was managed in two ways. Upper grade teachers developed a response sheet that actually listed the task as well as prompts that paralleled many of the questions that would normally be asked in a one-on-one interview. In this way, all students could respond to the performance assessment simultneously with a record of their work being completed in written and pictorial form. The teacher would then review these records and invite students to respond to futher probing and prompting questions should the written record warrant it. A fifth-grade teacher reported being able to accomplish the assessments for his entire class of 20 in about two hours in this way.

[FIGURE 5 OMITTED]

Inasmuch as younger children do not often possess sufficient independent writing ability to allow for simultaneous assessment administration, lower grade teachers determined to use paraprofessionals and preservice university students who were already a part of the school culture to free them up from some of their other instructional responsibilities so that they could deliver the assessments one-on-one.

A second limitation of performance assessments relates to the issue of subjectivity or lack of inter-rater reliability (Linn & Baker, 1996). This is an issue of consistency in the scoring process that is well understood, however, and can be easily controlled with appropriate training of raters (Dunbar & Witt, 1993). Videotaped assessment administrations conducted by the instructor were shown to the teachers in order to deepend their comprehension of assessment procedures. In addition, the teachers each scored these videotaped assessments and the teachers' scores were used to promote scoring consistency. The data obtained from this video training will be discussed in further detail in the next section.

A third limitation relates to the inability of performance assessment results to provide a basis for generalizing across tasks. For example, if a fifth-grade student performed well on a division operation sense task that does not necessarily mean that he or she could perform well on all division tasks. This limitation was addressed by the consistent instruction to teachers that the performance assessments that were developed serve as only one component of a more comprehensive assessment system. That is to say, each teacher was educated in multiple forms of assessment-observations, interviews, conferences, opened-ended questions, portfolios, student self-assessment, constructed response items, selected-response items, and short-answer items (NCTM, 2000; Stenmark, 1991)-so that data obtained from the performance assessments would only partially constitute a more complete collection of information about student mathematical achievement.

Inter-rater Reliability Analysis Procedures

The instructor in the program administered the performance assessments to six different children, one in each of the grades K-5. The teachers then viewed and scored videotaped administraitons of those assessments. Following the presentation of each videotaped performance assessment, teachers recorded their scores, and then a discussion was conducted in an effort to deepen understanding of the scoring criteria. The scores were subsequently analyzed using Cronbach's Alpha. Cronbach's Alpha is a test for a model's or survey's internal consistency and is sometimes referred to as a "scale reliability coefficient" (Moffatt, 2005, p. 1). Multiple ratings of the same performance are analogous to a test or survey in which several items purport to measure the same factor or attribute. In that sense, determining the degree of consistency between ratings compares to computing the degree of internal consistency (or relability) between items. One method for determining the degree of internal consistency is referred to as "split-half" in which a correlation coefficeint is computed between scores obtained from half of the items that measure an attribute and scores obtained from the other half. Cronbach's Alpha is mathematically equivalent to the average of all possible split-half estimates, although that is not exactly how it is computed (Trochim, 2005).

The first analysis involved computing the alpha for all ratings for all assessments in each of the four cohorts of teachers, as well as an overall alpha for all cohorts combined. As shown in Table 4, an extremely high degree of consistency existed among raters.

A second analysis was performed in which the degree of consistency among all raters for each of the two assssments in each grade level was computed, the results of which are displayed in Table 5. It is not surprising that somewhat lower alpha coefficients were obtained when compared to those in Table 4 because only ten percent of the ratings used for obtaining the coefficients in Table 4 were used to compute each of the coefficients in Table 5. With that statistical fact kept in mind, they do indicate a comfortable degree of reliability overall.

We also intended that similar statistical analyses would be conducted in order to determine the degree of consistency among level determinations-both numerical complexity levels as well as place value comprehension levels and problem solving labels. However, there was complete agreement among all raters with regard to these level and label determinations which rendered a more sophisticated statistical analysis of no value. It is clear that the overall statistical analyses reveal a very high rate of correlation among teachers' ratings, with one caveat. Cronach's alpha ignores differences associated with rater means due to generosity or leniency errors. We intend to conduct more thorough statistical analsyses using Many-Facets Rasch Modeling (Linacre, 1989, 2003) in order to further investigate the presence of such errors.

Conclusions

We learned that the incorporation of thorough, extended work in performance assessmnt provides viable support for professional development in reform pedagogy. Our efforts also appear to be founded upon sound psychometric principles as demonstrated by the high degree of correlation among teacher ratings. If the ideals of the mathematics reform movement are to achieve widespread adherence, then there must be a synchrony of improvement efforts in the areas of curriculum, instruction, and assessment (Morgan, 1998). Improvements in each of those areas will have concurrent efforts for the other two. Educating teachers in the designing and implementing of performance assessments provides a natural context in which reform-based assessment philosophy and research can be fostered.

References

Bell, K. N. (1995). How assessment impacts attitudes toward mathematics held by prospective elementary teachers. (Doctoral dissertation, Boston Univeristy, 1995). Dissertation Abstracts International, 56, 09-A.

Black, P. & William, D. (1998). Assessment and classroom learning, Assessment in Education, 5, 7-74.

Carpenter, T. P., Fennema, E., Franke, M., Levi, L. & Empson, S. B. (1999). Children's mathematics: Cognitively guided instruction. Portsmouth, N.H.: Heinemann.

Clarke, D. C., CLark, D. M. & Lovitt, C. J. (1990). Changes in mathematics teaching call for assessment alternatives. In T. J. Cooney (Ed.), Teaching and learning mathematics in the 1990s (pp. 118-129). Reston, VA: National Council of Teachers of Mathematics.

Dunbar, S. B. & Witt, E. A. (1993). Design innovations in measuring mathematics achievement. In Mathematical National Reseach Council. Measuring What Counts: A Conceptual Guide for Mathematics Assessment. Washington, D.C.: National Academy Press.

Firestone, W. A., & Schorr, R. Y. (2004). Introduction. In W. A. Firestone, R. Y. Schorr, & L. F. Monfils (Eds.), The ambiguity of teaching to the test (1-18). Mahwah, NJ: Lawrence Erlbaum.

Galbraith, P. (1993). Paradigms, problems and assessment: Some ideological implications. In M. Niss (Ed.), In Investigations into assessment in mathematics education (pp. 73-86). Boston: Kluwer Academic Publishers.

Glaser, R., Raghavan, K., & Baxer, G. P. (1992). Cognitive theory as the basis for design of innovative assessment: Design characteristics of science assessments (CSE Tech. Rep. No. 349). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Hancock, L. & Kilpatrick, J. (1993). Effects of mandated testing on instruction. Measuring what counts (149-174). Washington, D.C.: National Academy Press.

Izard, J. (1993). Challenges to the improvement of assessment practice. In M. Niss (Ed.), Investigations into assessment in mathematics education (pp. 185-194). Boston: Kluwer Academic Publishers.

Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.

Linacre, J. M. (2003). A user's guide to FACETS [computer program manual]. Chicago: MESA Press.

Linn, R. L. & Baker, E. L. (1996). Can performance-based assessments by psychometrically sound? In J. B. Baron & D. P. Wolf (Eds.), Performance-based student assessment: Challenges and Possibilities. Ninety-fifth yearbook of the National Society for the Study of Education (pp. 84-103). Chicago: University of Chicago Press.

McMillan, J. H. (2004). Classroom assessment: Principles and practice for effective instrution. Boston: Pearson.

Messick, S. (1989). Validity. In R. Linn (Ed.) Educational Measurement (3rd ed., pp. 13-103) New York: American Council on Education and Macmillan Publishing Company.

Mewborn, D. & Huberty, P. (1999). Questioning your way to the Standards. Teaching Children Mathematics, 6(4), 226-246.

Moffatt, M. (2005). Cronbach's Alpha-Dictionary definition of Cronbach's Alpha. Retrieved September 20, 2005, from About Economics: http://economics.about.com/cseconomicsglossary/g/cronbackalpha.htm

Morgan, C. (1998). Assessment of mathematical behaviour: A social perspective. In P. Gates (Ed.), Mathematics education and society. Proceedings of the First International Mathematics Education and Society Conference (MEAS 1) (pp. 277-283). Nottingham: Nottingham Univeristy.

National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards or school mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (1991). Professional standards for teaching mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (1995). Assessment standards for school mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (2000). Principles and standards for school Mathematics. Reston, VA: Author.

Niss, M., Ed. (1993). Investigations into assessment in mathematics education. Boston: Kluwer Academic Press.

Pegg, J. (2003). Assessment in mathematics: A developmental approach. In J. Royer (Ed.) Mathematical Cognition (pp. 227-259). Greenwich, CT: Information Age Publishing.

Pelegrino, J. W., Chubowsky, N., Glaser, R. (Eds.) (2001). Knowing what students know: The science and design of educational assessment. Washington DC: National Academy Press.

Reys, R. E. & Nohda, N. (Eds.) (1994). Computational alternatives for the twenty-first century: Cross-cultural perspectives from Japan and the United States. Reston, VA: National Council of Teachers of Mathematics.

Ridgeway, J. (1998). From barrier to lever: Revising roles for assessment in mathematics education. NISE Brief, 2(1), 1-9.

Ross, S. R. H. (1990). Children's acquisition of place-value numeration concepts: The roles of cognitive development and instruction. Focus on Learning Problems in Mathematics, 12(3), 1-18.

Ross, S. R. H. (1999). Place value. Using digit correspondence tasks for problem solving and written assessment. Focus on Learning Problems in Mathematics, 21(3), 28-36.

Shepard, L, A. (2000). The role of classroom assessment in teaching and learning. CSE Technical Report. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.

Sowder, J. T. (1992). Making sense of numbers in school mathematics. In G. Leinhardt, P. Putman, & R. A. Hattrup (Eds.), Analysis of arithmetic or mathematics teaching (pp. 1-51). Hillsdale, NJ: Lawrence Erlbaum Associates.

Stenmark, J. K. (Ed.) 1991. Mathematics assessment: Myths, models, good questions, and practical suggestions. Reston, VA: National Council of Teachers of Mathematics.

Trochim, W. M. (2005). Types of reliability. Retrieved September 20, 2005, from Types of Reliability: http://www.socialresearchmethods.net/kb/reltypes.htm

Wilson, M., (1999) Measurement of development levels. In G. N. Massers & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 151-163). New York: Pergamon.

Wheeler, D. (1993). Epistemological issues and challenges to assessment: What is mathematical knowledge? In M. Niss (Ed.), Investigations into assessment in mathematics education (pp. 87-95). Boston: Kluwer Academic Publishers.

Wiggins, G. P. (1993). Assessing student performance: Exploring the purpose and limits of testing. San Francisco: Jossey-Bass.

Appendix A

5th Grade Level Number Sense Instructions:

1. Administer the Number Sense Inventory to the child in order to estimate the level at which you should present your worthwhile mathematical task.

2. Present the worthwhile mathematical task at an estimated level of number complexity. Encourage the student to solve the task in any way she/he chooses. Provide manipulatives and invite the student to record responses on the Student Recording Form as needed.

I went on a trip to see some of the wonderful tourist attractions in the United States, like the Grand Canyon, the Black Hills, and the Florida Everglades. I have flown ______ miles in my travels. What does that number mean?

3. Ask additional questions as needed to prompt and probe student thinking.

Clarifying Questions:

1. How many groups of ______ are in ______? (P.S.)

2. How many ones (tens, hundreds, etc.) are in ______? (P.S.)

3. What is one (ten, hundred, etc.) less than ______? (P.S.)

4. Group this number another way. (P.S.)

5. Is the number ______ higher or lower than this number? (P.S.)

6. What is the place and the value of the digit ______ in this number? (P.S.)

7. Is this number big or small? What about compared to ______. (P.S.)

8. How do you know that ... (referring to above questions)? (Communicating)

9. Why do you think that ... (referring to the above question)? (Reasoning)

10. Show this number in another way. (pictures, manipulatives, numerals, expanded form) (Reasoning)

11. How does this picture (or manipulatives) match with these numbers. (Connecting)

4. Adjust the complexity of the number involved in the task if necessary at any time during the assessment--up or down.

5. Score and record the level according to the Hierarchy of Numerical Complexity.

6. Score and record the number comprehension level according to the following hierarchy:

1: Interprets a 2-digit numeral as the whole number it represents, but assigns no meaning to individual digits

2: Recognizes place value names ("ones," "tens") but attaches no meaning to the digit in those places

3: Interprets digits by their "face value," e.g. that the "2" in 25 means 2 of something but not necessarily 2 tens

4: Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, but the understanding is limited and performance is unreliable

5: Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, and the understanding is complete and performance is reliable

7. Score the overall performance according to the rubric.

Appendix B

Fifth-grade Whole Number Operation Sense Instructions:

1. Administer the Operation Sense Inventory to entire class to determine the level at which you should present your worthwhile mathematical task.

2. Present the worthwhile mathematical task at an estimated level of number complexity. Encourage the student to solve the task in any way she/he chooses and to use the Student Recording Form as needed.

Worthwhile mathematical task:

There are ______ pieces of candy. We need to put them into ______ bags. How many pieces of candy will be in each bag? (Real-life application) (Partitive)

3. Ask additional questions as needed to prompt and probe student thinking and communication

a. What type of a problem is this? Which operation would you use to find the answer? (Operation Sense) (Problem Solving)

b. Explain how you solved the problem. (Communication)

c. Solve the problem in a different way. Explain or show me. (Relationship between operations) (Connecting)

d. What would happen to the numbers in the question if you multiplied them? (If student multiplied, ask what would happen to the numbers if they divided them.) (Relative effects of operations) (Reasoning)

e. I have ______ pieces of candy that I am going to put into bags of ______. How many bags will I have? (Measurement) (Multiple definitions of operations)(Real-life application)

f. Show this problem as a fraction. (Representing)

g. Solve this problem using pictures, manipulatives, etc. (Connecting)

4. Adjust the complexity of the number involved in the task if necessary at any time during the assessment--up or down.

5. Score and record the numerical level according to the Hierarchy of Numerical Complexity.

6. Score and record the problem solving level according to the following list:

1. Direct Modeling

2. Counting Strategy

3. Memorized Fact

4. Derived Fact

5. Direct modeling with ones and tens

6. Incrementing

7. Combining with ones and tens

8. Compensating

9. Standard Algorithm

7. Score their overall performance according to the rubric.

Damon L. Bahr

Brigham Young University

Richard R Sudweeks

Brigham Young University

There is a pressing need to assess a much wider range of abilities than has been the case heretofore, including problem posing and solving, representing, and understanding. Traditional mathematical assessment has frequently relied upon the ability of students to display behavior that matches their assessor's expectations rather than on any underlying understanding (Morgan, 1998). These traditional assessments communicate that mathematics is an endeavor that involves determining a quick answer using a preexisting, memorized method (Bell, 1995; Clarke, Clarke & Lovitt, 1990; Hancock & Kilpatrick, 1993) thus failing to represent the true complexity of mathematics (Galbraith, 1993; Izard, 1993; Wheeler, 1993). In contrast, assessment data that provide direct information about improving the learning experience increase legitimate mathematical learning that is thorough and connected (Black & William, 1998; NCTM, 1995). The measurement of de-contextualized technical skills should be replaced with measures that reflect what is known about what it means to know and do mathematics, i.e., that capture the degree of acquisition of both conceptual and procedural knowledge and the connections between them, and that assess the solving of worthwhile problems, the communication and justification of conjecture, and the representation of mathematical thinking in multiple ways (NCTM, 2000). As Ridgeway (1998) states, "As an issue of policy, the implementation of standards-based curricula should always be accompanied by the implementation of standards-based assessment. In fact, incremental change in assessment systems will foster concurrent improvement in professional and curriculum development" (p. 2). The 1989 Standards states, "As the curriculum changes, so must the tests. Tests also must change because they are one way of communicating what is important for students to know.... In this way tests can effect change" (pp. 189, 190). Both the Assessment Standards (NCTM, 1995) and the Principles and Standards (NCTM, 2000) state that assessment tasks communicate what type of mathematical knowledge and performance are valued (p. 22). Therefore, standards-based assessment complements standards-based instruction (Dunbar & Witt, 1993).

Paralleling reform in mathematics curriculum and instruction have been calls to authenticate student assessment in all subject areas. Terms such as "Authentic Assessment," "Alternative Assessment," and "Performance Assessment" have become banners to rally focused efforts to change paradigms about the nature and purpose of assessment. According to McMillan (2004), a performance assessment is "... one in which the teacher observes and makes a judgment about the student's demonstration of a skill or competency in creating a product, constructing a response, or making a presentation. They possess several important characteristics:

1. Students perform, create, construct, or produce something 2. Deep understanding and/or reasoning skills are assessed. 3. They involve sustained work. 4. They call on students to explain, justify, and defend. 5. Performance is directly observable. 6. They involve engaging ideas of importance and substance. 7. There is a reliance on trained assessor's judgments for scoring. 8. Multiple criteria and standards are pre-specified and public. 9. There is no single correct answer. 10. The performance is grounded in real-world contexts and constraints" (p. 213).

Although defined in many ways, performance assessment that is designed according to the above criteria provides many benefits that are closely tied to instruction, including the integration of instruction and assessment and the tying of assessment to real-world challenges and reasoning processes, thus helping instruction target more important outcomes, providing an alternative to traditional assessment, and authenticating the assessment process (Wiggins, 1993). In other words, performance assessments conceptualized in this manner are therefore legitimately alternative and authentic in nature. Mathematics educators have joined in calling for the use of performance assessments that incorporate the aspects identified by McMillan in mathematics as both a means to align assessment with new reform curricula (Firestone & Schorr, 2004; Shepard, 2000) and as a means to improve the links between teaching practice and assessment (Pelegrino, Chubowksy, & Glaser, 2001).

Although performance assessments provide benefits heretofore unexperienced via more traditional assessment procedures, they are not without limitations. They are usually expensive in terms of both the amount of time required and the materials needed to administer them. In addition, the results obtained from them can be subjective, an issue of inter-rater reliability. Finally, those results often provide an inadequate basis for generalizing across tasks.

Statement of Purpose

The purpose of this study was to describe the principles that guided the teaching of elementary teachers in the development and administration of reform-oriented, grade level-specific performance assessments in the context of a professional development project, and to assess the internal consistency of the ratings of student performance obtained from the teachers in the course of providing that teaching.

Assessment Development

As an integral part of a two-year professional development program, 85 elementary teachers representing virtually the entire faculties of three schools and an additional smattering of teachers from fifteen other schools--all in central Utah--were involved in the development and implementation of a performance assessment system. We reasoned that this involvement would not only proide a natural context to enhance their understanding of appropriate assessment practice, but also serve to accelerate and enhance the acquisition of fundamenal notions of mathematics education reform.

We designed the assessment creation task to call for adequate attention to the interplay among cognitive processes, content categories, and task levels (Dunbar & Witt, 1993). Teachers worked in grade level teams to create two assessments: (a) a number sense assessment, and (b) an operation sense assessment. (Two examples of assessment instructions that were developed appear in Appendices A and B.) These two topics were chosen because of the need to assess important mathematics (NCTM, 2000; Dunbar & Witt, 1993; Morgan, 1998). Number and operations are the cornerstone of the entire mathematics curriculum internationally (NCTM, 2000; Reys & Nohda, 1994). A worthwhile mathematical task, i.e., an engaging word problem incorporating the intended mathematics (NCTM, 1991), was written for each assessment in such a way as to allow for the incorporation of varied levels of numerical complexity. The number sense assessments for each grade levels of numerical complexity. The number sense assessments for each grade level are similar in that they each call for demonstrating number comprehension in a real-life context. For example, the fifth-grade assessment, based on U.S. geography, begins with a simple task:

I went on a trip to see some of the wonderful tourist attractions in the United States, like the Grand Canyon, the Black Hills, and the Florida Everglads. I have flown ______ miles in my travels. What does that number mean?

This general, open-ended question (NCTM, 2000), "What does that number mean?" is a part of the number sense assessments of all grades. Its open-endedness is designed to elicit an intitial umprompted response. Subsequent questions are then asked as needed to further probe the nature of the students' number sense, such as, "Can you draw a picture of that number?" "Can you represent that number in expanded form?" "How many groups of ______ are in ______?" "What number is 100 (or 10, or 1,000) less than ______" etc. A sample set of responses for a three-digit task appears in Figure 1.

The operation sense assessments for each grade level were based upon the operation most emphasized in the state curriculum for that grade: e.g., subtraction in third grade, multiplication in fourth grade, etc. The fifth-grade operation sense assessment, for example, begins with the following worthwhile mathematical task:

[FIGURE 1 OMITTED]

I have ______ pieces of candy that I am going to put into bags of ______. How many bags will I have?

A sample set of responses for a two-digit division task appears in Figure 2.

[FIGURE 2 OMITTED]

Note that numbers of varying numerical complexity can be inserted in the blanks depending upon the child's estimated level. These levels were based upon a "Hierarchy of Numerical Complexity" relative to number and the four operations that was also developed, which appears in Table 1.

Multiple levels allows for obtaining data that is developmental in nature (Pegg, 2003; Wilson, 1999). In order to estimate the level at which the assessments would be administered, quick inventories were designed. The number sense inventory simply calls for the reading of numerals of varying sizes and appears in Figure 3. The operation sense inventory calls for the solving of simple exercises of varying complexity in the operation associated with the grade. Since the fifth-grade curriculum focuses on divsion, the fifth grade inventory consists of various division exercises and appears in Table 2.

Note that both inventories are quite procedural in nature. It seemed logical to assume that procedural performance would be a good way to obtain a quick, rough estimate of the level at which the assessments would be administered as long as a child's instructional experience includes the development of solid connections between the learning of concepts and procedures (NCTM, 2000). If the child's performance in the initial stages of the assessment would warrant an adjustment in level, the teacher would make that adjustment by re-administering the task with numbers of greater or lesser complexity. This assumption was born out as the teachers implemented the assessments in their classrooms. Those whose instructional programs promoted conceptual-procedural connections found the inventories to produce accurate level estimates. If the teachers tended to promote less conceptual thinking, the inventory results tended to provide level estimates that had to be adjusted. This latter situation served to inspire teacher change as the students' responses consistently revealed that procedural knowledge does not necessarily imply an underlying conceputal understanding. For example, one child's inventory revealed a procedural knowledge of multiplying two-digit numerals by one-digit numerals, such as 12 x 3. When she was presented with a worthwhile task that incorporated 12 x 3, she had absolutely no idea as to how to solve it, let alone solve or represent it in multiple ways. In fact, this particular student could not comprehend a multiplication situation as simple as 2 x 3, without at least some minimal assessor support.

Seven criteria were selected as standards by which student performance would be judged: five analytical criteria based upon the NCTM "Process Standards" (2000) as suggested by Dunbar & Witt (1993) and two holistic criteria as suggested by the "Learning Principle," also part of the Principles and Standards document. (We recognize that this suggestion by Dunbar & Witt was written seven years prior to the Principles and Standards document. However, the same five fundamental processes were key components of the predecessor to the Principles and Standards, namely the Curriculum and Evaluationh Standards (NCTM, 1989). These criteria serve, as stated by Morgan (1998) "... to provide a language that teachers and students can use both to help students to display the behaviors that will lead to success in the assssment process and critically to interrogate the assessment practices themselves" (paragraph 3). The five analytical criteria were:

1. Problem solving -- accurately solving a worthwhile task using multiple strategies,

2. Communicating -- explaining problem solving strategies clearly,

3. Reasoning -- justifying those strategies in a mathematically sound manner,

4. Representing -- showing or modeling mathematical ideas in multiple ways, and

5. Connecting -- explaining the connections between strategies and/or representations.

The two holistic criteria were:

1. Conceputal -- demonstrating an overall understanding of the mathematics involved with solving the task, and

2. Procedural -- demonstrating a knowledge of the rules or algorithms involved with solving the task.

The criteria were used to design a four-point rubric with its scoring hierarchy based upon the degree of assessor prompting required in order for a student to experience success in the assessment. The incorporation of prompting as a factor in distinguishing rubric levels results in a blurring of the line between instruction and assessment in harmony with current assessment philosophy (McMillan, 2004). The rubric appears in Table 3.

Both assessments also included an instruction guide and suggested questions or prompts to insure that opportunities were provided for students to express themselves verbally as well as in written form (Dunbar & Witt, 1993; Glaser, Raghaven, & Baxter, 1992), and to insure that students were invited to display behavior that addressed all analytical criteria, i.e., the Process Standards (Mewborn & Huberty, 1999). In addition, the questions associated with the number sense assessment were analyzed to insure that the key componets of number sense were addressed (NCTM, 2000; Sowder, 1992). In like manner, the operation sense questions were analyzed to insure that the key components of operation sense were addressed (NCTM, 2000). In this way we became confident that important mathematical knowledge was assessed (Dunbar & Witt, 1993; Morgan, 1998; NCTM, 2000) and were provided with evidence that the interpretations associated with the assessment possessed construct validity (Messick, 1989). A form for students to record their work and a teacher recording form were also developed and appear in Figures 4 and 5.

Besides its use as a guide in estimating the level of numerical complexity at which the assessments should be administered, the "Hierarchy of Numerical Complexity" was then intended to be used after assessment administration to record the actual level of numerical functioning relative to both tasks. An additional determination of level regarding place value comprehension was designed for the number sense assessment, in cases where the numerals being examined were at least two digits in size, based upon the Ross Five-Stage Place Value Understanding Model (Ross 1990, 1999). Those levels are:

1. Interprets a numeral as the whole number it represents, but assigns no meaning to individual digits

2. Recognizes place value names ("ones," "tens") but attaches no meaning to the digit in those places

3. Interprets digits by their "face value," e.g. that the "2" in 25 means 2 of something but not necessarily 2 tens

4. Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, but the understanding is limited and performance is unreliable

5. Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, and the understanding is complete and performance is reliable

Additional problem-solving labels for the strategies used in the operation sense assessment were based upon the Cognitively Guided Instruction single digit and multidigit invented algorithms (Carpenter, Fennema, Franke, Levi, & Empson, 1999). When the assessment is administered, single digit strategies would be labeled as either Direct Modeling, Counting, Memorized Fact, or Derived Fact. Multidigit would be labeled as either Direct Modeling with Tens and Ones, Incrementing, Combining, or Compensating.

[FIGURE 4 OMITTED]

Three limitations associated with performance assessments that were discussed previously were addressed in our development efforts. First, the issue of expense with regard to materials was managed by the creation of tasks that allowed for the use of mathematics manipulatives commonly used by the teachers. The issue of expense with regard to time was managed in two ways. Upper grade teachers developed a response sheet that actually listed the task as well as prompts that paralleled many of the questions that would normally be asked in a one-on-one interview. In this way, all students could respond to the performance assessment simultneously with a record of their work being completed in written and pictorial form. The teacher would then review these records and invite students to respond to futher probing and prompting questions should the written record warrant it. A fifth-grade teacher reported being able to accomplish the assessments for his entire class of 20 in about two hours in this way.

[FIGURE 5 OMITTED]

Inasmuch as younger children do not often possess sufficient independent writing ability to allow for simultaneous assessment administration, lower grade teachers determined to use paraprofessionals and preservice university students who were already a part of the school culture to free them up from some of their other instructional responsibilities so that they could deliver the assessments one-on-one.

A second limitation of performance assessments relates to the issue of subjectivity or lack of inter-rater reliability (Linn & Baker, 1996). This is an issue of consistency in the scoring process that is well understood, however, and can be easily controlled with appropriate training of raters (Dunbar & Witt, 1993). Videotaped assessment administrations conducted by the instructor were shown to the teachers in order to deepend their comprehension of assessment procedures. In addition, the teachers each scored these videotaped assessments and the teachers' scores were used to promote scoring consistency. The data obtained from this video training will be discussed in further detail in the next section.

A third limitation relates to the inability of performance assessment results to provide a basis for generalizing across tasks. For example, if a fifth-grade student performed well on a division operation sense task that does not necessarily mean that he or she could perform well on all division tasks. This limitation was addressed by the consistent instruction to teachers that the performance assessments that were developed serve as only one component of a more comprehensive assessment system. That is to say, each teacher was educated in multiple forms of assessment-observations, interviews, conferences, opened-ended questions, portfolios, student self-assessment, constructed response items, selected-response items, and short-answer items (NCTM, 2000; Stenmark, 1991)-so that data obtained from the performance assessments would only partially constitute a more complete collection of information about student mathematical achievement.

Inter-rater Reliability Analysis Procedures

The instructor in the program administered the performance assessments to six different children, one in each of the grades K-5. The teachers then viewed and scored videotaped administraitons of those assessments. Following the presentation of each videotaped performance assessment, teachers recorded their scores, and then a discussion was conducted in an effort to deepen understanding of the scoring criteria. The scores were subsequently analyzed using Cronbach's Alpha. Cronbach's Alpha is a test for a model's or survey's internal consistency and is sometimes referred to as a "scale reliability coefficient" (Moffatt, 2005, p. 1). Multiple ratings of the same performance are analogous to a test or survey in which several items purport to measure the same factor or attribute. In that sense, determining the degree of consistency between ratings compares to computing the degree of internal consistency (or relability) between items. One method for determining the degree of internal consistency is referred to as "split-half" in which a correlation coefficeint is computed between scores obtained from half of the items that measure an attribute and scores obtained from the other half. Cronbach's Alpha is mathematically equivalent to the average of all possible split-half estimates, although that is not exactly how it is computed (Trochim, 2005).

The first analysis involved computing the alpha for all ratings for all assessments in each of the four cohorts of teachers, as well as an overall alpha for all cohorts combined. As shown in Table 4, an extremely high degree of consistency existed among raters.

A second analysis was performed in which the degree of consistency among all raters for each of the two assssments in each grade level was computed, the results of which are displayed in Table 5. It is not surprising that somewhat lower alpha coefficients were obtained when compared to those in Table 4 because only ten percent of the ratings used for obtaining the coefficients in Table 4 were used to compute each of the coefficients in Table 5. With that statistical fact kept in mind, they do indicate a comfortable degree of reliability overall.

We also intended that similar statistical analyses would be conducted in order to determine the degree of consistency among level determinations-both numerical complexity levels as well as place value comprehension levels and problem solving labels. However, there was complete agreement among all raters with regard to these level and label determinations which rendered a more sophisticated statistical analysis of no value. It is clear that the overall statistical analyses reveal a very high rate of correlation among teachers' ratings, with one caveat. Cronach's alpha ignores differences associated with rater means due to generosity or leniency errors. We intend to conduct more thorough statistical analsyses using Many-Facets Rasch Modeling (Linacre, 1989, 2003) in order to further investigate the presence of such errors.

Conclusions

We learned that the incorporation of thorough, extended work in performance assessmnt provides viable support for professional development in reform pedagogy. Our efforts also appear to be founded upon sound psychometric principles as demonstrated by the high degree of correlation among teacher ratings. If the ideals of the mathematics reform movement are to achieve widespread adherence, then there must be a synchrony of improvement efforts in the areas of curriculum, instruction, and assessment (Morgan, 1998). Improvements in each of those areas will have concurrent efforts for the other two. Educating teachers in the designing and implementing of performance assessments provides a natural context in which reform-based assessment philosophy and research can be fostered.

References

Bell, K. N. (1995). How assessment impacts attitudes toward mathematics held by prospective elementary teachers. (Doctoral dissertation, Boston Univeristy, 1995). Dissertation Abstracts International, 56, 09-A.

Black, P. & William, D. (1998). Assessment and classroom learning, Assessment in Education, 5, 7-74.

Carpenter, T. P., Fennema, E., Franke, M., Levi, L. & Empson, S. B. (1999). Children's mathematics: Cognitively guided instruction. Portsmouth, N.H.: Heinemann.

Clarke, D. C., CLark, D. M. & Lovitt, C. J. (1990). Changes in mathematics teaching call for assessment alternatives. In T. J. Cooney (Ed.), Teaching and learning mathematics in the 1990s (pp. 118-129). Reston, VA: National Council of Teachers of Mathematics.

Dunbar, S. B. & Witt, E. A. (1993). Design innovations in measuring mathematics achievement. In Mathematical National Reseach Council. Measuring What Counts: A Conceptual Guide for Mathematics Assessment. Washington, D.C.: National Academy Press.

Firestone, W. A., & Schorr, R. Y. (2004). Introduction. In W. A. Firestone, R. Y. Schorr, & L. F. Monfils (Eds.), The ambiguity of teaching to the test (1-18). Mahwah, NJ: Lawrence Erlbaum.

Galbraith, P. (1993). Paradigms, problems and assessment: Some ideological implications. In M. Niss (Ed.), In Investigations into assessment in mathematics education (pp. 73-86). Boston: Kluwer Academic Publishers.

Glaser, R., Raghavan, K., & Baxer, G. P. (1992). Cognitive theory as the basis for design of innovative assessment: Design characteristics of science assessments (CSE Tech. Rep. No. 349). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Hancock, L. & Kilpatrick, J. (1993). Effects of mandated testing on instruction. Measuring what counts (149-174). Washington, D.C.: National Academy Press.

Izard, J. (1993). Challenges to the improvement of assessment practice. In M. Niss (Ed.), Investigations into assessment in mathematics education (pp. 185-194). Boston: Kluwer Academic Publishers.

Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.

Linacre, J. M. (2003). A user's guide to FACETS [computer program manual]. Chicago: MESA Press.

Linn, R. L. & Baker, E. L. (1996). Can performance-based assessments by psychometrically sound? In J. B. Baron & D. P. Wolf (Eds.), Performance-based student assessment: Challenges and Possibilities. Ninety-fifth yearbook of the National Society for the Study of Education (pp. 84-103). Chicago: University of Chicago Press.

McMillan, J. H. (2004). Classroom assessment: Principles and practice for effective instrution. Boston: Pearson.

Messick, S. (1989). Validity. In R. Linn (Ed.) Educational Measurement (3rd ed., pp. 13-103) New York: American Council on Education and Macmillan Publishing Company.

Mewborn, D. & Huberty, P. (1999). Questioning your way to the Standards. Teaching Children Mathematics, 6(4), 226-246.

Moffatt, M. (2005). Cronbach's Alpha-Dictionary definition of Cronbach's Alpha. Retrieved September 20, 2005, from About Economics: http://economics.about.com/cseconomicsglossary/g/cronbackalpha.htm

Morgan, C. (1998). Assessment of mathematical behaviour: A social perspective. In P. Gates (Ed.), Mathematics education and society. Proceedings of the First International Mathematics Education and Society Conference (MEAS 1) (pp. 277-283). Nottingham: Nottingham Univeristy.

National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards or school mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (1991). Professional standards for teaching mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (1995). Assessment standards for school mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (2000). Principles and standards for school Mathematics. Reston, VA: Author.

Niss, M., Ed. (1993). Investigations into assessment in mathematics education. Boston: Kluwer Academic Press.

Pegg, J. (2003). Assessment in mathematics: A developmental approach. In J. Royer (Ed.) Mathematical Cognition (pp. 227-259). Greenwich, CT: Information Age Publishing.

Pelegrino, J. W., Chubowsky, N., Glaser, R. (Eds.) (2001). Knowing what students know: The science and design of educational assessment. Washington DC: National Academy Press.

Reys, R. E. & Nohda, N. (Eds.) (1994). Computational alternatives for the twenty-first century: Cross-cultural perspectives from Japan and the United States. Reston, VA: National Council of Teachers of Mathematics.

Ridgeway, J. (1998). From barrier to lever: Revising roles for assessment in mathematics education. NISE Brief, 2(1), 1-9.

Ross, S. R. H. (1990). Children's acquisition of place-value numeration concepts: The roles of cognitive development and instruction. Focus on Learning Problems in Mathematics, 12(3), 1-18.

Ross, S. R. H. (1999). Place value. Using digit correspondence tasks for problem solving and written assessment. Focus on Learning Problems in Mathematics, 21(3), 28-36.

Shepard, L, A. (2000). The role of classroom assessment in teaching and learning. CSE Technical Report. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.

Sowder, J. T. (1992). Making sense of numbers in school mathematics. In G. Leinhardt, P. Putman, & R. A. Hattrup (Eds.), Analysis of arithmetic or mathematics teaching (pp. 1-51). Hillsdale, NJ: Lawrence Erlbaum Associates.

Stenmark, J. K. (Ed.) 1991. Mathematics assessment: Myths, models, good questions, and practical suggestions. Reston, VA: National Council of Teachers of Mathematics.

Trochim, W. M. (2005). Types of reliability. Retrieved September 20, 2005, from Types of Reliability: http://www.socialresearchmethods.net/kb/reltypes.htm

Wilson, M., (1999) Measurement of development levels. In G. N. Massers & J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 151-163). New York: Pergamon.

Wheeler, D. (1993). Epistemological issues and challenges to assessment: What is mathematical knowledge? In M. Niss (Ed.), Investigations into assessment in mathematics education (pp. 87-95). Boston: Kluwer Academic Publishers.

Wiggins, G. P. (1993). Assessing student performance: Exploring the purpose and limits of testing. San Francisco: Jossey-Bass.

Appendix A

5th Grade Level Number Sense Instructions:

1. Administer the Number Sense Inventory to the child in order to estimate the level at which you should present your worthwhile mathematical task.

2. Present the worthwhile mathematical task at an estimated level of number complexity. Encourage the student to solve the task in any way she/he chooses. Provide manipulatives and invite the student to record responses on the Student Recording Form as needed.

I went on a trip to see some of the wonderful tourist attractions in the United States, like the Grand Canyon, the Black Hills, and the Florida Everglades. I have flown ______ miles in my travels. What does that number mean?

3. Ask additional questions as needed to prompt and probe student thinking.

Clarifying Questions:

1. How many groups of ______ are in ______? (P.S.)

2. How many ones (tens, hundreds, etc.) are in ______? (P.S.)

3. What is one (ten, hundred, etc.) less than ______? (P.S.)

4. Group this number another way. (P.S.)

5. Is the number ______ higher or lower than this number? (P.S.)

6. What is the place and the value of the digit ______ in this number? (P.S.)

7. Is this number big or small? What about compared to ______. (P.S.)

8. How do you know that ... (referring to above questions)? (Communicating)

9. Why do you think that ... (referring to the above question)? (Reasoning)

10. Show this number in another way. (pictures, manipulatives, numerals, expanded form) (Reasoning)

11. How does this picture (or manipulatives) match with these numbers. (Connecting)

4. Adjust the complexity of the number involved in the task if necessary at any time during the assessment--up or down.

5. Score and record the level according to the Hierarchy of Numerical Complexity.

6. Score and record the number comprehension level according to the following hierarchy:

1: Interprets a 2-digit numeral as the whole number it represents, but assigns no meaning to individual digits

2: Recognizes place value names ("ones," "tens") but attaches no meaning to the digit in those places

3: Interprets digits by their "face value," e.g. that the "2" in 25 means 2 of something but not necessarily 2 tens

4: Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, but the understanding is limited and performance is unreliable

5: Recognizes digits represent groups of the particular place value, e.g. that the "2" in 25 means 2 tens or 20, and the understanding is complete and performance is reliable

7. Score the overall performance according to the rubric.

Appendix B

Fifth-grade Whole Number Operation Sense Instructions:

1. Administer the Operation Sense Inventory to entire class to determine the level at which you should present your worthwhile mathematical task.

2. Present the worthwhile mathematical task at an estimated level of number complexity. Encourage the student to solve the task in any way she/he chooses and to use the Student Recording Form as needed.

Worthwhile mathematical task:

There are ______ pieces of candy. We need to put them into ______ bags. How many pieces of candy will be in each bag? (Real-life application) (Partitive)

3. Ask additional questions as needed to prompt and probe student thinking and communication

a. What type of a problem is this? Which operation would you use to find the answer? (Operation Sense) (Problem Solving)

b. Explain how you solved the problem. (Communication)

c. Solve the problem in a different way. Explain or show me. (Relationship between operations) (Connecting)

d. What would happen to the numbers in the question if you multiplied them? (If student multiplied, ask what would happen to the numbers if they divided them.) (Relative effects of operations) (Reasoning)

e. I have ______ pieces of candy that I am going to put into bags of ______. How many bags will I have? (Measurement) (Multiple definitions of operations)(Real-life application)

f. Show this problem as a fraction. (Representing)

g. Solve this problem using pictures, manipulatives, etc. (Connecting)

4. Adjust the complexity of the number involved in the task if necessary at any time during the assessment--up or down.

5. Score and record the numerical level according to the Hierarchy of Numerical Complexity.

6. Score and record the problem solving level according to the following list:

1. Direct Modeling

2. Counting Strategy

3. Memorized Fact

4. Derived Fact

5. Direct modeling with ones and tens

6. Incrementing

7. Combining with ones and tens

8. Compensating

9. Standard Algorithm

7. Score their overall performance according to the rubric.

Damon L. Bahr

Brigham Young University

Richard R Sudweeks

Brigham Young University

Table 1. Hierarchy of Numerical Complexity Level Number Sense Addition Subtraction A Rote counting Joining sets Separating sets B One-to-one Single digit 1 digit - 1 digit = 1 correspondence addends & sum digit 5 - 3 = 2 3 + 2 = 5 C Single digit < 5 Single digit 2 digits - 1 digit = 1 addends & double digit (decomposing) digit sum 13 - 5 = 8 3 + 9 = 12 D Single digit > 5 Multiple single 2 digits - 1 digit = 2 digit addends digits (no decomposing) 3 + 2 + 4 = 9 27 - 5 = 22 E 2 digit > 15 < 20 2 digits + 2 digits no composing 32 + 24 = 56 F 2 digit > 9 < 16 2 digits + 2 2 digits - 1 digit = 2 digits with digits (decomposing) composing 27 - 9 = 18 32 + 29 = 61 G 2 digit > 20 3 2-digit addends 2 digits - 2 digits = 2 with composing digits (no decomposing) 32 + 25 + 46 = 103 36 - 24 = 12 H 3 digit 3 digits + 3 2 digits - 2 digits = 1 or digits varying 2 digits (decomposing) composing 32 - 18 = 14 391 + 467 = 858 I 3 digit, zeroes 3 3-digit addends 3 digits - 2 or 3 in ones or tens with composing digits = 1,2,or 3 digits places 323 + 257 + 469 (decomposing involving 1 zero) 406 - 178 = 228 J 4 digit 3 4-digit addends 3 or 4 digits - 2 or 3 with composing digits (1 decomposing) 3235 + 2579 + 4696 1469 - 635 K 5 digit 4 digits + 4 4 or 5 digits - 4 or 5 digits varying digits (2 alternating composing decomposes) 4628 - 1809 4625 + 1856 L 6 digit Variable digit number (2 consecutive decomposes) 631 - 253 M 6 digit, zeroes Variable digit number (3 consecutive decomposes) 54363 - 14581 N 7 digit Variable digit number (decomposing involving 2 or more zeroes) 4001 - 1376 O Whole #'s & tenths P Whole #'s hundredths Q Whole #'s & thousandths Level Multiplication Division A 1 digit x 1 digit = 1 digit 1 digit / 1 digit = 1 digit 2 x 3 = 6 8 / 2 = 4 B 1 digit x 1 digit = 2 digits 2 digit / 1 digit = 1 digit (composing) 2 x 6 = 12 12 / 2 = 6 C 10 x single digit 10 x 3 = 30 1 or 2 digits / 1 digit = 1 digit with remainder 7 / 3 = 2 r 1 D 10 multiple x single digit = 10 multiple / 1 digit = 10 2 digits 20 x 3 = 60 multiple (no decomposing) (60 / 2 = 30) E 10 multiple x single digit = 3 digit 10 multiple / 1 digit (no 3 digits (composing) decomposing) 120 / 4 = 30 30 x 4 = 120 F 2 digits x 1 digit (no 2 or 3 digits / 1 digit = 2 digits composing) 13 x 2 = 26 (no decomposing) 65 / 2 = 32 r1 G 2 digits x 1 digit 2 or 3 digit 10 multiple / 1 (composing) 14 x 3 = 42 digit = 2 digits (decomposing) 150 / 7 = 21 r3 H 10 multiple x 10 multiple = 3 2 or 3 digits / 1 digit = 2 digits digits (composing) (decomposing) 78 / 3 = 26 20 x 20 = 400 I 2 digits x 2 digits (no 3 digits / 1 digit = 3 digits composing) 13 x 12 432 / 2 = 216 J 2 digits x 2 digits (one 3 digits / 1 digit = 3 digits composing on first row) (zero in quotient) 432 / 2 = 216 23 x 14 K 2 digits x 2 digits (one 2 or 3 digits / 10 multiple composing on second row) 60 / 20 = 3 43 x 24 L 2 digits x 2 digits (two composes) 23 x 65 M 2 digits x 2 digits (larger digits) 67 x 98 N O P Q Ask the child to perform tasks A and B, then ask the child to read the remaining numerals until she/he is not successful. A. How far can you count? B. Count these blocks (up to 10) C. 4 D. 7 E. 18 F. 12 G. 68 H. 492 I. 709 J. 3,579 K. 24,683 L. 147,836 M. 305,284 N. 3,548,921 O. 73.9 P. 697.34 Q. 28.108 Figure 3. Number Sense Inventory Table 2. 5th Grade Operation Sense Inventory--Whole Number Division 8 / 2 12 / 2 7 / 3 60 / 2 120 / 4 65 / 2 150 / 7 78 / 3 60 / 20 Table 3. Performance Assessment Rubric Rubric level Problem Solving Communicating Reasoning 4 Independent Can solve the Can clearly Can clearly Understanding problem in two explain the justify the ways problem solving problem solving independently strategies strategies 3 Understanding with Can solve the Can clearly Can justify all minimal help problem in two explain all but but one part of ways with one part of the the problem minimal help or problem solving solving one way strategies strategies independently 2 Understanding with Can solve the Can explain Can justify substantial help problem at portions of the portions of the least one way problem solving problem solving with help strategies strategies 1 Little Cannot solve Cannot explain Cannot justify understanding the problem the strategies the strategies even with help even with help even with help Rubric level Representing Connecting Procedural 4 Independent Can represent Can independently Can solve the Understanding the problem in connect problem using at least two representations a procedure ways or strategies independently independently 3 Understanding with Can represent Can connect Can solve the minimal help the problem in representations problem two ways with or strategies procedurally minimal help or with minimal help with minimal one way help independently 2 Understanding with Can represent Can connect Can solve the substantial help some of the representations problem problem with or strategies procedurally help only with with substantial help substantial help 1 Little Cannot represent Cannot connect Cannot solve understanding the problem even representations the problem with help or strategies procedurally even with help even with help Rubric level Conceptual 4 Independent Can show thorough understanding of the problem and Understanding of the associated mathematics independently 3 Understanding with Can show some understanding with minimal help minimal help 2 Understanding with Can show some understanding with substantial help substantial help 1 Little Cannot show understanding even with help understanding Table 4. Cohort and Combined Alpha Coefficients Cohort N Cronbach's Alpha A 21 .992 B 19 .998 C 28 .993 D 17 .947 Combined 85 .978 Table 5. Grade Level Alpha Coefficients Cronbach's Alpha Grade Level Number Sense Operation Sense Kindergarten .932 .865 First .711 .633 Second .909 .846 Third .806 .925 Fourth .928 .881 Fifth .915 .910

Printer friendly Cite/link Email Feedback | |

Author: | Bahr, Damon L.; Sudweeks, Richard R. |
---|---|

Publication: | Focus on Learning Problems in Mathematics |

Article Type: | Report |

Geographic Code: | 1USA |

Date: | Jan 1, 2008 |

Words: | 6194 |

Previous Article: | Kindergarten and first graders' knowledge of the number symbols: production and recognition. |

Next Article: | A complex system analysis of practitioners' discourse about research. |

Topics: |

## Reader Opinion