Printer Friendly

Chronic noncorrespondence between elementary math curricula and arithmetic tests.

Chronic Noncorrespondence Between Elementary Math Curricula and Arithmetic Tests

Test of academic achievement are administered routinely as an estimate of student learning. In turn, estimates of student learning are used in making a number of educational decisions, including classification, placement, program planning, program evaluation, and grading. Inferences about learning, however, are accurate only to the extent that students have had an opportunity to learn the material tested. Thus, it is implicit that the content of assessment must bear a close correspondence to the content of instruction (Armbruster, Stevens, & Rosenshine, 1977, p. 2). For this reason, attainment and achievement are usually distinguished. "Attainment is what an individual has learned, regardless or where it has been learned. Achievement is what has been learned as a result of instruction in the schools" (Salvia & Ysseldyke, 1985, p. 9).

This need for tests to correspond to curriculum is in direct conflict with the needs of test publishers to market profitable products. Commercially prepared standardized tests are designed for broad appeal; they are intended to reflect the typical curricula offered to the majority of students, i.e., nonhandicapped students in the middle instructional tracks. The correspondence between test and curricular content is a particularly thorny issue for special educators because the curricula adopted for most exceptional students (and especially handicapped students) often differ systematically from the curricula offered nonexceptional students. Even when mainstreaming is a goal for handicapped students, one would anticipate curricular differences.

In the last decade, several researchers have examined curriculum-test correspondence in reading and mathematics. Armbruster et al. (1977) compared content coverage and emphasis of three third-grade reading curricula and two standardized test. They concluded that the tests were similar in emphasis on reading comprehension, but that the curricula differed widely in their emphases: "Only a small percentage of the skills emphasized in the curricula have counterparts on the standardized test" (p. 38). Jenkins and Pany (1978a) also examined reading tests and curricula in a paper that appeared in the Journal of Reading Behavior. They discussed a portion of that research in Exceptional Children (1978b), where they compared first- and second-grade books from five reading series with five word-recognition sub-tests. They found that the correspondence between tests and curricula varied widely and that the assumption of representatives of curricular content on achievement tests was not supported.

Leinhardt and Seeward (1981) examined correspondence between curriculum and test by evaluating first- and third-graders' opportunity to learn what was covered on the reading and math subtests of a standardized achievement battery. They used two different methods, instruction-based measurement and curriculum-based measurement. In instruction-based measurement, opportunity to learn was based on curricular materials and teacher-reported activities. In curriculum-based measurement, opportunity to learn was based only on curricular materials. Their results were summarized by Airasian and Madaus (1983, p. 113):

1) Correlations between measures of instructional overlap are moderate, indicating some, but not total redundancy in the information obtained from the two estimates; 2) the mean curriculum-based-test overlap measure is lower than the instruction-based overlap measure because the former does not encompass instruction covered by textbooks; 3) both overlap estimates tend to be stable; 4) overlap measure contribute significantly to the prediction of post-test scores on the test and curriculum under study.

The Institute for Research on Teaching conducted a series of studies to address the correspondence of mathematics curricula and tests. Schmidt (1978) developed a taxonomy to classify math curricula that was based on three dimensions: (a) mode of presentation, (b) nature of material, and (c) operation required. Based on his examination of the fourth-grade math subtests of four widely used group achievement batteries, he concluded that the subtests' content varied so much that some subtests would be more relevant than others for a given instructional program. Freeman et al. (1983) analyzed three fourth-grade mathematics curricula and compared them to math subtests of four widely used standardized achievement batteries. Onlu 6 of the 22 topics were emphasized in all textbooks and tests.

Three important questions about test and curriculum corrspondence in mathematics remain unanswered in the current literature. First, previous work has been concentrated on curriculum by topic. Curriculum theorists (e.g., Beauchamp, 1968; Brandt, 1981; McNeil, 1977; Tanner & Taner, 1980; Unruh, 1975) frequently make distinctions between curricular content and process. In this distinction, process refers to the function or requirements demanded of a student and may mean several different things (e.g., psychomotor, affective, and cognitive demands). Thus in addition to comparing tests and curricula in terms of content, it is also important to learn the correspondence by process. While Gagne's "Hierarchy of Ideas" (Gagne, 1971) and Guilford's (1967) "Structure of Intellect" have been cited often as a basis for classifying by process, Bloom's taxonomy usually is considered the most systematic approach to such classifications of curriculum content. If test are used to assess a student's knowledge of content and use a type of cognitive demand that is systematically different from the type used to teach that student, the test cannot be considered a valid assessment of what has been taught.

A second question is the degree of correspondence over time. Curricular sequences in mathematics may differ in any one year, but one might expect convergence over several years. In previous research, this issue of the cumulative aspect of curricula has been ignored. A third question relates to the grade levels at which curriculum and test content has been compared. Only correspondence at the middle-elementary level has been examined. The correspondence between mathematics curricula and test content in the earlier grades is especialy important because most handicapped students are identified in the primary grades. In this study these three unanswered questions are addressed by: (a) examining curriculum-test correspondence by both content and process, and (b) by examining correspondence between two mathematics curricula and two tests over the first three grades.



Two elementary mathematics curricula and two mathematics tests were selected. All are frequently used. The first curriculum, Scott, Foresman Mathematics (S,FM) (Bolster et al., 1983), may be used by handicapped students enrolled in mainstream settings since it is intended primarily for regular-education classrooms. S,FM is designed to teach daily math lessons by a "motivate, teach, practice, and apply" approach (Bolster et al., 1983, p. T10). The second curriculum was Distar Arithmetic I, II, & III (Distar) (Engleman & Carnine, 1976). This curriculum, often used with handicapped students, employs a carefully programmed sequence of activities that must be followed by a teacher. One individually adminstered and one group-administered test were also examined: The KeyMath Diagnostic Arithmetic Test (KMDAT) (Connolly, Nachtman, & Pritchett, 1976) and the mathematics subtest of the Iowa Tests of Basic Skills (ITBS) (Hieronymus, Lindquist, & Hoover, 1983).

Criteria For Classifying Content

A three-dimensional taxonomy of mathematics content was used in quantifying the coverage and emphases of the curricula and tests. Two of the dimensions, in the original form used by Kuhs et al. (1979), were material and operation. Material refers to the content of the problem (e.g., single digits, number sentences, decimals, etc.); operation refers to the arithmetic operation required in the problem (e.g., identifying equivalents, addition without regrouping, estimation, etc.). In addition, Wilson's (1971) adaptation of Bloom's (1956) Taxaonomy of Educational Objectives was used to classify curricula and tests on the type of process required. Whereas Bloom included basic computational skills within the knowledge dimension of his taxonomy, Wilson distinguished between computation and knowledge of facts and specifics. Our taxonomy used four process types: knowledge, computation, comprehension, and application. Knowledge refers to the recall or recognition of symbols and terms; computation refers to the solving mathematical problems expressed in numbers (e.g., ("3 + 5 = "); comprehension refers to translation of material (e.g., solving word problems); application is using abstract principles and rules in concrete situations. All together, they form our dimension mastery. Column 1 in Table 1 lists all dimensions used in classification in this study.


To obtain a representative sample of S,FM's content, every other item on every page in the student textbooks for grades 1 through 3 was analyzed. Items were defined as any activity requiring a student to respond and included computational and story problems, questions, and exercises. For the S,FM series (Grades 1-3), 2,057, 2,154, and 2,457 items were analyzed respectively.

Different page-by-page analyses were used to obtain a representative sample of Distar's content. For Distar 1, all activities in lessons 1 through 20 were analyzed. Thereafter, the required "Take Home" worksheets for lessons 21 through 160 were analyzed. For Distar 2, all "Take Home" worksheets and Teacher Presentation Book D were examined. (Presentation Book D contained lessons that were not covered in the "Take Homes" but that were part of the required curriculum.) For Distar 3, student workbooks 1, 2, and 3 were analyzed. Every second problem, worksheet, or exercise in the Distar materials were analyzed. For Distar, 1,941 items were classified at the first-grade level; 3,980, at the second-grade level; and 3,034, at the third-grade level.

Every item on the KMDAT was analyzed. Separate tabulations were made for end of first grade, end of second grade, and end of third grade. The ends for grade levels were determined by the expected end-of-year attainment: 1.9 for Grade 1, 2.9 for Grade 2, and 3.9 for Grade 3.

All items on the ITBS Early Primary Battery (Levels 5 and 6), and Primary Battery (Levels 7 and 8), and Level 9 of the ITBS Multilevel Booklet were included. Every item for Test M (Mathematics) of Levels 5 and 6 were analyzed. For levels 7, 8, and 9, mathematics items from Test W-1 (Visual Materials) and all items in Test m-1, m-2, and m-3 (Math Concepts, Math Problems, and Math Computation) were analyzed.

The number of items in each category of each dimension was tabulated for each grade level of both curricula and for each level of the ITBS. Percentages were computed for each dimension by dividing the category frequencies by the appropriate total frequency. These percentages are shown in Table 1. The sums of items in each category of each dimension then were tabulated for grades 1 and 2 combined and grades 1, 2, and 3 combined for the two curricula and the ITBS and KMDAT. Early Primary (Levels 5 and 6) were combined with Grade 1 (Level 7) for cumulative ITBS data. Once, again, percentages were computed for each dimension by dividing the category frequency by the appropriate total frequency. These percentages appear in Table 2.

Reliability of the classifications was estimated by computing interrater agreement on each dimension (mastery, material, and operation) for each grade level of each curriculum and for the two tests. For curriculum content, a 5% random sample of curriculum items was selected (i.e., 333 items for S,FM and 446 items for Distar); for tests, 10% random sample of items was selected (i.e., 12 items for KMDAT and 34 items for the ITBS). The independent raters were the first author and a second person specifically trained to use the classification system. Percentages of simple agreement (Salvia & Ysseldyke, 1985, p. 114) were calculated. Because all percentages of agreement equalled or exceeded 90% (see Table 3), the data were considered sufficiently reliable for further analyses.


Chi square for independent samples was the appropriate analysis. However, the computational procedures for chi square do not allow for the evaluation of interactions. Furthermore, omnibus chi squares should not be computed because of the nature of the data: Interactions between grade and curriculum were expected on the three dimensions (mastery, operation, and material) since content coverage was likely to vary. (For example, the number of single-digit addition problems could be expected to decrease from first to second grade while the number of multiple-digit addition problems could be expected to increase; however, it could not be anticipated that the decrease in single-digit problems and the increase in multiple-digit problems would be consistent from curriculum to curriculum.) Thus, the appropriate analysis was a series of 90 chi squares that compared the number of items in each category of each dimension at each grade and for cumulative grades, between curricula, between tests, and between curricula and tests. The disadvantage of so many analyses is that the probability of making a Type 1 error is increased. However, this disadvantage could be offset by decreasing alpha to .001.

The detailed breakdowns of the material and operation dimensions resulted in frequencies of zero in some comparisons; these rows in the chi square tables were deleted. In addition the correction procedures described by Siegel (1956) were applied to 58 tables with cells containing expected frequencies of less than five; adjacent categories were combined.

Ninety pecent (81 of 90) of the analyses produced values significant at the .001 level. Only nine chi squares were nonsignificant (and four of these were significant at the .05 level). (See Table 4). Of primary interest were the comparisons between tests and curricula. At first grade, S,FM and the ITBS did not differ significantly on material or operation; they did differ on mastery. distar and the ITBS differed significantly on all dimensions at first grade. At second and third grades and for all cumulative grades, both S,FM and Distar differed significantly from the two tests on all dimensions with two exceptions: S,FM and the ITBS did not differ on material at the second and third grades (although these comparisons were significant at the .05 level). Thus, the curricula and tests were in disagreement on at least one dimension at all grade levels or cumulative grades; with four exceptions, the curricula and tests did not agree on any of the three dimensions at any grade level or cumulative grades at the .001 level.

Also of interest was the correspondence between the two curricula. S,FM and Distar differed significantly on all dimesions within each grade and cumulatively across grades. Finally, the correspondence between the ITBS and KMDAT was examined. The two tests corresponded more closely. Only four of nine comparisons between the two tests differed significantly. Material and operation both differed significantly through second and third grades. In addition, material was significant at the .05 level, and mastery was signficant at the .05 level through third grade.


The findings of this study are consistent with the previous research reported by Jenkins and Pany (1976, 1978a, 1978b) and Armbruster et al. (1977), who documented significant discrepancies in the content of reading curricula and tests, and with Freeman et al. (1983), who found substantial variance in the content of elementary math curricula and tests. In this study, differences in curricular and tests contents were observed again. Moreover, these differences did not disappear when the cumulative curricula and tests were examined. Additionally, the results of this study demonstrates that the types of learning required by curricula and assessed by tests differed substantially.

The chronic noncorrespondence of the content of curricula and tests raises several implication for their use in educational programs and decision making. Clearly, the KMDAT and ITBS lack content validity for S,FM and Distar. Use of tests with low content validity with respect to given curricula to make educational decisions is improper for two reasons. First, neither test could provide an accurate measure of achievement for either curricula; assessing the outcomes of learning would be impossible under these circumstances. Second, estimates of attainment of math skills would be difficult, because the KMDAT and ITBS do not correspond; no generalization of student abilities could be made with confidence.

Although not all curricula, tests, and grade levels have been investigated, a clear pattern of findings has emerged at the elementary levels for reading and mathematics. All data reported in the literature are consistent: Commercially prepared tests do not match the curriculum that is taught. Educators should take seriously the dictum that achievement tests have content validity. Special educators should no longer accept the results of standardized achievement tests unless they have been demonstrated to match a student's curricula. Sensitivity to the issue of correspondence should be an integral consideration in assessment practices to ensure an appropriate education for all students.


Airasian, P., & Madaus, G. (1983). Linking testing and instruction: Policy issues. Journal of Education Measurement, 20, 103-118.

Armbruster, B., Stevens, R., & Rosenshine, B. (1977). Analysing content coverage and emphasis: A study of three curricula and two tests. (Technical Report No. 26). Urbana, IL: University of Illinois, Center for the Study of Reading.

Beauchamp, G. (1968). Curriculum theory. Willmette, IL: Kagg Press.

Bloom, B. (1956). Taxonomy of educational objectives, handbook 1: Cognitive domain. New York: McKay.

Bolster, L., Gibb, E., Hansen, T., Kirkpatrick, J., McNerney, C., Robitailee, D., Trimble, H., Vance, I., Walsh, R., & Wisner, R. (1983). Scott, foresman mathematics, teachers edition (books 1, 2, 3). Glenview, IL: Scott, Foresman.

Brandt, R. (Ed.) (1981). Applied strategies for curriculum evaluation. Alexandria, VA: Association for Supervision and Curriculum Development.

Connolly, A., Natchman, W., & Pritchett, R. (1976). KeyMath Diagnostic Arithmetic Test. Circle Pines, MN: American Guidance Service.

Englemann, S., & Carnine, D. (1976). Distar arithmetic I, II, & III. Circle Pines, MN: American Guidance Service.

Freeman, D., Kuhs, T., Porter, A., Floden, R., Schmidt, W., & Schwille, J. (1983). Do textbooks and tests define a national curriculum in elementary school mathematics? The Elementary School Journal, 83, 501-513.

Gagne, R. (1971). The conditions of learning (2nd ed.). New York: Holth, Rinehart, & Winston.

Guilford, J. (1967). The nature of human intelligence. New York: McGraw-Hill.

Hieronymus, A., Lindquist, E., & Hoover, H. (1983). Iowa Tests of Basic Skills. Chicago, IL: Riverside.

Jenkins, J., & Pany, D. (1976). Curriculum biases in reading achievement tests. (Technical Report No. 16). Urbana, IL: University of Illinois, Center for the Study of Reading.

Jenkins, K., & Pany, D. (1978a). Curriculum biases in reading achievement tests. Journal of Reading Behavior, 10, 345-357.

Jenkins, J., & Pany, D. (1978b). Standardized achievement tests: How useful for special education? Exceptional Children, 44, 448-453.

Kuhs, T., Schmidt, W., Porten, A., Floden, R., Freeman, D., & Schwille, J. (1979). A taxonomy for classifying elementary school mathematics content. East Lansing, MI: Michigan State University, Institute for Research on Teaching.

Leinhardt, G., & Seeward, A. (1981). Overlap: What's tested, what's taught? Journal of Educational Measurement, 18, 85-95.

McNeil, J. (1977). Curriculum: A comprehesive introduction. Boston: Little, Brown.

Salvia, J. & Ysseldyke, J. (1985). Assessment in special and remedial education (3rd ed.). Boston: Houghton Mifflin.

Schmidt, W. (1978). Measuring the content of instruction. East Lansing, MI: Michigan State University, Institute for Research on Teaching.

Siegel, S. (1956). nonparametric statistics. New York: McGraw-Hill.

Tanner, D., & Tanner, L. (1980). Curriculum development (2nd ed.). New York: Macmillan.

Unruh, G. (1975). Responsive curriculum development: Theory and action. Berkeley, CA: McCutchan.

Wilson, J. (1971). Secondary school mathematics. In B. Bloom, J. Hastings, & G. Madaus (eds.), Handbook of formative and summative evaluation of student learning. New York: McGraw-Hill.

JAMES SHRINER is Doctoral Student and Graduate Fellow, Department of Special Education, University of Minnesota, Minneapolis. JOHN SALVIA is Professor, Division of Special Education and Communication Disorders, The Pennsylvania State University, University Park.
COPYRIGHT 1988 Council for Exceptional Children
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1988 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:grades 1 through 3
Author:Shriner, James; Salvia, John
Publication:Exceptional Children
Date:Nov 1, 1988
Previous Article:Current disciplinary practices with handicapped students: suspensions and expulsions.
Next Article:Young children with orthopedic handicaps: self-knowledge about their disability.

Related Articles
Tests flunk, study find.
Debate over NAEP: too easy?
Teachers have the power to alleviate math anxiety.
A little "Ingenuity" leads Baltimore City students to success in math and science.
8th grade algebra: finding a formula for success: if your district is considering this trend, here's some advice on mathematics' good news/bad news...
Enhanced math.
Using mix of strategies, Stanton Elementary moves from bottom to top of Philadelphia's promising schools.
Algebraic thinking: what it is and why it matters.

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters