Printer Friendly

Examining the validity of two measures for formative teaching: reading aloud and maze.

ABSTRACT: This study examined the relations among performance on standardized reading tests, two informal measures (reading aloud and maze), and teacher judgment. Subjects were 335 students from Grades 2-6. Correlations between oral reading fluency and two achievement tests varied according to students' grade level, with a declining trend in the coefficients as grade level increased. Correlations between oral reading and achievement tests at the upper grade levels were lower than those reported in previous studies. In contrast, coefficients between performance on maze tasks and achievement tests did not show a similar decline from lower to upper grade levels. Results are discussed in the context of assessment procedures used informative instruction.

* The idea that teaching can be informed by regularly gauging the current status of the learner's skill and knowledge has a long tradition in education. Socrates, for example, selected questions based on students' answers to earlier questions. More recent examinations of teaching show that experienced teachers employ routines with which they assess students' current knowledge and understanding, and adjust their instruction accordingly (Leinhardt & Greeno, 1986).

Formative teaching, or the principle of adapting instruction to students' current knowledge, represents one of the most basic tenets of effective instruction. In the main, learner assessment for formative teaching has been conducted through informal observations and classroom questions; but periodically educators have attempted to formulate more systematic means for teachers to monitor student learning in ways that are useful for instructional planning.

Deno (1985) proposed an alternative approach to the assessment component in formative instruction that relies on frequent, small-scale achievement tests. In his approach, teachers gauge students' global reading proficiency by using reading tasks that remain relatively uniform in difficulty throughout the year. The merits of his approach to assessment hinge on three criteria: (a) the validity of the measures used to monitor students' global reading proficiency--that is, the extent to which the measures provide a valid index of general reading proficiency; (b) the sensitivity of the measures to changes in reading proficiency that occur over relatively brief periods; and (c) the utility of the information for instructional planning--that is, whether teachers can use this kind of general assessment information to adjust and modify instruction in ways that advance students' achievement. The present study addresses the first criterion, validity.

The aim of Deno' s assessment approach is not to measure specific skills, but to provide an index of global reading proficiency, similar to that obtained from an omnibus achievement test. Thus, the validity of the formative measures depends on the extent to which they correspond to other commonly accepted indexes of reading profiCienCy, that is, concurrent validity estimates of formative measures that use traditional achievement test performance as the criterion. In the words of Kaplan (1964), "Here, the validity of a measurement is a matter of the success with which the measures obtained in particular cases allows us to predict the measures that would be arrived at by other procedures and in other contexts" (p. 199).

The present study was an attempt to further explore the concurrent validity of two reading measures that have been used to index general reading proficiency in the context of formative teaching. Both may be characterized as relatively informal measurements. The first, reading aloud, was recently noted by the Commission on Reading (1985): "A valid assessment of reading proficiency... can be obtained by having students read unfamiliar text but with acceptable fluency" (p. 99). Further, measuring the number of words read accurately during brief oral reading assessments seems to satisfy basic standards of technical adequacy. Reported correlations between reading aloud (using words read correctly as the measure) and standardized reading achievement performance typically fall between .70 to .90, with most coefficients above .80 (Deno, Mirkin, & Chiang, 1982; Shinn, 1989). Nevertheless, many reading experts are troubled by any reading measure that does not provide more explicit evidence of comprehension.

Procedures for assessing reading proficiency that include an explicit comprehension element have better face validity than oral reading. Guthrie (1973) and Guthrie, Siefert, Burnham, and Caplan (1974) proposed one such measure, the maze task. This measure requires students to read a passage and select answers to multiple-choice items, that is, the correct word and two distractors, which have replaced every fifth word in the passage. That the maze was intended for use in formative instruction is evident from Guthrie et al.'s (1974) comments:

Teachers and reading specialists need a simple, accurate means to monitor the progress of children during the course of a reading program. Particularly if the program emphasizes comprehension skills, the comprehension levels of an individual or a group should be assessed regularly to supply feedback to the teacher about the effectiveness of the instructional approach. Standardized tests are insufficient for this purpose since they require time and money and cannot be given with sufficient frequency to provide the feedback that is needed for continuous revision and improvement of the teaching program. (p. 165)

There are fewer validity studies of maze than on reading aloud. Guthrie et al. (1974) reported a correlation of .82 between performance on this measure and standardized achievement tests, with retest reliability "over .90" (p. 165).

The high correlations between achievement tests and relatively informal reading measures such as reading aloud and maze tasks have been taken as evidence that the latter measures satisfy standards for concurrent validity and can be used as an index of general reading proficiency. However, generalizations drawn from earlier concurrent validity studies may be broader than warranted. The Guthrie et al. (1974) investigation of the maze procedure was limited to 2nd-grade students and used a single standardized achievement test as the validity criterion. Performance on maze tasks may not be as highly correlated with achievement test performance at other grade levels or with other achievement tests. Likewise, Deno et al. (1982); Fuchs, Fuchs, and Maxwell (1988); and Shinn (1989) reported high correlations between the number of correct words read aloud and standardized achievement tests. The testing samples used in these studies, however, were extremely heterogeneous (i.e., students drawn from several grade levels), a fact that could artificially inflate the correlations. Thus, one purpose of the present research was to examine the relationship between conventional measures of reading proficiency and other less formal measures for both grade-heterogeneous and grade-homogeneous samples to determine if correlations previously reported on heterogeneous groupings overestimate the magnitude of the relationship between the different reading measures.

Criterion validity is based on the idea that different measurement procedures tap a common trait. In the present context, traditional achievement tests and informal assessments may measure the same construct, reading proficiency, or different but related aspects of this construct. The surface features of the three types of reading measures investigated in this research bear little obvious similarity. Moreover, whereas reading aloud and maze retain the same test format across grade levels, achievement tests tend to use different tasks to estimate reading proficiency in lower and upper grade levels. For example, levels of the Metropolitan Achievement Tests (MAT) (Prescott, Balow, Hogan, & Fan, 1984) used in Grades 2-4 require students to match words based on beginning, middle, and ending sounds, and to discriminate among word affixes and compound words; but levels used in Grades 5 and 6 do not test these skills. In contrast, reading aloud and maze tests not only use the same task form across all grade levels, but they sometimes even use the same passages for all grades (Deno et al., 1982; Espin, Deno, Maruyama, & Cohen, 1989; Fuchs et al. 1988). Thus, informal and formal reading tests could be tapping different aspects of the general construct "reading proficiency," and the type of reading proficiency measured by the achievement tests may change across grade levels.

Because of the methodological characteristics of earlier research on the criterion validity of informal measures, specifically the practice of sampling students across several grade levels, differences in what is measured by achievement tests and more informal reading assessments may have been obscured. Thus, a second purpose of the present research was to examine correlations between formal and informal reading measures at different grade levels to determine if the latter measures remain uniformly "valid" with respect to the former--that is, whether reading proficiency as measured by the two approaches is defined differently at specific grade levels.

We address a third issue in this research--the criterion used in estimating the validity of reading measures. In recent years, traditional reading achievement tests have drawn criticism from reading researchers, as well as practitioners. For example, Valencia and Pearson (1988) stated, "We have been so seductively drawn to the socalled objectivity, reliability and validity of standardized norm-referenced tests that we have forgotten that they may only be minimally useful for making instructional decisions" (p. 27). Hence, attempts to establish the validity of newer measures such as maze tasks by correlating their results with those of traditional tests may be questioned because of doubts about the standard used for establishing criterion validity--traditional achievement tests. In the present research, we also included a third estimate of reading proficiency, teacher judgments, and examined its relationship to both the traditional and informal measures.



The sample comprised 335 subjects from two elementary schools in the Pacific Northwest. The numbers of students in Grades 2-6 were 47, 50, 66, 47, and 125, respectively. Roughly one third of the sample came from lower socioeconomic status homes; 109 of the 335 subjects qualified for free and reduced price lunches, and 17 (5%) were special education students. All but 2 special education students were classified as having learning disabilities, I was classified as having a mental disability and I as having an emotional disturbance. Fourteen classroom teachers participated in ranking students in their classrooms according to their estimated reading proficiency.


Gates-MacGinitie Reading Tests. We used one of three levels of the Gates-MacGinitie Reading Tests (MacGinitie, Kamons, Kowalski, MacGinitie, & McKay, 1978). Level B was administered to 2nd-grade students, Level C to 3rd-grade classes, and Level D to students in Grades 4-6. Each level of the test contains two subtests: vocabulary and comprehension. Vocabulary is tested without sentence contexts and uses a multiple-choice format; the comprehension subtest requires students to indicate which of four pictures matches a sentence or short paragraph, or to answer multiple-choice questions based on brief paragraphs. For our analyses, we used scaled scores for the reading comprehension subtest and total reading.

The technical manual of the Gates-MacGinitie (MacGinitie et al., 197 8) presents concurrent correlations for Levels D and E with corresponding subtests of the fifth edition of the MAT. These are between .79 and .92, with the higher correlations for total test scores (Stahl, 1989). In describing the validity of the 1972 edition of the Gates-MacGinitie, Salvia and Ysseldyke (1978) stated, "The authors [of the Gates-MacGinitie] do report the results of an unpublished doctoral dissertation by Davis (1968) in which the subtests of the Gates-MacGinitie were found to correlate in the .70 to .85 range with four other standardized reading tests" (p. 154). In addition, Ryckman (1982) reported correlations between the Gray Oral Reading Test and the Gates-MacGinitie that ranged from .48 to .69. In our study, the correlations of .82 to .85 (overall) and .60 to .76 (specific grade levels) between the BASS Reading subtest and the Gates-MacGinitie Reading Tests appeared to be of a magnitude comparable to those reported in this somewhat sparse literature.

Metropolitan Achievement Tests (MAT). We used three levels of the MAT (Prescott et al., 1984): the Primary 2 for 2nd grade, the Elementary for 3rd and 4th grades, and the Intermediate for 5th and 6th grades. All levels include vocabulary and reading comprehension subtests in multiple-choice formats. Vocabulary is tested within sentence contexts and comprehension with questions that follow reading selections. The Primary 2 and Elementary levels include a word recognition subtest in which students identify consonant and vowel sounds. The subtests together yield an overall total reading score. We used scaled scores for the reading comprehension subtest and the total reading score.

Maze Passages. The maze task consisted of three passages with a mean Spache (1953) readability of 2.3, drawn from the Basic Academic Skills Samples (BASS) (Espin et al., 1989). (The Spache readability formula weights sentence length and word frequency to estimate the difficulty of text.) Passages ranged from 226 to 313 words (mean of 263), in which every seventh word was replaced by a multiple-choice item that consisted of the correct word and two distractors that were clearly incongruous with the context of the story. Students had 1 min to read each passage, selecting the correct words. We discontinued scoring after a student made three consecutive incorrect choices. The total number of correct maze choices across three passages was used in our analyses.

Oral Reading Measures. Three narrative passages developed by Deno, Marston, Deno, and Marston (1988), ranging from 180 to 200 words, were used for the oral reading assessment. The mean readability of these passages was 1.7, according to the Spache formula. Students read each passage to an examiner for 1 min. If students hesitated for more than 3 s, the examiner supplied the word. If students completed a passage in less than 60 s, they returned to the beginning of the passage and continued reading until the minute ended. Substitutions, mispronunciations, and omissions were scored as errors. The score on a passage was the number of words read correctly. We used each student's median score for the three passages as the overall reading-aloud score.

Teacher Judgment. We included a third validity criterion, teacher judgment. Classroom teachers were asked to identify their 15 lowest readers and rank them from lowest to highest in reading ability.


During the second week of the school year, students were first given the Gates test by their classroom teacher. During the third week of school, classroom teachers administered the maze, and research staff administered the oral reading test. The former was group administered; the latter was individually administered, with all three reading samples presented at one testing, conducted outside of the classroom. During the fourth week of school, before any test results had been released, classroom teachers identified and rank ordered their 15 lowest readers.

Teachers administered the MAT in the third week of April. During the fourth week of April and the first week in May, teachers again administered the maze, and research staff gave oral reading tests.


Means and standard deviations for all measures broken out by grade level are shown in Tables 1 and 2. We report normal curve equivalents (NCEs). NCEs are similar to percentiles except that the former have equal interval properties, allowing them to be added, multiplied, and so forth. The mean Gates NCE computed for each grade level ranged from 50 to 62, with an overall mean NCE of 56, indicating that students scored slightly above national norms for this test. MAT NCE means ranged from 52 to 62, with a mean of 56.

Cross-Grade Correlations

Table 3 gives the Pearson's correlations between the achievement tests, the maze, and the reading-aloud measures. These correlations for tests given in both autumn and spring test periods were computed for the group as a whole. All correlations were .80 or above, and were statistically significant (p< .01).

Grade-Level Correlations

Table 4 shows grade-level correlations between the two achievement tests and the two informal measures. All coefficients were statistically significant (p< .01). Correlations between maze and Gates scores were similar whether total reading or comprehension subtest scores were employed. The same was true for maze and MAT total reading and comprehension. Whereas the correlations between the standardized achievement tests and either of the informal tests for all grades combined were at or above .80, none of the grade-level correlations between the maze test and either achievement test reached this level. Between the maze and Gates total reading scores, correlations ranged between .65 and .76, with a median of .71, achieved by the Grade 6 sample. Between maze and MAT total reading, correlations ranged from .66 to .76, with a median of .73 obtained by Grades 4 and 5.

The oral reading measures correlated with Gates total reading and comprehension scores at similar levels, as did reading aloud with MAT total reading and comprehension subtests. Correlations between words correct read aloud and Gates total reading ranged from .67 to .88, with a median of .83 obtained by Grade 2. Correlations between reading aloud and MAT total reading ranged from .60 to .87, with a median of .70, obtained by the Grade 3 group.

Table 4 shows a negative trend across grade levels for correlations between reading aloud and both achievement tests. Between oral reading and Gates total reading, correlations of .83, .88, and .86 found in Grades 2, 3, and 4, respectively, declined to .67 at Grade 6. Similarly, the correlations between reading aloud and MAT total reading dropped from .87 at Grade 2 to .60 at Grade 6. For oral reading and Gates total reading correlations, differences between 2nd and 6th grade were significant, z = 2.15, p < .05, as were 3rd versus 6th, z = 3.32, p < .01, and 4th versus 6th, z = 3.11, p < .01. The same pairs of correlations differed significantly when the comprehension subtest scores were substituted for total reading. Oral reading-Gates total reading correlations also differed significantly between Grades 3 and 5, z = 2.13, p < .05. Although at Grade 5, the oral reading-Gates total reading correlation did not differ significantly from those of Grades 2 and 4, the oral reading-Gates comprehension correlations did, for Grades 2 versus 5, z = 2.17,p < .05, and 4 versus 5, z = 2.36, p < .05.

Correlations between oral reading words correct and MAT reading scores declined across grades (from .87 at Grade 2 to .60 at Grade 6) in a manner similar to that observed with the Gates. The correlations between oral reading and total reading at Grade 2 exceeded that at Grade 5, z = 2.36, p < .05 and Grade 6, z = 3.64, p < .01. We observed similar differences when we substituted the reading comprehension subtest score for total reading. The Grade 4 correlation between oral reading and the comprehension subtest exceeded that in Grade 5, z = 2.03,p < .05, and that in Grade 6, z = 3.19, p < .01, as did the oral reading-total reading correlations, z = 2.44, p < .05. However the Grade 3 correlations did not significantly exceed those of Grade 6.

In contrast to the pattern observed with oral reading-achievement test correlations, we did not observe clear trends across grade levels for the correlations between maze and either achievement test. No pairwise contrasts between grade-level correlations were statistically significant.

Teacher Judgment and Test Scores

Correlations were computed between teacher ranking of the lowest 15 readers in the classrooms and these students' scores on the various reading measures. Mean coefficients were calculated for each combination and are shown in Table 5. The mean correlation between teacher judgments and the Gates scores was comparable to those between teacher judgment and the two informal tests. Note that the correlations reported in Table 5, all somewhat lower than those reported earlier, are to be expected because they are based on a restricted sample--the lowest 15 readers in each classroom. The three highest coefficients all involved reading aloud.


Grade-Homogeneous and -Heterogeneous Correlations

Children's performance on the two informal reading measures and standardized achievement tests were highly and significantly correlated. These findings are consistent with those of other studies (Deno et al., 1982; Fuchs et al., 1988; and Fuchs & Deno, 1992), which were conducted on heterogeneous groups of children who spanned several grade levels. For example, the .86 coefficient between reading aloud and Gates total is comparable to those reported between reading aloud and (a) the Stanford Achievement Test (.90), (b) the SRA Achievement test (.80) (Marston & Deno, 1982), and (c) the Stanford Achievement Test (.91) (Fuchs et al., 1988).

We had hypothesized that correlations computed on age-heterogeneous groups would tend to overestimate the relation between scores derived from informal measures and achievement tests at particular grade levels. Our prediction was supported in the case of the maze measure; none of the individual grade-level correlations between maze and either achievement test reached the level of the comparable cross-grade coefficients. Although oral reading and achievement test performance were as highly correlated for some specific grade levels as they were across the span of grades, correlations at other grade levels were considerably lower. Overall, both informal measures showed a strong relationship to reading achievement test scores, particularly at the lower grade levels. However, the concurrent validity of reading aloud, using standardized achievement tests as the criterion, may depend in part on the student's grade level.

Concurrent Validity at Different Grade Levels

Performance on oral reading and achievement tests appeared to be less strongly associated at the higher grade levels, with the lowest coefficients observed for measurements taken at the beginning (Gates) and end (MAT) of the 6th grade. Our confidence in the negative trend between oral reading-achievement test correlations is strengthened by the fact the result was observed with two achievement tests and with oral reading measurements gathered on two separate occasions.

We considered four possible explanations for this declining relationship. First, oral reading performance may be a more sensitive measure of individual differences in reading ability for students in the primary grades than those in the intermediate grades. That is, decreased group variance in oral reading scores at intermediate grade levels could depress the correlation between this measure and standardized tests. However, examination of the standard deviations in Tables 1 and 2 indicates that group variability on the oral reading measure did not decrease with grade level.

A second potential explanation for the declining relationship between scores on oral reading and achievement tests involves differences between these measures in their sensitivity to gains in reading proficiency. It could be argued that achievement tests in the intermediate grades continue to reflect growth in reading proficiency, but oral reading scores have, by then, reached asymptote for most students and thus no longer reflect growth in reading proficiency. However, examination of oral reading scores in Table 2 reveals that within-year growth means at the higher grade levels (28 and 21 for Grades 5 and 6, respectively) were comparable to those at the lower grades (36 and 22 for Grades 3 and 4, respectively). We take these oral reading gains as evidence of improving "reading proficiency," although the proficiency measured on oral reading tests may differ from that measured on achievement tests, which brings us to a third potential explanation.

Reading achievement tests may emphasize different aspects of reading proficiency at lower and upper grade levels. One indication of this difference in emphasis can be seen in the MAT. For Grades 2-4, the total reading score is based on a combination of vocabulary, word recognition and comprehension subtests; but by Grades 5 and 6, the word recognition subtest has been eliminated. However, the declining relationship between oral reading and achievement tests cannot be explained solely by differences in the number of subtests contributing to the total reading score, because we observed the same declining pattern of correlations when just the comprehension subtest, rather than the total reading score, was used to compute the correlations with oral reading.

Nevertheless, the differences in the number of subtests at lower and upper grade levels may be taken as an indication that test developers think about reading proficiency differently for younger and older readers. Even within a specific subtest, subtle differences can be observed in the type of text selections and in the reasoning demands inherent to the test items that appear in lower and upper grade batteries. For example, there are twice as many narrative selections on the Primary 2 MAT comprehension subtest as on the Intermediate level, and twice as many nonfiction selections on the Intermediate as on the Primary 2 level. The reasoning demands of the test items also appeared to increase. Specifically, between the Primary 2 and Intermediate levels of the comprehension subtest we observed increases in the proportion of test- and script-implicit questions (Pearson & Johnson, 1978), the proportion of items requiring students to judge the meaning of a word or phrase in context, and the proportion of test items requiting students to discriminate between 3 versus 4 multiple-choice alternatives (45% of the items at Grade 2 versus 100% at Grades 5 and 6 used four choice alternatives).

Changes in the character of achievement tests used in lower versus higher grade levels may partially explain the negative trend in correlations between achievement and oral reading tests, but it is not the whole story. Students in Grades 2, 3, and 4 all took different forms of the Gates, yet as Table 3 indicates, the correlations between comprehension and oral reading scores were of comparable magnitudes. Furthermore, the scores of students in Grades 4, 5, and 6, all generated from the same form of the Gates, showed a declining relationship with oral reading (.87 to .67 for total reading and .86 to .62 for comprehension). Thus, not only differences across grade levels in the tests themselves, but also differences across grade levels in readers' abilities may be implicated in the changing relationships between oral reading and achievement test performance.

Although decoding skills, language comprehension abilities, and world knowledge contribute to answering comprehension questions and to oral reading fluency (Stanovich, Cunningham, & Freeman, 1984), their relative contribution to test performance is probably affected by factors such as the reader's facility in decoding, the reader's familiarity with and the complexity of the information conveyed in the text, and the demands of the specific test items. We propose that there are two conditions under which individual differences in decoding skills will have larger effects on task performance than will individual differences in language comprehension and world knowledge. The first occurs when a reading task requires levels of language comprehension and world knowledge that most individuals possess (e.g., the oral reading passages used in the present study and many of the passages used in the lower levels of reading achievement tests). The second condition occurs when the reading task requires levels of language comprehension and world knowledge that differentiate individuals in the reader group, but individual differences in decoding skills create problems of access to information inherent in the text and questions. In the second condition, individual differences in language comprehension and world knowledge would be expected to make increasingly larger contributions to test performance as individual differences in decoding become less important, that is, as the proportion of the students who possess decoding skills that are adequate to the task increases.

Applying this analysis to the present results, oral reading and achievement test performance may have been more highly correlated at the lower grades because both types of measures were affected primarily by individual differences in decoding. In the upper grade levels, however, the correlations diminished somewhat, because individual differences in language comprehension and world knowledge began to have larger effects on achievement test performance. By then, more of the students had gained access to text information through improved decoding skills. Conversely, performance on the oral reading measure, which made minimal demands on students' language comprehension and world knowledge, continued to be affected primarily by individual differences in decoding ability. Thus, the "reading proficiency" measured by achievement tests, which at lower grade levels was reasonably congruent with the "reading proficiency" measured by oral reading tests, seemed to differ from that measured in the upper grade levels, in part because of (a) the altered demands of achievement tests and (b) improvements in readers' decoding skills which in turn allow them to gain basic access to printed information. However, our oral reading measures were based on relatively easy material, which probably emphasized individual differences in decoding speed. Further research is warranted in which students' oral reading fluency is assessed with passages of graded difficulty and in which language and world knowledge demands are more comparable to those of reading achievement tests.

Concerning the results of the maze tests, the strength of the association between maze and achievement test performance did not show a similar decline over the elementary grades. Maze-achievement correlations fluctuated between the mid-60s and 70s with no clear trend emerging. The correlations involving maze are somewhat more difficult to interpret because this task generated a different performance profile from that of either the achievement or oral reading tests. Whereas the standard deviations for the latter measures remained comparable across grade levels, standard deviations on the maze showed marked increases across grades, from a 2nd-grade level of 4.9 to a 6th-grade level of 11.1 (autumn tests), and 8.0 to 12.8 for 2nd and 6th grades, respectively (spring tests). Increases in variance such as these should have been reflected in increased correlations between achievement tests and maze performance, if both measures were tapping the same construct. That increases in the correlations were not observed suggests that maze and achievement tests may have actually been more similar measures at lower than they were at upper grade levels. The scatterplot of 2nd-grade autumn maze-Gates results showed a heavy concentration of low maze scores, suggesting that this measure was not particularly sensitive to individual differences among the less skilled readers. The maze seems more sensitive to individual differences at upper than at lower grade levels.

The absence of a definitive explanation not-withstanding, performance on oral reading and achievement tests seems more closely aligned in lower than in upper grades, whereas the relationship between maze and achievement test performance remains relatively stable across grades. Teachers who use these informal measures to gauge and monitor their students' reading development should be aware that the two measures may not be entirely congruent, and that they do not necessarily maintain a uniform level of concurrent validity with standardized achievement tests across grade levels.

However, to date, formative instruction systems using maze and oral reading assessments have been employed exclusively with students with learning disabilities and those educationally at risk (Fuchs, Deno, & Mirkin, 1984; Fuchs, Fuchs, & Hamlett, 1989; Fuchs, Fuchs, Hamlett, & Ferguson, 1990). Typically, the reading performance of older students with learning disabilities resembles that of younger students without disabilities. For these older low-achieving students, changes in oral reading performance may reflect performance on standardized achievement tests in a manner similar to that observed with younger students without disabilities.

Teacher Judgment as Validity Criterion

Both formal and informal tests rely on somewhat uniformly applied criteria and standardized procedures to obtain information about students' reading abilities. In contrast, classroom teachers use variable judgment criteria and relatively nonstandardized procedures (multiple observations across a broad range of reading tasks) to obtain information about their students' reading abilities. Students' maze and oral reading performance correlated with teacher judgments of their reading proficiency at .56 and .66, respectively. As Table 4 shows, maze and oral reading correlated nearly as well with this validity criterion (i.e., teacher judgment) as they did with achievement tests.

The "accuracy" of achievement judgments by teachers in our sample was comparable to that found in other studies that have used similar procedures. Hoge and Coladarci (1989) reported a mean correlation of .61 between teacher judgments of students' reading abilities and students' achievement test performance, whereas our teachers' judgments of reading proficiency correlated with achievement test performance at .65. However, across teachers the range of correlation coefficients between teacher judgment and reading test performance was considerable. Correlations between individual teacher's judgments and the Gates test ranged from .08 to .89; for maze from .07 to .83; and for oral reading from .35 to .89. Variability among teachers in their ability to accurately judge students' relative and absolute achievement levels has also been reported by other researchers (e.g., Helmke & Schrader, 1987; Hopkins, George, & Williams, 1985).

Finally, in generalizing from the findings of this study one must be mindful that we employed oral reading and maze passages that were relatively simple, with readability estimates around 2nd-grade difficulty. Our interest in examining the validity claims of earlier studies that had employed age-heterogenous samples dictated our use of easy texts. That is, if we were to include all students in the same correlational analyses it was essential to provide them with the same maze and oral reading passages, and to use passages that were within the reach of even the youngest (2nd grade) reader group. In addition, we were interested in learning whether very simple texts would continue to reflect improvements in reading proficiency of students whose reading skills far exceeded this level of text. As noted previously, students at all grade levels registered significant within-year gains on both maze and oral reading tasks. Measures that are sensitive to growth in reading proficiency (i.e., that generate large within-year gains) are essential to the approach to formative instruction advocated by Deno (1985) and Guthrie et al. (1974), in which teachers base instructional decisions on students' slope, or rate of improvement. The relative sensitivity to growth in reading proficiency of using easy versus grade-level materials is a topic requiting further investigation, particularly as it relates to the standard error of estimate. Ideally, measures that produce a steep slope with a relatively low standard error of estimate (an index of intrastudent instability) are most suited for formative instruction (Fuchs & Fuchs, 1990).

To summarize, our findings indicate that performance on reading-aloud and maze tests, like achievement tests, discloses individual differences in students' reading proficiency. As the grade level of the students tested increases, higher scores are recorded on each index, indicating that all three measures continue to reflect changes in the development of reading skill. Given that the surface features of these reading measures differ considerably, their overall congruence as indexes of general reading proficiency is quite remarkable. However, correlations that are derived from grade-heterogeneous groups tend to overestimate at some grade levels the relationship between achievement tests and these less formal measures. Across grade levels, we observed changes in the magnitude of the relationship between oral reading and achievement tests, suggesting that in the early grades these measures tap similar aspects of reading proficiency; but for reasons that are not entirely clear, oral reading and achievement tests begin to reflect somewhat different aspects of reading proficiency in later grades. We speculated that individual differences in decoding skills account for much of the variance in achievement test performance in the primary grades, and that individual differences in language comprehension and world knowledge begin to play a more prominent role in achievement test performance in the intermediate grades.


Commission on Reading. (1985). Becoming a nation of readers: The report of the Commission on Reading. Washington, DC: The National Institute of Education.

Deno, S.L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219-232.

Deno, S.L., Marston, D., Deno, D.D., & Marston, D. (1988). Reading Progress Monitoring Passages. Minneapolis, MN: Children's Educational Services.

Deno, S.L., Mirkin, P.K., & Chiang, B. (1982). Identifying valid measures of reading. Exceptional Children, 49, 36-45.

Espin, C., Deno, S.L., Maruyama, G., & Cohen, C. (1989). The Basic Academic Skills Survey (BASS): An instrument for screening and identification of children at risk for failure in the regular education classroom. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

Fuchs, L.S., & Deno, S.L. (1992). Effects of curriculum within curriculum-based measurement. Exceptional Children, 58, 232-243.

Fuchs, L.S., Deno, S.L., & Mirkin, P.K. (1984). The effects of frequent curriculum-based measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21,449-460.

Fuchs, L.S., & Fuchs, D. (1990). Identifying an alternative reading measure for curriculum-based measurement. Nashville, TN: Peabody College, Vanderbilt University.

Fuchs, L.S., Fuchs, D., & Hamlett, C.L. (1989). Monitoring reading growth using student recalls: Effects of two teacher feedback systems. Journal of Educational Research, 83(2), 103-110. [Research Report No. 417].

Fuchs, L.S., Fuchs, D., Hamlett, C.L., & Ferguson, C. (1990). Effects of instructional consultation within curriculum-based measurement using a reading maze task. Nashville, TN: Peabody College, Vanderbilt University.

Fuchs, L.S., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special Education, 9(2), 20-29.

Guthrie, J.T. (1973). Reading comprehension and syntactic responses in good and poor readers. Journal of Educational Psychology, 65(3), 294-300.

Guthrie, J.T., Siefert, M., Burnham, N.A., & Caplan, R.I. (1974). The maze technique to assess, monitor reading comprehension. The Reading Teacher, 28(2), 161-168.

Helmke, A., & Schrader, F.W. (1987). Interactional effects of instructional quality and teacher judgment accuracy on achievement. Teaching and Teacher Education, 3, 91-98.

Hoge, R.D., & Coladarci, T. (1989). Teacher-based judgments of academic achievement: A review of literature. Review of Educational Research, 59(3), 297-313.

Hopkins, K.D., George, C.A., & Williams, D.D. (1985). The concurrent validity of standardized achievement tests by content area using teachers' ratings as criteria. Journal of Educational Measurement, 22, 177-182.

Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science. San Francisco: Chandler.

Leinhardt, G., & Greeno, J, G. (1986). The cognitive skill of teaching. Journal of Educational Psychology, 78(2), 75-95.

MacGinitie, W.H., Kamons, J., Kowalski, R.L., MacGinitie, R.K., & McKay, T. (1978). Gates-MacGinitie Reading Tests (2nd ed.). Chicago: Riverside.

Marston, D., & Deno, S.L. (1982). Implementation of direct and repeated measurement in the school setting (Research Report No. 106). Minneapolis: University of Minnesota Institute for Research on Learning Disabilities.

Pearson, P.D., & Johnson, D.D. (1978). Teaching reading comprehension. New York: Holt, Rinehart & Winston.

Prescott, G.A., Balow, I.H., Hogan, T.P., & Farr, R. C. (1984). Metropolitan Achievement Tests (MAT-6). San Antonio, TX: The Psychological Corporation.

Ryckman. D.B. (1982). Gray Oral Reading Tests: Some reliability and validity data with learning disabled children. Psychological Reports, 50, 673-674.

Salvia, J.A., & Ysseldyke, J.E. (1978). Assessment in special and remedial education. Boston: Houghton Mifflin.

Shinn, M.R. (1989). Curriculum-based measurement: Assessing special children. New York: Guilford Press.

Spache, G. (1953). A new readability formula for primary-grade reading materials. Elementary, School Journal, 53, 410-413.

Stahl, S.A. (1989). Gates-MacGinitie Reading Tests (2nd ed.). In V.L. Willson (Ed.), Academic achievement and aptitude testing: Practical applications and test reviews (pp. 158-167). Austin, TX: Pro-Ed.

Stanovich, K.E., Cunningham, A.E., & Freeman, D. J. (1984). Relation between early reading acquisition and word decoding with and without context: A longitudinal study of first-grade children. Journal of Educational Psychology, 76(4), 668-677.

Valencia, S.W., & Pearson, P.D. (1988). Principles for classroom comprehension assessment. Remedial and Special Education, 9, 26-35.


JOSEPH R. JENKINS (CEC WA Federation), Professor of Special Education, MARK JEWELL, doctoral student in special education, University of Washington, Seattle.

Address correspondence to Joseph R. Jenkins, EEU, WJ-10, University of Washington, Seattle, WA 98195.

This research was supported in part by grants from the U.S. Department of Education, Nos H023F80013-88. Points of view or opinions stated in this article do not necessarily represent official agency positions.

Manuscript received June 1991; revision accepted/January 1992.
 Correlations Between Gates-MacGinitie, Maze,
Metropolitan Achievement Tests (MAT-6), and
Oral Reading
 Autumn Tests (n= 335)
Test 1 2
1. Gates[a] Total
2. Maze .85
3. Oral Reading .88 .89
 Spring Tests (n= 326)
Test 1 2
1. MAT-6[b] Total
2. Maze .80
3. Oral Reading .80 .81
 [a] Gates-MacGinitie Reading Tests (MacGinitie,
Kamons, Kowalski, MacGinitie, & McKay, 1978).
 [b] Metropolitan Achievement Tests (MAT-6)(
Prescott, Balow, Hogan, & Fart, 1984).
 Mean Correlations Among Teachers' Judgments
and Measures of Reading Ability
Measure 1 2 3
1. Teacher judgment
2. Gates-MacGinitie[a] .64
3. Maze .56 .61
4. Oral reading .66 .73 .70
 Note: Correlations based on the 15 lowest readers
(according to teacher judgment)/classroom. N = 210.
[a]Gates-MacGinitie Reading Tests (MacGinitie,
Kamons, Kowalski, MacGinitie, & McKay, 1978).

COPYRIGHT 1993 Council for Exceptional Children
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1993 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Jenkins, Joseph R.; Jewell, Mark
Publication:Exceptional Children
Date:Mar 1, 1993
Previous Article:The perceived environment of special education classrooms for adolescents: a revision of the Classroom Environment Scale.
Next Article:"Flat" versus "weighted" reimbursement formulas: a longitudinal analysis of statewide special education funding practices.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters