Printer Friendly

Monitoring children with reading disabilities' response to phonics intervention: are there differences between intervention aligned and general skill progress monitoring assessments?

Traditionally, assessment in special education has focused on identifying students in need of special education through the use of standardized measures (Deno, 1997). Although this type of assessment has had an important historical role in special education, the use of norm-referenced tests has been shown to be unreliable for use in monitoring student progress over time (Deno, 1992). Assessments used for progress-monitoring assessments need to be sensitive to small changes in skills over a period of time. As special educators are becoming more responsible for ensuring effective interventions for students, there is a need for reliable and valid assessments that can continually monitor student progress and index performance (Vaughn & Fuchs, 2003).

Progress-monitoring assessments must meet several requirements. First, the material used for progress monitoring must be representative of the academic competence expected of students at the end of the school year. The measure must also be free of floor or ceiling effects, as well as demonstrate sensitivity to change over a short period of time as students gain more skills (L. S. Fuchs & Fuchs, 1999). In addition, the assessment must be authentic and have adequate reliability and validity (Deno, 1997; L. S. Fuchs & Fuchs, 1999). Finally, the assessment must accurately predict gains on more generalized outcome measures, such as standardized assessments (Good, Simmons, & Kame'enui, 2001). Progress-monitoring instruments that meet these requirements allow teachers to have confidence that the instrument is providing valid and reliable data.

Teachers use progress-monitoring assessments to make educational decisions and improve instructional effectiveness. Progress-monitoring assessments provide frequent data that teachers can use to make decisions about an individual student's instructional needs. For example, based on a student's progress, a teacher may decide to increase the amount of instruction, slow the pace of the instruction, or change the instructional method completely. The use of progress-monitoring instruments in special education has been shown to improve student outcomes in academic areas (e.g., L. S. Fuchs & Fuchs, 1986b; L. S. Fuchs, Fuchs, & Hamlett, 1989).

Progress monitoring is a broad classification encompassing a variety of measures designed to track student progress and assist in instructional decision making. Examples of progress-monitoring instruments include mastery measurement and curriculum-based assessment. Curriculum-based measurement, one type of curriculum-based assessment, is one of the most widely used and scientifically validated progress-monitoring instruments.


Curriculum-based measurement (CBM) was a tool initially designed for special education teachers to measure performance level, monitor progress in basic academic skills, and assist teachers in decision making (Deno & Mirkin, 1977). Research has shown that CBM monitors progress in special education students more accurately than norm-referenced tests (Elliot & Fuchs, 1997). Since its inception in 1977, more than 150 studies have documented the psychometric reliability and validity of CBM as well as its ability to document student growth and inform instructional decisions (for a concise review of the strengths and limitations of CBM, see L. S. Fuchs & Fuchs, 1999). CBM has gathered interest in both special and general education (Deno, 2003) due in part to the lower cost and increased sensitivity to small changes in growth as compared to norm-referenced tests, as well as its ease of administration by teachers (Knutson & Shinn, 1991; Marston, Fuchs, & Deno, 1985).

The vast majority of CBM reading research has focused on measuring a student's oral reading fluency (ORF) as a measure of reading competence. ORF is typically used with students reading at a mid-first-grade level or higher. Students reading below this level are monitored using other forms of CBM, such as word lists or letter sounds (e.g., D. Fuchs, Fuchs, & Compton, 2004; L. S. Fuchs, Fuchs, & Compton, 2004; Good & Kaminski, 2002). Oral reading fluency measures a student's rate and accuracy when reading a short passage for a specified time period. ORF has been shown to be a reliable and valid measure of a student's overall reading competence, including reading comprehension, as fluent reading in text requires the automatization and coordination of multiple reading skills. (e.g., L. S. Fuchs, Fuchs, Hosp, & Jenkins, 2001; Shinn, Good, Knutson, Tilly, & Collins, 1992). In addition, ORF is a sensitive measure of reading progress (see L. S. Fuchs, 1995 for a summary).


Although ORF has a widely established research base in progress monitoring, we were interested in developing a different theoretically based assessment that would be highly sensitive to individual differences in word reading acquisition, and would demonstrate the capacity to change systematically and predictably with decoding instruction. An intervention aligned word list (IAWL) was specifically designed as a progress-monitoring assessment for this study. This measure comprised words representing the decoding skills students would obtain as they progressed through the reading intervention and was developed to assess children's responsiveness to instruction. The IAWL was designed as a progress-monitoring instrument that would directly monitor changes in decoding and word reading acquisition as a result of instruction that then generalized to a larger corpus of words. The IAWL represents a potentially innovative method of accurately measuring student progress for program evaluation, and that will also identify students not responding to treatment who might need remediation. For example, school administrators could use an IAWL to easily track classroom progress in specific reading interventions, whereas teachers could identify students in need of more intensive instruction.


The IAWL is a theoretically based progress-monitoring assessment containing a list of 50 words. The IAWL differs from other progress-monitoring assessments in that it is tied directly to the specific intervention students receive. In this case, the IAWL is designed to measure a student's ability to apply the strategies taught in the Phonological and Strategy Training (PHAST; Lovett, Lacerenza, & Borden, 2000; Lovett, Lacerenza, Borden et al., 2000) decoding and word reading program. To accomplish this, a systematic procedure was developed for sampling words for the IAWI,. This sampling technique allowed individual growth on the IAWL to be aligned with individual growth in the intervention program and to generalize to a larger corpus of decodable words.

A series of steps was followed to develop the IAWL. First, a list of the 10,000 most frequently read words (Zeno, Ivens, Millard, & Duvvuri, 1995) were coded using a hierarchical set of coding rules to determine the lesson in which an individual word became decodable. The hierarchy of strategies was chosen to represent larger to smaller orthographic units. This set of rules ensured that a student had been exposed to the strategies necessary to decode each word, and if applied correctly, was a set of effective decoding strategies. Based on the coding scheme, 6,335 words were determined to be decodable using strategies taught in the first 60 lessons of the PHAST reading program. From the corpus of 6,335 words, 50 words were chosen. The word list was constructed to represent the potential of each PHAST lesson to add decodable words to a student's reading lexicon. In addition, words were selected to reflect increases in average word length of decodable words across 60 lessons. To help encourage the application of decoding strategies, lower frequency words were selected. The resulting list reflected an increase in difficulty as students moved down the list. It was expected that words presented higher on the list would require fewer strategies to decode the word, whereas words further down the list would require the student to make decisions about the type and number of strategies necessary to decode the word. In addition, the list was designed to determine whether students would become better at decoding words after the lessons teaching the required skills for a word were presented. It was expected that there would be an increase in percentage passing for an individual word once the necessary skills to decode the word had been taught.

It is important to note that the IAWL is not a single skill measure or mastery measure being used for progress monitoring as criticized by L. S. Fuchs (2004) and L. S. Fuchs and Fuchs (1999). Single skill and mastery measures assess whether a student has mastered a single discrete skill; for example, decoding consonant-vowel-consonant words. In addition, mastery measurement rarely establishes reliability and validity, whereas the [AWL was designed to demonstrate adequate reliability and validity.

The IAWL differs from ORF in two important ways. First, the IAWL can be considered a curriculum-dependent assessment, due to the specific alignment between the development of the measure and the PHAST reading curriculum. The ORF passages used in this study reflected a more generalized outcome of reading competency. Second, the [AWL required students to read a list of words in an untimed manner, emphasizing decoding and word reading accuracy, whereas ORF passages emphasized passage reading accuracy and fluency under timed conditions.

This study evaluated whether the IAWL and ORF measures differentially predicted growth in reading skills. We hypothesized that the [AWL had the potential to be more sensitive to changes in decoding and word reading acquisition as a result of instruction because it was directly tied to the intervention, whereas a traditional measure of progress monitoring, ORE would be a more sensitive proxy of growth in passage reading accuracy, fluency, and comprehension. If true, the results of the study would highlight the importance of selecting a progress monitoring instrument that addresses the specific goals of the intervention. In addition, our research purposes included evaluating the psychometric properties of the IAWL. First, we examined whether the [AWL functioned according to the theoretical development. Second, we evaluated whether the hierarchical coding scheme accurately predicted each word's rank order in terms of difficulty.



Participants were 40 students recruited from resource room classrooms in a metropolitan school district in the southeastern United States. All participants were identified as having a learning disability according to state guidelines, which require students with learning disabilities to demonstrate a significant discrepancy between academic achievement and intellectual capacity. To recruit potential students, elementary special education teachers were asked to identify students with word-level reading difficulties. The identified students were screened to determine those who met the study's eligibility criteria. To be selected for participation in the study, students (a) received resource room services for reading instruction; (b) had individualized education program (IEP) goals in the area of decoding skill acquisition; (c) had a composite score on the Test of Word Reading Efficiency (Torgesen, Wagner, & Rashotte, 1997), a measure of decoding and word reading efficiency, below the 25th percentile; (d) had an estimated IQ above 70; and (e) had no documented neurological or emotional problems, no uncorrected sensory deficits, and were not second language learners. The present study included 18 boys and 22 girls who completed 70 hr of an intensive reading intervention from a larger study (see Compton et al., 2005). There were 20 third graders, 16 fourth graders, and 4 fifth graders. Students ranged in age from 8.0 to 11.5 (M = 9.61, SD = 0.82). More than half of the participants (55%) were minority students, including 21 African American students and one Hispanic student. Most students (82.5%) were eligible for free or reduced-price lunch as identified by the local school district. The intervention group was approximately 1 to 1 1/2 standard deviations below the mean on standardized measures of intelligence (Weschler Abbreviated Scale of Intelligence; Weschler, 1999; M = 83.8, SD = 11.0), rapid naming speed (Comprehensive Test of Phonological Processing; Wagner, Torgeson, & Rashotte, 1999; M= 85.1, SD = 12.3), phonological awareness (Comprehensive Test of Phonological Processing; Wagner et al.; M = 78.3, SD = 9.4), and receptive language (Peabody Picture Vocabulary Test; Dunn & Dunn, 1997; M = 87.0, SD = 11.7).


The Phonological and Strategy Training Program developed by Lovett and colleagues at the University of Toronto (Lovett, Lacerenza, & Borden, 2000; Lovett, Lacerenza, Borden, et al., 2000) was the reading intervention used in this study. PHAST is a systematic and sequential reading program in which students receive phonologically based remediation along with word identification strategies. Phonological remediation is presented in a direct instruction format using Reading Mastery Fast Cycle I/II Program (Engelmann & Brunner, 1988). Reading Mastery presents lessons in a systematic and sequential format with opportunities for repeated practice. The Reading Mastery lessons were presented to students as the first word identification strategy (Sounding Out). Students were also explicitly taught three other word identification strategies including word identification by analogy (Rhyming), trying two vowel sounds (Vowel Alert), and segmenting affixes in a multisyllabic word (Peering Off). The PHAST program consists of 70 lessons, though because of time constraints, the current study taught only the first 60 lessons and did not teach the seeking the part you know (I Spy) strategy. For further information regarding the PHAST program, see Lovett, Lacerenza, and Borden (2000) and Lovett, Lacerenza, Borden, et al. (2000).

Graduate research assistants were trained by trainers from the University of Toronto and followed the scripted lessons and instructional materials provided by the Lovett group. All students received 60 PHAST lessons over 70 hr of instruction in groups of three to five. Lessons were conducted in the student's classroom or in another familiar schoolroom. Treatment fidelity was assessed in two ways. The project coordinator observed each reading group ever), 2 weeks to assess fidelity of treatment. In addition, 25% of the lessons were audiotaped. The project coordinator listened to the audiotapes and scored each lesson using a checklist noting the important components of each strategy. Treatment fidelity was greater than 95% across all groups over the course of the study.


All assessments were individually administered in a familiar and quiet environment by trained project staff. Table 1 lists the mean performance of participants at pretest and posttest.

Pretest and Posttest Assessments. The Gray Oral Reading Test-3 (GORT; Wiederholt & Bryant, 1992) provides separate scores for timed passage reading fluency, accuracy, and comprehension. This nationally normed test consists of increasingly more difficult passages in which a student reads a passage in a timed manner and then answers five multiple-choice questions. Split-half reliability exceeds .80.

The Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1997) consists of two subtests, which assess timed word reading and nonword reading fluency and accuracy. The word reading subtest consists of a list of real printed words and the nonword reading subtest consists of a list of decodable nonsense words. The student's score reflects the number of words and nonwords pronounced correctly in 45 s. Test-retest reliability for the subtests ranges from .83 to .96.

The word identification and word attack subtests of the Woodcock Reading Mastery Test-R/NU (WRMT-R/NU; Woodcock, 1987) were used in this study to assess untimed decoding and word reading ability. The Word Identification subtest consists of a list of words that a student reads. The Word Attack subtest consists of a list of nonsense words that assesses phonic and structural analysis skills. These untimed subtests are nationally normed and split-half reliability exceeds .90 for each subtest.

Progress Monitoring Assessments. All participants were assessed six times throughout the study with the IAWL and ORF measures. The first assessment occurred before the start of the intervention (lesson 0). The remaining five assessments were equally spaced and administered after students completed lessons 12, 24, 36, 48, and 60. Students did not receive feedback regarding performance on the IAWL or the ORF measure.

ORF data were collected using second-grade passages to assess reading accuracy and fluency. Second-grade passages were chosen for the study as they reflected long-term challenging material for the majority of students. The use of challenging level material has been shown to reduce variability of performance and possible ceiling effects, as well as to provide a better index of progress toward the desired goals (Shinn, 1989). In addition, the use of challenging material has been shown to have stronger correlation with global achievement tests than material chosen to reflect short-term gains (L. S. Fuchs, 1989). However, the use of challenging material can result in floor effects if the passages are too difficult to show sensitivity to small changes in word reading abilities. An examination of students scoring at the lower end of oral reading fluency revealed equivalent growth rates as the larger sample; therefore, floor effects did not appear to be present. To reduce measurement error, students read two passages in each assessment wave. Alternate form passages were used throughout the study so that each student never read the same passage twice. Test-retest and parallel form reliabilities ranged from .82 to .97 in a previous study with second graders, with most above .90 (Compton, Appleton, & Hosp, 2004). Each passage was rated as appropriate for second-grade students (L. S. Fuchs & Fuchs, 1992) and ranged from 1.9 to 2.6 grade-level equivalency as calculated with the Spache readability formula (Compton et al., 2004). The ORF passages were generally equivalent to reading materials used in the intervention and other classroom reading instruction.

At each assessment wave, a research assistant presented an ORF passage to the student with directions to read the story as quickly and carefully as possible for 1 min. The research assistant started the timer when the student read the first word. If the student did not attempt a word in 3 s, then the research assistant provided the word and asked the student to continue reading. Words read incorrectly, omitted, or provided to the student were counted as errors. Inserted words or repeated words were not counted as errors. The student's ORF score was the number of words read correctly in 1 min averaged across the two passages.

The IAWL consisted of one word list that was used in each of the six assessment waves. The IAWL was developed starting with a set of hierarchical coding rules to determine the lesson in which an individual word became decodable. The hierarchical order in which the strategies were applied was Peeling Off, Rhyming, Sounding Out, and Vowel Alert. Two research assistants (coders) individually coded each of the 10,000 words, with the third author of this article resolving disagreements. Approximately 63% of the 10,000 words (6,335) were decodable based on the coding scheme. Words that were not decodable were removed from the pool. The coders first applied the Peeling Off strategy and recorded the lesson number at which any prefixes or suffixes were taught. Next, coders identified any spelling patterns (Rhyming) in the word or remaining base word, and then recorded the lesson number in which the spelling pattern was taught. Third, coders identified the letter sequences in which Sounding Out and/or Vowel Alert strategies could be used to decode the word or base word and recorded the lesson number in which the letter sounds or vowel variations were taught. The lesson at which the word was decodable was determined by recording the highest lesson number after applying the hierarchical coding scheme. For example, the word mat was considered decodable at lesson 11, when the blending skills and letter sounds for m, a, and t had been taught. The word unlikely was considered decodable at lesson 42, when the affixes un and ly had been taught, as well as the letter sounds for l, i, and k, and the final e rule.

After the words were coded, each PHAST lesson was examined to determine its potential to add decodable words to a student's reading lexicon. For example, 112 words, or approximately 1.8% of the 6,335 words, were coded as becoming decodable during lesson 31, whereas 471 words, or approximately 7.5% of the words, were coded as becoming decodable during lesson 49. To reflect this difference in the potential of each lesson, one word (string) was added to the IAWL from lesson 31, whereas 4 words (shiny, horizon, isolated, silent) were added from lesson 49 to the IAWL. Lessons that did not provide adequate potential to add words to a student's lexicon were not represented on the 50-word list. Lower frequency words were chosen that would be in the student's oral vocabulary, but not in their reading vocabulary, to encourage students to use their acquired strategies to decode each word. In addition, words were chosen to reflect increases in average word length across the 60 PHAST lessons. All 50 words on the IAWL were represented in the grade-level corpora for Grades 3-5 (Zeno et al., 1995). The words on the IAWL were not directly taught to the students during the intervention to control for the effects of familiarity. Students needed to rely on their knowledge and application of the taught strategies to decode the words. Appendix A lists the order of words as determined by the hierarchical coding scheme on the IAWL as well as other relevant information regarding construction of the IAWL.

At each assessment wave, a research assistant presented the student with the 50-word IAWL broken into lists of 10 words printed on cardstock. The student was directed to read each word and to use the strategies they had been taught for unfamiliar words. After the student read the list of 10 words, the next 10 words were presented to the student until the student failed to read 10 words in a row. After this ceiling was reached, the research assistant asked the student to look at each remaining list of 10 words and see if there were any words that could be read.

The internal consistency reliability, Chronbach's alpha, of the set of 50 words over the six assessment waves exceeded .94 for each assessment wave.



Hierarchical linear modeling (HLM, Raudenbush & Bryk, 2002) was used to model each student's initial status (score at lesson 0) and slope (rate of change across each assessment wave) parameters. HLM models a best-fit ordinary least squares regression line for each student and provides overall mean initial status and slope coefficients. Two separate models were run, with IAWL and ORF as the outcome measures. Preliminary analysis indicated that linear growth models provided adequate fit for the IAWL and ORF data. Hypothesis testing for the fixed effect (t ratio) revealed that the initial status and slope parameters were significantly different from zero. The homogeneity of growth parameters ([X.sup.2] statistic) revealed significant individual variation between students, therefore the intercept and slope terms were allowed to vary randomly. Table 2 lists the results and fit statistics for the IAWL and ORF measures. Results for the IAWL indicate that, on average, students read 11.2 words correctly at lesson 0 and gained 3.1 words read correctly each assessment wave. Results for the ORF measure indicate that, on average, students read 44.9 words correctly per min at lesson 0 and gained 2.6 words per min each assessment wave. These results demonstrate a general trend of improvement for students on both the IAWL and ORF assessments. Figure 1 shows the estimated individual growth curves for each student on the IAWL and ORF measures. This general trend of improvement is consistent with standardized test comparisons of word reading ability between intervention and control students in the larger study. Intervention participants showed significant pretest to posttest gains on measures of untimed word decoding and timed word reading, as compared to control participants. Effect sizes for untimed and timed decoding and word reading, fluency, and comprehension ranged from -.06 to 1.19 (for detailed results, see Compton et al., 2005).


The ORF and IAWL initial status and slope parameters were then entered in multiple regression analysis to determine the unique variance explained (expressed as [R.sup.2]) by the IAWL and ORF slope parameters in predicting raw change scores on standardized tests, after controlling for a student's initial status on the IAWL and ORF measures. For example, to calculate the unique variance explained by the IAWL slope parameter, the IAWL and ORF initial status parameters were first entered as a block. These two parameters were significantly correlated (r = .67, p < .000).

Next, the ORF slope parameter was entered. Finally, the IAWL slope parameter was entered. Table 3 displays the variance explained ([R.sup.2]) by the IAWL and ORF slope parameters. The unique variance accounted for by the IAWL slope parameter after controlling for the IAWL and ORF initial status and the ORF slope parameters ranged from .01 to .29, with significant results for measures of untimed and timed decoding and word reading ability, as well as passage reading accuracy. The unique variance accounted for by the ORF slope parameter after controlling for the IAWL and ORF initial status and the IAWL slope parameters ranged from .00 to .19, with significant results on a measure of passage fluency. The unique variance accounted for by the IAWL and ORF slopes is similar regardless of the order entered due to the absence of correlation between the two parameters (r = .01). In summary, the results partially supported our hypothesis that the IAWL and ORF measures would differentially predict growth on standardized tests of reading. The IAWL accounted for unique variance on measures of decoding and word reading accuracy, as well as passage reading accuracy, whereas the ORF measure accounted for unique variance on a measure of passage reading fluency, although it did not account for unique variance in passage reading accuracy or comprehension.


To evaluate the construction of the overall word list, generalized estimating equations (GEE) were used. Because each child was tested repeatedly, the independence assumption of analysis of variance was violated, and a longitudinal analysis of the data was necessary. In addition to providing hypothesis testing, GEE modeling provides orderly model scores that can be used to demonstrate main trends visually. GEE can be used when the outcome is 0 or 1 (as in incorrect or correct), and it accounts for the lack of independence among observations (Liang & Zeger, 1986). Statistical Analysis Software (SAS; SAS Institute, 2001) PROC GENMOD was used to estimate a model that examined the main effects of "Assessment Wave," "Word Order," "Taught," and the interaction of "Assessment Wave by Taught."

According to this model, whether a word was pronounced correctly was a function of three main effects and one interaction: (a) The "Assessment Wave" (0, 12, 24, 36, 48, and 60), representing each of the assessments given every 12 lessons; (b) "Words" (scored from 1-50), an ordinal number representing each word's relative theory-based difficulty; (c) "Taught" (0-1), an indicator of whether the lesson directly pertaining to a given word had been taught; and (d) "Assessment Wave by Taught," the interaction showing how "Assessment Wave" moderated the effects of "Taught." For the effect of Taught, words were coded as "0" when the lesson in which the word became decodable had not been taught, and "1" after the lesson in which that particular word became decodable had been taught. Taught was thus a time-varying covariate in the longitudinal GEE model.

The results for the GEE modeling are listed in Table 4. Results were significant for all three main effects and the interaction effect. Significant results for "Assessment Wave" demonstrated an increase in percentage passing at each assessment wave. Significant results for "Word" revealed that as words progressed from 1-50, the percentage passing went down, indicating that words were more difficult to decode the further they were down the list, hence the negative coefficient. Significant results for "Taught" showed that when the lesson in which individual words became decodable was taught, percentage passing improved. The significant interaction of "Assessment Wave by Taught" indicated a weakening effect in the potency of a "Taught" lesson for words further down the list. These results are best understood in a visual representation as shown in Figure 2. Each line represents one of the 50 words on the list, with the words rank ordered 1-50 from top to bottom. The effect of "Assessment Wave" is seen by the general upward trend from lesson 0 to lesson 60. The white diamond-shaped spaces illustrate the interaction of "Assessment Wave by Taught," the improvement in percentage passing after the lesson in which a word becomes decodable is taught. As seen in Figure 2, this effect is greater for words higher in the word list, showing the strength of the intervention in improving students' skills in decoding those words. As words become more difficult, the intervention's strength in improving percentage passing weakened.


The hierarchical coding scheme used to determine the rank ordering of words on the IAWL was evaluated using a one-parameter model based on item response theory (IRT; Bock, 1997; Embretson, 1996; Embretson & Reise, 2000). IRT's model-based difficulty scores provide equal interval scaling and scores that are not sample-dependent. Although percentage passing can be used to estimate word difficulty, IRT estimates are more accurate at the extremes.

Using IRT, we examined the final rank ordering to see whether the hierarchical coding scheme was effective in correctly ordering the difficulty of the words. Figure 3 shows the theoretical rank ordering as determined by the hierarchical coding rules in the left column, and the final rank ordering as determined by the IRT model in the right column. Despite the GEE analysis demonstrating the overall ability of the IAWL to predict when words become decodable and the ability of the taught lessons to increase the percentage passing, IRT analysis showed considerable shifting in the difficulty rank ordering. Some words remained relatively stable, for example the words grid (initial rank 5, final rank 5), bold (initial rank 23, final rank 22), and duplicate (initial rank 39, final rank 39), whereas other words showed considerable shifting in their difficulty rank ordering, such as crayon (initial rank 43, final rank 2), painful (initial rank 24, final rank 4), and grind (initial rank 19, final rank 45). In summary, the evaluation of the psychometric properties and the hierarchical coding scheme was mixed. Overall, the IAWL operated as expected, with an improvement in percentage passing occurring over time, as well as an improvement in percentage passing after the lessons teaching the skill necessary to decode an individual word had been taught. However, the IRT results suggest that the hierarchical coding scheme was not effective for determining the final word difficulty ordering for students in this study.



In this study we introduced a theoretically based progress-monitoring assessment developed to monitor decoding and word reading acquisition within the PHAST reading intervention. Results indicated that the IAWL accounted for unique variance on measures of timed and untimed word reading and decoding and timed passage accuracy. ORF, a traditional progress-monitoring assessment, accounted for unique variance on a measure of passage reading fluency. These results address the importance of choosing an appropriate progress-monitoring assessment based on the skill level of the student, the targeted remediation, and the goals of the intervention. For example, if a student requiring extensive word reading remediation receives an intervention targeting decoding and word reading, such as the PHAST intervention, and the goals of the intervention are to improve a student's word reading ability, then the IAWL appears to be an effective progress-monitoring instrument. However, if the goals of the intervention are to generalize decoding and word reading gains to passage reading fluency, then the ORF measure would be an effective progress-monitoring instrument. It is interesting to note that gains in passage reading fluency have been shown to generalize to gains in reading comprehension (e.g., L. S. Fuchs et al., 2001; Shinn et al., 1992), yet in this sample, neither the IAWL nor the ORF measure accounted for unique variance on a standardized measure of reading comprehension. This may be due to the explicit focus of the PHAST reading intervention on decoding and word reading, as well as the severity of the reading disability exhibited by this sample of students. These results partially support our hypothesis that the IAWL appears to monitor more intervention specific goals, whereas the ORF measure appears to monitor more generalized effects (L. S. Fuchs & Fuchs, 1986a). It is also important to note that the combined measures of the IAWL and ORF initial status and slope failed to account for 49% to 74% of unique variance in raw score growth on standardized reading measures. Clearly, other factors accounted for unique variance in this sample. Further research should investigate methods of improving the unique variance accounted for by progress-monitoring instruments such as the IAWL. It may be that there is an advantage to using word lists as a method of monitoring progress in students with severe reading disabilities, and the choice of a more generalized word list outcome may be more effective than passages or a list such as the IAWL.

The adequacy of the test construction for the IAWL was important in establishing the validity of the measure. The IAWL appeared to meet the requirements for progress-monitoring instruments as addressed earlier in this article (Deno, 1997; L. S. Fuchs & Fuchs, 1999; Good et al., 2001). The IAWL was free from floor or ceiling effects, and appeared to be sensitive to small changes in decoding and word reading gains, with students gaining an average of 3.1 words per assessment wave, or about 1 word each week.

When analyzing the overall measure, the IAWL assessment behaved as developed. Words coded as becoming decodable earlier in the intervention had a higher percentage passing than words coded as becoming decodable later in the program. As students progressed through the reading intervention, the percentage passing improved each assessment wave. In earlier assessment waves, the percentage passing increased as students were exposed to the skills necessary to decode a word; however, the effect of being "taught" the necessary skills diminished with each assessment wave. This may be a result of several factors. Words appearing earlier on the word list required only one or two strategies to decode the word. In addition, students had only been exposed to a few decoding strategies and therefore had fewer decisions to make about which strategy to use. Words appearing later on the word list required the use of multiple strategies, as well as requiring the student to determine the most effective set of strategies to decode the word. Students also had less exposure to the strategies taught later in the program than the strategies taught earlier in the program. For instance, the Sounding Out strategy was taught in all 60 lessons as well as being a familiar word decoding strategy taught in schools. The first keywords for the Rhyming strategy were introduced in lesson 11 and students learned two new keywords each day until lesson 60. In contrast, the Vowel Alert strategy was not introduced until lesson 46. Words on the IAWL requiring the use of the Vowel Alert strategy, therefore, did not have the benefit of many repetitions for students to become more proficient. A final reason that lessons being taught did not benefit later words is that the words, although listed in the 3-5 grade level corpus (Zeno et al., 1995), may not have been familiar words to the students. The students in this study were almost a full standard deviation below typical peers on a measure of receptive vocabulary. It may be that they were unable to determine which set of strategies resulted in a familiar word.

Despite the results demonstrating the overall effectiveness of IAWL's test construction, IRT analysis showed that the coding system of the words was not as accurate. There was considerable shifting from the theoretical rank ordering of the words and the final rank ordering of the words. In order to achieve reliable coding of the words, a hierarchical coding scheme was used. This coding scheme resulted in some words being coded at unrealistically late lessons in the reading intervention. Although all students had been exposed to the necessary skills to decode the word by the coded lesson, many students could read certain words before being exposed to them in the reading intervention. The hierarchical coding scheme coded words with simple suffixes (e.g., -ing, un-, -less) and rime patterns (e.g., -ig, -en, -up) later in the lesson sequence, yet students could use simple letter sound correspondences taught earlier in the lesson sequence to sound out the word.

The hierarchical coding scheme also did not account for background knowledge or environmental exposure to words that may have allowed students to recognize a word earlier than expected. For example, the word crayon was coded as becoming decodable at lesson 49, very late in the reading intervention; however, IRT analysis showed that it was easier for students to read than expected. Although words were chosen for the word list as ones that were infrequent in children's books, crayons are a common feature in all children's classrooms and students are exposed to the word on every crayon box. Further research is needed to develop reliable coding procedures that closely mimic the psycholinguistic difficulties of words.

Future research should also investigate whether similar procedures can be used to develop responsiveness measures with other reading interventions, or measures that can be used with multiple interventions. One limitation of the IAWL in this study is that it is specific to the PHAST intervention. Current research is underway to determine whether an IAWL can be developed for use with multiple interventions, allowing for comparisons between different reading interventions. In addition, it is important to determine the feasibility of similar responsiveness measures to predict gains in reading skills that focus on different reading skills, such as fluency or comprehension. It may be that an IAWL is more appropriate for reading programs that primarily address decoding and word reading accuracy, such as PHAST.

Finally, the use of the theoretically based IAWL is in the initial stages of development. Further research is needed to investigate the instructional utility of the IAWL and to determine whether the development process can be simplified for teachers. In its current stage of development, it is unlikely that teachers will be able to apply similar techniques to design an IAWL for specific curriculums due to the considerable time required to develop such a measure. However, in the future, it is possible that teachers may be able to use this measure in several ways. First, the IAWL could be used as a measure of responsiveness to intervention. Students who are not gaining skills in decoding may require additional review or a slower pace. Teachers may also be able to identify the most appropriate starting point in the intervention or to determine whether a student has gained the necessary skills to move to the next unit of instruction. Furthermore, the results from this study provide initial evidence as to the importance of selecting a progress-monitoring instrument that addresses the specific goals of the intervention and the individual student. Further research could explore whether monitoring instruments more closely aligned with the curriculums purpose are more accurate. Ultimately, any progress-monitoring instrument must provide information that is instructionally useful to teachers.
50-Item Responsiveness Measure

Word at Lesson Frequency Letters

mat 11 5807 3
deed 11 7866 4
fate 15 4832 4
hail 19 8016 4
grid 21 8938 4
forgot 25 3125 6
shake 27 2971 5
artist 30 8562 6
string 31 1919 6
cactus 34 7082 6
wax 35 4134 3
hive 35 8970 4
beside 36 1053 6
shooting 36 4111 8
vanished 37 5963 8
smelled 37 4281 7
admit 38 3424 5
monster 38 3903 7
grind 39 7274 5
repeat 40 3640 6
shortest 41 8343 8
unlikely 42 5960 8
bold 43 4762 4
painful 44 4435 7
throat 45 2316 6
robot 46 6338 5
choking 46 9867 7
crater 47 8735 6
discarded 47 7178 9
rotating 47 9370 8
fever 48 3346 5
shout 48 4113 8
shiny 49 3273 5
horizon 49 4400 7
isolated 49 4058 8
silent 49 1990 6
demanding 50 6612 9
screen 51 2246 6
duplicate 51 8810 9
considerable 53 2251 12
amuse 53 6478 5
expanded 54 4382 8
crayon 54 9955 6
construction 55 2127 12
greeting 55 5746 8
confusion 56 3442 9
beneath 56 1425 7
argument 57 2479 8
impressive 59 5771 10
ignore 59 4220 6

This study was supported in part by Grant H324D0100003 to Vanderbilt University from the U.S. Department of Education, Office of Special Education Programs, and Core Grant HD15052 from the National Institute of Child Health and Human Development to Vanderbilt University. Statements do not reflect the position or policy of these agencies, and no official endorsement by them should be inferred.

We thank Maureen Lovett and her research team for their generous assistance in helping us implement the PHAST intervention program.

Manuscript received February 2005; accepted August 2005.


Bock, R. D. (1997). A brief history of item response theory. Educational Measurement: Issues and Practice, 16(4), 21-33.

Compton, D. L., Appleton, A. C., & Hosp, M. K. (2004). Exploring the relationship between text-leveling systems and reading accuracy and fluency in second-grade students who are average and poor decoders. Learning Disabilities Research and Practice, 19, 176-184.

Compton, D. L., Olinghouse, N. G., Elleman, A., Vining, J., Appleton, A. C., Vail, J., et al. (2005). Putting transfer back on trial: Modeling individual differences in the transfer of decoding skill gains to other aspects of reading acquisition. Journal of Educational Psychology, 97, 55-69.

Deno, S. L. (1992). The nature and development of curriculum-based measurement. Preventing School Failure, 36(2), 5-10.

Deno, S. L. (1997). Whether thou goest ... Perspectives on progress monitoring. In J. W. Lloyd, E. J. Kameenui, & D. Chard (Eds.), Issues in educating students with disabilities (pp. 77-99). Mahwah, NJ: Erlbaum.

Deno, S. L. (2003). Developments in curriculum-based measurement. Journal of Special Education, 37, 184-192.

Deno, S. L., & Mirkin, P. K. (1977). Data-based program modification: A manual Reston, VA: Council for Exceptional Children.

Dunn, L. M., & Dunn, L. M. (1997). Peabody picture vocabulary test (3rd ed.). Circle Pines, MN: AGS.

Elliot, S. N., & Fuchs, L. S. (19971). The utility of curriculum-based measurement and performance assessment as alternative to traditional intelligence and achievement tests. School Psychology Review, 26, 224-233.

Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341-349.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.

Englemann, S., & Bruner, E. C. (1988). Reading Mastery I/II Fast Cycle: Teacher's guide. Chicago: Science Research Associates.

Fuchs, D., Fuchs, L. S., & Compton, D. L. (2004). Identifying reading disabilities by responsiveness-to-instruction: Specifying measures and criteria. Learning Disability Quarterly, 27, 216-227.

Fuchs, L. S. (1989). Evaluation solutions: Monitoring progress and revising intervention plans. In M. R. Shinn (Ed.), Curriculum-based measurement: Assessing special children (pp. 153-181). New York: Guilford Press.

Fuchs, L. S. (1995, May). Curriculum-based measurement and eligibility decision making: An emphasis on treatment validity and growth. Paper presented at the National Research Council Workshop on alternative to IQ Testing, Washington, DC.

Fuchs, L. S. (2004). The past, present, and future of curriculum-based research. School Psychology Review, 33, 188-192.

Fuchs, L. S., & Fuchs, D. (1986a). Curriculum-based assessment of progress toward long- and short-term goals. Journal of Special Education, 20, 69-82.

Fuchs, L. S., & Fuchs, D. (1986b). Effects of systematic formative evaluation on student achievement: A meta-analysis. Exceptional Children, 53, 199-208.

Fuchs, L. S., & Fuchs, D. (1992). Identifying a measure for monitoring student reading progress. School Psychology Review, 21, 45-58.

Fuchs, L. S., & Fuchs, D. (1999). Monitoring student progress toward development of reading competence: A review of three forms of classroom-based assessment. School Psychology Review, 28, 659-671.

Fuchs, L. S., Fuchs, D., & Compton, D. L. (2004). Monitoring early reading development in first grade: Word identification fluency versus nonsense word fluency. Exceptional Children, 71, 7-21.

Fuchs, L. S., Fuchs, D., & Hamlett, C. (1989). Effects of instrumental use of curriculum-based measurements to enhance instructional program. Remedial and Special Education, 10(2), 43-52.

Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5, 239-256.

Good, R. H., & Kaminski, R. A. (Eds.). (2002). Dynamic indicators of basic early literacy skills (6th ed.). Eugene, OR: Institute for Development of Education Achievement.

Good, R. H., Simmons, D. C., & Kame'enui, E. J. (2001). The importance and decision-making utility of a continuum of fluency-based indicators of foundational reading skills for third-grade high-stakes outcomes. Scientific Studies of Reading, 5, 257-288.

Knutson, N., & Shinn, M. R. (1991). Curriculum-based measurement: Conceptual underpinnings and integration into problem-solving assessment. Journal of School Psychology, 29, 371-393.

Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 7, 13-22.

Lovett, M. W., Lacerenza, L., & Borden, S. L. (2000). Putting struggling readers on the PHAST track: A program to integrate phonological and strategy-based remedial reading instruction and maximize outcomes. Journal of Learning Disabilities, 33, 458-476.

Lovett, M. W., Lacerenza, L., Borden, S. L., Frijters, J. C., Steinbach, K. A., & De Palma, M. (2000). Components of effective remediation for developmental reading disabilities: Combining phonological and strategy-based instruction to improve outcomes. Journal of Educational Psychology, 92, 263-283.

Marston, D., Fuchs, L. S., & Deno, S. L. (1985). Measuring pupil progress: A comparison of standardized tests and curriculum-related measures. Diagnostique, 11(2), 77-90.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods second edition. London: Sage.

Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. New York: Guilford Press.

Shinn, M. R., Good, R. H., Knutson, N., Tilly, W. D., & Collins, V. L. (1992). Curriculum-based measurement of oral reading fluency: A confirmatory analysis of its relation to reading. School Psychology Review, 21, 459-479.

Statistical Analysis System Institute. Statistical Analysis Software (2001). [Computer software]. Cary, NC: Author.

Torgesen, J. K., Wagner, R. K., & Rashotte, C. A. (1997). Test of word reading efficiency. Austin, TX: Pro-Ed.

Vaughn, S., & Fuchs, L. S. (2003). Redefining learning disabilities as inadequate response to instruction: The promise and potential problems. Learning Disabilities Research and Practice, 18, 137-146.

Wagner, R. K., Torgeson, J. K., & Rashotte, C. A. (1999). Comprehensive test of phonological processing. Austin, TX: Pro-Ed.

Wechsler, D. (1999). Wechsler abbreviated scale of intelligence. San Antonio, TX: The Psychological Corporation.

Wiederholt, J., & Bryant, B. (1992). Gray oral reading test-3. Austin, TX: Pro-Ed.

Woodcock, R. W. (1987). Woodcock reading mastery test-revised. Allen, TX: DLM.

Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R, (1995). The educator's word frequency guide. Brewster NY: Touchstone Applied Science Associates.


Michigan State University


Vanderbilt University Kennedy Center


Vanderbilt University

NATALIE G. OLINGHOUSE, Assistant Professor, Teacher Education and Counseling, Educational Psychology and Special Education Department, Michigan State University, East Lansing, Michigan. WARREN LAMBERT, Director, Statistical & Methodological Core, Vanderbilt University Kennedy Center, Nashville, Tennessee. DONALD L. COMPTON (CEC TN Federation), Associate Professor, Department of Special Education, Peabody College, Vanderbilt University, Nashville, Tennessee.

Correspondence concerning this article should be addressed to: Natalie G. Olinghouse, Assistant Professor, Teacher Education and Counseling, Educational Psychology and Special Education Department, 318 Erickson Hall, Michigan State University, East Lansing, MI 48824. (e-mail:
Mean Performance of Participants at Pretest and Posttest

Achievement Measure (a) Pretest Mean (S D) Posttest Mean (S D)

WRMT-R: Word Identification 81.78 (7.68) 83.44 (9.87)
WRMT-R: Word Attack 84.93 (10.69) 88.03 (12.21)
TOWRE: Word Efficiency 80.53 (11.51) 83.64 (11.46)
TOWRE: Decoding Efficiency 80.00 (7.95) 80.21 (8.41)
GORT3: Rate 70.25 (10.06) 73.46 (10.40)
GORT3: Accuracy 72.25 (10.37) 81.67 (15.74)
GORT3: Comprehension 78.50 (13.78) 84.23 (8.31)
Intervention Aligned
 Word List (b) 11.55 (9.01) 26.62 (12.07)
Oral Reading Fluency
 Rate (c) 44.43 (24.30) 57.64 (27.43)

Note. WRMT-R = Woodcock Reading Mastery Test-Revised; TOWRE = Test
of Word Reading Efficiency; GORT-3 = Gray Oral Reading Test-3.

(a) M = 100, SD = 15, unless otherwise noted. (b) Number of words
read correctly on 50-word list. (c) Words read correctly per minute.

Hierarchical Linear Modeling Unconditional Models: Intervention
Aligned Word List and Oral Reading Fluency

 Fixed Effect

Measure Coefficient SE t

 Intercept 11.239 1.408 7.984 *
 Slope 3.104 0.244 12.709 *
 Intercept 44.866 3.598 12.469 *
 Slope 2.564 0.343 7.473 *

 Random Effect

Measure Variance [X.sup.2](39) Reliability

 Intercept 73.517 546.638 * 0.927
 Slope 1.738 148.246 * 0.728
 Intercept 495.388 912.622 * 0.956
 Slope 2.166 74.791 * 0.460

Note. IAWL = Intervention aligned word list; ORF = Oral reading

* p < .01.

Variance Explained by Intervention Aligned Word List and Oral
Reading Fluency Parameters in Raw Score Gains

Predictors Word ID Attack Words Nonwords

Step 1: IAWL initial status,
 ORF initial status .09 .33 ** .02 .23 *
Step 2: ORF Slope .00 .02 .07 .06
Step 3: IAWL Slope .29 ** .16 ** .18 * .02
Step 1: 1AWL initial status,
 ORF initial status .09 .33 ** .02 .23 *
Step 2: IAWL Slope .29 ** .15 ** .18 * .02
Step 3: ORF Slope .00 .03 .06 .06
Total [R.sup.2] Explained .38 .51 .26 .31

Predictors Rate Accuracy Comprehension

Step 1: IAWL initial status,
 ORF initial status .19 * .18 .08
Step 2: ORF Slope .19 ** .02 .06
Step 3: IAWL Slope .01 .16 * .01
Step 1: 1AWL initial status,
 ORF initial status .19 * .18 .08
Step 2: IAWL Slope .01 .17 * .01
Step 3: ORF Slope .19 ** .02 .06
Total [R.sup.2] Explained .39 .36 .15

Note. LAWL = Intervention aligned word list; ORF = Oral reading

* p < .05. ** p < .01.

Longitudinal Generalized Estimating Equations
Model Results

Parameter Estimate SE Z-score

Assessment Wave 0.012 0.001 8.54 *
Word -0.025 0.002 -11.95 *
Taught 0.344 0.080 4.32 *
Assessment Wave
 x Taught -0.007 0.002 -3.92 *

Note. Coefficients are expressed in log units.

* p < .001.
COPYRIGHT 2006 Council for Exceptional Children
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2006 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Olinghouse, Natalie G.; Lambert, Warren; Compton, Donald L.
Publication:Exceptional Children
Geographic Code:1USA
Date:Sep 22, 2006
Previous Article:TWA + PLANS strategies for expository reading and writing: effects for nine fourth-grade students.
Next Article:Alternate achievement standards and assessments: a descriptive investigation of 16 states.

Related Articles
Two approaches to reading instruction with children with disabilities: does program design make a difference?
Comparison of faster and slower responders to early intervention in reading: differentiating features of their language profiles.
Use of precorrection strategies to enhance reading performance of students with learning and behavior problems.
Teaching, assessing students with disabilities.
Meeting diverse literacy needs: two children, two approaches, one common thread.
Assessing letter sound knowledge: a comparison of letter sound fluency and nonsense word fluency.
Response-to-intervention: separating the rhetoric of self-congratulation from the reality of specific learning disability identification.
Implementing RTI: response-to-intervention is an ambitious and complex process that requires administrators choose the right model.
Obscuring vital distinctions: the oversimplification of learning disabilities within RTI.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters