Effects of expert system consultation within curriculum-based measurement, using a reading maze task.
Research investigating teachers' use of CBM is promising. It appears that with CBM, instructional quality and student achievement increase (e.g., Fuchs, Deno, & Mirkin, 1984; Fuchs & Fuchs, 1986; Fuchs, Fuchs, Hamlett, & Stecker, 1990; Jones & Krouse, 1988). In response to a mounting database supporting the effectiveness of this strategy, curriculum-based assessment frequently is cited as a potential method for enhancing the quality of services within both regular and special education settings (e.g., Christenson, Ysseldyke, & Thurlow, 1989; Gersten, Carnine, & Woodward, 1987; Reisberg & Wolf, 1988; Will, 1986; Zigmond & Miller, 1986).
Despite support for CBM efficacy and calls for CBM implementation, research indicates that the specific structure of CBM may be critical to successful implementation and to differential instructional quality and related achievement gains. For example, teachers' systematic use of the assessment information, rather than measurement alone, represents one key to better achievement outcomes (Fuchs, Fuchs, & Hamlett, 1989b; Wesson et al., 1988). Use of the database to monitor the appropriateness of goals and to adjust goals upward when possible also appears related to improved student growth (Fuchs, Fuchs, & Hamlett, 1989a). In addition, information to supplement graphed displays of students' overall scores--to provide teachers with more specific information about students' curricular strengths and deficiencies-relates to differential achievement (Fuchs, Fuchs, & Hamlett, 1989c; Fuchs, Fuchs, Hamlett, & Allinder, 1991; Fuchs, Fuchs, Hamlett, & Stecker, 1990).
Consequently, investigation of the conditions under which CBM enhances instructional quality appears essential. One dimension that needs systematic study is the importance of consultative support to teachers in successfully adjusting instructional programs, in response to CBM data. Within all studies in which CBM was related to enhanced instructional quality and achievement, support systems to teachers have been intact. For example, in a large CBM efficacy study conducted in the New York City Schools (Fuchs et al., 1984), consultants visited teachers each week to help them formulate ideas for improving instruction when the CBM data indicated student progress was unsatisfactory. Previous research (Casey, Deno, Marston, & Skiba, 1988; Tindal, Fuchs, Christenson, Mirkin, & Deno, 1981) has suggested that teachers experience difficulty in formulating ideas to modify their instructional routines. Thus, the question of the importance of instructional consultation to successful CBM implementation appears critical. The primary purpose of this study was to assess the contribution of instructional consultation to CBM effectiveness.
We structured this investigation in the following way. We developed a CBM-based expert system that provided computerized, systematic instructional consultation to teachers. Teachers employed the expert system only when the CBM graph indicated a student's rate of progress under the current instructional routine was inadequate and that an instructional adjustment was required. The expert system relied on information about (a) the student's CBM graph, which displayed the student's CBM reading scores over time; (b) the student's performance on decoding, fluency, and comprehension skills, as judged by the teacher; (c) the student's work performance history in the classroom; (d) the teacher's previous instructional program; (e) the teacher's curricular priorities; and (f) feasibility issues. The expert system potentially recommended (a) up to two comprehension, fluency, decoding, or sight vocabulary instructional strategies, along with detailed directions for implementing the strategies, and (b) a motivational strategy for improving classroom work performance.
For this study, we employed two contrasting treatment groups and a control group. We randomly assigned one third of the teachers to a CBM with expert system consultation group. Every 1-2 weeks, project staff visited these teachers. When an instructional adjustment was recommended according to the CBM graphed decision rules, project staff provided no advice; however, the teacher employed the expert system for consultation.
We assigned one third of the teachers to a CBM with no expert system consultation group. These teachers employed CBM in the same way as did the CBM with expert system consultation group, with one difference: Although project staff visited teachers every 1-2 weeks in both groups, the CBM without expert system consultation group received no instructional consultation. They neither employed the expert system nor received any advice from project staff. We assigned the remaining third of the teachers to a control group, which did not employ CBM. Teachers employed treatments for 17 school weeks, and we assessed effects on teacher planning and student achievement.
This study is important for several reasons. First, it provides an experimental test of the importance of instructional consultation within the CBM process. If teachers and districts are to consider implementing CBM, the potential contribution of instructional consultation to CBM effectiveness must be determined. Second, the study addressed the effectiveness of an expert system for providing instructional consultation; if successful, it may represent a cost-effective means for providing consultation to support CBM. Third, expert system validity typically is indexed by agreement between the expert system and human experts (e.g., Parry & Hofmeister, 1986). This study extends previous expert system work by illustrating how expert system validity can be indexed in terms of teacher and student outcomes. Finally, within this investigation, we had the opportunity to study the instructional decisions teachers make with and without external support and advice. Improved understanding of instructional planning and program adjustment is important to the development of teacher training to improve the quality of instruction in our schools.
Teachers. Participants were 33 special education teachers in 15 schools in a southeastern metropolitan area, who taught students in Grades 1-9. Teachers were assigned randomly to three treatment groups: (a) CBM with expert system recommendations about the nature of instructional changes (CBM-ES), (b) CBM with no expert system (or other) consultation (CBM-NES), and (c) control (i.e., no CBM or consultation). One-way analyses of variance (ANOVAS) revealed no significant differences among groups on age level, total years of teaching experience, years teaching special education, and years in current position. The two CBM groups also were comparable in terms of previous CBM experience. Descriptive statistics on these variables are shown in Table 1. A chi-square test applied to teachers' highest educational degree revealed no significant relation: In the CBM-ES, CBM-NES, and control groups, respectively, 5, 5, and 6 teachers had bachelor's degrees; 5, 6, and 5, master's degrees; and 1, 0, and 0, specialist's certificates.
Information also was collected on personal and general teaching efficacy. These factors of the Teacher Efficacy Scale (Gibson & Dembo, 1984), corresponding to Bandura's (1982) two-factor theoretical model of self-efficacy, demonstrate adequate convergent and discriminant validity and are associated with effective teaching variables, including size of instructional group, use of criticism, and persistence in failure situations (Gibson & Dembo). A multivariate ANOVA (MANOVA) was conducted on these two scales, using Wilks's lambda to test for equality of group centroids. This analysis revealed no significant difference among groups, F(4, 5 8) = .55, ns (Wilks's lambda = .929). See Table I for descriptive statistics and univariate F values.
Students. Each teacher selected two students to participate in the study. All pupils were classified as mildly to moderately disabled, were in Grades 2-8, had a current individualized education program (IEP) reading goal, were functioning at least I grade below expected reading level, and had been classified as learning disabled (LD) or seriously emotionally disturbed (SED), according to state regulations.
During the study (3 weeks of training; 17 weeks of implementation), one student in CBM-ES and two in CBM-NES moved. One-way ANOVAs conducted on the remaining 63 students' age, grade, teacher estimated reading instructional level, years in special education, and individually measured IQ (available for 17 [81%] CBM-ES students, 16 [80%] CBM-NES students, and 16 [73%] control students) indicated no significant differences among groups (see Table 1). Chi-square tests on students' race, sex, and disability also indicated group comparability. In the CBM-ES, CBM-NES, and control groups, respectively, there were (a) 13, 15, and 15 boys and 8, 5, and 7 girls; (b) 5, 6, and 8 minority and 16, 14, and 14 nonminority students; and (c) 20, 16, and 20 LD and 1, 4, and 2 SED pupils.
Teachers in CBM-ES and CBM-NES conditions employed CBM to track their pupils' progress toward reading goals for 17 weeks. CBM monitoring comprised (a) goal selection and ongoing measurement on the goal material and (b) evaluation of the database to develop instructional programs.
Goal Selection and Ongoing Measurement. Teachers determined the appropriate reading level on which to establish each student's goal. The teachers hoped the student would master this level of material by the year's end.
Teachers assessed the pupil's reading performance at least twice weekly, each time on a different passage that had been sampled randomly from the goal-level pool of passages. These reading assessments were administered and scored automatically by computers (contact the first author for information on the software). With this software, students have 2.5 minutes (min) to complete a maze of a 400-word passage on the screen. For this maze, the first sentence is intact; thereafter, every seventh word is deleted and replaced with three choices. Only one choice is semantically correct. Distractors are not auditorally or graphically similar to the correct replacement; they are either the same length or within one letter of the correct replacement. Each maze was edited twice by independent editors to ensure compliance with these requirements. The task requires students to use the space bar and <RETURN> keys. Performance is scored as number of correct replacements. Reliability and validity of this CBM maze task has been demonstrated (e.g., Espin, Deno, Maruyama, & Cohen, 1989; Fuchs & Fuchs, 1990; Jenkins & Jewell, 1990).
Each student was trained by project staff to use software as follows. Staff (a) introduced students individually to the keyboard, using a structured tutorial; (b) taught students individually to use the software; (c) observed students using software until they demonstrated correct use on two separate occasions; and (d) observed students using software and provided corrective feedback weekly for 1-2 subsequent weeks.
After mastery was demonstrated and after at least 3 weeks of acclimation to the software, teachers calculated a median baseline performance, using the most recent three scores. Next, teachers set a performance criterion. (Goals and instructional programs employed for this study were not part of the student's IEP, but rather used as "lesson plans or more detailed objectives based on the IEP" [Office of the Federal Register, 1986, p. 84]. Therefore, modifying did not require IEP meetings [Office of the Federal Register, 1986].) This performance criterion represented their best estimate of the score the student might achieve by year's end. Teachers were instructed to be ambitious but realistic. For the next 17 weeks, students were tested at the computer twice weekly.
Evaluation of the Database. To evaluate the data, each week teachers employed software to automatically (a) graph the student's score, (b) apply decision rules to the graphed scores, and (c) receive feedback communicating decisions.
The software displays a graph on the computer screen, showing (a) the pupil's performance over time, (b) a goal line reflecting the desired slope of improvement from baseline to goal, and (c) a quarter-intersect (White & Haring, 1980) line of best fit superimposed over the scores that have been collected since the last vertical line and extrapolated to the goal date. Figure 1 shows a sample graph.
Decision rules guided the development of the student's instructional program as follows. If, subsequent to any vertical line signifying a goal or teaching change, four consecutive scores fell below the goal line, the teacher was to introduce a teaching change in an attempt to improve the student's rate of progress. If, on the other hand, four consecutive scores were above the goal line, the teacher was to raise the goal. However, if no decision had occurred when eight scores had been collected since the last vertical line (i.e., no four consecutive scores fell below or above the goal line), the following trend-based rules applied: If the line of best fit was flatter than the goal line, the teacher introduced a teaching change; if the line of best fit was steeper than the goal line, the teacher raised the goal. Below the graph, the decision appeared: "Uh-oh. Make a teaching change"; "OK! Raise the goal"; or Insufficient data for analysis." When the teacher pressed <RETURN>, an explanation for the decision appeared.
When the decision rules dictated a teaching change, teachers in the CBM-NES group determined the nature of their teaching adjustments on their own. However, in the CBM-ES group, teachers relied on an expert system for recommendations about the nature of their teaching adjustments. This expert system requested teachers to enter information about their student's graphed performance pattern; the nature of the previous instructional program; judgments about the quality of the student's daily performance in terms of fluency, accuracy of decoding, comprehension, and independent work completion; and the extent of their own implementation of previous teaching adjustments and willingness to continue or improve implementation. Based on the information provided by the teacher, the expert system recommended a teaching adjustment, along with instructions on how to implement the change.
CBM Teacher Training
Initial teacher training was delivered over 4 weeks, including two 2-hour (hr) after-school workshops and individual staff meetings with teachers to ensure appropriate and timely selection of goals, completion of Instructional Plan Sheets (IPS), use of teacher software, and student access to computers for measurement. After initial training, staff mct with teachers individually once every 1-2 weeks for 20-40 min to inspect graphs, discuss pupil performance patterns, assist teachers in problem solving about treatment implementation, and (for the CBM-ES teachers) supervise use of the expert system. Staff were advanced doctoral students, experienced as teachers or school psychologists. On the average, during the 17-week study, CBM-ES teachers received 9.64 visits (SD = 1.96); CBM-NES teachers, 10.27 visits (SD = 2.72), F (1, 20) = .40, ns. It is important to note that we deemed it unnecessary to routinely visit control teachers to discuss their students' instructional programs, based on previous evidence (e.g., Fuchs et al., 1984) that this type of visit and support cannot account for differential achievement or planning outcomes.
As reported on posttreatment questionnaires, teachers in the control group relied primarily on criterion-referenced tests (average = 22% of the time), daily work grades (mean = 29% of the time), unsystematic observation of performance (mean = 23% of the time), and teacher-made tests (mean = 15% of the time). They reported relying on standardized achievement test information and systematic monitoring data, respectively, for means of 6% and 5% of the time. In addition, they reported introducing a mean of.31 (SD =.22) adjustments in their students' programs during the 17-week study. These figures are similar to those reported in other studies of teacher planning (e.g., Mirkin & Potter, 1982). However, they differ in key ways from the reports of the CBM teachers on the same posttreatment questionnaire: The control group reported statistically significantly greater reliance on criterion-referenced tests, F(2, 30) = 3.34, p < .05, and less reliance on systematic, ongoing performance monitoring, F (2, 30) = 13.3 , p <.000 1. Consequently, the control group served as a benchmark, representing typical teacher assessment practices that differed from the systematic monitoring procedures of the CBM teachers.
Program Adjustments. The number of goal changes introduced for each student was counted from computer files on which teachers recorded these changes. Goal ambitiousness was calculated by subtracting the baseline median performance from the final goal level, both of which were derived from computer files. The number of instructional adjustments introduced by teachers during the study was counted from computer files. The number of expert system interactions (for CBM-ES teachers) was counted from expert system log files maintained on teachers' expert system disks. All percentages of agreement, calculated for 20 cases, were 100. (Percentage of agreement = [agreement between Rater A and Rater B/(agreements between A and B, disagreements between A and B, and omissions)] x 100; see Coulter, cited in Thompson, White, & Morgan, 1982. Unless otherwise noted, this formula was used for agreements reported in this article.)
Nature of instructional adjustments. During the study, teachers maintained IPSs. These have five categories within which teachers describe their programs: Instructional Procedures, Arrangement, Time, Materials, and Motivational Strategies. Teachers described their initial instructional programs. Then, each time they made an adjustment in their teaching program, they described the nature of that adjustment, along with the date on which the adjustment was introduced (see Wesson & Deno, 1989, for additional information on the IPS). IPSs were coded in terms of the number of times teachers employed interventions that fell in each of following categories: decoding, fluency, comprehension, sight vocabulary, and cloze. Average interobserver agreement on the coding of the IPS, calculated on 10 IPSs by two independent coders, was 92 (range 86-98).
Achievement. Reading achievement was assessed with the Comprehensive Reading Assessment Battery (CRAB) (Fuchs et al., 1989c). The CRAB provides six scores: words read correctly, questions answered correctly, total words written in recall summaries, matched words written in recall summaries, number of correct restorations in maze passages, and percentage of correct restorations in maze passages. The CRAB employs four 400-word traditional folktales, used in previous studies of reading comprehension (e.g., Brown & Smiley, 1977; Jeakins, Heliotis, Haynes, & Beck, 1986). The stories had been rewritten by Jenkins et al. to approximate a second-to third-grade readability level (Fry, 1968) while preserving the gist of the stories.
These folktales serve as stimuli for CRAB tasks. On one passage, pupils are required, first, to read orally for 3 min; second, to write a summary of the passage for 5 min; and, third, to answer 10 questions. On another passage, students have 2 min to complete a maze; then they read aloud for 3 min; and, finally, they answer 10 questions. The comprehension questions, which were developed by Jenkins et al.(1986), require short answers, reflecting recall of information contained in idea units of high thematic importance. The maze was prepared by maintaining the first sentence intact; thereafter, every seventh word was replaced with a 3-item multiple choice, where only one item provided a semantically correct replacement. Across pretesting and posttesting, each student read from all four passages. Tasks associated with passages and orders of administration of the tasks were counterbalanced across treatment groups.
To generate the words correct score, examiners mark insertions, omissions, mispronunciations, and substitutions as students read. Omissions of endings (ed, s, and ing) are scored as errors; self-corrections are not. Performance is scored as the average number of correct words read across the two oral reading samples. Test-re-test reliability ranged from .93 to .96 (Fuchs, Deno, & Marston, 1983), and concurrent validity with the Stanford Achievement Test-Reading Comprehension Subtest was .91 (Fuchs, Fuchs, & Maxwell, 1988).
For number of correct questions answered, examiners ask questions and record the student's oral responses. When the student makes five consecutive incorrect responses, the examiner terminates testing. The score is the average number of questions answered correctly across the two passages. Percentage of agreement, calculated on 20 protocols, was 92. The correct questions score correlated .82 with the Stanford Achievement Test-reading Comprehension Subtest (Fuchs, Fuchs, & Maxwell, 1988).
To generate the total words written and matched words written recall scores, the following administration procedure is employed. After reading, the passage is removed and students are presented with a lined piece of paper. They have 5 min to write a summary. If students complete the recall before the time limit, a maximum of four controlled prompts is delivered before the recall is terminated. Examiners allow 30 seconds (s) of no response before delivering consecutive prompts and before termination of the recall. Performance is scored as the total number of words written and number of unique, matched words written. Total words written is the number of words written in recalls. Percentage of agreement for total number of words written was 99, as calculated on 20 recalls. For matched words, every word in the student's recall is examined to determine whether it matches a word in the original passage. A match is awarded if at least 50% of the letter sequences (White & Haring, 1980) are spelled correctly. Once a word within the original passage is deemed a match," it is removed from further consideration. Consequently, if the word the were written in the student's recall 10 times and appeared in the original passage 20 times, it would be counted as only one match. Student recalls were entered into a computer program (Hamlett & Fuchs, 1988) that automatically scores number of matched words. Percentage of agreement for computer scoring, against human scoring, was 93, as calculated on 20 recalls. The number of matched words is considered a proxy for the number of content words and correlates highly with content words (Fuchs, Fuchs, & Maxwell, 1988). The total words and matched words written scores have been shown to demonstrate adequate criterion validity, with the correlation between matched words and total words, respectively, and the Stanford Achievement Test-reading Comprehension Subtest of .76 and .81 (Fuchs, Fuchs, & Maxwell, 1988).
For number and percentage of correct maze responses, scorers count the number of correct and incorrect responses. To derive percentage scores, scorers divide the number correct by the sum of correct and incorrect responses. Percentage of agreement, on 20 protocols, was 99 for each index. Correlations for the number and percentage scores with the Stanford Achievement Test-reading Comprehension Test were.82 and .43, respectively; between number correct and oral reading rate, from .77 to .86 (Espin et al., 1989).
Program adjustment scores were tallied from document files after the study. For the nature of instructional programs, IPSs were coded after the study. To index achievement, the CRAB was administered individually preceding and following the study.
Fidelity of Treatment
Fidelity Measures. The accuracy with which teachers implemented the treatment was assessed using the Reading-Modified Accuracy of Implementation Rating Scale-Revised (R-MAIRS; Fuchs, 1988), which comprises three subscales: Structure (taking baseline, graphing data, writing goals, and drawing goal lines), Measurement (task administration, reliability of scoring, and frequency of measurement), and Evaluation (describing instructional procedures and timing instructional changes). Each item is rated on a 5-point Likert-type scale (0 = low; 4 = high), in accordance with detailed scoring guidelines. Staff were trained in scoring the R-MAIRS during one 3-hr session. Percentage of agreement, calculated on 15 protocols, was 95. (See Fuchs, 1988, for descriptions of items and scoring guidelines.)
Three additional measures of teacher fidelity were isolated for analysis. First, number of measurements was counted from scores stored in computer files. Second, percentage of expert system recommendations implemented was computed for CBM-ES teachers, by dividing the number of recommendations (tallied from log files maintained on teachers' expert system disks) by the number of program elements implemented by teachers (tallied from IPSs that teachers maintained). Third, immediately following the specification of each instructional change, teachers rated the potential effectiveness of the change on a 4-point Likert-type scale (1 = very effective; 4 = not very effective). These ratings were averaged over the course of the study to produce an average rating of perceived effectiveness of instructional changes. Percentages of agreement, calculated on 20 cases, were 100 and 82 for the first two measures, respectively.
Student accuracy during measurement was indexed through the Student Computer Observation, which assesses student accuracy in (a) entering dates, (b) entering names, (c) swapping disks, (d) using the keyboard to respond to items, (e) responding deliberately, and (f) test time. Each item is scored nominally as correct," "incorrect," "not applicable" (for entering name or date if the student already had been using the computer program). Four observers were trained in the Student Computer Observation during one 1-hr session. Percentage of agreement calculated on five observations across all observers was 100.
Students' understanding of their graphs was assessed using the CBM Graph Test (Stecker, Whinnery, & Fuchs, 1988). The CBM Graph Test requires students to respond to 10 multiple-choice or recall questions that assess knowledge of labeling axes, identifying academic areas and maximum possible scores, naming dates and scores of graphed points, and judging patterns of improvement. Internal consistency reliability (Cronbach's alpha), on the current sample, was .83. Percentage of agreement on 20 students was 98.
Fidelity Data Collection. For teacher fidelity, R-MAIRS observations for the Measurement subscale were conducted 10 weeks into the study, and scoring from documents for the Structure, Measurement, and Evaluation subscales was completed after the study. An R-MAIRS assessment was completed for one randomly selected student per teacher. The number of measurement points, percentage of expert system recommendations implemented, and rating of perceived effectiveness of changes were tallied from document files after the study. For student fidelity, CBM students were observed using the Student Computer Observation 3 months after students had been trained to use the data collection software. CBM Graph Tests were administered individually to CBM students before treatment, after measurement training, and after treatment; CBM Graph Tests were administered individually to students in the control group at pretreatment and posttreatment.
Fidelity Results. Because fidelity information was relevant for the CBM groups only, one-way ANOVAs were used (CBM-ES vs. CBM-NES). ANOVAs conducted on the R-MAIRS subscales and on two additional measures of teacher fidelity revealed no significant differences between groups. See Table 2 for descriptive statistics and F values.
On the Student Computer Observations, of the 21 students in the CBM-ES condition, 11 scored yes" and 10 scored "not applicable" for accurate entry of date; 11 scored "yes" and 10 scored "not applicable" for accurate name entry; and all scored "yes" on accurate disk swapping, use of keyboard for item response, deliberate test taking, and time in measurement. Among the 20 CBM-NES students, 12 scored "yes" and 8 scored "not applicable" on accurate entry of date; 12 scored "yes" and 8 scored "not applicable" on accurate entry of name; 19 scored "yes" on accurate disk swapping whereas I scored "no" on accurate disk swapping; and and scored "yes" on accurate use of keyboard for item response, deliberate test taking, and time in measurement. Chi-square tests indicated that accuracy during measurement was not related to treatment condition.
On the pretreatment CBM Graph Test, scores for the three treatment groups were comparable, F (2, 61) = 2.41, ns (for CBM-ES, M = 4.76 [SD = 1.871; for CBM-NES, M =3.65 [SD = 2.11]; for control, M = 3.52 [SD = 2.09]). After test-taking training, scores for the two CBM groups were comparable, F (1, 39) = .01, ns (for CBM-ES, M = 8.33 [SD = 1.35]; for CBM-NES, M =8.30 [SD = 1.22]). When the study ended, scores for the two CBM treatment groups were comparable to each other, but higher than those for the control group, F (2, 61) = 20.76, p <.001 (for CBM-ES, M = 8.91 [SD = 1.181; for CBM-NES, M = 7.40 [SD = 1.821; for control, M = 4.74 [SD = 2.18]).
For number of goal changes, goal ambitiousness, and number of instructional changes, ANOVAs comparing CBM-ES and CBM-NES were employed with the student as the unit of analysis. (Information was irrelevant for the control group.) No significant differences between groups were revealed. Table 2 shows descriptive statistics and F values for all program adjustment measures.
Nature of Instructional Adjustments
For each category coded from the IPSs, means, standard deviations, and F values are shown in Table 2. ANOVAs comparing CBM-ES and CBM-NES groups were used with the student as the unit of analysis. (Control group information was not relevant.) The CBM-ES group employed statistically significantly more instructional activities that focused on decoding, fluency, and sight vocabulary. In addition, although only approaching statistical significance p = .07), this sample of CBM-ES teachers adopted more instructional activities focusing on comprehension. Conversely, although not statistically significant (p = .18), the sample of CB M-NES teachers employed more instructional activities resembling the monitoring measurement task, that is, cloze activities.
Scores from each pretreatment and from each posttreatment CRAB task were aggregated by teacher. Then, for each CRAB task, deviation scores were calculated on the basis of analysis of covariance (ANCOVA) using each CRAB task's pretreatment and posttreatment scores. This deviation score indicates the extent to which a teacher's actual posttest score deviated above (if positive) or below (if negative) the teacher's predicted posttest score. Table 3 displays pretreatment, posttreatment, deviation, and posttreatment adjusted scores for each CRAB task.
Tests indicated that the assumption of homogeneous within-class regression coefficients was tenable for each datum. A preliminary multivariate analysis of variance (MANOVA) (CBM-ES vs. CBM-NES vs. Control), applied to the deviation scores, indicated a reliable difference associated with the treatment, F (12,50) = 2.31, p <.05 (Wilks's lambda =.414).
To examine the multivariate effect in more depth, ANCOVA was applied to each CRAB task. As shown in Table 3, the ANCOVAs revealed significant effects for words read correctly total words written on the recall, and number of correct maze responses. F values also approached significance for number of questions answered correctly p = .06) and matched words written correctly (p = .10). There was no reliable difference for percentage of maze correct.
Follow-up tests were employed for the three statistically significant ANCOVAs. For words read correctly and number of correct maze responses, the performance of both CBM groups exceeded that of the control group. For total words written on the recall, the performance of the CBM-ES group exceeded that of the CBM-NES and the control groups.
Effect sizes were computed for ANCOVA where Effect size =
FORMULA OMITTED (Hedges & Olkin, 1985). Effect sizes, comparing each CBM group with the control group, are shown in Figure 2. Effect sizes comparing the two CBM groups were .02 for words read correctly, .14 for questions correct, .55 for total words written on the recall, .54 for matched words written on the recall, .16 for number of maze correct, and 0 for percentage of maze correct.
Treatment fidelity in this study was strong. For students, computer observations indicated that measurement occurred with high and comparable accuracy for both CBM groups; and, as reflected on the CBM Graph Test, students interpreted graphs with similar and high levels of understanding. For teachers, both CBM groups implemented CBM comparably and accurately, as indicated on the R-MAIRS. They also collected CBM data with similar frequency and assigned their instructional adjustments comparable effectiveness ratings. In addition, teachers appeared to employ the expert system with fidelity, implementing nearly 70% of all recommendations. Finally, control teachers appeared to represent conventional special education assessment practice, as indicated in previous work (Mirkin & Potter, 1982). Moreover, their assessment practices differed in key ways from those of CBM teachers: In contrast to CBM teachers' responses on Posttreatment questionnaires, control teachers reported greater reliance on criterion-referenced tests and less use of systematic monitoring data for formulating instructional decisions. Given that project staff frequently visited teachers in both CBM groups to help solve CBM implementation problems, these high and similar levels of CBM fidelity are not surprising. However, in conjunction with (a) the documented differences between CBM and control groups and (b) previous evidence (Fuchs et al., 1984) that visits or consultation alone cannot explain planning or achievement differences, the fidelity data increase confidence that results can be attributed to the experimental treatment.
In terms of program development, teachers in both CBM groups increased goals with comparable frequency; and their levels of goal ambitiousness were similar. They adjusted instructional programs in similar ways and for relatively large numbers of times: During the 17-week study, CBM teachers modified student programs an average of 2.54 times. This compares with the mean of only .3 1 program adjustments reported by control teachers. Consequently, as in related studies (e.g., Fuchs et al., 1991; Fuchs, Fuchs, Hamlett, & Stecker, 1990), CBM was associated with teachers' more frequent revision of student programs to stimulate better rates of progress.
Also corroborating previous research (e.g., Fuchs et al., 1984; Jones & Krouse, 1988), the use of CBM and its attendant increase in teachers' revision of student programs produced superior student achievement. For two reading outcome measures (i.e., fluency and maze), both CBM groups outperformed the control group, with statistically significant differences and strong effect sizes. In addition, for number of questions answered correctly, the performance of both CBM groups approached statistical significance p = .06), and both CBM groups were associated with strong effect sizes: .52 for the CBM-ES group and.39 for the CBM-NES group. Consequently, both CBM groups achieved more than did a comparable control sample over a 17-week CBM implementation on important measures of reading proficiency. Teachers appeared able to use CBM (a) to determine when instructional programs were producing inadequate student growth and when adjustments to those instructional programs were necessary, and (b) to identify instructional adjustments that produced superior achievement.
Nevertheless, despite (a) the accuracy and comparability of teachers' CBM implementation, (b) the apparent responsiveness of both CBM groups to students in the form of frequent program revision, and (c) better achievement on fluency, maze, and (perhaps) question-answering measures, teachers in the two CBM groups did plan differently for their students, as evidenced on their Instructional Plan Sheets. The CBM-ES teachers, who received systematic "expert" system consultation about how to revise student programs, included more fluency, decoding, and sight vocabulary activities in their program adjustments; these teachers also tended to employ more comprehension activities. On the other hand, the CBM-NES teachers, who received no advice about their instructional adjustments, tended to incorporate more activities that resembled the CBM monitoring measure, that is, cloze activities similar to the maze measurement task. Along with these differences in the substantive nature of instructional revisions, the CBM-ES group outperformed CBM-NES students on one key measure: the CRAB summaries of the stories they retold.
As indicated in related studies of teacher planning, these findings suggest that, on their own, teachers may experience difficulty in determining strategies for modifying their programs (Tindal et al., 1981) or revising well-rehearsed, smooth-running instructional routines (Clark & Peterson, 1984). Without the benefit of consultation, teachers may hug the measurement system closely to formulate program adjustments (e.g., Fuchs, Fuchs, Hamlett, & Stecker, in press). Moreover, in this study, differences in instructional activities were associated with differential outcomes on the CRAB recall task, a frequently used measure of reading comprehension (Fuchs & Maxwell, 1988) and a measure that resembles some of the story grammar instructional activities recommended by the expert system and employed by the CBM-ES teachers. Consequently, results suggest the possibilities that (a) with consultation, teachers adjust programs in more diverse ways, and (b) instructional diversity may enhance a more diverse set of skills.
A final note of interest is that in contrast to previous CBM studies, an alternative reading monitoring measure was employed in this CBM implementation. Instead of the standard CBM oral reading measure, we incorporated a maze task that requires students to read passages and choose replacements for deleted words. This monitoring task demonstrates adequate psychometric features and, in many ways, appears to mirror the standard oral reading CBM task (see Fuchs & Fuchs, in press, for discussion). In addition, current findings indicate that, as with oral reading, teachers can use the CBM maze to monitor reading improvement and to revise programs in effective ways that support differential reading achievement. Based on these findings, additional research exploring use of the maze appears warranted, because (a) computers can be used to automatically collect maze performance data, thus improving the feasibility of CBM, and (b) the face validity for the maze, as an overall indicator of reading proficiency, may be greater than for oral reading. TABULAR DATA OMITTED
Bandura, A. (1982). Self-efficacy mechanism in human agency. American Psychologist, 37,122-147.
Brown, A. L., & Smiley, S. S. (1977). Rating the importance of structural units of prose passages: A problem of metacognitive development. Child Development, 48, 1-8.
Casey, A., Deno, S. L., Marston, D., & Skiba, R. 1988). Experimental teaching: Changing teacher beliefs about effective instructional practices. Teacher Education and Special Education, 1, 123-132.
Christenson, S. L., Ysseldyke, J. E., & Thurlow, M. L. ( 1989). Critical instructional factors for students with mild handicaps: An integrative review. Remedial and Special Editcation, 10 (5), 21-31.
Clark, C. M., & Peterson, P. L. (1984). Teachers' thought processes (Occasional Paper No. 72). East Lansing, MI: Michigan State University, Institute for Research on Teaching. (ERIC Document Reproduction Service No. ED 251 449)
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219-232.
Deno, S. L. (1986). Formative evaluation of individual student programs: A new role for school psychologists. School Psychology Review, 15, 358-382.
Espin, C., Deno, S. L., Maruyama, G., & Cohen, C. (1989). The Basic Academic Skills Samples BASS): An instrument for the screening and identification of children at risk for failure in regular education classrooms. Unpublished manuscript.
Fry, E. (1968). A readability formula that saves time. Journal of reading, 11, 513-516, 575-578.
Fuchs, L. S. (1988). Reading modified accuracy of implementation rating scale-revised. Unpublished manuscript (available from author).
Fuchs, L. S., Deno, S. L., & Marston, D. (1983). Improving the reliability of curriculum-based measures of academic skills for psychoeducational decision making. Diagnostique, 6,135-149.
Fuchs, L. S., Deno, S. L., & Mirkin, P. K. (1984). Effects of frequent curriculum-based measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21, 449-460.
Fuchs, L. S., & Fuchs, D. (in press). Identifying a measure for monitoring student reading progress. School Psychology Review.
Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation on student achievement. Exceptional Children, 53, 199-20$.
Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989a). Effects of alternative goal structures within curriculum-based measurement. Exceptional Children, 55, 429-438.
Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989b). Effects of instrumental use of curriculum-based measurement to enhance instructional programs. Remedial and Special Education, 10(2), 43-52.
Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989c). Monitoring reading growth using student recalls: Effects of two teacher feedback systems. Journal of Educational Research, 83, 103 -111.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Allinder, R. M. ( 1991). The contribution of skills analysis within curriculum-based measurement in spelling. Exceptional Children, 57, 443-452.
Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Stecker, P. M. (1990). The role of skills analysis in curriculum-based measurement in math. School Psychology Review, 19, 6-22.
Fuchs, L.S., Fuchs, D., Hamlett, C.L., & Stecker, P. M. (in press). Effects of curriculum-based measurement and consultation on teacher planning and student achievement in mathematics computation. American Educational Research Journal, 28, 617-641.
Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special Education, 9(2), 20-29.
Fuchs, L. S., & Maxwell, L. (1988). Interactive effects of reading mode, production format, and structural importance of text among learning disabled pupils. Learning Disability Quarterly, 11, 97-105,
Gersten, R., Camine, D., & Woodward, J. (1987). Direct instruction research: The third decade. Remedial and Special Education, 8(6), 48-56.
Gibson, S., & Dembo, M. H. (1984). Teacher efficacy: A construct validation. Journal of Educational Psychology, 76, 569-582.
Hamlett, C. L., & Fuchs, L. S. (1988). Scoring student recalls. [Unpublished computer program.] Hedges, L. V., & Olkin, 1. (1985). Statistical methods for meta-analysis. Orlando: Academic Press.
Jenkins, J. R., Heliotis, J., Haynes, M., & Beck, K. (1986). Does passive learning account for disabled readers' comprehension deficits in ordinary reading situations? Learning Disability Quarterly, 9, 69-75.
Jenkins, J. R., & Jewell, M. (1990). Examining the validity of two measures for formative teaching: Reading aloud and maze. Manuscript submitted for publication.
Jones, E. D., & Krouse, J. P. (1988). The effectiveness of data-based instruction by student teachers in classrooms for pupils with mild handicaps. Teacher Education and Special Education, 11 (1), 9-19.
Mirkin, P. K., & Potter, M. L. (1982). A survey of program planning and implementation practices of LD teachers Research Report No. 80). Minneapolis: University of Minnesota, Institute for Research on Learning Disabilities.
Office of the Federal Register. (1986). Code of federal regulations, Parts 300-399. Washington, DC: Author.
Parry, J. D., & Hofmeister, A. M. (1986). Development and validation of an expert system for special educators. Learning Disability Quarterly, 9, 124-132
Reisberg, L., & Wolf, R. (1988). Instructional strategies for special education consultants. Remedial and Special Education, 9(6), 29-40.
Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. New York: Guilford Press.
Stecker, P. M., Whinnery, K., & Fuchs, L. S. (1988). Curriculum-based measurement graph test. Unpublished instrument.
Thompson, R. H., White, K. R., & Morgan, D. P. (1982). Teacher-student interaction patterns in classrooms with mainstreamed mildly handicapped students. American Educational Research Journal, 19, 220-236.
Tindal, G., Fuchs, L. S., Christenson, S., Mirkin, P. K., & Deno, S. L. (1981). The effect of IEP monitoring strategies on teacher behavior (Research Report No. 61). Minneapolis, MN: University of Minnesota, Institute for Research on Learning Disabilities. (ERIC Document Reproduction Service No. ED 218 846)
Wesson, C., & Deno, S. L. (1989). An analysis of long-term instructional plans in reading for elementary resource room students. Remedial and Special Education, 10(1), 21-28.
Wesson, C., Deno, S. L., Mirkin, P. K., Maruyama, G., Skiba, R., King, R. P., & Sevcik, B. (1988). A causal analysis of the relationships among ongoing measurement and evaluation, structure of instruction, and student achievement. The Journal of Special Education, 22, 330-344.
White, 0. R., & Haring, N. G. (1980). Exceptional teaching (2nd ed.). Columbus, OH: Merrill.
Will, M. (1986). Educating children with learning problems: A shared responsibility. Exceptional Children, 52, 411-415.
Zigmond, N., & Miller, S. E. (1986). Assessment for instructional planning. Exceptional Children, 52, 501-509.
ABOUT THE AUTHORS
LYNN S. FUCHS (CEC Chapter # 185) is an Associate Professor, DOUGLAS FUCHS (CEC Chapter #185) is a Professor, CAROL L. HAMLETT (CEC TN Federation) is a Research Associate, and CARL FERGUSON (CEC Chapter #639) is a Doctoral Student in the Department of Special Education at Peabody College of Vanderbilt University, Nashville, Tennessee.
The research reported in this article was supported by Grant #G008730087-88 from the Office of Special Education Program, U.S. Department of Education, to Vanderbilt University. Points of view or opinions stated in this report do not necessarily represent official agency positions.
Requests for reprints should be sent to Lynn S. Fuchs, Box 328, Peabody College, Vanderbilt University, Nashville, TN 37203.
Manuscript received May 1990; revision accepted December 1990.
Exceptional Children, Vol. 58, No. 5, pp. 436-450. [C] 1992 The Council for Exceptional Children.