Comparison of academic and cognitive programs for young handicapped children.
Progress in the education of handicapped children can come only from a reciprocal interaction between theoretical innovations and a careful evaluation of the effectiveness of models when they are actually implemented in programs. The experience of recent decades suggests, perhaps surprisingly, that innovations may be more easily implemented than evaluated. Many early childhood special education programs have been developed to address the cognitive limitations of young handicapped learners, but it remains difficult to draw any firm conclusions from evaluations of programs for these children. As Odom and Fewell (1983) pointed out, "The state of the art of program evaluation in early childhood special education is far from ideal."
White and Casto (1985), in a review of early intervention efficacy studies of disadvantaged, at-risk, and handicapped children, found that less than a quarter of the data (effect sizes) could be classified as coming from "good quality" studies (a rating based on design and various threats to internal validity). Many of the difficulties of interpretation stem from a lack of random assignment to treatments, with the consequent possibility of observed effects being due to history, instrumentation, statistical regression, and so forth. Other limitations to the interpretability of evaluation studies result from the use of limited sets of assessment measures (most often, IQ) and limited follow-up past the immediate end of the program. In addition, for the greatest external validity, there is a need for evaluations conducted independently of the program developers.
Because of the small number of evaluation studies available on programs for young handicapped children, educators and others have drawn on the more extensive literature on disadvantaged children. Casto and Mastropieri (1986), however, have found in their meta-analysis that such a generalization may be inappropriate. For example, they concluded that the degree of structure of program has not been shown to be associated with more positive outcomes for handicapped children, though research with disadvantaged children has found such an association, at least for short term outcomes. Conversely, program intensity and duration appear to be related to intervention effectiveness for handicapped children, though not for disadvantaged children.
The research currently available (Casto & Mastropieri, 1986; Dunst & Rheingrover, 1981; Odom & Fewell, 1983) supports a general conclusion that early intervention has significant effects on young handicapped children. It is far less conclusive, however, about the effects of contrasting program models or specific program features. Because instructional models differ greatly in early childhood special education, knowledge of relative effectiveness would be quite valuable. The purpose of this article is to report a comparative evaluation of two highly contrasting programs. Both programs represent special education models currently in wide, and growing, use. Both have theoretical foundations that enjoy strong support in both academic and educational settings. The two models, however, differ in numerous aspects of both theory and classroom practice.
The first program is Direct Instruction, developed by Engelmann and his colleagues (Becker, 1977; Becker, Engelmann & Thomas, 1975). A recent review of the essentials of this program and research on its effectiveness has been provided by Gersten, Woodward, and Darch (1986). Direct Instruction (DI) is based on extensive task analysis of academic skills. The analysis is then used as the basis of systematic and explicit teaching of academic skills such as language, reading, and mathematics, with a goal of maximizing academic learning time. The method, like the content, is "direct" in that the program is teacher directed and fast paced, utilizing highly structured presentation of material with frequent opportunities for student response and reinforcement or correction.
The second program is Mediated Learning; it is based on the work of the Israeli psychologist Reuven Feuerstein (Feuerstein, Rand, Hoffman, & Miller, 1980; Harth, 1982), as modified for the preschool level by Haywood and his associates at the Cognitive Education Project at Peabody College (Burns, et al., 1983). Mediated Learning (ML) emphasizes the development and generalization of cognitive processes of input, elaboration, and output rather than specific academic content. Units are organized around such cognitive processes as comparison, classification, perspective changing, and sequencing. These processes are assumed to underly academic learning. The role of the teacher is to interpret the environment and cognitive problems for children, assisting them in their attempts to use cognitive processes, rather than simply modeling or teaching them directly.
Feuerstein saw the problem of generalization as central to a wide variety of poor learners, including disadvantaged, learning disabled, and retarded students. In ML there is great emphasis on the generalization of processes, referred to as "bridging." For example, children are encouraged to think of other situations in which a certain kind of thinking could be used. The goal of generalization is also addressed by minimizing the use of extrinsic motivation, because Feuerstein believed that intrinsic motivation is more likely to maintain cognitive activities. Mediated Learning activities generally include an explicit statement of the meaning and purpose of the activity, in an attempt to foster a sense of personal cognitive competence. Similarly, the program attempts to promote reflection in learners. This reflection includes the inhibition of impulsive responding, as well as the promotion of the learner's ability to monitor his or her success in responding and the processes that led to that response.
DIFFERENCES BETWEEN PROGRAMS
The difference between the two programs may be viewed as one of tactics. Both programs attempt to ameliorate the academic difficulties of young handicapped children. The DI program addresses this problem "directly," by maximizing the time spent on academic material; whereas the ML program addresses the problem "indirectly," by facilitating the development of general cognitive processes that can serve as the basis of more efficient and productive academic learning later.
Because the two programs differ in their concentration on academic and cognitive variables, both are domains included in the assessment battery. In addition, both domains are viewed as inherently multifactorial, necessitating assessment across the domains of language, reading, and methematics.
In the first year of the project, 83 pupils, 61 in the preschool program (up to age 6) and 22 in the kindergarten/primary program (age 6 to 8) served as subjects. Washington State has a noncategorical system of funding for preschoolers with handicapping conditions. to qualify, children must score 1.5 standard deviations below the mean on 2 or more of the following measures: gross motor, fine motor, language, cognition, or social/emotional development. Alternately, they may score 2 standard deviations below the mean on any one of these. Students age 6 and above are categorized in conventional special education funding categories such as mentally retarded, language delayed, emotionally disabled, and so forth.
The pupils were referred to the Experimental Education Unit, Child Development and Mental Retardation Center, University of Washington, from two public school districts in the Seattle metropolitan area. The sample was relatively representative of the special education population in the districts, because referral is based largely on space availability in the home district rather than on pupil characteristics. Many of the students had a global developmental delay (mean McCarthy General Cognitive Index at pretest = 80.1); some had language deficits greater than their overall deficit (mean language quotient of 76.3 on the Peabody Picture Vocabulary Test, Revised). The sample was also representative ethnically of the student population in the area, being approximately 70% Caucasian, 23% Black, and the remainder being Asian or Pacific Islander, Hispanic, or Native American.
Classes and Assignment
Preschool subjects (3 years of age through 5 years, 11 months) attended class for 2 hours per day, 5 days per week for 180 school days. There were six preschool classes, three for each program, with 12 students each. One of the three classes for each program contained four normally developing students and eight handicapped children. The other classes were comprised of students with handicapping conditions only. Data from the handicapped students only are reported here. Kindergarten/primary subjects attended class for 5.5 hours per day, 5 days a week, for 180 school days. There was one such class for each program, with 14 students per class. Students were randomly assigned to a DI or ML classroom by a two-step procedure, both utilizing a random number generator. First, students were assigned randomly to the DI or ML program. Second, preschool students were randomly assigned to one of the three classrooms for their previously determined program. The only exception allowed was placement into morning or afternoon preschool classes on parent request; in no case was the program changed.
The classroom staff included a head teacher with a Master's degree in special education, one assistant teacher, and additional staff including related services personnel, practicum students, and volunteers, resulting in a student-teacher ratio of approximately 4:1.
Fidelity of Implementation
Two of the three DI head teachers received teaching degrees from a program with specific emphasis in Direct Instruction (University of Oregon); a third received inservice training from that program. In addition, the fidelity of the DI implementation was monitored by a consultant with extensive DI training experience, including authorship of materials.
The fidelity of the ML program, a much newer curriculum, was also established by means of expert consultants. Developers of the original Feuerstein-based preschool program at Vanderbilt University evaluated the fidelity of the ML program based on observations during a site visit and videotape observations, and concluded that the program was a highly appropriate implementation. Two staff members were provided inservice training at Vanderbilt. In addition, two teachers met directly with Dr. Feuerstein concerning materials developed for use in the program.
In addition to these procedures, an observation system, described in the next section, was devised to confirm differences between the two sets of classrooms.
The following measures were used for the study:
* McCarthy Scales of Children's Abilities.
* Peabody Picture Vocabulary Test-Revised (PPVT-R).
* Test of Early Language Development (TELD).
* Mean Length of Utterance (MLU). MLU is widely considered the best single measure of young children's expressive language. A 20-minute language sample was recorded for each subject, in conversation with a certified speech/language pathologist or a developmental psychologist. The recordings were made during unstructured one-to-one interaction in a small room furnished with various toys and books. A sample of 100 utterances per subject was utilized for computing MLU, except for a small number of subjects who failed to produce this many utterances. The procedures for computing MLU suggested by Miller (1981) were followed.
* Basic Language Concepts (BLC). The BLC was developed as part of the Direct Instruction program. It is primarily a test of common names, descriptive terms, and tense and plural markers. Both receptive ability and expressive ability are assessed. Unlike the other tests, it yields an error score, rather than a correct score.
* Test of Early Reading Ability (TERA).
* Test of Early Mathematics Ability (TEMA).
* Stanford Early School Achievement Test (SESAT): Kindergarten-Beginning Grade 1 Form.
The measures listed previously were administered from October through December of 1984, as pretests, and again from May through August of 1985 as posttests. In each case a minimum of 6 months elapsed between the two administrations of each test. Assessment procedures were administered by speech and language pathologists (for the PPVT-R and TELD) and by project staff (all other measures). The latter included students in speech pathology, special education, and psychology. Because both instructional programs and the project staff were housed in the same facility, "blind" testing was not feasible. All testing staff members, however, tested approximately equal numbers of DI and ML students. Furthermore, testing staff members were not informed of specific hypotheses with respect to individual measures.
Procedural Validity Assessment
In order to validate differences between the DI and ML classrooms, an observational coding system was developed and applied during the spring of 1985. For the first year of this study, a relatively simple system was devised. A limited sample of teacher behaviors, far from exhaustive, was identified on the grounds of theoretical interest, program principles, and feasibility of real-time coding, using portable microcomputers. Only 14 categories were included (see Table 1), and the system was limited to frequencies of discrete behaviors. It was, therefore, not sensitive to differences in sequencing of behaviors or in the rhythm and pacing of classroom activity. For these reasons, observed differences should be considered as an underestimate of the actual differences between programs.
Each child was observed for a total of 40 minutes (four 10-minute sessions on separate days), and all adult utterances directed toward that child (or his or her group) were coded, using a time-sampling technique of 10 seconds observe, 10 seconds code. Three coders applied this system. Reliability estimates were based on a set of ten 10-minute sessions coded independently by two observers. Three such estimates were obtained, for the three possible observer pairs, for a total of 30 intervals, or approximately 9% of the total. Pearson correlations between the frequencies of the various behaviors had a median of .78.
In the comparison of the two programs, there were significant differences for nine of the categories, all in the hypothesized direction. Only two of the categories yielded results in the nonhypothesized direction, both nonsignificantly. The behaviors may be grouped into three broad classifications, as shown in Table 1. The first is the category of teaching behaviors. Eliciting verbal imitation and eliciting unison responding were hypothesized to be more common in DI, as they were. Modeling or otherwise presenting a cognitive process or general behavioral approach (such as counting, slowing down, sequencing, or turn-taking, when these were means to an end, rather than the end itself) was expected to be higher in ML, as it was. The frequency of goal-setting and goal-questioning did not differ in the two programs, though it had been expected to. Generalization is of particular interest; it refers to explicitly calling attention to the applicability of a word, label, skill, or process to a situation different from that present. No difference was expected for generalization of content items of labels, and none was observed; but as expected, process generalization was higher in ML.
Teacher questions are particularly revealing of program style. The DI classrooms were expected to have more limited response questions (yes/no questions and what/who/where/which questions for which there is a limited set of responses specified by the context) and more labeling questions, which they did. The ML classrooms were expected to have more questions requesting a relevant process rather than the answer, and more open-ended observation questions (including reports of own thoughts and feelings). Both these predictions were confirmed; however, a third prediction was not confirmed--that ML classrooms should have more projection and generalization questions that go beyond the present situation (such as asking for consequences of actions, the future, someone else's thoughts or feelings, or alternative solutions after one has been generated).
Three types of teacher-response behaviors were identified for the third broad category. Here the predictions were generally not confirmed, except that there was more immediate correction of an incorrect answer in the DI program.
The performance of the pupils in both programs at pretest and posttest is summarized in Table 2. Means are reported for the total sample, for the preschool sample alone, and for the kindergarten sample alone. In most cases both raw scores and scaled developmental scores were used for statistical analysis. For the McCarthy, however, only the scale scores were analyzed: the GCI for an overall measure, and the scaled scores for the five subscales of Verbal, Perceptual, Memory, quantitative, and Motor. These scores are developmental quotients with a mean of 50 and a standard deviation of 10.
Program Effectiveness Comparison
Differences in program effectiveness were assessed by the interaction terms in repeated measures analyses of variance, with Program as a between-subjects factor, and Time of Testing as a within-subjects factor. The first column of Table 2 summarizes the analyses for the total sample. For the McCarthy, the gain was greater for the ML group. This was due primarily to differences on the Verbal and Memory subscales. The subscales are quotients; and therefore children's performance can increase in an absolute sense even when the quotient decreases. The reverse pattern is observed for the Test of Early Language Development, where there is a greater gain for the DI group for both raw score and language quotient. Thus there is a differential effect of program; each program has at least one measure on which it is superior.
The second and third columns of Table 2 separate the results from the preschool and kindergarten students. For the preschool sample alone, the overall findings concerning the McCarthy MA, Verbal, and Memory scores, and the Test of Early Language Development, are replicated (not suprisingly, since they make up the great majority of the combined sample). There are, however, two new findings specific to the preschool group. The first concerns the Basic Language Concepts test, which is in fact a "near" test for DI. Because this is the one test that has an error score, improvement corresponds to a decrease. The improvement is greater for DI than for ML. Second, there is a significant difference favoring the ML group on Mean Length of Utterance. As discussed earlier, MLU is generally viewed as a good overall measure of expressive language in the preschool period, though not later. Weak trends also favor the ML group on both the TERA and the TEMA.
The dissociation among the language measures is even clearer for the preschool group than for the full sample. The McCarthy Verbal and Memory, and Mean Length of Utterance, favor the ML group, whereas the TELD and the BLC test favor the DI group. Though it is difficult to objectify or quantify the distinction, the first set of measures seems more open ended, perhaps more divergent, whereas the second set seems more specific and focused, perhaps more convergent. If this is a valid characterization, it suggests that the programs are having differential effects, even within the language domain, highly consistent with their overall philosophies.
The third column of Table 2 presents the results for the kindergarten group alone. Because the kindergarten sample is substantially smaller, statistical significance is more difficult to obtain. For this sample, the few significant differences observed consistently favor the DI group, including a marginally significant difference on the McCarthy General Cognitive Index. Most of the other comparisons also favor the DI group. The Stanford Early School Achievement Test was given only to the kindergarteners. No significant differences were observed on either the total score or subtests.
Aptitude-by-Treatment Interaction Analyses
Following Pedhazur (1982), multiple regression analyses were used to evaluate aptitude-by-treatment interactions (ATIs). To keep the sample size as large as possible (ATI analyses have extremely low power with small sample sizes, which may account for the dearth of significant ATIs reported in the literature), the combined sample was used. In each analysis, Program was entered as the first predictor variable, followed by pretest aptitude measure, followed by the interaction term. Both a global cognitive measure (McCarthy GCI) and a language measure (TELD Language Quotient) were evaluated as pretest aptitude measures. There were 24 such analyses, two each for the posttest measures of McCarthy Mental Age, General Cognitive Index, Verbal, Perceptual, Quantititive, Memory and Motor Scales, and PPVT-R, BLC, TELD, TERA, and TEMA Raw Scores. Although 18 of the 24 analyses were in the predicted direction, with steeper regression slopes for Mediated Learning than for Direct Instruction (i.e., higher functioning children scoring better on the posttest in ML, whereas lower functioning children scored better on the posttest in D1), none were significant.
Perhaps the major finding of this study is that the two programs do indeed have differential effects that are consistent with the program philosophies. These results are particularly intriguing with respect to the role of language. There are clear and striking differential effects of program on language. Far from being a unitary construct, language is multidimensional. Even an informal observation of the programs reveals that language is being used with a higher cognitive load, with more abstractness, in the Mediated Learning program, and this higher load may require more language ability than was present in the lower functioning children in the sample. It is probable that the ability to use language as a representational tool is critical. In the second year of the project, we have added the Preschool Language Assessment Instrument of Blank, Rose and Berlin (1978) to the test battery. This test was specifically designed to measure the ability to use language for cognitive purposes. It may be the most appropriate language aptitude measure to demonstrate an ATI.
The failure to obtain a significant aptitude-by-treatment interaction is consistent with a previous study (Cole & Dale, 1986) comparing two contrasting language intervention programs. Significant ATIs are difficult to obtain, in part because of the very conservative nature of the statistical procedure. In addition, it is plausible that global-developmental measures such as language and cognitive quotients are not sensitive to the specific cognitive and behavioral characteristics that influence program effectiveness for individual students. Measures of learning style reflecting affective and social characteristics as wells as cognitive ones, for example, locus of control, may have more promise here.
One of the most serious limitations in our research, and other work like it, is the absence of reliable and valid means of assessing strategic thinking in young children, the cognitive processes which are the focus of Mediated Learning. Most appropriate would be methods of dynamic assessment, which examine learners' abilities to gain from graduated instruction, and which examine the specific strategies manifested during the learning process. Interestingly, Direct Instruction programs have been moving in the direction of cognitive process instruction in recent years (Gersten et al., 1986). For example, programs for somewhat older learners now focus on critical reading skills, such as the detection of faulty arguments, and the use of story grammar to promote reading comprehension of narratives. These activities are more difficult to teach than, for example, decoding skills, but they are essential for successful performance at upper elementary levels. Judging from the examples in Gersten et al., DI teaching methods used in this area may be much less different from ML than in earlier phases.
Finally, another aspect of these programs and the results must be addressed. It is clear from informal observation that behavior problems present a particular difficulty for ML. In our program and others, there is more behavioral disruption in the Mediated Learning classrooms than in the Direct Instruction classrooms. Although only a small minority (less than 10%) of the children ever experienced a time-out period in a separate classroom over the course of the year, they were consistently from ML classrooms, occuring in three of the four ML classrooms. In the kindergarten data presented earlier, the superiority of the Direct Instruction results was particularly apparent. By kindergarten, these handicapped students are ready to take advantage of the academic content of DI. But there is another possibility. The ML kindergarten program during this year was particularly hampered by serious behavioral disruption. Such problems can indeed serve as fields for mediation, if they are relatively mild and relatively infrequent. As they become more substantial, teachers face a fundamental dilemma of how to blend more direct approaches to behavioral control with a mediated learning approach without giving up the essence of the program. It should be kept in mind that Mediated Learning is still very much a program under development, whereas Direct Instruction is well established.
To conclude, these two programs are clearly effective. Not only do all raw scores increase substantially, but many of the standard scores (for PPVT-R, TELD, and TEMA), which are uncorrelated with age, also increase, reflecting growth at a higher rate than during the period preceding entry into these programs. The pattern of test scores for the groups reflect the orientation of the programs. Both represent tactics for early education which are at least initially fruitful. Only longitudinal research can establish the long-term effects of each strategy.
Becker, W. C. (1977). Teaching reading and language to the disadvantaged--What we have learned from field research. Harvard Educational Review, 47, 518-543.
Becker, W. C., Engelmann, S., & Thomas, D. R. (1975). Teaching 2: Cognitive learning and instruction. Chicago: Science Research Associates.
Blank, M., Rose, S., & Berlin, L. J. (1978). Preschool Language Assessment Instrument. New York: Grune & Stratton.
Brown, R. (1973). A first language. Cambridge, MA: Harvard University Press.
Burns, S., Haywood, C., Cox, J., Brooks, P., Green, L., Ransom, O., Goodroe, P., & Willis, E. (1983, December). Let's thing about it: A cognitive curriculum for young children. Paper presented at the Handicapped Children's Early Education Programs and division of Early Childhood Conference, Washington, DC
Campbell, D. T., & Stanley, J. (1966). Experimental and quasi-experimental designs in research. Chicago: Rand McNally.
Casto, G., & Mastropieri, M. A. (1986). The efficacy of early intervention programs: A meta-analysis. Exceptional Children, 52, 417-424.
Cole, K. N., & Dale, P. S. (1986). Direct language instruction and interactive language instruction with language delayed preschool children: A comparison study. Jrnl of Spch & Hearing Rsrch, 29, 206-217.
Dunst, C. J., & Rheingrover, R. H. (1981). An analysis of the efficacy of infant intervention programs with organically handicapped children. Evaluation and Program Planning, 4, 287-293.
Feuerstein, R., Rand, Y., Hoffman, M. B., & Miller, R. (1980). Instrumental enrichment: Redevelopment of cognitive functions of retarded performers. Baltimore: University Park Press.
Gersten, R., Woodward, J., & Darch, C. (1986). Direct Instruction: A research-based approach to curriculum design and teaching. Exceptional Children, 53, 17-31.
Harth, R. (1982). The Feuerstein perspective on the modification of cognitive performance. Focus on Exceptional Children, 15, 1-12.
Miller, J. F. (1981). Assessing language production in children. Baltimore: University Park Press.
Odom, S. M., & Fewell, R. R. (1983). Program evaluation in early childhood special education: A meta-evaluation. Educational Evaluation and Policy Analysis, 5, 445-460.
Pedhazur, E. J. (1982). Multiple regression in behavioral research. New York: Holt, Rinehart & Winston.
White, K., & Casto, G. (1985). An integrative review of early intervention efficacy studies with at-risk children: Implications for the handicapped. Analysis and Intervention in Dvlpmntl Disabilities, 5, 7-31.
PHILIP S. DALE is Associate Professor, Department of Psychology and KEVIN N. COLE is Principal, Experimental Education Unit, University of Washington, Seattle.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||includes bibliography|
|Author:||Dale, Philip S.; Cole, Kevin N.|
|Date:||Feb 1, 1988|
|Previous Article:||Relationship between teachers' effectiveness and their tolerance for handicapped students.|
|Next Article:||Observed supervisory behavior and teacher burnout in special education.|