Progress monitoring with objective measures of writing performance for students with mild disabilities.
In the current reappraisal of writing in schools (Scardamalia & Bereiter, 1986; Stewart, 1985) emphasis is given to the need for both effective instruction and for adequate tests of writing proficiency, since remediation of writing deficiencies implies accurate assessment (Isaacson, 1985). Writing assessment in special education should provide both evaluation and formative adjustment of instruction through progress monitoring (Moran, 1987). Therefore, writing tests are needed that are sensitive to small increments of skill growth across short and medium periods of time (Tindal, 1989)--placing demands for high technical adequacy on the tests.
Direct writing assessment methods--which examine actual student writing samples--have received considerable attention (Stiggins, 1982), because they are considered to have high face and content validity (Charney, 1984; Moran, 1987; Stiggins, 1982). In this study we investigated the technical adequacy of a direct assessment methodology for progress monitoring.
The two primary methods for directly scoring writing samples are "holistic," in which subjective judgments are used to rank or rate papers, and "atomistic," which consider discrete, countable components of the written product (Isaacson, 1985). The holistic evaluations carried out in this study resulted in a single, global judgment of writing quality (Conlan, 1978; Spandel, 1981), while the atomistic indexes, including "number of correct word sequences" and "number of correctly spelled words," yielded counts, averages, and proportions or percents (Deno, Marston, & Mirkin, 1982; Hunt, 1965). While direct holistic evaluations may be more suited to a "writing process" instructional approach (Lynch & Jones, 1989), the atomistic indexes are suited to an instructional approach based on building mastery of subskills.
TECHNICAL ADEQUACY OF
With prior training and the use of "anchor" papers (Charney, 1984; McColly, 1970), it is possible to attain moderate to strong intrascorer and interscorer reliability levels (r = .75-.85) with holistic judgments (Mishler & Hogan, 1982; White, 1984). Furthermore, these judgments are often moderately related to criterion measures such as standardized writing tests (Veal & Hudson, 1983), handwriting or neatness (McColly, 1970), spelling errors, and length of writing sample (Nold & Freedman, 1977).
Holistic judgment scoring methods may be adequate for eligibility or program entry/exit decisions (Spandel & Stiggins, 1980), but not for planning individualized programs, since they are not referenced to any specific instructional features (Moran, 1987; Spandel & Stiggins, 1980). Such judgments also may lack appropriate scale properties for use in frequent, repeated assessments. Because of their undeniable face validity (communication to a reader) and wide use in schools, however, holistic judgments are included in the present study as the chief criterion against which atomistic scoring methods are compared.
TECHNICAL ADEQUACY OF
Several different atomistic scoring methods have been investigated and found to be internally consisten (r = .85) and to provide acceptable agreement between judges (r = .95). Retest stability, which is critically important for progress monitoring, is lower (range. 48-.71) (Marston & Deno, 1981; Tindal & Parker, 1989).
The criterion-related validity of more complex indexes (e.g., "T-units" and "correct word sequences") is supported through moderate-size relationships with holistic ratings (Issacson, 1985; Tindal & Parker, 1989). With regular education students, simpler indexes, such as "number of words written," and "number of spelling errors," have also produced moderate correlations (Grobe, 1981; Moss, Cole, & Khampalikit, 1982) with holistic ratings and with published writing tests such as the Test of Written Language (TOWL) (Hammill & Larsen, 1983). These relationships have not been found consistently in writing samples from special education populations, however (Tindal & Parker, 1989).
Despite their lack of face validity (Charney, 1984; McColly, 1970) several atomistic indexes appear technically adequate for both program eligibility and program planning decisions, if the subskills they represent are directly teachable (Gorell, 1983; Simmons, 1984). Progress monitoring has also been recommended with atomistic indexes (Deno et al., 1982), but empirical support is lacking. Although the indexes "seem ideal for use in routine, systematic formative evaluation," low stability estimates (r = .60) (Deno et al., 1982) indicate caution.
The technical adequacy of a test should be established under the types of student performance, task conditions, and time frames that mirror the real-world context of its use (Cronbach, 1971). For progress-monitoring purposes, analytic methods should be sensitive to the time series nature of repeated measurement. To establish the reliability of a test for this purpose, the conservative coefficient of stability and equivalence is the most appropriate reliability estimate, although it commonly yields lower estimates than do parallel forms or internal consistency analyses (Sabers, Feldt, & Reschly, 1988). An appropriate analytic method for establishing the validity of progress-monitoring tests is profile analysis (Bock, 1975) conducted on series of scores obtained over time.
Although atomistic indexes of writing quality appear technically adequate for eligibility and program-planning decisions, little evidence supports their use for progress monitoring with students in special education. The purpose of this study was to investigate the use of seven different atomistic writing indexes for progress monitoring with middle school (Grades 6-8) students with mild learning disabilities over a 6-month period. The main concern was to obtain reliability and validity estimates under conditions that mirror those of real-world classroom use, and to use analytic procedures that are sensitive to performance changes over time.
Two middle schools (Grades 6-8) with enrollments of 475 and 650, respectively, were selected for the study from a suburban, west coast, lower-middle socioeconomic status (SES) school district of 9,000 students. From these schools we selected all 54 students (41 males and 13 females) enrolled in the six language arts resource classes. All students were identified as learning disabled according to the Oregon Department of Education discrepancy formula and were on active IEPs for language arts (spelling, written expression) and/or reading. Seven of the students additionally qualified as educable mentally retarded under state guidelines. Ages ranged from 11 years 4 months to 15 years 8 months (Md = 12 years 7 months). No racial minorities were represented in the student sample.
Extant individual intelligence test scores on the Wechsler Intelligence Scale for Children-Revised, (Wechsler, 1974) and Woodcock-Johnson Psycho-Educational Battery (Woodcock & Johnson, 1977) available for 32 students ranged from 63 to 91 (Md = 73). Extant reading achievement test scores on the Woodcock-Johnson and the Woodcock Reading Mastery Test (Woodcock, 1973) ranged from mid-first to high-third grade levels (Md = Grade 2.5). Median percentiles for California Achievement Tests (CATs), Form E (CTB/McGraw Hill, 1985) subtests were: Vocabulary, 8th; Total Reading, 7th; Spelling, 6th; Language Mechanics, 10th; Language Expression, 12th; Total Language, 10th. For all CAT subtests, percentile ranks ranged from the 3rd to the 29th. School withdrawals and high absenteeism during the 6 months of this study reduced the initial group of 54 students to only 36, for whom results are reported.
Four classrooms used a synthetic phonics approach to remedial teaching in spelling and reading. The instructional program in the other two classrooms was eclectic, including high-interest readers, workbooks, and both teacher-made and published handouts. Written instruction was not a priority in any of the six classrooms, occupying less than 5% of class time, an estimate based on 48 hours of "momentary time sample" and "event record" observations per classroom (Parker & Tindal, 1988).
Seven objective, direct-scoring indexes were applied to four sets of writing samples. Holistic ratings of these writing samples was the primary validation criterion; the TOWL served as a secondary criterion.
Writing Samples. Writing samples were obtained from students at four points during the school year: October, January, February, and April. Students were read a story starter and asked to write on that topic for 6 minutes (min). The four story starters were: (a) "It was the night before Halloween, and the students planned to ..."; (b) "Christmas was coming ..."; (c) "If I won the lottery ..."; (d) "My favorite sport is ...." Previous research indicated the comparability (r = .79-.87) of writing samples drawn from various story starters (Marston & Deno, 1981). After 3 min of writing, students were directed to draw a small star, which allowed the writing samples to be scored in two 3-min sections, for calculation of split-half reliability.
Objective Scoring. All writing samples were objectively scored by undergraduate students trained in the seven atomistic scoring methods. The methods are described here in a rough, logical skill sequence, with a brief rationale for each in parentheses. More details are available in the training manual (Hasbrouck, 1989).
1. Tot.Wd: total number of words written, regardless of spelling or handwriting legibility. (First, students must put letter groupings on paper.)
2. Leg.Wd: number of letter groupings recognizable as real English words. (Second, letters must be identifiable and letter groupings must approximate known words.) Leg.Wd. was scored independent of context clues by scanning each writing sample from the end to the beginning, looking through a cardboard mask that revealed only one word at a time.
3. CSWd: number of correctly spelled words. (Third, words must be spelled correctly to improve readability.)
4. CWSeq: number of adjacent, correctly spelled word pairs that make sense together, given the context of the sentence (Videen, Deno, & Marston, 1982). (Fourth, correctly spelled words must be sequenced in a sensible manner.)
5. ML/CWSeq: average length of all continuous strings of CWSeq. (Fifth, the strings of correctly spelled and sequenced words should increase in length.) ML/CWSeq was devised as an alternative to Hunt's (1977) T-units, because dependent and independent clauses were often not distinguishable, as T-unit computation requires.
6. %Leg.Wd: proportion of the total words written that are legible, that is, the ratio, Leg.Wd/Tot.Wd. (Throughout this skill-development sequence, the proportion of words that are legible should increase toward 100%.)
7. %CSWd: proportion of words written that are correctly spelled, that is, the ratio, CSWd/Tot.Wd. (The proportion of words that are correctly spelled should also increase toward 100%.)
Figure 1 shows the five primary scores (Tot.Wd, CSWd, Leg.Wd, CWSeq, ML/CWSea) and two derived scores (%CSWd, %Leg.Wd) applied to two typical writing samples. Tot.Wd, a simple count of the words in the sample, is not marked in the figure. All CSWd are circled. A caret ([caret]) is placed above and between each correct word sequence. Words identified as illegible are underlined with a wavy line; Leg.Wds are the remaining unmarked words. Parentheses mark each unbroken string of CWSeq. ML/CWSeq is then computed as the average of the number of carets in these strings.
Both "production" and "production-independent" indexes (Rafoth & Rubin, 1984) were included in the study. Tot.Wd, CSWd, CWSeq, and Leg.Wd are considered production measures, because they depend in part on the length of the writing sample, unlike the production-independent indexes, %CSWd, %Leg.Wd, and ML/CWSeq. Three of the production indexes (CWSeq, Tot.Wd, and CSWd) were researched at the University of Minnesota Institute for Research in Learning Disabilities (IRLD). The remaining four indexes have not, to our knowledge, been previously researched.
Interrater reliabilities awere computed for all seven indexes from the scoring of 32 writing samples by two graduate students. All coefficients were significant at the p < .001 level: Tot.Wd: .99; CSWd: .98, %CSWd: .89; Leg.Wd: .95; %Leg.Wd: .92; CWSeq: 87; ML/CWSeq: .83. Split-half reliabilities for the seven indexes were calculated by separately scoring the first and last 3 min of the 32 writing samples (separated by a student-produced star) and correlating the two sets of scores. Reliability coefficients were all significant at the p < .001 level: Tot.Wd: .77; CSWd: .78; %CSWd: .77; Leg.Wd: .81; %Leg.Wd: .79; CWSeq: .75; ML/CWSeq: .69. Stability over time was assayed by correlating the score for each index across adjacent assessment periods. The three stability estimates for each index are presented in order, with medians in italic: Tot.Wd: .69, .83, .82; CSWd: .68, .79, .75; %CSWd: .45, .75, .46; Leg.Wd: .69, .82, .83; %Leg.Wd: .41, .76, .17; CWSeq: .49, .77, .65; ML/CWSeq: .59, .66, .26.
Two criterion measures were used: holistic judgments of the "communicative effectiveness" of the writing samples, and the first edition of the TOWL (Hammill & Larsen, 1983).
Holistic Rating. Holistic ratings of the writing samples were accomplished by four practicing special education teachers from upper intermediate and middle school grade levels. The 4 raters were selected from a larger group of 10 volunteer teachers on the basis of their high interrater agreement.
To encourage consistent holistic judgments, the four raters were asked to adhere to a definition of "good writing" that emphasized communicative effectiveness (Hasbrouck, 1989). With this guidance, the raters scored a set of 36 writing samples from "poor" to "very good," on a 1-7 scale (with no intermediate descriptors provided). Interrater reliabilities were computed among the four raters, producing a median r = .80 (range .74-.87) (all with p < .001). Interrater agreement was computed again for each new set of writing samples. The resulting mean Pearson r coefficients were as follows: October .74, January .72, February .81, April .75, all with p < .001--all within the .70-.80 range recommended for the "early stages of research on predictor tests" for groups (Nunnally, 1978, p. 245). Stability of holistic ratings across the four testing periods was estimated in the same manner as for the objective indexes. The Pearson correlations for 36 holistic ratings between the four adjacent assessment periods (varying from 1 to 3 months apart) were: .75, .58, .77 (all with p < .001).
TOWL. The TOWL (Hammill & Larsen, 1983) was administered in May, and results were correlated with scores from April's writing samples. A Total Quotient and six subtest scores were obtained: [TDO]
Vocabulary, Thematic Maturity, Spelling, Word Use, Style, and Handwriting. A more recent revision of the TOWL (1988), unavailable during the study, includes additional atomistic, countable indexes.
To determine whether the effects of attrition (from 54 to 36 students) resulted in a highly skewed or nonnormal sample, skewness and kurtosis of TOWL and atomistic scores were established on the first two sets of writing samples, using D'Agostino and Tietjen's (1973) critical value table. Of the 46 separate statistical tests conducted at a liberal .05 level, only 7 showed significant nonnormality, which justified proceeding with further analyses.
Summary statistics on the seven objective indexes and holistic ratings from four sets of writing samples are presented in Table 1.
An average of 20-30% of written words were misspelled, and 6-12% were not legible or recognizable as real words. Only "total number of words written" (Tot.Wd) and "number of legible words written" (Leg.Wd) appear to increase regularly over the 6 months. For the other five objective indexes, as well as for holistic ratings, a trend of regular improvement was not apparent. The two percent indexes -- %CSWd and %Leg.Wd -- showed little variance; for the other indexes, SDs were roughly one-half to one-third the size of the group's mean score. Because no significant (p < .05) between-grade differences were found on any of the objective indexes, data from Grades 6-8 were aggregated for all further analyses.
For each of the four sets of writing samples, the seven objective scores were correlated with holistic ratings, averaged over four raters (see Table 2).
On the average, the strongest predictors of holistic ratings (p < .001) were "percent of legible words" (%Leg.Wd), "correct word sequences" (CWSeq), and "mean length of correct word sequences" (ML/CWSeq). These three measures produced moderately strong correlations averaging .63 (range .48-.75). The next most consistent predictors were "number of correctly spelled words" (CSWd) and "percent of correctly spelled words" (%CSWd), yielding average correlations around .53 (range .43-76). Of the remaining four indexes, "number of legible words" (Leg.Wd) was a moderately strong predictor in only two sampling periods. The weakest predictor of holistic scores was "total number of words written" (Tot.Wd), producing only one significant correlation.
Validation with the TOWL
In a second test of concurrent validity, TOWL scores were correlated with the holistic ratings and the seven objective direct scoring measures, respectively (see Table 3).
Three of the six TOWL subtests -- Thematic Maturity, Spelling, and Word Use -- were significantly related to direct scoring indexes; the remaining subtests -- Vocabulary, Style, and Handwriting -- were not significantly related to any objective measure or to holistic ratings. Word Use was the strongest predictor, correlating moderately with all seven objective indexes and with holistic ratings (range .42-.65). In general, the three objective indexes that correlated most highly with TOWL subtests (%Leg.Wd, CWSEq, and ML/CWSeq) also were most strongly related to holistic ratings.
For assaying sensitivity to growth, "static" comparisons can be made among scores obtained at a single point in time; or "dynamic" analyses can be conducted on performance profiles obtained over time (Bock, 1975). Static comparisons include (a) traditional stability estimates, by correlating test results obtained at two points in time (Nunnally,
1978) and (b) repeated comparisons between the test prototype and a validation criterion measure at several occasions.
Stability estimates between adjacent assessment periods were uniform and of at least moderate size for Tot.Wd, CSWd, %CSWd, Leg.Wd, and CWSeq. Two indexes, %Leg.Wd and ML/CWSeq, showed considerable "bounce," however, with stability coefficients ranging from .17 to .76. Results from repeated comparisons between holistic judgments and objective indexes are displayed in Table 2. Correlations for three of the indexes -- %Leg.Wd, CWSeq, and ML/CWSeq -- were quite stable across the four assessment periods. Highly variable correlations were evidenced by Tot.Wd and the two other weakest predictors, %CSWd and Leg.Wd. Tot.Wd correlations ranged from .26 to .61, and those for %CSWd ranged from .27 to .76 over the four assessment periods.
For dynamic comparisons, profile analysis (Bock, 1975; Stevens, 1986) was employed to determine whether countable indexes paralleled holistic ratings over time. Profiles for each of the eight measures were first graphed by plotting standardized mean scores, with their grand means and [+ or -] .5 SD bands (see Figure 2).
The performance profiles were next tested for linear, simple curvilinear, and complex curve fits, through polynomial contrasts (after adjustments for unequal intervals) (Stevens, 1986). Hotelling-Lawley Trace, and omnibus F test, was used to detect the presence of any significant trend in the profile (see Table 4). The existence of complex curves is not of particular interest to this study, but gives us a statistical "handle" for comparing the profiles of objective indexes with that of the criterion variable. Holistic ratings showed no significant linear or curvilinear trend over the four assessment periods, partly due to the variability among individual scores. The profile of ML/CWSeq shows no significant linear or curvilinear trend either, and visually appears to be the most parallel with holistic ratings. In contrast, significant linear trends were exhibited by Tot.Wd, CSWd, Leg.Wd, %Leg.Wd, and CWSeq; that is, steady improvement in performance was noted in these indexes over the four assessment periods.
Special and remedial education teachers need writing tests for progress monitoring to formatively evaluate programs and adjust instruction (Moran, 1987). This study focused on the technical adequacy of seven direct, objective writing indexes in serving that purpose. Technical adequacy was assessed through (a) stability estimates, (b) repeated comparisons with one validation criterion--holistic ratings, and (c) dynamic comparisons of profiles obtained over time. A second criterion variable, the TOWL, was also used in a static validation test with a single set of writing samples.
On the basis of both direct assessment and informal judgments of the research team, students appeared not to improve in writing over the 6-month study. This finding may be attributable to the very small amount of active writing instruction noted in the language arts resource rooms, a fact also noted by other researchers (Leinhardt et al., 1980). The observed lack of progress within a 6-month period is consistent with the low writing levels displayed by students after 6 to 8 years of public schooling--much of it with the support of special instruction. The observation of no progress is also consistent with the lack of performance differences noted earlier (ANOVA results) among students in Grades 6, 7, and 8 on the TOWL and objective indexes.
Against this backdrop of apparent lack of progress within the 6-month period, the trend analysis results are especially interesting. Pronounced linear growth was noted in three indexes, "total number of words written" (Tot.Wd), "number of legible words written" (Leg.Wd), and "number of correctly spelled words" (CSWd). The first two of these indexes, however, yielded some of the lowest correlations in static comparisons with holistic ratings and the TOWL. Although the linear trends of these three objective indexes, and their strong
stability estimates superficially suggest "sensitivity growth," that conclusion is not corroborated by the two criterion measures or by informal observations by the research team. From the eight performance profiles, it appears that although students increased in writing speed or productivity (as measured by reliable and stable metrics), their communicative effectiveness did not improve. This conclusion, however, is based on the single criterion of holistic judgment with its rather narrow ordinal scale and its unknown sensitivity to skill improvement.
The objective indexes that correlated most highly with the TOWL and holistic ratings were as follows: "percent of legible words" (%Leg.Wd), "mean length of correct word sequences" (ML/CWSeq), and "number of correct word sequences" (CWSeq). The size of these correlations varied considerably over time, however. The ML/CWSeq correlations ranged from .48 to .75, and %Leg.Wd ranged from .53 to .72. This variability might be anticipated from the "bounce" is stability estimates for all three indexes, especially ML/CWSeq and %Leg.Wd. Therefore, these three indexes may lack the stability needed for making decisions from progress monitoring for writing-deficient middle school students.
Considerable variability in holistic criterion correlations by Tot.Wd (range .26 to .61) and %CSWd (range .27 to .76) was also noted. This variability is likely due not to random measurement error, but to systematic error (unexplained variance) among the different assessment occasions. As Phelps-Gunn and Phelps-Terasaki (1982) have noted, variability in writing performance from one day to the next may result from a range of factors, including fatigue, anxiety, emotional state, or interest. Sources of systematic error could also include the content of recent writing lessons, peculiarities of each "story starter," or evolving ideas by students of "what is wanted in the test."
The three objective indexes that consistently produced moderate-size correlations over several static comparisons (%Leg.Wd, ML/CWSeq, CWSeq) and with two criterion measures (holistic ratings and the TOWL) did not demonstrate sufficient stability over time, or perform well in dynamic profile analysis. Among the three profiles, only ML/CWSeq was reasonably parallel with holistic ratings.
The seven atomistic indexes were originally described in a loose developmental sequence. The implication is that certain indexes may be differentially appropriate for students with more or less deficient writing. The present sample size does not allow the analyses required to confirm this possibility, although scrutiny of the data suggest that it may be a fruitful course for future research.
This study has four main limitations. First, the sample is marginal in size, and restricted geographically, by community SES, and by grade level. Second, a single 6-min writing sample was gathered at each assessment period; two or three would be preferable. Six minutes may appear insufficient for development of an idea that can be communicated effectively and holistically evaluated. However, across all assessment periods 90% of the students finished writing before the allotted time. A third limitation is that little dedicated writing instruction occurred during the course of the study. This fact could limit the generalizability of findings to schools with more active instructional programs. However, a lack of active writing instruction may be more common than not (Leinhardt et al., 1980). A fourth limitation is the scope of only 6 months of instruction. Whereas 6 months is a reasonable time period for assessing growth in regular education, a full year or more would have been preferable with these students. This design required the same students at each assessment period, however, and the large dropout rate precluded a more lengthy study.
Three implications can be drawn from this study for the assessment of writing in special and remedial education, while acknowledging the limitations described earlier. First, three of the seven indexes show promise for use in screening and eligibility decisions with very deficient writers in middle school settings: "number of correct word sequences," "mean length of correct word sequences," and "percent of legible words written." As noted earlier, these indexes appear more congruent with a "subskill mastery" approach to instruction than a "writing process" approach. For instructional programs with a "writing process" approach, holistic judgments appear to be sufficiently reliable for "static" (no judgments of improvement over time) evaluation of writing samples.
Program-planning decisions require the identification of specific subskills that are relevant to instruction. The objective indexes may also be useful for that purpose, but currently there is no empirical evidence to support that application. Using holistic judgments as the criterion, none of the seven measures proved sufficiently valid or stable for measuring skill growth of writing-deficient middle school students over time. Perhaps testing less frequently and combining scores from two or three samples would improve stability of the indexes. However, greater standardization of the writing topics and greater structuring of the writing task also appear to be needed.
The second implication of these results should be skepticism toward use of "total number of words written" and "number of correctly spelled words" with writing-deficient students of middle school age, if the intent is to make improvements in overall writing quality. Although these two indexes have been researched (Marston & Deno, 1981) and are cited in the literature (Isaacson, 1985), the results from this study are not supportive. These results are consistent with a recent study of 175 writing-deficient students, in which "total number of words written" bore the weakest relationship (r=.10) with holistic ratings (Tindal & Parker, 1989). This factor analytic study offers additional insights into the present results. Eight atomistic indexes yielded two orthogonal factors: production-dependent ("number of" indexes) and production-independent (percent indexes). The production-independent factor scores were much stronger predictors of holistic judgments (r=.69) than were the production-dependent factor scores (r=.24)(Tindal & Parker, 1989).
The third implication of this study is the need to insist on a high degree of technical adequacy if assessment procedures are to be used for progress monitoring in special education. Simple ANOVAs or static comparisons with criterion measures do not provide sufficient information; those results may not be confirmed by results from designs that more clesely replicate the intended classroom use of the instruments. Given the importance of holistic judgments to these findings, additional research is needed on the sensitivity of holistic judgments and on the stability in judgments made under different judgment criteria. In summary, further research is needed before the three most promising indexes, "number of correct word sequences," "mean length of correct word sequences," and "percent of legible words written," can be recommended for use is progress monitoring with writing-deficient middle school students.
Barenbaum, E., Newcomer, P., & Nodine, B. (1987). Children's ability to write stories as a function of variation in task, age, and developmental level. Learning Disability Quarterly, 10, 175-188.
Bock, R. D. (1975). Multivariate statistical methods in behavioral research. New York: McGraw-Hill.
Bridge, C. A., & Hiebert, E. H. (1985). A comparison of classroom writing practices, teachers' preceptions of their writing instruction, and textbook recommendations on writing practices. Elementary School Journal, 86, 155-172.
Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18, 65-81.
Conlan, G. (1978). How the essay in the College Board English Composition Test is scored. An introduction to the reading for readers. Princeton, NJ: Educational Testing Service.
Cronbach, L.J. (1971). Test validation. In R.L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 443-507). Washington, DC: American Council on Education.
CTB/McGraw Hill. (1985). California Achievement Tests, Form E. Monterey, CA: CTB/McGraw Hill.
D'Agostino, R. B., & Tietjen, G. L. (1973). Approaches to the null distribution of [b.sup.1]. Biometrika, 60, 169-173.
Deno, S., Marston, D., & Mirkin, P. (1982). Valid measurement procedures for continuous evaluation of written expression. Exceptional Children, 48, 368-371.
Englert, C. S., Raphael, T. E., Fear, K. L., & Anderson, L. M. (1988). Students' metacognitive knowledge about how to write informational texts. Learning Disability Quarterly, 11, 18-46.
Gorell, R. (1983). How to make Mulligan stew: Process and product again. College Composition & Communication, 34, 272-277.
Grobe, C. (1981). Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings. Research in the Teaching of English, 15, 75-85.
Hammill, D., & Larsen, S. (1983). Test of written language. Austin, TX: Pro-Ed.
Hammill, D., & Larsen, S. (1988). Test of written language-revised. Austin, TX: Pro-Ed.
Hasbrouck, J. (1989). Training manual for direct, objective scoring of writing samples. (Resource Consultant Training Program Module No. 2). Eugene: University of Oregon.
Hunt, K. W. (1965). Grammatical structures written at 3 grade levels. (Research Report No. 3). Champaign, IL: National Council of Teachers of English.
Hunt, K. W. (1977). Early blooming and late blooming syntactic structures. In C. Cooper & L. Odell (Eds.), Evaluating writing (pp. 91-104). Buffalo: National Council of Teachers of English, State University of New York at Buffalo.
Isaacson, S. (1985). Assessing written language skills. In C. S. Simon (Ed.), Communication skills and classroom success: Assessment methodologies for language-learning disabled students (pp. 403-424). San Diego: College-Hill Press.
Leinhardt, G., Zigmond, N., & Cooley, W. W. (1980), April). Reading instruction and its effects. Paper presented at the annual meeting of the American Educational Research Association, Boston.
Lynch, E. M., & Jones, S. D. (1989) Process and product: A review of the research on LD children's writing skills. Learning Disability Quarterly, 12, 74-86.
Marston, D., & Deno, S. (1981). The reliability of simple, direct measures of written expression. (Research Report No. 50). Minneapolis: University of Minnesota Institute for Research on Learning Disabilities.
McColly, W. (1970). What does educational research say about the judging of writing ability? Journal of Educational Research, 64, 147-156.
Mishler, C., & Hogan, T. (1982). Holistic scoring of essays. Diagnostique, 8, 4-16.
Moran, M. R. (1987). Options for written language assessment. Focus on Exceptional Children, 19(5), 1-10.
Moss, P. A., Cole, N. S., & Khampalikit, C. (1982). A comparison of procedures to assess written language skills at grades 4, 7, and 10. Journal of Educational Measurement, 19, 37-47.
Nodine, B. F., Barenbaum, E., & Newcomer, P. (1985). Story composition by learning disabled, reading disabled, and normal children, Learning Disability Quarterly, 8, 167-179.
Nold, E., & Freedman, S. (1977). An analysis of readers' response to essays. Research in the Teaching of English, 11, 164-174.
Nunnally, J. (1978). Psychometric theory. New York: McGraw-Hill.
Parker, R., & Tindal, G. (1988). A curriculum evaluation strategy to guide district-level basal text adoption decision-making. Special Services in the Schools, 5(1/2), 33-66.
Phelps-Gunn, T., & Phelps-Terasaki, D. (1982). Written language instruction: Theory and remediation. Rockville, MD: Aspen Systems.
Rafoth, B. S., & Rubin, D. L. (1984). The impact of content and mechanics on judgments of writing quality. Written Communication, 1(4), 446-458.
Roit, M., & McKenzie, R. (1985). Disorders of written communication: An instructional priority for LD students. Journal of Learning Disabilities, 19, 258-260.
Sabers, D. L., Feldt, L. S., & Reschly, D. J. (1988). Appropriate and inappropriate use of estimated true scores for normative comparisons. The Journal of Special Education, 22(3), 358-366.
Scardamalia, M., & Bereiter, C. (1986). Research on written composition. In M. C. Wittrock (Ed.), Handbook of research on teaching (3rd ed.) (pp. 778-803). New York: Macmillan.
Schenck, S. J. (1981). The diagnostic/instructional link in individualized education programs. Journal of Special Education, 14(3), 337-345.
Simmons, J. (1984). The one-to-one method of teaching composition. College Composition & Communication, 35, 222-229.
Spandel, V. (1981). Classroom applications for writing assessment: A teacher's handbook. Portland, OR: Northwest Regional Educational Laboratory.
Spandel, V., & Stiggins, R. J. (1980). Direct measures of writing skill: Issues and applications. Portland, OR: Northwest Regional Educational Laboratory.
Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum.
Stewart, S. R. (1985). Development of written language proficiency: Methods for teaching text structure. In C. S. Simon (Ed.), Communication skills and classroom success: Therapy methodologies for learning-language disabled students (pp. 341-364). San Diego: College-Hill Press.
Stiggins, R. J. (1982). A comparison of direct and indirect writing assessment methods. Research in the Teaching of English, 16(2), 101-114.
Thomas, C. C., Englert, C. S., & Gregg, S. (1987). An analysis of errors and strategies in the expository writing of learning disabled students. Remedial and Special Education, 8, 21-30.
Tindal, G. (1989). Evaluating the effectiveness of educational programs at the system level using CBM. In M. Shinn (Ed.), Applications of curriculum-based measurement to the development of programs for mildly handicapped students. New York: Guildford Press.
Tindal, G., & Parker, R. (1988). Direct observation in special education classrooms: Concurrent use of two instruments and their validation. The Journal of Special Education, 21(2), 43-59.
Tindal, G., & Parker, R. (1989). Assessment of written expression for students in compensatory and special education programs. The Journal of Special Education, 23(2), 169-184.
Veal, R. L., & Hudson, S. A. (1983). Direct and indirect measures for large-scale evaluation of writing. Research in the Teaching of English, 17, 290-296.
Videen, J., Deno, S., and Marston, D. (1982). Correct word sequences: A valid indicator of proficiency in written expression (Research Report No. 84). Minneapolis: University of Minnesota Institute for Research on Learning Disabilities.
White, E. (1984). Holisticism. College Composition & Communication, 35, 400-409.
Woodcock, R. W. (1973), Woodcock reading mastery tests. Circle Pines, MN: American Guidance Service.
Woodcock, R. W., & Johnson, M. B. (1977). Woodcock-Johnson Psycho-Education Battery. Allen, TX: DLM Teaching Resources.
RICHARD I. PARKER (CEC Chapter #375) is a Research Associate; GERALD TINDAL (CEC Chapter #375) is an Associate Professor; and JAN HASBROUCK (CEC Chapter #375) is an Instructor in the Mild Handicap Area, in the Teacher Education Department at the University of Oregon, Eugene.
|Printer friendly Cite/link Email Feedback|
|Author:||Parker, Richard I.; Tindal, Gerald; Hasbrouck, Jan|
|Date:||Sep 1, 1991|
|Previous Article:||Vocational technical programs: follow-up of students with learning disabilities.|
|Next Article:||Serving troubled youth or a troubled society?|