Printer Friendly

Curriculum-based measurement in writing: predicting the success of high-school students on state standards tests.

In this study, we examine the validity and reliability of curriculum-based measures (CBMs) as indicators of performance on a state standards test in writing. More than 41 states require students to take a test in writing or require a writing component on their English/Language Arts assessments (McCombs, Kirby, Barney, Darilek, & Magee, 2005). In 20 of these states, students are required to pass the state test in order to graduate from high school (Kober, Chudowsky, Chudowsky, Gayler, & McMurrer, 2006). Yet, many high school students struggle with writing, especially those students with disabilities. Results of the 2002 National Assessment of Educational Progress indicate that 70% of 12th-grade students with disabilities score below a basic level of writing proficiency, defined as partial mastery of prerequisite knowledge and skills that are fundamental for proficient work.

Secondary teachers must work to ameliorate the writing difficulties of students with disabilities. To do so, they must have at their disposal writing interventions with a strong evidence base; yet, they must also have the means to evaluate the success of such interventions for their particular students. A system of progress measurement that allows teachers to evaluate the success of their instruction and monitor the growth of students toward success on a state standards test would prove helpful. One such system is CBM.


Curriculum-based measurement (CBM) is a system of progress monitoring used to enhance the instructional decision making of teachers and the achievement of students (Deno, 1985; Deno & Fuchs 1987). In a CBM approach, students are administered probes on a frequent basis (e.g., once per week), and the scores are graphed. Teachers examine the graphs to evaluate the effects of an instructional program. The probes in a CBM progress-monitoring system are designed to be valid and reliable indicators of general proficiency in an academic area (Deno). The concept of indicator is important in a CBM approach. If CBM measures are to be used for instructional decision making, they must be technically adequate; that is, they must have a relatively high level of validity and reliability for gauging performance and progress in an academic area. At the same time, if the measures are to be given on a frequent basis for progress monitoring, they must be feasible; that is, they must be time efficient, easy to administer and score, and easy to understand (Deno).

The initial research on CBM (see Deno, 1985) was conducted at the elementary level and focused on identifying measures that would meet the two-pronged criterion of technical adequacy and feasibility. In the area of writing, early work supported the reliability and validity of a 3-min writing sample produced in response to a story starter and scored for number of words written (WW), words written correctly (WWC), or correct word sequences (CWS) written (Deno, Marston, & Mirkin, 1982; Deno, Mirkin, & Marston, 1980; Marston, 1989; Marston, Lowry, Deno, & Mirkin, 1981; Tindal & Parker, 1991; Videen, Deno, & Marston, 1982). In the 1990s, the research on the development of writing measures was extended to the secondary level. Two critical issues arose in this early research. First, would the simple measures and scoring procedures used at the elementary-school level be valid and reliable at the secondary level or would new measures need to be developed? Second, how would the construct of writing proficiency be defined and measured?


Serving as a backdrop to the development of CBM secondary-level writing measures was a body of research conducted in the 1980s on the writing characteristics of students with learning disabilities (LD; see Newcomer & Barenbaum, 1991, for a review). This body of research yielded a surprisingly consistent, yet sobering, picture. Students with LD were found to struggle with nearly every aspect of writing including, production, fluency, spelling, capitalization, punctuation, word usage, syntax, text structures, cohesion, and overall quality of writing. These difficulties were evident across writing genre (e.g., narrative and expository), modes of production (e.g., handwriting vs. dictation), and age levels (e.g., elementary to college). The persistence of these difficulties suggested a general writing deficiency: "the correlation of certain mechanical skills, particularly spelling and the ability to sequence words correctly, with holistic evaluations of writing and general language test scores suggest that these problems are part of a general or global deficiency" (Newcomer & Barenbaum, p. 583). The existence of a general writing deficiency raised the possibility that a single CBM indicator might be found to represent writing across age, genre, and task. Yet, the technical adequacy of such an indicator would need to be established experimentally, and to do so would require identification of criterion measures of writing.

Selection of criterion measures proved problematic. Many writing assessments provided no or minimal technical adequacy information (see Huot, 1990, and Miller & Crocker, 1990, for reviews). Those that reported validity data revealed low- to mid-level correlations among various writing assessments (e.g., .25 to .60) and correlations often were not reported separately for secondary students (e.g., Test of Written Language-3; Hammill & Larsen, 1996; Woodcock-Johnson III, McGrew & Woodcock, 2001). Difficulty in establishing validity was seen across a variety of assessment approaches, not just commercially produced tools (see Huot; Miller & Crocker).

How could the technical adequacy of potential CBM indicators be determined when the criterion measures themselves correlated moderately at best with each other? An answer could be found in Messick's (1989a, 1989b) conceptualization of construct validity. Messick described construct validity as a unified concept consisting of both test interpretation and test use. Evidence for test interpretation referred to the pattern of relations, or the nomological net (Cronbach & Meehl, 1955), between predictor and criterion variables. Evidence for test use referred to the effects associated with implementation of a measure.

With regard to the development of CBM measures, Messick's (1989a, 1989b) work suggested that decisions about the validity of a CBM writing measure could be based on a pattern of relations with several criterion measures, each reflecting some aspect of writing. Further his work suggested that validity decisions should include consideration of use; that is, the effects of progress monitoring using CBM writing measures on teacher instruction and student achievement. To date, research on CBM writing measures at the secondary level has focused primarily on examining the pattern of relations between CBM and various criterion measures. However, like the early research on CBM, this research is conducted always with an eye toward future implementation. Thus, factors such as ease of measurement and scoring are always considered in the development of the measures.


Tindal, Parker, and colleagues were the first to examine the validity and reliability of writing measures for secondary-level students (Parker, Tindal, & Hasbrouck, 1991a, 1991b; Tindal & Parker, 1989). Students wrote for 6 min in response to a narrative prompt, and samples were scored in several ways, including words written, correctly written words, correct word sequences, percentage of correctly spelled words, and percentage of correct word sequences. A correct word sequence was defined as two adjacent correctly spelled words considered acceptable within the context of the sentence as judged by a native speaker of the English language (Videen et al., 1982). Criterion variables included teachers' holistic ratings of writing and the Test of Written Language (TOWL; Hammill & Larsen, 1983).

Results across studies revealed that the simple scoring procedures used at the elementary level--WW and WWC--were not technically adequate at the secondary level. Somewhat better technical adequacy was found for more complex scoring procedures involving the use of CWS. Results further revealed that percentage measures were better predictors of the criterion variables than counting measures. Both percentages of WWC and CWS emerged as reasonably good predictors of students' general writing performance (Parker et al., 1991a, 1991b; Tindal & Parker, 1989; Watkinson & Lee, 1992). However, percentage measures were seen to be problematic for progress monitoring because of a potential lack of sensitivity to change. This problem was confirmed in a study by Malecki and Jewell (2003), who found that percentages of WWC and CWS did not differentiate elementary and middle school students and did not reflect fall to spring growth for these students.

Subsequent research at the secondary level extended the work on the development of written expression progress measures to examine alternate-form reliability, introduce a new scoring procedure (correct minus incorrect word sequences; CIWS), and address the effects of writing time and genre on reliability and validity (Espin, De La Paz, Scierka, & Roelofs, 2005; Espin, Scierka, Skare, & Halverson, 1999; Espin et al., 2000; Weissenburger & Espin, 2005). Results of this subsequent research replicated earlier findings that a more complex indicator of writing proficiency was needed at the secondary level: CWS and CIWS yielded higher validity coefficients than WW and WWC (Espin et al., 1999; Espin et al., 2000, Weissenburger & Espin. One exception to this general pattern of results, Espin et al., 2005, is discussed later). Correlations between CWS, CIWS, and various criterion variables generally were between .50 and .80, with most between .60 and .70. CIWS tended to produce stronger correlations than CWS, but these results were not consistent across studies.

With regard to writing time, effects were seen for reliability but not for validity. Increased time to write was associated with increased alternate-form reliability, especially for older students. In Espin et al. (2000) alternate-form reliability for CWS and CIWS ranged from .72 to .80, and slight increases were found from 3 to 5 min of writing. In Weissenburger and Espin (2005), reliability coefficients for CWS and CIWS ranged from .67 to .82 for 8th-grade students, and from .59 to .80 for 10th-grade students. Increases in alternate-form reliabilities were found from 3 to 5 to 10 min of writing. The effect of time was especially notable for 10th-grade students. In terms of validity, few effects have been found related to time. The one exception to this pattern is Espin et al. (2005) in which students were given 35 min to write an expository essay. Results revealed that WW correlated with the criterion variables at levels equal to or above CWS and CIWS, with correlations ranging from .58 to .90 for WW and from .66 to .83 for CWS and CIWS. The authors surmised that the correlations for WW might have been related to the length of time given to the students to write.

With regard to genre, results generally have revealed no effects on the reliability or validity of CBM measures. Espin et al. (2000) compared narrative and expository writing samples and found similar levels of reliability and validity for the two types of writing. Espin et al. (2005) examined reliability and validity for expository essays and found coefficients similar to those seen in other studies employing narrative writing.


Although a fair number of studies have been done on the development of CBM measures at the secondary level, important gaps still remain. First, most of the research at the secondary-school level has been conducted with middle school and not high school students. The three studies that have included high school students have suggested that correlations between CBM and criterion measures decrease as students get older (Espin et al., 1999; Parker et al., 1991a; Weissenburger & Espin, 2005). Of these three studies, only Weissenburger and Espin examined the effect of sample duration, and their criterion variable was a general language arts measures, not a writing measure. It is possible that a longer writing sample will produce a more valid and reliable indicator of performance for older students. This hypothesis is supported by the strong correlations found in the Espin et al. (2005) study, where even WW was found to predict performance in essay writing when a 35-min sample was collected.

Second, only two studies at the secondary level have examined alternate-form reliability (Espin et al., 2000; Weissenburger & Espin, 2005), and only one of these has examined reliability of the measures for high-school students (Weissenburger & Espin). Third, no study has examined the use of CBM measures to predict performance on state standards tests for high school students. Examination of the relation between CBM scores and performance on state standards tests in writing is important, especially given that performance on the state tests might be tied to high school graduation. Finally, little CBM research has been done to examine the reliability and validity of the measures for English Language Learners (ELL). The research that has been done with ELL students has been done in reading (e.g., Baker & Good, 1995; McMaster, Wayman, & Cao, 2006: Wiley & Deno, 2 305). Although often CBM is viewed as a viable assessment alternative for ELL students, it is not known whether the measures function in the same way for ELL and non-ELL students (Bentz & Pavri, 2000). It is important, where possible, to examine the reliability and validity of the measures in writing for ELL students.

In the current investigation, we address the issues outlined earlier. We conduct our study at the high school level and examine alternate-form reliability and validity of the measures. Further, we examine the effects of scoring method and timeframe on reliability and validity. In terms of scoring, we examine correct word sequence measures because they have demonstrated the greatest technical adequacy in previous research; however, we also include WW and WWC, because of the surprising results of Espin et al. (2005) in which WW related strongly to ratings of expository writing when students were given 35 min to write. Third, we connect the CBM measures to performance on the state standards test, focusing specifically on developing a method for reporting data to enhance instructional decision making. Finally, although we did not set out in our study to compare measures for ELL and non-ELL students, we had a large number of ELL students in our sample, allowing us to conduct an exploratory analysis of how the measures work for ELL students.

Four research questions were addressed in the study:

1. What is the alternate-form reliability of CBM in writing? Does alternate-form reliability differ with scoring method or timeframe?

2. What is the validity of CBM for predicting performance on a state writing test for high school students? Does validity differ with scoring method or timeframe?

3. Can the relation between performance on CBM measures and the state standards test be presented in a usable format?

4. Do reliability and validity differ for ELL and non-ELL students?

One issue we did not address in the current study was the issue of writing genre. Practical constraints related to school scheduling kept us from collecting both narrative and expository writing samples. We used narrative rather than expository writing because previous research has revealed similarities in results across genre (Espin et al., 2000; Espin et al., 2005). We also took into account practical considerations. If teachers are to use CBM measures on a weekly basis to monitor student progress, they will have to create multiple (20 to 30) equivalent prompts that minimize reliance on background knowledge, and they will have to score each sample. Our experience has been that creation and scoring of narrative probes is easier than creation and scoring of expository probes.



Participants in the study were 183 high school students (57% female and 43% male) from two large, urban, midwestern high schools. In the first school (enrollment 1,636 students), 70% of the students qualified for free/reduced lunch, 37% received ELL services, and 15% received services for special education. Fifty-eight percent of the students were African American, 20% Caucasian, 10% Asian Pacific American, 8% Hispanic/ Latino/Chicano, and 4% American Indian. In the second school (enrollment 1,647), 35% of the students qualified for free/reduced lunch, 25% received ELL services, and 11% of the students received services for special education. Seventeen percent of the students were African American, 53% Caucasian, 6% Asian Pacific American, 23% Hispanic/Latino/ Chicano, and 1% American Indian.

For this study, participants were recruited across a range of performance levels to examine whether the CBM indicators ordered student performance similar to the state standards test. (In future research on the effects of implementation, the focus will be on students with writing disabilities.) Participants were recruited from all 10th-grade English classes (10 total) in the schools in which the study took place. All students for whom consent was obtained were included in the study. Five percent of the participants were receiving special education services. Fifty-four percent of the participants were identified as students of color (3.8% American Indian, 30.6% African American, 10.9% Asian/Pacific Islander, and 8.2% Hispanic/Latino/Chicano). The passing rate for the sample on the state writing test was 80.4%; this compared to a passing rate of 91% for the entire state, and 71% for the district in which the study took place.

Thirty-eight students in the sample were classified as English Language Learners (38% male, 62% female; 22% African, 9% Asian and 9% Hispanic.) Decisions regarding the need for ELL services were based on multiple sources of information, including home language, educational history, performance on language assessment measures, and, for more fluent English speakers, scores on a norm-referenced reading comprehension test. Students in the school were placed into one of five ELL categories: 1 (non-English speaker); 2, 3 (limited English speaker); 4, 5 (fluent English speaker.) All ELL students included in our study were levels 4 and 5 ELL or, relatively speaking, fluent English speakers and were in general education English classes. Fifty percent of the ELL students in our study spoke Somali as their home language, whereas 23% spoke Spanish. The remainder of the students spoke the following languages at home: Hmong (13%); Laotian (7%); and Yoruba Nigerian, Chinese, and Oromo Ethiopian (2% to 3% each). For the non-ELL students (44% male, 56% female; 5% American Indian, 25% African American, 8% Asian, 4% Hispanic, and 57% White), 91% spoke English, 4% Hmong, 3% Spanish, and 1% Shon Ethiopian, English dialect, or Serbian.


Predictor Variables. The predictor variables in this study were scores on curriculum-based measures of writing. Students wrote for 10 min in response to a narrative writing prompt. Students marked their progress at 3, 5, and 7 min. Writing samples were scored four different ways: words written (WW), words written correctly (WWC), correct word sequences (CWS), and correct minus incorrect word sequences (CIWS). Separate scores were calculated for 3-, 5-, 7-, and 10min samples of writing performance. Words written was the total number of word units written in the sample, regardless of spelling or usage. Words written correctly was the total number of correctly spelled words in the sample. Any correctly spelled English word was counted, regardless of the appropriateness of usage (in the same way that a spell-checker in a word processing program would score words spelled correctly). Correct word sequences was the number of sequences between two adjacent correctly spelled words. Correct sequences were those considered to be acceptable within the context of the sample and both syntactically and semantically correct (Videen et al., 1982). In our scoring of CWS, we also took into consideration the beginning and end of sentences. For a word sequence to be scored as correct at the beginning of the sentence, the first word had to be capitalized; for it to be correct at the end of the sentence, correct punctuation needed to be present. Correct minus incorrect word sequences was the number of correct sequences minus the number of incorrect word sequences. A description of the scoring procedures is presented in Table 1.

Criterion Variable. The criterion variable in the study was performance on the Minnesota Basic Standards Test/Minnesota Comprehensive Assessments (MBST/MCA; Minnesota Department of Education & Beck Evaluation and Testing Associates, 1997) in written expression. The MBSTs are high-stakes tests required for graduation and are designed to reflect a minimum level of skill needed for survival. The MCAs are high-standards tests designed to rank student performance across a continuum and are used for purposes of meeting the No Child Left Behind requirements. The MBST/MCA writing test is an untimed test, administered annually to 10th graders in the state. The prompts used in the MBST/MCA writing test are designed to elicit expository writing that does not require specialized knowledge. Students complete one writing sample for both the MBST and MCA. The sample is scored twice to meet the requirements of both tests.

The writing test is scored using a process called focused holistic scoring (Minnesota Basic Skills Test Technical Manual, 2001-2002). Scoring is holistic because the writing samples are scored as a whole. The scoring is focused because preestablished criteria are used in scoring. These criteria include clarity of central idea, focus, organization, support or elaboration, and language conventions. The writing samples are first scored on a scale of 0 to 4 to meet MBST requirements. A score of 3 is necessary to pass the test. Students who do not earn a score of 3 or higher on the initial test may take the test again until they pass. Writing samples receiving scores of 4 on the initial scoring are rescored on a scale of 4 to 6 to meet MCA requirements.

Two raters score each sample for the state test. Raters assign only whole number scores to the sample. If scores are adjacent or within 1 point, the average of the two scores is assigned to the sample. If scores are not within 1 point of each other, a third "expert" scores the writing sample, and the student receives the score assigned by the expert. Minnesota contracts with PEM Pearson to administer and score the MBST. Educators from across the state are trained as scorers by PEM Pearson. Scorers qualify by demonstrating a 70% rate of perfect agreement and 100% adjacent agreement on two of three sets of qualifying papers. Interscorer agreement for the MBST/MCA is calculated by determining the percentage of papers that are scored with exact agreement and those scored within 1 point of each other. Agreement is calculated separately for MBST and MCA scoring. Interscorer agreement for the MBST in the year in which the study took place was 67% for exact agreement and 99.6% for agreement within 1 point. Interscorer agreement for the MCA was 69.7% for exact agreement and 98.4% for agreement within 1 point.

Validity of the MBST/MCA is defined as the extent to which the test is aligned with the content it is intended to measure. Validity is ensured by involving educators, item development experts, assessment experts, and state staff members in the development and annual review of items and scoring systems selected for the tests. No other validity data are reported for the writing test.


In the fall, students completed two CBM writing samples during one of their English classes. Samples were collected on the same day because only one class period was available for data collection. The narrative prompts were: "It was a dark and stormy night ..." and "I stepped into the time machine and.... " The order in which the prompts were given was counterbalanced across students. Participants were given 30 s to think and 10 min to write each story. At 3, 5, and 7 min, students were instructed to make a slash mark on their paper to indicate how far they were at each time point.

Writing samples were scored by three graduate students. Prior to scoring, scorers participated in a 3-hr training session. At the end of the session, participants were required to score five samples and reach a level of 80% agreement with the trainer before proceeding with scoring. Interscorer agreement was calculated by dividing the smaller score by the larger score and multiplying by 100. All scorers reached 80% agreement on their first attempt. To ensure that scorer drift did not occur, every 10th sample was scored by an independent rater and interscorer agreement was calculated. Average interscorer agreement across the four timeframes were as follows: WW (100%), WWC (99%), CWS (95.5%), CIWS (91.7%).

Students completed the state writing test in January. The prompt given to the students in the year that the study took place was: "What would be your dream job? Tell about that job and explain why it would be a good job for you. Include details so your readers will understand your choice." Students were given as much time as needed to complete their essays. They were encouraged to first write a draft of their essay, but were not required to do so. Students were also prompted to write neatly and to include the following in their essays: a clear focused central idea; supporting details; a logical organization; and correct spelling, grammar, punctuation, and capitalization.



Research questions 1 and 2 addressed the reliability and validity of the CBM written expression measures and differences in this reliability and validity by scoring procedure and timeframe. In this section, we report results using our entire sample. Later, we report results of an exploratory analysis of the technical adequacy of the measures for ELL and non-ELL students.

Alternate-form reliability was determined by examining correlations between scores on the first and second writing sample and was calculated separately for each timeframe (i.e., 3, 5, 7, and 10 min). Means and standard deviations for student performance on individual writing samples are reported in Table 2. Students wrote more on the second sample than on the first, reflecting a practice effect (recall that the order in which the prompts were given was counterbalanced across students).

Correlations between scores on the two writing samples are reported in Table 3. Alternate-form reliability coefficients ranged from .64 to .85. Although correlations were similar across scoring procedure, differences were seen for timeframe. Alternate form reliability increased steadily with writing time up to 7 min of writing. The strongest coefficients were obtained for 7 and 10 min of writing, which differed little from each other for most scoring procedures.

Predictive validity was calculated using the mean score of the two probes. Correlations were examined separately for scoring procedure and timeframe. Pearson product moment correlations were calculated between CBM writing scores and scores on the MBST/MCA written expression test. Means and standard deviations for the combined CBM scores are reported in Table 2. In general, students wrote a steady amount across the 10 min, averaging about 17 words per minute. Students did not slow down as they approached the 10-min mark. Differences between WW and WWC were small, indicating that students wrote few words incorrectly. The distributions for CWS and CIWS were slightly negatively skewed at lower time limits, but were fairly normal at higher limits.

The mean score on the MBST/MCA for our sample was 3.10 with a standard deviation of .66. The distribution of the MBST/MCA scores for our sample was limited, ranging from 1 to 4 (half-points were possible), with the largest number of students receiving scores of 3. This led to a slightly negatively skewed distribution. The mean and standard deviation for our sample was similar to that of all students taking the test that year (X = 3.16, SD = .73 for students in Grades 10 to 12, X = 3.24, SD = .68 for students in Grade 10 only).

Correlations between the writing scores and MBST/MCA scores are reported in Table 4. Correlations ranged from .23 to .60. Differences in correlations by timeframe were small, but differences by scoring procedure were large. Correlations for WW and WWC ranged from .23 to .31, for CWS from .43 to .48, and for CIWS from .56 to .60.

To summarize, the results across the entire student sample revealed that the scoring procedure with the strongest reliability and validity coefficients was CIWS. Relatively reliable scores were obtained from a 5-min sample, but stronger reliabilities were found for 7- and 10-min samples. Validity coefficients differed little by timeframe. Based on these results, we chose to use the number of correct minus incorrect word sequences written in 7 min to create a Table of Probable Success.


Although from a research perspective, it is important to know that performance on the CBM writing measures predicts success (that is, a passing score) on a state standards test, for these data to be user friendly, they must be reported in a format that enhances decision making. One simple method for making the data user friendly is to report a cut-off score; that is, the score that best predicts passing the state standards test. The limitation of a cut-off score is that it does not provide goal-setting information for students along the entire performance continuum. An alternative to a cut-off score is to report data in something like a Table of Probable Success. A Table of Probable Success shows the probability of passing a state standards test along the entire continuum of CBM scores. Thus, teachers can view the probability of passing the state standards test associated with a CBM score of 5, 20, or 30. Such a table would provide more information than a simple cut-off score.

To create a Table of Probable Success, it is necessary to use data from a large, representative sample of students from a school, district, or state. Our sample was small; thus, our purpose here is not to produce a table of scores for general consumption, but rather to illustrate the process for creating such a table. The values for the table were generated based on a logistic regression solution. Logistic regression is appropriate for examining the predictive power of one or more predictors when the outcome variable is dichotomous (see, e.g., Agresti & Finlay, 1997, chap. 15). In this case, the dichotomous outcome variable was passing or not passing the high-stakes test. Similar to traditional regression analysis, logistic regression provides a prediction curve and confidence bounds for prediction. Unlike traditional regression, the logistic regression prediction curve indicates the probability of being in one category or another (i.e., passing or not passing) as a function of values of the predictor (i.e., correct minus in-Correct word sequences). In addition, the prediction curve is inherently nonlinear as it is based on a logistic function rather than a linear function.


The results of the logistic regression are displayed in Figure 1. The predictor variable was CIWS. Scores on the state standards test were transformed to be dichotomous with a passing score of 3 or greater (recall that a score of 3 is required to pass the test) being coded as 1 and a score of less than 3 coded as 0. Results of the logistic regression revealed statistically significant prediction, [[??].sub.1] = .045, [chi square](1) = 24.53, p < .0001, with an estimated intercept value of, [[??].sub.0] = -1.31, [chi square](1) = 4.92, p < .027. The exponent of the slope was 1.05 indicating that the odds of passing were multiplied by this amount for a 1-unit increase in CIWS. The dashed line in Figure 1 indicates the predicted probabilities and the solid lines indicate the 95% confidence bounds. For example, a student with a score of 60 CIWS in 7 min is predicted to have an 80% chance of passing the state test (dotted line), with a lower confidence bound of 70% and a higher confidence bound of approximately 88% (solid lines).

Some or all of the information from Figure 1 can be reported in table format, as illustrated in Table 5. Table 5 shows the probability of the lower confidence band as a function of CIWS. Using this table, teachers can locate the score for a given student on the fall CBM testing and predict, on the basis of the norming sample used to create the table, a student's likelihood of success on the state writing test given in January. For example, imagine that on the fall testing, a student obtained a CBM score of 40 CIWS. This score is associated with a 48% likelihood of passing the state standards test in our sample (.48 being the lower confidence bound in Figure 1). Knowing this, the teacher may decide that this student may need 2 years to obtain the skills necessary to pass the state standards test. The teacher may thus set an intermediate goal for the current school year (e.g., a goal of 50 CIWS on the CBM measures, which would be associated with a 60% likelihood of passing the test). This goal would bring the student closer to success on the test.

There are at least two caveats to keep in mind when using a Table of Probable Success. First, the accuracy of such a table depends on the strength of the relationship between the CBM predictor and the state test outcome. For example, in this analysis, the analog of R2 for logistic regression, Nagelkerke's (1991) [[bar.r].sup.2], had a value of .42 in this analysis. This value indicates that the prediction ability of the CBM writing measure was far from perfect. In fact, if we conceptualize [[bar.R].sup.2] as the proportion of variance accounted for, then the CBM writing measure did not account for the majority of probability variance. For this reason, it may be best to use the lower confidence bound as a reference for estimating the probability associated with a particular CBM writing score, as done in Table 5. Second, it is important to keep in mind that the probabilities in the table are not deterministic. That is, even when a probability is very high, this does not ensure that a student will pass the MBST/MCA. No doubt there are many intervening influences on MBST/MCA performance that are not considered in the graph, such as motivation.


Our last set of analyses examined differences in the technical adequacy of the CBM measures for ELL and non-ELL students. We consider this analysis exploratory because we did not set out in our study to examine differences in correlations between ELL and non-ELL students, and, thus, we did not have two groups of students who differed on language status but not achievement status. Further, the ELL students in our sample were at the higher end of English language proficiency in the school district in which we worked, so it is unclear how our results apply to ELL students in general.

Despite these caveats, we thought it important and useful to tentatively look at how our measures functioned with ELL students, especially given the paucity of research in the area. We had an ideal sample in that we had a large number of ELL students, reflecting the demographics of the school in which we worked, and the sample was fairly homogeneous, with a majority of the students speaking one of two home languages (Somali or Spanish).

The mean scores on the MBST/MCA for the ELL and non-ELL groups were 2.65 (SD = .65) and 3.25 (SD = .59) respectively. This difference was both statistically significant (t = 5.46, p < .001) and practically important in the sense that the mean score for the ELL students was below passing, whereas the mean score for the non-ELL students was above passing.

Means and standard deviations for the ELL and non-ELL sample on the CBM measures are reported in Table 6. CBM scores for the ELL students were lower than for the non-ELL students. This pattern is evident across both scoring method and timeframe. Group differences are statistically significant for all measures (see Table 6). The ELL students wrote less than the non-ELL students (e.g., 153 words vs. 185 words in 10 min) and the words they wrote contained relatively fewer correct word sequences (e.g., 8 vs. 9 CWS per 10 words). The differences in CWS were not due primarily to spelling errors. Both ELL and non-ELL students wrote on average only four incorrectly spelled words in 10 min.

In Table 7, the alternate-form reliabilities are presented. For both ELL and non-ELL students, alternate-form reliability increased with time, with acceptable coefficients at 5 min and stronger coefficients at 7 and 10 min. Reliability coefficients of the measures were in general larger for the ELL students than for the non-ELL students, especially at 5, 7, and 10 min, and for CIWS scores.

In Table 8, correlations between the CBM measures and scores on the MBST/MCA are presented by scoring method and timeframe. Inspection of scatterplots for the initial correlational analyses revealed two outliers in the analyses for the non-ELL students. These outliers were removed, and the correlations run again. We discuss only the analyses with the outliers removed. The general pattern of results were the same for ELL and non-ELL students: The strength of the validity coefficients changed little with timeframe, but differences were seen for scoring method, with CIWS resulting in the strongest correlations, and WW and WWC the weakest correlations. Although we found a similar pattern of results between the two groups, we also found differences in the strength of the correlations. Correlations for the ELL students were much stronger than those for the non-ELL students, especially for CWS and CIWS scores. To examine the reliability of the differences in correlations, we chose the most promising scoring method (CIWS) and timeframes (5 and 7 min). Differences on the CIWS scores between ELL and non-ELL students were significant for both 5 min (z = 2.90; p < .05) and 7 min (z =2.79; p < .05).

The magnitude of the correlations for the non-ELL students was low, with correlations for the best scoring procedure, CWIS, ranging only from .39 to .42. These lower correlations were not due to a ceiling in the scores on the MBST/MCA. Like scores for the ELL students, the distribution of scores on the MBST/MCA was restricted with the majority of scores being 3. The similarity in the pattern of scores between ELL and non-ELL students on the MBST/MCA can also be seen in the similarity of standard deviations for the two groups.

In summary, we found a similar pattern of results for both ELL and non-ELL students with respect to scoring method and timeframe; however, we also found that alternate-form reliability and predictive validity were stronger for the ELL group than for the non-ELL group, and that the validity coefficients for the non-ELL group were low in magnitude.


Our study examined the use of CBM measures to predict performance on a statewide standards test in writing. In conducting this study, we extended existing research on CBM to include high school students as participants, examined alternate-form reliability and validity, compared timeframes and scoring procedures, explored methods for reporting data in a user-friendly manner, and explored the technical adequacy of the measures for ELL and non-ELL students. Our study is the first step in the development of a data-based decision-making system for teachers to use to monitor student progress in written expression.


Our first two research questions addressed the reliability and validity of CBM writing measures, and whether they were affected by timeframe and scoring procedures. Similar to previous research (Weissenburger & Espin, 2005), alternate-form reliabilities did not differ by method of scoring, but did differ by timeframe, with reliability coefficients increasing with the amount of time given to write. Strongest reliability coefficients were found for 7 and 10 min, which did not differ appreciably from each other. The increase in alternate-form reliability from 3 to 7 min is not surprising given that more time to write increases the number of "test items" included in the writing sample.

In contrast to reliability, validity coefficients were not affected by time, but were affected by method of scoring. Results of our study with high school students replicated earlier results seen with middle school students: Criterion-related validity coefficients were stronger when the writing sample was scored for CWS or CIWS than when the sample was scored for WW or WWC (Espin et al., 2000; Espin et al., 2005; Parker et al., 1991b; Tindal & Parker, 1989; Watkinson & Lee, 1992). CWS and CIWS scores take into account a greater number of factors that separate good and poor writers, including production, fluency, spelling, word usage, grammar, and punctuation; thus, it is perhaps not surprising that they produce stronger validity coefficients than the simpler measures of WW and WWC, which rely solely on production, fluency, and spelling. In addition, CWS and CIWS are more sensitive to errors than are WW or WWC. For example, if a student spells a word incorrectly, it does not affect WW and reduces WWC by only 1; however, the same error reduces CWS by 2 and CIWS by 4.

Similar to the results of the study by Espin et al. (2000), we found differences in the strength of the coefficients between CWS and CIWS, with CIWS producing stronger validity coefficients than CWS. Espin et al. (2005), however, found few differences in validity coefficients between CWS and CIWS. Espin et al. (2005) differed from the other two studies in terms of criterion variables, writing genre (expository vs. narrative), and time (30 min of writing vs. 3 to 10 min of writing). Further examination of the relative validity of CWS and CIWS with regard to these variables is in order.

The magnitude of the validity coefficients for CWS and CIWS found in our study are somewhat lower than those found in previous studies. Whereas our validity coefficients for CWS were .43 to .48, validity coefficients for CWS in previous research have tended to be in the mid-.50s to low .60s (Espin et al., 2000; Parker et al., 1991a, 1991b) or mid-.60s to high .70s (Espin et al., 2005). Our validity coefficients for CIWS were .56 to .60, whereas coefficients for CIWS in previous research have tended to be between .60 and .80 (Espin et al., 2000; Espin et al., 2005). One possible reason for the differences in correlations between our study and previous studies is the age of our participants--our participants were high school students, most participants in previous research have been middle school students. It is possible that as students become older and more proficient in writing, the validity of CBM measures decreases. This hypothesis is supported by the results of Parker et al. (1991a) and Weissenburger and Espin (2005) who found decreases in correlations between CBM and criterion measures with increases in student age, and by our findings that coefficients for the non-ELL students (who were higher performing in our sample) were significantly lower than for the ELL students (who were lower performing in our sample). Unfortunately, Parker et al. (1991a) did not examine validity for longer writing samples and for samples scored for CIWS, Weissenburger and Espin did not have a direct measure of writing proficiency for their 10th-grade students, and our students differed both on performance levels and language status. Additional research is needed to examine the relative validity and reliability of progress measures across age and skill levels. Such comparisons should be done within a single study where materials and methods are held constant.

Viewed within the context of the technical adequacy of writing measures in general, the criterion-related validity coefficients we obtained in our study of .56 to .60 for CIWS are quite respectable. For example, the TOWL-3 (Hammill & Larsen, 1996) reports correlations of .50 to .57 between composite scores on the TOWL-3 and the Writing Scale of the Comprehensive Scales of Student Abilities (CSSA; Hammill & Hresko, 1994) for 76 elementary-level students. For secondary-level students, correlations between performance on the TOWL-3 and age range from nonsignificant to .25, indicating that scores on the measure do not change appreciably for students as they increase in age from 13 to 17 years. The Woodcock-Johnson III (WJIII; McGrew & Woodcock, 2001) reports cross-age correlations between the WJIII writing subtests and Wechsler Individual Achievement Test (WIAT; Wechsler, 1992) of .31 to .57 for the WIAT Writing Composites and .42 to .58 for Written Expression subtest. Note correlations for the TOWL-3 and W-JIII are calculated across grade and primarily for elementary-level students. Our coefficients are well within the range of these other writing assessments, even though they are calculated for high school students within one grade.

To summarize our first set of analysis, the results supported the use of 5- to 7-min writing samples produced in response to narrative prompts and scored for CIWS. If writing is sampled only one to three times a year for screening purposes, a 7-min sample is recommended because of the higher levels of alternate-form reliability. If writing is sampled on an ongoing and regular basis, for example once a week, a 5-min sample is recommended. Although alternate-form reliability is lower for 5 min, it is still in the .70 range, and scoring several 7-min writing samples each week might prove to be burdensome for teachers.


Identifying the validity and reliability of CBM measures as indicators of performance is important, but it is also important to be able to report the information in a user-friendly manner. Thus, our third research question addressed the feasibility of reporting the relation between the CBM measures and performance on the state test in a form that could enhance decision making. To address this question, we used logistic regression to create a Table of Probable Success where CBM scores were associated with the probability of passing the state standards test. The validity of these tables (e.g., the actual number of students receiving a particular score who pass compared to the predicted number) and the usability of these tables have yet to be examined. We once again caution the reader that the numbers reported in the table in this paper are for illustrative purposes only and should not be used for decision making.


In our final analyses, we examined the technical adequacy of the measures for ELL and non-ELL students. We found that the measures functioned in the same way for both ELL and non-ELL students: Reliability increased as time increased, with the strongest coefficients seen for 7 and 10 min, and the strongest validity coefficients seen for CIWS. However, we also found group differences in alternate-form reliability and validity for the two groups, with larger coefficients obtained for the ELL students than for the non-ELL students. Further, we found that the validity coefficients for the non-ELL students were relatively low.

Although our results imply that the reliability and validity of the CBM writing measures are not negatively affected by language proficiency status, the reasons for the group differences between the ELL and non-ELL students are not clear. The groups differed not only on language, but also on performance status. Regardless of the reason for the group differences, the results do support the use of the CBM measures for high school students who are ELL, fairly fluent in English, and lower-performing. The validity coefficients for CIWS for the ELL students in our sample were fairly strong, ranging from .70 to .75. These coefficients are especially impressive given the fact that writing measures in general tend to correlate at relatively moderate levels with each other.

At the same time, our results call into question the validity of the CBM measures for high school students who are non-ELL and higher performing. On the one hand, it is not surprising that the correlations for a selected subgroup would be lower than for a group representing a range of performance levels. Selecting a subgroup restricts the range of scores, which can lead to a decrease in correlations. However, examination of the standard deviations and histograms for the ELL and non-ELL students reveal fairly similar distributions on both criterion and predictor variables, with no ceiling in scores for the non-ELL group. Our results raise questions about the validity of CBM measures for higher-performing high school students who are fluent English language speakers.


A first limitation of our study was our use of only one criterion variable, a state writing test. Although the test was developed on the basis of careful and representative feedback from educators and the validity established based on the match between the test and this expert input, other forms of validity are not reported. In addition, a large number of students scored 3 on the test. This distribution of scores would serve to lower the correlations between the CBM and criterion variables. However, despite these limitations, given the high-stakes nature of state tests for students and schools, it is worthwhile to include state tests as a criterion variable. Replication of our results using other criterion variables is in order.

A second limitation of our study was our sample. First, based on the MBST/MCA writing scores, the sample was somewhat higher performing than students in general in the urban district in which the study took place, and somewhat lower performing than students in the state in which the study took place. If, as seems to be the case, the validity of the writing measures decreases with an increase in grade (and thus perhaps also performance level), our validity coefficients may underestimate those that might be obtained from lower-performing districts and overestimate those that might be obtained from higher-performing districts. More research is needed to examine the validity of the measures across grade levels and performance levels within a grade. Second, with regard to the ELL sample, students in our study were at the highest levels of performance in terms of English language proficiency and were functioning in mainstream classrooms. Our results do not address the validity and reliability of writing measures for ELL students who are at lower levels of language proficiency.

A final limitation of our study is that we focused on the technical adequacy of CBM measures as static measures rather than dynamic measures of performance. That is to say, we examined the validity and reliability of the measures as indicators of performance at a single time point rather than as indicators of progress across time. (We attempted to collect longitudinal data on a small subset of students from the larger study reported here. We asked students from one classroom to complete weekly writing probes for a period of 10 weeks. Unfortunately, the final data set was too small to conduct longitudinal analyses.) The validity and reliability of the measures as progress measures should be examined in future research. In this research, the focus should be on students with writing disabilities, and special care should be given to determining whether the measures are sensitive to fine gradations in writing performance.


In summary, our results support the criterion-related validity and reliability of CBM written expression measures for predicting performance on a state standards test. Results reveal that high school students need to write samples that are 5 to 7 min in length and that are scored for CIWS. Results reveal further that there is a process that can be used to present these data in a usable format for decision making for teachers and school districts. Finally, our data support the use of these measures for students receiving ELL services who are lower performing and relatively fluent in English, but raise questions about the use of the measures for non-ELL students who are higher performing. Future research should examine the technical adequacy of the measures, separating out the effects of language and performance status.

Manuscript received July 2005; accepted December 2006.


Agresti, A., & Finlay, B. (1997). Statistical methods for the social sciences (3rd ed.). Upper Saddle River, NJ: Prentice-Hall.

Baker, S., & Good, R. (1995). Curriculum-based measurement of English reading with bilingual Hispanic students: A validation study with second-grade students. School Psychology Review, 24, 561-578.

Bentz, J., & Pavri, S. (2000). Curriculum-based measurement in assessing bilingual students: A promising new direction. Diagnostique, 25(3), 229-248.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219-232.

Deno, S. L., & Fuchs, L. S. (1987). Developing curriculum-based measurement systems for data-based special education problem solving. Focus on Exceptional Children, 19(8), 1-15.

Deno, S. L., Marston, D., & Mirkin, P. (1982). Valid measurement procedures for continuous evaluation of written expression. Exceptional Children, 48, 368-371.

Deno, S. L., Mirkin, P., & Marston, D. (1980). Relationships among simple measures of written expression and performance on standardized achievement tests. (Research Rep. No. 22). Minneapolis: University of Minnesota, Institute for Research on Learning Disabilities.

Espin, C. A., De La Paz, S., Scierka, B. J., & Roelofs, L. (2005). Relation between curriculum-based measures in written expression and quality and completeness of expository writing for middle-school students. Journal of Special Education, 38, 208-217.

Espin, C. A., Scierka, B. J., Skate, S., & Halverson, N. (1999). Criterion-related validity of curriculum-based measures in writing for secondary students. Reading and Writing Quarterly, 15, 5-27.

Espin, C. A., Skare, S., Shin, J., Deno, S. L., Robinson, S., & Brenner, B. (2000). Identifying indicators of growth in written expression for middle-school students. Journal of Special Education, 34, 140-153.

Hammill, D. D., & Hresko, W. P. (1994). Comprehensive Scales of Student Abilities. Austin, TX: PRO-ED.

Hammill, D. D., & Larsen, S. C. (1983). Test of Written Language. Austin, TX: PRO-ED.

Hammill, D. D., & Larsen, S. C. (1996).Test of Written Language (3rd ed). Austin, TX: PRO-ED.

Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60, 237-263.

Kober, N., Chudowsky, N., Chudowsky, V., Gayler, K., & McMurrer, J. (2006). State high school exams: A challenging year. Retrieved October 12, 2006, from www.

Malecki, C. K., & Jewell, J. (2003). Developmental, gender, and practical considerations in scoring curriculum-based measurement in writing probes. Psychology in the Schools, 40, 379-390.

Marston, D. (1989). A curriculum-based measurement approach to assessing academic performance: What is it and why do it. In M. Shinn (Ed.), Curriculum-based measurement: Assessing special children (pp. 18-78). New York: Guilford.

Marston, D., Lowry, L., Deno, S. L., & Mirkin, P. (1981). An analysis of learning trends in simple measures of reading, spelling, and written expression: A longitudinal study (Research Rep. No. 49). Minneapolis: University of Minnesota, Institute for Research on Learning Disabilities.

McCombs, S., Kirby, S., Barney, H., Darilek, H., & Magee, S. (2005). Achieving state and national literacy goals, a long uphill road: A report to Carnegie Corporation of New York. Retrieved October 12, 2006, from

McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III technical manual. Itasca, IL: Riverside.

McMaster, K. L., Wayman, M., & Cao, M. (2006). Monitoring the reading progress of secondary-level English learners: Technical features of oral reading and maze tasks. Assessment for Effective Intervention, 31(4), 17-32.

Messick, S. (1989a). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5-11.

Messick, S. (1989b). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). New York: Macmillan.

Miller, M. D., & Crocker, L. (1990). Validation methods for direct writing assessment. Applied Measurement in Education, 3, 285-296.

Minnesota Basic Skills Test Technical Manual. (2001-2002). Minnesota Department of Children, Families, and Learning.

Minnesota Department of Education & Beck Evaluation and Testing Associates, Inc. (1997). Minnesota Basic Skills Test of Written Composition/Minnesota Comprehensive Assessments (BST/MCA). St. Paul, MN: Minnesota Department of Education.

Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691-692.

Newcomer, P. L., & Barenbaum, E. M. (1991). The written composing ability of children with learning disabilities: A review of the literature from 1980 to 1990. Journal of Learning Disabilities, 24, 578-593.

Parker, R., Tindal, G., & Hasbrouck, J. (1991a). Countable indices of writing quality: Their suitability for screening-eligibility decision. Exceptionality, 2, 1-17.

Parker, R., Tindal, G., & Hasbrouck, J. (1991b). Progress monitoring with objective measures of writing performance for students with mild disabilities. Exceptional Children, 58, 61-73.

Tindal, G., & Parker, R. (1989). Assessment of written expression for students in compensatory and special education programs. The Journal of Special Education, 23, 169-183.

Tindal, G., & Parker, R. (1991). Identifying measures for evaluating written expression. Learning Disabilities Research & Practice, 6, 211-218.

Videen, J., Deno, S. L., & Marston, D. (1982). Correct word sequences: A valid indicator of proficiency in written expression (Research Rep. No. 84). Minneapolis: University of Minnesota, Institute for Research on Learning Disabilities.

Watkinson, J. T., & Lee, S. W. (1992). Curriculum-based measures of written expression for learning-disabled and nondisabled students. Psychology in the Schools, 29, 184-191.

Wechsler, D. (1992). Wechsler Individual Achievement Test. San Antonio, TX: Psychological Corporation.

Weissenburger, J. W., & Espin, C. A. (2005). Curriculum-based measures of writing across grade levels. Journal of School Psychology, 43, 153-169.

Wiley, H. I., & Deno, S. L. (2005). The relative effectiveness of oral reading and maze measures for predicting success on a state standards assessment: A comparison of limited English proficient and non-limited English proficient students. Manuscript submitted for publication.



University of Minnesota


St. Olaf College


University of Missouri



University of Minnesota

CHRISTINE ESPIN (CEC MN Federation), Professor, Department of Educational Psychology and TERI WALLACE (CEC MN Federation), Research Associate, Institute on Community Integration, University of Minnesota, Twin Cities. HEATHER CAMPBELL (CEC MN Federation), Assistant Professor, Education Department, St. Olaf College, Northfield, Minnesota. ERICA S. LEMBKE (CEC MO Federation), Assistant Professor, Department of Special Education, University of Missouri, Columbia. JEFFREY D. LONG, Associate Professor, Department of Educational Psychology and RENATA TICHA (CEC MN Federation), Research Fellow, Institute on Community Integration, University of Minnesota, Twin Cities.

The research reported here was funded by the Office of Special Education Programs, U.S. Department of Education, Field Initiated Research Projects, CFDA 84.324C. We wish to thank Mary Pickart for her contributions to this research. We also wish to thank the Netherlands Institute for Advanced Study in the Humanities and Social Sciences for its support in the preparation of this manuscript.

Address correspondence to Christine Espin, 250B Burton Hall, 178 Pillsbury Drive SE, Minneapolis, MN 55407 (e-mail:

CBM Scoring Procedures for Writing Samples

I. Count the number of words written and words written correctly:

It was a dark and stormy night and ... I was a lone in my house wen
the lites went suddnly out. I don't no what to do so I found a
flashlight and walked around the house and thot that I herd a sound
but didn't see anything. My freind Alicia called my cell phone. I
asked her if her lites were out, to. She said no. I asked her to come
over to my house and she said she would come over.



II. Divide the writing sample into sentence units. Mark and count
incorrect word sequences.

It was a dark and stormy night and ... I was v a v lone v in my house
v wen v the v lites v went v suddnly v out. / I v don't v no v what to
do so I found a flashlight and walked around the house v / v and v
thor v that I v herd v a sound but didn't see anything. / v my v
freind v Alicia called my cell phone. / I asked her if her v lites v
were out, v to v. / She said no. / I asked her to come over to my
house and she said she would come over. /


III. Mark and count correct word sequences. Calculate correct minus
incorrect sequences.

It was a dark and stormy night and ...> ^ I^ was v a v lone v in A my
v house v wen v the v lites v went v suddnly v out ^. / ^I v don't v
no v what ^to^ do ^so^ I^ found^ a ^flashlight^ and^ walked^ around^
the^ house v / v and v thot v that ^I v herd v a ^sound ^but^ didn't
^see ^anything^. / v my v freind v Alicia^ called^ my^ cell^ phone^.
/^ I^ asked ^her ^if^ her v lites v were^ out, v to v./^She^ said^
no^. ^/^ asked ^her ^to ^come ^over^ to^ my^ house^ and ^she^ said^
said^ would^ come^ over^. /




Means and Standard Deviations for Single and
Combined Writing Samples


Scoring Method 3 5 7 10

First Writing Sample
Words written 50.64 84.57 117.96 169.56
 (17.81) (26.43) (35.76) (50.04)
Words written correctly 49.54 82.88 115.43 165.84
 (17.67) (26.16) (35.66) (50.07)
Correct word sequences 47.31 78.43 108.87 156.09
 (18.33) (28.34) (39.35) (54.31)
Correct minus incorrect 39.65 65.52 91.04 129.19
 word sequences
 (20.31) (31.81) (45.50) (60.65)
Second Writing Sample
Words written 57.37 93.62 129.98 184.16
 (17.29) (26.49) (35.40) (50.12)
Words written correctly 56.28 91.65 127.19 179.98
 (16.95) (26.02) (34.86) (49.84)
Correct word sequences 52.69 85.90 119.05 169.11
 (18.29) (28.03) (38.32) (55.37)
Correct minus incorrect 43.20 70.74 97.50 138.55
 word sequences
 (21.15) (32.25) (44.57) (64.51)
Combined Writing Samples
 Words written 54.00 89.02 123.97 176.86
 (15.89) (24.67) (33.80) (48.02)
Words written correctly 52.91 87.27 121.31 172.91
 (15.67) (24.34) (33.49) (47.67)
Correct word sequences 50.00 82.16 113.96 162.60
 (16.68) (26.47) (37.00) (52.78)
Correct minus incorrect 41.42 68.13 94.27 133.87
 word sequences
 (18.86) (30.14) (42.76) (60.07)


Alternate-Form Reliability Between Two Samples


Scoring Method 3 5 7 10

Words written .64 ** .74 ** .81 ** .84 **
Words written correctly .64 ** .74 ** .80 ** .82 **
Correct word sequences .66 ** .76 ** .82 ** .85 **
Correct minus incorrect .66 ** .77 ** .80 ** .84 **
 word sequences

** p < .01.


Correlations Between Scores on Curriculum-Based
Measures and Minnesota Basic Standards Test


Scoring Method 3 5 7 10

Words written .26 *** .23 ** .27 *** .29 ***
Words written correctly .29 *** .26 *** .29 *** .31 ***
Correct word sequences .44 *** .43 *** .46 *** .48 ***
Correct minus incorrect .57 *** .56 *** .58 *** .60 ***
 word sequences

** p <.01. *** p <.001.


Table of Probable Success: CBM Writing Scores
and Corresponding Probability of Passing the
State Writing Test

 Correct Minus Probability of
 Incorrect Passing the
Word Sequences, State Writing
 7Min Test

 -30 2%
 -20 4%
 -10 5%
 0 7%
 10 13%
 20 24%
 30 33%
 40 48%
 50 60%
 60 70%
 70 77%
 80 84%
 90 87%
 100 91%
 110 93%
 120 95%
 130 96%
 140 97%
 150 98%
 160 98%
 170 99%
 180 99%
 190 99%
 200 99%

Note. Probabilities are lower-limit probabilities.
See Figure 1.


Means (Standard Deviations) for ELL and Non-ELL
Students on CBM Measures

Scoring Method 3 5 7 10

ELL Students

Words written 47.84 79.45 108.91 152.79
 (13.76) (22.12) (29.91) (42.55)
Words written correctly 46.91 77.50 106.06 148.43
 (14.02) (22.06) (29.97) (42.62)
Correct word sequences 40.04 65.80 89.65 125.54
 (14.87) (24.02) (33.44) (48.29)
Correct minus incorrect 26.99 45.31 60.74 85.26
 word sequences
 (18.81) (29.35) (40.87) (58.28)
Non-ELL Students

Words written 56.21 92.63 129.10 184.86
 (15.68) (24.25) (33.00) (46.65)
Words written correctly 55.04 90.80 126.46 181.02
 (15.36) (23.80) (32.50) (46.00)
Correct word sequences 53.28 87.72 121.89 174.58
 (15.60) (24.45) (33.93) (47.90)
Correct minus incorrect 45.91 75.54 104.90 149.30
 word sequences
 (16.35) (26.11) (37.18) (51.62)
Note. Means and standard deviations are for combined scores on two
samples. There is a significant difference between ELL and non-ELL
students in words written F(1,172) = 12.12, words written correctly
F(1,172) = 12.59, correct word sequences F(1,172) = 28.51, and
correct minus incorrect word sequences F(1,172) = 44.39.
All differences significant at p < .001.


Alternate-Form Reliability: ELL Versus Non-ELL


Scoring Method 3 5 7 10

ELL Students

Words written .55 ** .73 ** .82 ** .81 **
Words written correctly .59 ** .74 ** .83 ** .81 **
Correct word sequences .61 ** .78 ** .85 ** .87 **
Correct minus incorrect .64 ** .78 ** .84 ** .88 **
 word sequences

Non-ELL Students

Words written .63 ** .72 ** .79 ** .83 **
Words written correctly .62 ** .71 ** .78 ** .81 **
Correct word sequences .61 ** .71 ** .77 ** .81 **
Correct minus incorrect .56 ** .69 ** .73 ** .78 **
 word sequences

** p < .01.


Correlations Between Scores on Curriculum-Based Writing Measures
and Minnesota Basic Standards Test, ELL Versus Non-ELL

Scoring Method 3 5 7 10

ELL Students

Words written .34 * .33 * .36 * .39 *
Words written correctly .39 * .37 * .39 * .43 **
Correct word sequences .56 ** .60 ** .62 ** .64 **
Correct minus incorrect .70 ** .74 ** .74 ** .75 **
 word sequences

Non-ELL Students, Outliers Included

Words written .16 .12 .15 .15
Words written correctly .19 * .14 .16 .17 *
Correct word sequences .30 ** .27 ** .30 ** .32 **
Correct minus incorrect .42 ** .40 ** .42 ** .44 **
 word sequences

Non-ELL Students, Outliers Removed

Words written .25 ** .21 * .23 ** .24 **
Words written correctly .27 ** .23 ** .25 ** .25 **
Correct word sequences .33 ** .30 ** .32 ** .34 **
Correct--incorrect .39 ** .38 ** .40 ** .42 **
 word sequences

* p<.05. ** p<.01.
COPYRIGHT 2008 Council for Exceptional Children
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Espin, Christine; Wallace, Teri; Campbell, Heather; Lembke, Eriga S.; Long, Jeffrey D.; Ticha, Renat
Publication:Exceptional Children
Article Type:Table
Geographic Code:1USA
Date:Jan 1, 2008
Previous Article:Effects of preventative tutoring on the mathematical problem solving of third-grade students with math and reading difficulties.
Next Article:Peer-assisted learning strategies: a "Tier 1" approach to promoting English learners" response to intervention.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters