Productivity, fluency, and grammaticality measures from narratives: potential indicators of language proficiency?
Keywords: language assessment; narratives; evidence-based practice; outcome measures; school age; children; language impairment
Many children with early language impairment exhibit persistent language and academic difficulties in later childhood and adolescence (Beitchman, Wilson, Brownlie, Waiters,& Lancee, 1996; Catts, Fey, Tomblin, & Zhang, 2002; Naucler & Magnusson, 2002; Rescorla, 2002; Tomblin, Zhang, Buckwalter, & Catts, 2000). With this noted relation between language proficiency and academic success, it is important for educators and speech-language pathologists (SLPs) to identify school-age children who are not acquiring academic language skills as readily as peers and to determine if language-based interventions are resulting in desired changes in academic language proficiency (Ehren & Nelson, 2005; Justice, 2006; Ukrainetz, 2006).
Much of the research on language assessment of school-age children has emphasized diagnosis of language impairment (LI) from other disability or English learner profiles (e.g., Bishop, North, & Donlan, 1996; Botting & Conti-Ramsden, 2001; Redmond, 2005; Scott & Windsor, 2000). Whereas it is important to identify unique language features that differentiate children with LI from those without LI, there is also a growing need for SLPs to implement evidence-based intervention practices (Dollaghan, 2004; No Child Left Behind Act of 2001 [NCLB]). In many schools, decisions about the type and intensity of services a child receives are based on a child's rate of progress when given high-quality instruction (Catts, Hogan, & Adlof, 2005; Ehren & Nelson, 2005; L. S. Fuchs, Fuchs, & Speece, 2002; NCLB, 2001).
Previous attempts to document treatment effectiveness such as the American Speech-Language-Hearing Association's (ASHA's) National Outcomes Measurement System (NOMS; ASHA, 1993) have addressed broad questions related to outcomes of speech and language intervention such as, How many intervention sessions are needed? or Does the intervention need to be provided by an SLP or can an assistant complete intervention services? (Baum, 1998; Mullen, 2004). Current legislative trends (e.g., NCLB, 2001) require SLPs to justify specific intervention methods and demonstrate change in the child's performance as a result of this intervention.
Aligning with these trends, there is growing interest in development of evidenced-based intervention guidelines to inform SLPs' clinical decision making (ASHA, 2006; Dollaghan, 2004; Gillam & Gillam, 2006). However, the current research base from which to develop evidence-based practice guidelines for language intervention is limited (Cirrin & Gillam, 2008; Law, Garrett, & Nye, 2005), and individual SLPs maintain primary responsibility for documenting effectiveness of intervention by monitoring a child's progress (Circin & Gillam, 2008). One challenge is that few language assessments are designed for frequent, repeated use to quantitatively measure subtle changes in language skills (Ukrainetz, 2006).
An individual school-age child's progress in language intervention has traditionally been measured using norm-referenced assessments or criterion-referenced measures (ASHA, 1993; Laing & Kamhi, 2003). An alternate method to document a child's response to intervention is use of a general outcome indicator (GOI; Deno, 1985; L. S. Fuchs, 2004; L. S. Fuchs et al., 2002). GOIs are discrete measures that reflect proficiency in a broader skill area, much as a measurement of body temperature is one indicator of overall health. For example, oral reading fluency in words per minute (wpm) is used as a reflection of reading proficiency (e.g., Deno, Mirkin, & Chaing, 1982; Good & Kaminski, 2002). In the following section, we briefly describe traditional methods for documenting progress in language interventions and their limitations for measuring response to intervention. After that we describe GOIs and their role in progress monitoring and outline stages to develop GOIs for language proficiency.
Limitations of Traditional Methods for Documenting Progress
Norm-referenced measures. Norm-referenced standardized language tests are used to generate a single quantitative score of a child's language proficiency. Standardized administration and scoring procedures allow for consistent administrations and comparisons of children of the same age. Standardized formal language tests are designed to generate a stable score over time, limiting the usefulness of these measures for documenting a child's progress in intervention (Deno, 2003; Marston, 1989; Pearson, Hiebert, & Kamil, 2007).
Comprehensive language sampling (e.g., Leadholm & Miller, 1992; J. E Miller et al., 2005) offers an ecologically valid, norm-referenced method to measure a child's language proficiency (Aram, Morris, & Hall, 1993; Hewitt, Hammer, Yont, & Tomblin, 2005). Multiple discrete language measures are analyzed from typed transcripts of a child and examiner's spoken interactions during play, interview, narratives, or other elicitation tasks. Results can be compared with norms of English- and Spanish-speaking children up to age 13 (J. E Miller et al., 2005).
Language sampling has been used to measure preschool language skills but has limitations for use with school-age children (Hadley, 1998; Hewitt et al., 2005) and as a measure of response to language intervention.
First, the recommended 50 to 100 utterance language sample (Leadholm & Miller, 1992; J. E Miller et al., 2005) requires a significant amount of time to collect, transcribe, and analyze (Kemp & Klee, 1997). Second, procedures for eliciting language samples vary (Gazella & Stockman, 2003; Hadley, 1998). Normative information on varied elicitation procedures is limited (Justice et al., 2006), making direct comparisons between language samples difficult (Hadley, 1998). Third, from the many language measures generated, limited information is available to help SLPs to determine the most relevant measures for school age children (Hewitt et al., 2005; Justice et al., 2006). Comprehensive language sampling in its recommended form seems impractical for frequent, repeated use to measure school-age children's response to language intervention.
Criterion-referenced measures. A child's progress in intervention may be measured using criterion-referenced methods such as mastery of sequential goals or observational assessments of a child's functional language (ASHA, 1993). Both methods have limitations for frequent, repeated use as indicators of a child's response to intervention. When documenting mastery of sequential goals, the SLP assumes that mastery of each goal will result in accumulated change in the broader targeted skill (L. S. Fuchs, 2004). For example, goals targeting discrete aspects of syntactic constructions should result in generalized improvement in expressive language. Some children may master discrete goals but have difficulty applying these separate skills in a task requiring simultaneous use of multiple skills (L. S. Fuchs, 2004). In addition, some goals may take longer than others to master, resulting in different rates of progress based on the nature of the goal (L. S. Fuchs, 2004). If rate of progress is to be used to make a decision for educational placement and intensity of service, it is inappropriate to compare progress on goals that differ in attainment time.
Observational assessments can be used to document changes in functional language skills. For example, an SLP might observe the level of prompting needed for the child to perform the targeted language skills in the classroom. Observational measures present ecological validity, but they can be time-consuming to administer; often lack standardization; and may rely on descriptive over quantitative information to measure intervention effectiveness (Marquardt & Gillam, 1999), limiting comparisons between observations. In sum, common assessment methods used by SLPs do not align well with the goal of documenting progress during intervention.
General Outcome Indicators
In school settings, GOIs such as Curriculum-Based Measurement (CBM; Deno, 1985; Fuchs, 2004; Shinn, 1989) or the Dynamic Indicators of Basic Early Literacy Skills (DIBELS; Good & Kaminski, 2002) are often used to measure a child's response to academic interventions (L. S. Fuchs, 2004; L. S. Fuchs et al., 2002). GOIs are brief, specific measurements administered repeatedly using a standardized elicitation procedure (Deno, 2003; Deno et al., 1982; L. S. Fuchs, 2004) that yield a quantitative score strongly related to overall performance in a broader skill area such as reading (see Wayman, Wallace, Wiley, Ticha, & Espin, 2007, for a recent review).
GOIs are based on a test-teach-retest process, with a baseline measure, intervention, and repeated retesting (D. Fuchs, Fuchs, Compton, Bouton, Caffrey, & Hill, 2007). For example, a child's progress in reading intervention might be measured by a GOI such as wpm read aloud correctly from a grade-level passage. An initial score, or several initial scores, on the GOI would be used as baseline data. Intervention procedures might target multiple components of reading such as phonological awareness, letter-sound correspondence, and specific word patterns. Repeated measurement using alternate forms of grade-level passages would be used to indicate growth in overall reading proficiency. GOIs used in educational settings may be useful to inform development of GOIs of language proficiency. In the following section, we define recommended stages for developing potential GOIs and outline our method for examining GOIs for language.
Developing GOIs for Language
Stages for developing GOIs. L. S. Fuchs (2004) listed three stages for development of a GOI. The first is to establish technical features of the static score. In this stage, researchers demonstrate that the measure yields reliable scores when elicited from alternate test forms (alternate form reliability) and when scored by varied examiners (interrater reliability) and that the static score is a valid indicator of the overall skill as shown by strong relation to other acceptable measures of the broader construct (criterion validity). The second stage is to establish technical features of the slope. In this stage, researchers demonstrate that scores reflect growth during relatively brief periods of time. In the third stage, researchers establish the instructional usefulness of the GOI measure. For example, changes in intervention are reflected by changes in GOI scores. In the present study, we implemented Stage 1 for developing a GOI for school-age language proficiency: establishing technical features of a static score. Below, we describe our rationale for elicitation procedures and language measures and outline specific study questions.
Structure of GOI Tasks and Measures
Elicitation method. To align with the framework of GOIs, we selected a single-picture elicitation method to generate expressive narratives using a standardized protocol. Expressive language samples from spoken narratives are recognized as ecologically valid measures of school-age children's discourse (Finestack, Fey, & Catts, 2006; Justice et al., 2006) with potential to differentiate children with and without LI (e.g., Hayes, Norris, & Flaitz, 1998, Kadaverek & Sulzby, 2000; Liles, Duffy, Merritt, & Purcell, 1995; Scott & Windsor, 2000). Single pictures were used to elicit narratives to avoid differences in narratives based on text features. Some evidence suggests that single-picture elicitation methods may be more sensitive to differences in children's verbal organizational skills than more structured narrative tasks (Luo, Timler, & Vogler-Elias, 2006). Similar elicitation methods (e.g., story starter sentence) have been used for GOIs of written language (McMaster & Espin, 2007).
Selection of language measures. For the purposes of this study, we selected discrete measures of narrative microstructure because of their potential sensitivity to subtle differences in language (Finestack et al., 2006; Justice et al., 2006; Liles et al., 1995). Researchers generally agree that measures of language productivity, grammatical skill, and verbal fluency are important developmentally as well as for children and adults with LI and English learners (Gillam & Johnston, 1992; Ling-yu & Tomblin, 2005; Loban, 1976; J. F. Miller et al., 2005, 2006; Nichols & Brookshire, 1993; Scott & Windsor, 2000; Tomblin, Freese, & Records, 1992). Drawing from these research findings, we selected potentially viable language measures of productivity, verbal fluency, and grammaticality to examine as potential GOIs. Language measures were chosen that clinicians could calculate with minimal training.
The purpose of this study was to examine the technical adequacy of specific language measures obtained from brief picture elicitation tasks and to identify language measures with potential application as GOIs of language proficiency. Specifically, three questions were addressed: First, when examining language measures of productivity, fluency, and grammaticality, which measures produce reliable scores across three narratives? Second, of the language measures showing sufficient alternate-form reliability for all grades, which measures show criterion validity with a norm-referenced, standardized measure of language proficiency? Third, of those language measures showing sufficient alternate-form reliability, which measures differentiate students by grade, thus showing potential as a growth measure?
Forty-five 5- to 9-year-old children participated in this study (mean age 7 years, 7 months) with equal numbers of participants (n = 15) from kindergarten, first-, and third-grade classrooms in a suburban, Midwestern elementary school. Participants represented a broad range of language and reading skills. Children with uncorrected hearing or vision impairment or multiple physical disabilities were excluded from the sample. See Table 1 for demographic information.
Narrative elicitation protocol. For this study, a single-picture scene containing setting and character information provided a context for an expressive narrative (Hadley, 1998; Justice et al., 2006). All pictures were black-and-white photographs or line drawings from commonly used educational and clinical materials and lent themselves to narrative generation.
Before formulating independent narratives, participants listened to the examiner tell a model narrative with major components of story grammar (e.g., setting, characters, events, problem, and resolution) from a sample picture. Then participants were asked to create a narrative about each of three different pictures. The participant was given 1 minute to look at a picture and instructed to "think of ideas for your story. Think about what happened before, what happened in the picture, and what happened afterward." After 1 minute of thinking time, the participant was asked to "tell the best story you can." The picture remained in front of the participant during the task. To encourage participants during the task, the examiners made acknowledging comments such as "oh, right, um-hm" or repeated words or phrases from the participant's narratives. When the participant completed his or her narrative as indicated by a statement (e.g., "That's all.") or lengthy pause, the examiner prompted one time with the phrase "What else?" or "And then ...?" with an expectant tone. If the participant did not add to the narrative, the examiner asked if he or she was finished and stopped timing. Timing of narratives always began on the examiner's last word (can). Elicitation directions were adapted from the School Language Sample Procedure (Hughes, McGillivray, & Schmidek, 1997) and are available from the first author.
Measures of language productivity. Five measures were used to assess language productivity: (a) total productive words (TPW)--total number of words (TNW) minus any maze words (revisions, repetitions, false starts, or filler words); (b) TNW--total number of words in the narrative, including maze words; (c) total C-units--total number of C-units in the narrative (C-unit was defined as the independent clause and all of its modifiers) (Loban, 1976); (d) total number of clauses--total number of verbs (excluding helping verbs) in each narrative (R. Miller, Gillam, & Pena, 2001); and (e) total time-total time in minutes from the last word in the examiner's narrative prompt to the last word in the participant's narrative.
Measures of total talking time, total C-units, and total number of clauses increase developmentally in school-age children (Loban, 1976; J. F. Miller et al., 2005) and show diagnostic importance for children with LI (Gillam & Johnston, 1992; Scott & Windsor, 2000). Productivity was measured both with and without maze words because of the potential role of word-finding difficulties in oral language proficiency (Simon & German, 1991).
Measures of verbal fluency. Four measures of verbal fluency were calculated based on language productivity measures divided by the total time: (a) TPW per minute, (b) TNW per minute, (c) C-units per minute, and (d) Clauses per minute. TNW per minute and pause time have been shown to increase developmentally between the ages of 3 and 13 years (Loban, 1976; J. E Miller et al., 2005) and correlate with age and second language proficiency in conversational and narrative language samples (J. F. Miller et al., 2006). C-units and clauses per minute were measured based on evidence that a measure of correct information units per minute showed greater specificity and sensitivity to language proficiency of adults with aphasia than straight counts (Nichols & Brookshire, 1993). TPW per minute were calculated to tap potential word finding difficulties (Simon & German, 1991).
Measures of grammaticality. Four measures of grammatical complexity and errors were calculated: (a) total grammatical errors--total number of morphosyntactic errors recognized by a native speaker of English, including omission or misuse of verb tenses, pronouns, and articles; (b) clauses per C-unit--total number of clauses divided by the total C-units; (c) C-unit length-productive--TPW divided by the total C-units; and (d) C-unit length-all--TNW divided by the number of C-units. Developmentally, school-age children use longer C-units, more clauses per C-unit, and exhibit fewer grammatical errors as their language develops (Loban, 1976). Children with LI use shorter, less developed C-units and have more grammatical errors per C-unit than peers and language-matched younger children (Gillam & Johnston, 1992; Greenhalgh & Strong, 2001; Scott & Windsor, 2000). Again, measures of C-unit length in productive words were examined to account for potential word finding difficulties (Simon & German, 1991).
Oral and Written Language Scales (OWLS; Carrow & Woolfolk, 1996). The OWLS Oral Language Scale is designed to assess children's ability to use and understand spoken language. Performance is scored on the Listening Comprehension Scale (LCS) and Oral Expression Scales (OES). From these scales, an Oral Composite Score is generated. On the LCS, participants listen to a sentence or paragraph and select a picture that corresponds to the prompt. On the OES, students are shown a picture or words and then asked to answer a question, complete a sentence, or construct a one- to two-sentence response that relates to the prompt. The test manual reports interrater reliabilities for the OWLS from .91 to .98. Internal consistency reliabilities ranged from .77 to .94. Test-retest reliabilities were .66 for ages 8 to 10 years. Validity correlations were .74 with the Clinical Evaluation of Language Fundamentals--Revised (CELF-R; Semel, Wiig, & Secord, 1987) and .61 with the Wechsler Intelligence Scale for Participants--Third Edition (WISC-III; Wechsler, 1991) Verbal IQ Scale. This standardized test provided a brief global indication of overall oral language level.
Participants were tested individually during one to two sessions, in a quiet room at the school, by the first author or one of four graduate students in speech-language pathology. All participants completed narrative tasks with the same pictures presented in the same order followed by the OWLS. To establish rapport during the first narrative, participants chose a narrative prompt picture from one of two line drawings. In the last two narratives, all participants told one narrative for each of two photographic pictures. Assessment sessions were audiotaped using a Sony TCM-929 cassette recorder for later transcription.
Training, reliability, and fidelity. Graduate student data collectors completed two 2-hour training sessions led by the first author in which they were trained in administration and scoring of all measures. Then data collectors independently scored additional language transcripts until agreement with the first author for C-unit divisions and language measure counts was .90 or higher. Reliability of language measures was calculated using point-by-point agreement (agreement divided by agreement plus disagreement). Interrater reliability checks were conducted on 10% of coded transcripts. All coders achieved 95% or higher agreement with the first author. The first author observed at least 50% of all assessment sessions. Formal fidelity checks for 10% of sessions indicated 97% or better adherence to a detailed protocol checklist.
Transcription and coding. Measures of language productivity, verbal fluency, and grammaticality were examined for all narrative samples. Narratives were transcribed by trained graduate student research assistants using the conventions of Systematic Analysis of Language Transcripts (SALT; J. E Miller & Chapman, 2000). Total time for narrative samples was measured from the final word of the examiner's instructions to the final word of the participant's narrative, including in some cases significant delay at the onset of the narrative. Because examiner comments contributed minimally to the sample, total time measures included both the participant's narrative and brief examiner prompts.
Alternate-form reliability. To determine reliability of scores on 13 language measures across each of the three narratives (alternate-form reliability), Pearson product-moment correlations (Pearson's rs) were calculated. Language measures were considered to have strong reliability if they had a significant reliability coefficient (p < .01) with a magnitude of r = .70 or greater. Moderate reliability was indicated for significant coefficients (p < .01) with a magnitude of r = .50 to .69. A more stringent alpha level (p = .01) was used for statistical significance to account for potential Type I error due to multiple correlation coefficients.
Criterion validity of language measures. Pearson's r correlations were computed for language measures demonstrating strong alternate form reliability at all individual grade levels and for the combined group to examine the relation between the mean score of each language measure and the OWLS Oral Composite Score. An alpha level of p = .05 was used to measure significance. Strength of correlation was determined using magnitude cutoffs indicated above.
Differences between grades. To examine whether measures differentiated children by grade level, multivariate analyses of variance (MANOVAs) were completed for measures with sufficient alternate-form reliability. Post hoc comparisons using the Tukey HSD procedure were used to examine specific differences in language measures by grade.
The purpose of this study was to examine the technical adequacy of language measures obtained from brief picture elicitation tasks and to identify measures with potential application as GOIs of language proficiency. Three questions were addressed: First, which measures of productivity, fluency, and grammaticality produced reliable scores across three narratives? Second, of the language measures showing sufficient alternate-form reliability for all grades, which measures showed criterion validity with a standardized measure of language proficiency? Third, of those language measures showing sufficient alternate-form reliability, which measures differentiated students by grade, thus showing potential as a measure of language proficiency growth? Table 2 shows means and standard deviations for all measures by grade and total sample.
To examine the possibility of differences in language measures due to order or picture effects, mean language scores for each picture were submitted to a MANOVA. MANOVA results for total group means ([lambda][ 1, 24] = 0.84, p = .69) and each grade level (kindergarten [lambda][1, 26] = 1.20, p = .34; first grade [lambda][1, 26] = 0.67, p = .22; third grade [lambda][1, 26] = 1.06, p = .31) were nonsignificant, indicating mean language measures did not significantly differ by order or picture.
Alternate-form reliability was examined using Pearson's r correlation coefficients and the criteria described previously to evaluate strength of correlation. Correlations were examined for all children combined and for each grade level. When the total group results were examined, 10 language measures (TPW, TNW, total C-units, total clauses, total time, TPW per minute, TNW per minute, C-units per minute, Clauses per minute, and total grammatical errors) produced strong correlations ranging from .70 to .94 (p < .01), indicating that these measures were reliability obtained across three separate narrative samples with identical elicitation scripts.
When reliability was examined by grade level, two measures of verbal fluency, TPW per minute and TNW per minute, had moderately strong to strong correlations (r = .67 to .90, p < .01) at all grade levels. Notably, measures of productivity (TPW, TNW, total C-units, total clauses) produced significant, positive, moderately strong correlations for third-grade students only. Total grammatical errors produced significant, positive, moderately strong correlations for kindergarten students only. Reliability coefficients are listed for each grade and the total sample in Table 3.
Criterion Validity of Language Measures To examine criterion validity of the two measures of verbal fluency meeting alternate form reliability standards, Pearson product-moment correlations were computed for OWLS composite score and each measure of verbal fluency. TPW per minute (r = .65, p < .001) and TNW (r = .64, p < .001) yielded moderate correlations with OWLS composite scores when total group results were examined. For individual grade levels, weak, nonsignificant correlations were obtained. Correlation coefficients are listed in Table 4.
Differentiation by Grade
To determine whether TPW per minute and TNW per minute were significantly different for students by grade level, MANOVA analyses were computed. Results indicated significant differences between grade levels ([lambda][1, 10] = 9.39, p < .001) and post hoc comparisons using the Tukey HSD procedure indicated third-grade TPW per minute (p < .001) and TNW per minute (p < .001) differed significantly from both kindergarten and first-grade scores. Kindergarten and first-grade measures of TPW per minute (p = .57) and TNW per minute (p = .68) were not significantly different.
Summary of Results
In this study, 10 measures of language productivity, fluency, and grammaticality showed strong reliability across three narrative samples for kindergarten, first-, and third-grade students when the total sample was examined. Two verbal fluency measures, TPW per minute and TNW, were reliably obtained at each grade level and from the total sample. TPW per minute and TNW per minute demonstrated moderate criterion validity when the total sample was examined. Statistical analysis of grade-level differences indicated significant differences in the TPW per minute and TNW per minute for children in third grade compared with children in kindergarten and first grade.
In school settings, SLPs are expected to show evidence that language interventions are effective (Ehren & Nelson, 2005; NCLB, 2001). There is a lack of brief, quantitative, standardized methods to measure language proficiency (Ukrainetz, 2006). In other fields, GOIs have been used to predict intervention outcomes, reflect an individual's response to intervention in a specific area, and signal need for instructional changes (L. S. Fuchs, 2004). The purpose of this study was to examine technical adequacy of specific language measures obtained from brief picture elicitation tasks and to identify language measures with potential application as GOIs of language proficiency. The results of this study suggest that measures of verbal fluency can be reliably obtained for kindergarten, first-grade, and third-grade children from brief narrative language samples elicited using a single picture scene and highly standardized protocol and that these measures distinguish children in third grade from children in kindergarten and first grade.
Consistent with previous research (Leadholm & Miller, 1992; Loban, 1976; J. F. Miller et al., 2005), our findings show a trend of increasing productivity and verbal fluency for students in each grade. However, only verbal fluency measures produced reliable scores at all grades and differentiated third-graders from kindergartners and first-graders. This may be the result of overlap in the chronological age of the kindergarten and first-grade groups or may suggest that differences in verbal fluency are more discriminating as children progress in school. Although TNW per minute calculations were reliably obtained in this study, mean scores were slightly lower than wpm findings in previous research (J. E Miller et al., 2005, 2006; Redmond, 2005). These results likely reflect differences in length of the language samples, elicitation methods, or that initial formulation time was included in verbal fluency calculations for the present study.
Productivity measures were reliably obtained only for third-grade students. It may be that the shorter language samples generated by younger children in this study were too brief to reliably examine productivity. Total grammatical errors, which have also been identified as important developmentally (Leadholm & Miller, 1992; Loban, 1976; J. F. Miller et al., 2005) and for school-age children with LI (Gillam & Johnston, 1992; Scott & Windsor, 2000), were reliably measured only for kindergartners. Perhaps intermittent grammatical errors of some older children would be more evident in written samples (Scott & Windsor, 2000), whereas younger children's developing mastery of grammatical forms was more evident in brief oral language samples.
For the purposes of this study, criterion validity was explored in a limited way with modest findings. Limited validity findings raise questions about whether verbal fluency measures from brief samples may be considered valid indicators of language proficiency. It may be that modest validity findings reflect differences in how language proficiency was sampled in the open-ended narrative procedure compared with the more highly structured OWLS assessment that required brief word or sentence responses. To further explore criterion validity, it will be important to examine relations between brief verbal fluency measures from this study and more comprehensive measures that incorporate multiple dimensions of language proficiency such as The Index of Narrative Microstructure proposed by Justice et al. (2006).
Limitations and Implications for Research
Limitations are present in this research. The relatively small sample size (N = 45) of this study and the heterogeneous population may have influenced the magnitude of correlation coefficients. Alternate-form reliability was established through three narratives elicited on the same day and presented in the same order. The first narrative was elicited with a picture choice condition and the second and third narratives with obligatory picture prompts. Alternate-form reliability appeared strong in spite of these differences, but additional research could examine whether the choice condition or order of picture presentation affects language sample characteristics. Additional aspects of reliability such as test-retest need to be examined, as well as details of the elicitation procedure such as the level of standardization required to collect reliable measures and the importance of initial formulation time in calculations of verbal fluency.
This study explored the feasibility of collecting reliable language measures from a brief narrative language sample and preliminary criterion validity of language measures demonstrating alternate-form reliability. Analysis of expressive narratives has been proposed as an ecologically valid assessment method sensitive to differences in children's language proficiency (e.g., Justice et al., 2006; Liles et al., 1995; Pena & Gillam, 2000; Scott &Windsor, 2000). When developing a GOI, one challenge is to determine salient outcome variables to measure. Findings from this study suggest that measures of verbal fluency at least partially meet stage one criteria (L. S. Fuchs, 2004) showing promise as a GOI of language proficiency. However, if verbal fluency measures are to be considered for application as GOIs, there must be evidence of criterion validity with "gold standard" language measures as well as examination of Stage 2 and 3 criteria: sensitivity to subtle growth in language skills and sensitivity to intervention changes. Despite limitations and remaining questions, the findings of this study support further examination of verbal fluency measures as brief indicators of language proficiency.
Authors' Note: Please address correspondence to Janet Tilstra or Kristen McMaster, University of Minnesota, Educational Psychology, 350A Educational Sciences Building, 56 East River Road, Minneapolis, MN 55455; e-mail: firstname.lastname@example.org.
American Speech-Language-Hearing Association (ASHA). (1993). National outcome measurement system. Retrieved June 28, 2007, from http://www.asha.org
ASHA. (2006). Introduction to evidence-based practice. Retrieved July 20, 2007, from htpp://www.asha.org/members/EBP/intro/htm
Aram, D., Morris, R., & Hall, N. (1993). Clinical and research congruence in identifying children with specific language impairment. Journal of Speech and Hearing Research, 36, 580-591.
Baum, H. M. (1998). Overview, definitions, and goals for ASHA's treatment outcomes and clinical trials activities (What difference do outcome data make to you?). Language, Speech, and Hearing Services in Schools, 29, 246-249.
Beitchman, J. H., Wilson, B., Brownlie, E. B., Waiters, H., & Lancee, W. (1996). Long-term consistency in speech/language profiles: Developmental and academic outcomes. Child and Adolescent Psychiatry, 35(6), 804-814.
Bishop, D. V. M., North, T., & Donlan, C. (1996). Non-word repetition as a behavioral marker for inherited LI: Evidence from a twin study. Journal of Child Psychology and Psychiatry, 37, 391-403.
Botting, N., & Conti-Ramsden, G. (2001). Non-word repetition and language development in children with specific language impairment (SLI). International Journal of Language and Communication Disorders, 36(4), 421-432.
Carrow, E., & Woolfolk, C. (1996). Oral and written language scales: Listening comprehension and oral expression scales. Circle Pines, MN: AGS.
Catts, H. W., Fey, M. E., Tomblin, J. B., & Zhang, X. (2002). A longitudinal investigation of reading outcomes in children with language impairments. Journal of Speech, Language, and Hearing Research, 45, 1142-1157.
Catts, H. W., Hogan, T. P., & Adlof, S. M. (2005). Developmental changes in reading and reading disabilities. In H. W. Catts & A. G. Kamhi (Eds.), The connections between language and reading disabilities. Mahwah, NJ: Lawrence Erlbaum Associates.
Cirrin, F. M., & Gillam, R. B. (2008). Review of evidence-based practices for school-age children with spoken language disorders: A systematic review. Language, Speech, and Hearing Services in Schools, 39(1), S110-S137.
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 16(2), 99-104.
Deno, S. L. (2003). Developments in curriculum based measurement. Journal of Special Education, 37(3), 184-192.
Deno, S. L., Mirkin, P. K., & Chaing, B. (1982). Identifying valid measures of reading. Exceptional Children, 49(1), 36-45.
Dollaghan, C. A. (2004). Evidence-based practice in communication disorders: What do we know, and when do we know it? Journal of Communication Disorders, 37, 391-400.
Ehren, B. J., & Nelson, N. W. (2005). The responsiveness to intervention approach and language impairment. Topics in Language Disorders, 25(2), 120-131.
Finestack, L. H., Fey, M. E., & Catts, H. W. (2006). Pronominal reference skills of second and fourth grade children with language impairment. Journal of Communication Disorders, 39, 232-248.
Fuchs, D., Fuchs, L. S., Compton, D., Bouton, B., Caffrey, E., & Hill, L. (2007). Dynamic assessment as responsiveness to intervention: A scripted protocol to identify young at-risk readers. Teaching Exceptional Children, 39(5), 58-63.
Fuchs, L. S. (2004). The past, present, and future of curriculum-based measurement research. School Psychology Review, 33(2), 188-192.
Fuchs, L. S., Fuchs, D., & Speece, D. (2002). Treatment validity as a unifying construct for identifying learning disabilities. Learning Disability Quarterly, 25(1), 33-46.
Gazella, J., & Stockman, I. J. (2003). Children's narrative retelling under different modality and task conditions: Implications for standardizing language sampling procedures. American Journal of Speech-Language Pathology, 12, 61-72.
Gillam, R. B., & Gillam, S. L. (2006). Making evidence-based decisions about child language intervention in schools. Language, Speech, and Hearing Services in Schools, 37, 304-315.
Gillam, R. B., & Johnston, J. R. (1992). Spoken and written language relationships in language/learning-impaired and normally achieving school-age children. Journal of Speech and Hearing Research, 35, 1303-1315.
Good, R. H., & Kaminski, R. A. (2002). DIBELS oral reading fluency passages for first through third grades (Technical Report 10). Eugene: University of Oregon.
Greenhalgh, K. S., & Strong, C. J. (2001). Literate language features in spoken narratives of children with typical language and children with language impairments. Language, Speech and Hearing Services in Schools, 32, 114-125.
Hadley, P. A. (1998). Language sampling protocols for eliciting text-level discourse. Language, Speech, and Hearing Services in Schools, 29, 132-147.
Hayes, P. A., Norris, J., & Flaitz, J. R. (1998). A comparison of oral narrative abilities of underachieving and high-achieving gifted adolescents: A preliminary investigation. Language, Speech, and Hearing Services in Schools, 29, 58-171.
Hewitt, L. E., Hammer, C. S., Yont, K. M., & Tomblin, J. B. (2005). Language sampling for kindergarten children with and without SLI: Mean length of utterance, IPSYN, and NDW. Journal of Communication Disorders, 38, 197-213.
Hughes, D., McGillivray, L., & Schmidek, M. (1997). Guide to narrative language: Procedures for assessment. Eau Claire, WI: Thinking Publications.
Justice, L. M. (2006). Evidence-based practice, response to intervention, and the prevention of reading difficulties. Language, Speech, and Hearing Services in Schools, 37, 284-297.
Justice, L. M., Bowles, R. P., Kaderavek, J. N., Ukrainetz, T. A., Eisenberg, S. L., & Gillam, R. B. (2006). The index of narrative microstructure: A clinical tool for analyzing school-age children's narrative performances. American Journal of Speech-Language Pathology, 15, 177-191.
Kadaverek, J. N., & Sulzby, E. (2000). Narrative production by children with and without specific language impairment: Oral narratives and emergent readings. Journal of Speech, Language, and Hearing Research, 43, 34-49.
Kemp, K., & Klee, T. (1997). Clinical language sampling practices: Results of a survey of speech-language pathologists in the United States. Child Language Teaching and Therapy, 13(2), 161-176.
Laing, S. P., & Kamhi, A. (2003). Alternative assessment of language and literacy in culturally and linguistically diverse populations. Language, Speech, and Hearing Services in Schools, 34, 44-55.
Law, J., Garrett, Z., & Nye, C. (2005). Speech and language therapy: Interventions for children with primary speech and language delay or disorder. Cochrane Database of Systematic Reviews 3, No. CD004110. Available from http://www.cochrane.org/reviews/en/ ab004110.html
Leadholm, B. J., & Miller, J. F. (1992). Language sample analysis: The Wisconsin guide. Madison: Wisconsin Department of Public Instruction.
Liles, B. Z., Duffy, R. J., Merritt, D. D., & Purcell, S. L. (1995). Measurement of narrative discourse ability in children with language disorders. Journal of Speech and Hearing Research, 28, 123-133.
Ling-yu, G., & Tomblin, J. B. (2005, June). Pauses in narratives of English-speaking children with specific language impairment. Poster session presented at the Symposium on Research in Child Language Disorders, Madison, WI.
Loban, W. D. (1976). Language development: Kindergarten through grade twelve. Urbana, IL: National Council of Teachers of English.
Luo, F., Timler, G., & Vogler-Elias, D. (2006, November). Narrative skills of school age children with language impairment with and without attention deficit and hyperactivity disorder. Paper presented at the 2006 annual convention of the American Speech, Language, and Hearing Association, Miami, FL.
Marquardt, T. P., & Gillam, R. B. (1999). Assessment in communication disorders: Some observations on current issues. Language Testing, 16(3), 249-269.
Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.), Curriculum-based measurement: Assessing special children (pp. 18-78). New York: Guilford.
McMaster, K., & Espin, C. (2007). Technical features of curriculum-based measurement in writing: A literature review. Journal of Special Education, 41(2), 68-84.
Miller, J. F., & Chapman, R. S. (2000). Systematic analysis of language transcripts (Version 6.1a) [Computer software]. Madison: University of Wisconsin, Waisman Research Center, Language Analysis Laboratory.
Miller, J. F., Heilmann, J., Nockerts, A., Iglesias, A., Fabiano, L., & Francis, D. J. (2006). Oral language and reading in bilingual children. Learning Disabilities Research & Practice, 21, 30-43.
Miller, J. F., Long, S., McKinley, N., Thormann, S., Jones, M., & Nockerts, A. (2005). Language sample analysis II: The Wisconsin guide. Madison: Wisconsin Department of Public Instruction.
Miller, R., Gillam, R. B., & Pena, E. D. (2001). Dynamic assessment and intervention: Improving children's narrative abilities. Austin, TX: Pro-Ed.
Mullen, R. (2004). Evidence for whom? ASHA's national outcomes measurement system. Journal of Communication Disorders, 37, 413-417.
Naucler, K., & Magnusson, E. (2002). How do preschool language problems affect language abilities in adolescence? In E Windsor, L. Kelly, & N. Hewlett (Eds.), Investigations in clinical phonetics and linguistics. Mahwah, N J: Lawrence Erlbaum.
Nichols, L. E., & Brookshire, R. H. (1993). A system for quantifying the informativeness and fluency of the connected speech of adults with aphasia. Journal of Speech and Hearing Research, 36(2), 338-350.
No Child Left Behind Act of 2001. Pub. L. No. 107-110, 115 Stat. 1425 (2001).
Pearson, E D., Hiebert, E. H., & Kamil, M. L. (2007). Theory and research into practice in vocabulary assessment: What we know and what we need to learn. Reading Research Quarterly, 42(2), 282-296.
Pena, E. D., & Gillam, R. B. (2000). Dynamic assessment of children referred for speech and language evaluations. Dynamic Assessment: Prevailing Models and Applications, 6, 543-575.
Redmond, S. (2005). Differentiating SLI from ADHD using children's sentence recall and production of past tense morphology. Clinical Linguistics & Phonetics, 19(2), 109-27.
Rescorla, L. (2002). Language and reading outcomes to age 9 in late-talking toddlers. Journal of Speech, Language, and Hearing Research, 45(2), 360-371.
Scott, C. M., & Windsor, J. (2000). General language performance measures in spoken and written narrative and expository discourse of school-age children with language learning disabilities. Journal of Speech, Language, and Hearing Research, 43, 324-339.
Semel, E., Wiig, E. H., & Secord, W. (1987). Clinical Evaluation of Language Fundamentals-Revised. San Antonio, TX: Psychological Corporation.
Shinn, M. (1989). Curriculum-based measurement: Assessing special children. New York: Guilford.
Simon, E., & German, D. J, (1991). Analysis of children's word-finding skills in discourse. Journal of Speech and Hearing Research, 34, 309-316.
Tomblin, J. B., Freese, P. R., & Records, N. L. (1992). Diagnosing specific language impairment in adults for the purpose of pedigree analysis. Journal of Speech and Hearing Research, 35, 832-843.
Tomblin, J. B., Zhang, X., Buckwalter, P., & Catts, H. (2000). The association of reading disability, behavioral disorders, and language impairment among second-grade children. Journal of Child Psychology and Psychiatry and Allied Disciplines, 41(4), 473-482.
Ukrainetz, T. A. (2006). The implications of RTI and EBP for SLPs: Commentary on L.M. Justice. Language, Speech, and Hearing Services in Schools, 37, 298-303.
Wayman, M. M., Wallace, T., Wiley, H. I., Ticha, R., & Espin, C. A. (2007). Literature synthesis on curriculum-based measurement in reading. Journal of Special Education, 41(2), 85-120.
Wechsler, D. ( 1991 ). Wechsler Intelligence Scale for Children: III. San Antonio, TX: Psychological Corporation.
University of Minnesota, Minneapolis
Janet Tilstra, PhD, is a certified speech-language pathologist and recently completed her doctoral degree in educational psychology with emphasis in special education. Her research interests include reading comprehension and methods for measuring school-age children's response to language and reading interventions.
Kristen McMaster, PhD, is an assistant professor of special education at the University of Minnesota. Her research interests include the development of assessments and interventions to promote student response to classroom-based and individualized instruction.
Table 1 Demographic Information for Study Participants Kindergarten Grade 1 Participant Characteristics (n = 15) (n = 15) Age in years: Mean (SD) 6.29 (0.55) 7.35 (0.46) Female (%) 33 53 Race (%) Caucasian 73 100 African American 20 0 Other 7 0 Special services (a) 13 20 ESL (b) 7 0 Participant Characteristics Grade 3 Total (n = 15) (N = 45) Age in years: Mean (SD) Female (%) 9.12 (0.38) 7.59 (1.26) Race (%) 53 47 Caucasian African American 93 89 Other 7 9 Special services (a) 0 2 ESL (b) 27 20 0 2 (a.) Receives special education services for reading, speech, or language. (b). English as a second language: Speaks a language other than English at home. Table 2 Mean Language Scores by Grade Kindergarten Grade 1 (n= 15) (n= 15) Language Measure M SD M SD Productivity TPW 57.64 24.92 59.60 27.47 TNW 71.89 28.76 71.91 31.54 Total C-units 9.31 3.02 9.89 4.42 Total clauses 11.64 4.78 12.36 5.97 Total time (minutes) 1.54 0.75 1.22 0.59 Fluency TPW per minute 46.52 24.13 56.62 26.97 TNW per minute 58.60 29.01 68.14 30.54 C-units per minute 7.78 3.25 9.55 4.47 Clauses per minute 9.70 4.76 11.52 5.59 Grammaticality Total grammatical errors 3.76 6.96 1.13 0.93 Clauses per C-unit 1.22 0.20 1.25 0.20 C-unit length-productive words 6.00 1.33 6.33 1.30 C-unit length-all words 7.65 1.32 7.60 1.61 Grade 3 Total (n=15) (N=45) Language Measure M SD M SD Productivity TPW 104.78 54.98 74.01 43.41 TNW 130.64 78.25 91.48 57.55 Total C-units 14.31 7.81 11.17 5.80 Total clauses 20.36 10.25 14.79 8.25 Total time (minutes) 1.17 0.81 1.31 0.73 Fluency TPW per minute 98.29 29.47 67.14 34.74 TNW per minute 118.83 32.67 81.86 40.23 C-units per minute 13.22 3.64 10.19 4.38 Clauses per minute 19.27 5.71 13.50 6.71 Grammaticality Total grammatical errors 0.87 0.99 1.92 4.21 Clauses per C-unit 1.46 0.17 1.31 0.22 C-unit length-productive words 7.45 1.05 6.60 1.36 C-unit length-all words 9.07 1.11 8.11 1.50 Note: TPW = total productive words; TNW = total number of words. Table 3 Alternate-Form Reliability of Language Measures Across Three Narratives Language Measures Kindergarten r First Grade r Productivity TPW Narratives 1 & 2 .38 .62 * Narratives 1 & 3 .25 .68 ** Narratives 2 & 3 .48 .72 ** TNW Narratives 1 & 2 .29 .59 * Narratives 1 & 3 .30 .61 * Narratives 2 & 3 .48 .64 * Total C-units Narratives 1 & 2 .30 .36 Narratives 1 & 3 .33 .51 Narratives 2 & 3 .67 ** .47 Total clauses Narratives 1 & 2 .32 .11 Narratives 1 & 3 .43 .4 Narratives 2 & 3 .69 ** .64 ** Total time Narratives 1 & 2 .70 ** .62 * Narratives 1 & 3 .24 .43 Narratives 2 & 3 .49 .82 ** Fluency TPW per minute Narratives 1 & 2 .58 * .85 *** Narratives 1 & 3 .73 ** .77 ** Narratives 2 & 3 .80 ** .90 *** TNW per minute Narratives 1 & 2 .61 * .85 *** Narratives 1 & 3 .74 ** .70 ** Narratives 2 & 3 .78 ** .66 ** C-units per minute Narratives 1 & 2 .72 ** .63 * Narratives l & 3 .27 .70 ** Narratives 2 & 3 .51 .66 ** Clauses per minute Narratives 1 & 2 .66 * .71 ** Narratives 1 & 3 .61 * .72 ** Narratives 2 & 3 .77 ** .83 *** Grammaticality Total grammatical errors Narratives 1 & 2 .90 ** .40 Narratives 1 & 3 .97 ** .12 Narratives 2 & 3 .92 ** -.14 Clauses per C-unit Narratives 1 & 2 .07 -.13 Narratives 1 & 3 .25 .33 Narratives 2 & 3 .27 .03 C-unit length-Productive words Narratives 1 & 2 .18 .01 Narratives 1 & 3 .30 .27 Narratives 2 & 3 .10 .39 C-unit length-All words Narratives 1 & 2 .07 .25 Narratives 1 & 3 .33 .42 Narratives 2 & 3 .21 .29 Language Measures Third Grade r Total Productivity TPW Narratives 1 & 2 .92 ** .81 *** Narratives 1 & 3 .85 ** .76 *** Narratives 2 & 3 .89 ** .76 *** TNW Narratives 1 & 2 .95 ** .83 *** Narratives 1 & 3 .88 ** .79 *** Narratives 2 & 3 .89 ** .81 *** Total C-units Narratives 1 & 2 .89 ** .74 *** Narratives 1 & 3 .82 ** .66 *** Narratives 2 & 3 .89 ** .74 *** Total clauses Narratives 1 & 2 .92 .70 *** Narratives 1 & 3 .87 .70 *** Narratives 2 & 3 .86 ** .75 *** Total time Narratives 1 & 2 .98 ** .79 *** Narratives 1 & 3 .80 ** .50 ** Narratives 2 & 3 .82 ** .68 *** Fluency TPW per minute Narratives 1 & 2 .88 *** .86 *** Narratives 1 & 3 .77 *** .85 *** Narratives 2 & 3 .67 ** .86 *** TNW per minute Narratives 1 & 2 .88 *** .87 *** Narratives 1 & 3 .84 *** .85 *** Narratives 2 & 3 .76 ** .88 *** C-units per minute Narratives 1 & 2 .72 ** .76 *** Narratives 1 & 3 .64 * .65 *** Narratives 2 & 3 .55 * .66 *** Clauses per minute Narratives 1 & 2 .81 *** .83 *** Narratives 1 & 3 .73 ** .79 *** Narratives 2 & 3 .65 ** .82 *** Grammaticality Total grammatical errors Narratives 1 & 2 .23 .86 ** Narratives 1 & 3 .65 ** .94 ** Narratives 2 & 3 .21 .85 *** Clauses per C-unit Narratives 1 & 2 .21 .18 Narratives 1 & 3 -.01 .43 ** Narratives 2 & 3 .44 .26 C-unit length-Productive words Narratives 1 & 2 .3 .22 Narratives 1 & 3 .36 .33 Narratives 2 & 3 .63 .38 C-unit length-All words Narratives 1 & 2 .34 .19 Narratives 1 & 3 .35 .36 * Narratives 2 & 3 .47 .39 ** Note: TPW = total productive words; TNW = total number of words. * p < .05. ** p < .01. *** p < .001. Table 4 Bivariate Correlations (Pearson's r) Between Language Fluency Measures and OWLS Composite Scores Kindergarten First Grade Language fluency OWLS Composite OWLS Composite measures TPW per minute 0.36 0.31 TNW per minute 0.31 0.28 Third Grade Total Language fluency OWLS Composite OWLS Composite measures TPW per minute 0.18 .65 *** TNW per minute 0.15 .64 *** Note: OWLS = Oral and Written Language Scales: TPW = total productive words; TNW = total number of words. *** p < .001.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Regular Articles|
|Author:||Tilstra, Janet; McMaster, Kristen|
|Publication:||Communication Disorders Quarterly|
|Date:||Sep 22, 2007|
|Previous Article:||Cultural intelligence (CQ): a quest for cultural competence.|
|Next Article:||Exploring international perspectives in hearing health care.|