Response to "dynamic indicators of basic early literacy skills (DIBELS)" by Kamii and Manning.
Keywords: reading assessment, DIBELS, reading readiness, reading disability
The Dynamic Indicators of Basic Early Literacy Skills (DIBELS; Good & Kaminski, 2002) is used as a screening tool by more than 15,000 school districts, according to the official DIBELS site (DIBELS Data Systems, 2012). As many as 1.5 million students are at risk for not reading at grade level and were a part of the Reading First program. DIBELS was a widely used assessment tool in that program (Roehrig, Petscher, Nettles, Hudson, & Torgesen, 2008). Many research articles affirm the value of DIBELS as a tool for evaluating the effectiveness of a variety of programs, including improving literacy skills (Coyne & Ham, 2006; Fien et al., 2010) and early screening for educational difficulties (Elliott, Huai, & Roach, 2007).
In their article "Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A Tool for Evaluating Student Learning?" Kamii and Manning (2005) claimed that "the DIBELS is based on an outdated scientific theory and the evidence presented in this study does not justify its use for the evaluation of an instructional program" (p. 90). This finding also was cited in Manning, Kamii, and Kato's chapter in Kenneth Goodman's anti-DIBELS book Examining DIBELS: What It Is, What It Does (Manning, Kamii, & Kato, 2006). Although the limited scope of our response does not address their claim that DIBELS is based on an outdated scientific theory, we requested the opportunity to review their research study. Stacey Neuharth-Pritchett, editor of the journal at the time of the article's publication, provided the "ground rules," with her notification that we could write a critique article. She told us that, after receiving our manuscript, Kamii and Manning could write a response to our critique.
RESPONSE BY ARTICLE'S AUTHORS
After reading this article, we questioned Kamii and Manning's conclusions and many aspects of their study but decided to ask the authors for only two pieces of information: (1) their rationale for using a Somer's d statistical analysis and (2) a copy of their data. Dr. Kamii did not respond. Dr. Manning's e-mail response was,
I'm sorry, but there is no way that I have time or Dr. Kamii has time to get you everything that you asked concerning the study. We want to cooperate with you and will to the extent that we can. We are both in the middle of big writing projects and teaching full loads so going back to that study just isn't possible.
Her refusal to provide their data is in noncompliance with the American Psychological Association (APA, 2001) publication requirements that were in effect at the time of publication:
To permit interested readers to verify the statistical analysis, an author should retain the raw data after publication of the research. Authors of manuscripts accepted for publication in APA journals are required to have available their raw data through the editorial review process and for at least 5 years after the date of publication (section 8.05 includes a discussion about sharing data in the subsection on data verification). (p. 137)
Kamii and Manning' s response put us in a difficult situation. They refused to provide information that would assist us in our critical analysis, and yet they will be able to criticize our response. (1)
THEIR PURPOSE OF THE STUDY AND LITERATURE REVIEW SECTION OF ARTICLE
Essentially, this is a concurrent validity study involving DIBELS and a writing task developed by Kamii and Manning as predictors of reading achievement (based on scores from the Slosson Oral Reading Test [SORT]), Finding an acceptable reading measure was important in the Reading First era. DIBELS was one of the 24 reading measures selected by the Reading First Assessment Committee (Kame'enui et al., 2006), but the authors' writing task was not one of those measures. The authors said nothing about the importance of their study other than their opinion that DIBELS was a flawed reading test.
The beginning of the article should contain a review of the research literature on early reading assessment for prediction. Instead, the article starts with two paragraphs describing DIBELS without any citations of validity and reliability studies (e.g., Roehrig et al., 2008; Rouse & Fantuzzo, 2006; Schilling, Carlisle, Scott, & Zeng, 2007). After providing the names of DIBELS subtests, the authors stated, "Many teachers doubt the validity of the DIBELS, especially of the first two subtests" (p. 75). They are referring to the DIBELS Nonsense Word Fluency and DIBELS Letter Naming Fluency subtests. Unfortunately, the authors cited no research to support their statement about teacher doubts. Given this unsubstantiated comment about the DIBELS Letter Naming Fluency subtest, it seems odd that Kamii and Manning excluded any analysis of that subtest in their study. (For a review of reliability and validity evidence of the DIBELS, see Goffreda and DiPerna, 2010, which provides multiple examples of studies showing high reliability and validity.)
The rest of their literature review section was spent describing Emilia Ferreiro's theory of assessing young children's reading and writing development. The authors stated, "For the past fifteen years, researchers have been trying to develop ways of assessing young children's development in early reading and writing by following the work of Emilia Ferreiro" (p. 75), but they do not say who those researchers are or provide a citation. Then, Ferreiro's three-level system is explained without any research evidence about its validity or reliability.
At the end of this section, we expected that a research question might be stated along these lines: "How well does the DIBELS Phoneme Segmentation Fluency, Nonsense Word Fluency, and Oral Reading Fluency and Kamii/Manning Writing Task predict reading achievement based on Slosson Oral Reading Test (SORT) scores?" In other words, this is a concurrent validity study with four predictor variables: (1) DIBELS phoneme segmentation fluency, (2) DIBELS nonsense word fluency, (3) oral reading fluency, and (4) the Kamii/Manning Writing Task--and one criterion variable--the SORT. Actually, because they stated they were "analyzing the relationships between scores on each DIBELS subtest" in the Abstract section of their study, we were expecting that all of the DIBELS five subtests were involved. Instead, Kamii and Manning stated, "The present study was conducted to investigate the value of two of the DIBELS subtests for the evaluation of an instructional program--Phoneme Segmentation Fluency (PSF) and Nonsense Word Fluency (NWF)" (p. 78). Phoneme Segmentation Fluency and Nonsense Word Fluency are not instructional programs; they are tests that can be used to evaluate instructional programs.
Later in the paragraph, they stated, "Moreover, this study was designed to see if the PSF also correlated with current achievement in reading and writing" (p. 78). Taken together with the first sentence, it appears that there are two predictor variables (DIBELS PSF and NWF) and two criterion variables (the Kamii/Manning Writing Task and the SORT). The DIBELS Oral Fluency Scale is not mentioned, but it was included. This exclusion is surprising due to the research support of the predictive relationship between oral fluency and reading comprehension (e.g., Petscher, Kim, & Petscher, 2011; Schilling et al., 2007). In addition, although the DIBELS Letter Naming Fluency is usually administered, it was not included in this study, for unknown reasons.
THEIR SAMPLE SECTION
The APA style guide (2001) stresses the need to provide sufficient participant characteristics information for possible replication of studies. Kamii and Manning's sample came from two geographical locations. The 107 kindergartners, from six classrooms, were from a middle- to upper-middle-class suburb, and the 101 1st-graders, from seven classrooms, came from a rural town. Income-level information was not provided for the 1st-graders, and the authors did not give free/reduced-price lunch information to evaluate the income level of either group. The authors provided ethnicity information for both groups but did not explain why they selected students from different locations. This section fails to provide sufficient information to meet APA standards.
THEIR MEASUREMENT SECTION
This is one of the most important sections in a study involving concurrent validity of tests/tasks. The tests involved should have high reliability (consistency of scores) and strong validity (the tests measures what it is supposed to measure). For example, if you wanted to conduct a study for predicting intelligence, the criterion measure would probably be the Wechsler Intelligence Scale for Children III (WISC III), because it is the most popular IQ test. If you decided to test the same students using the Stanford-Binet III IQ Test (the second-most popular IQ test) as one of the predictor variables, you would find high correlations between those two tests, because both of these well-established tests have good reliability and validity. As another predictor measure, you could decide to use one of the "Learn Your IQ" books sometimes located by grocery checkout lines. Usually, those "tests" involve timed mazes and other tasks. It is doubtful that scores on do-it-yourself IQ books are highly correlated with WISC-III scores. Why? It is doubtful that the do-it-yourself IQ test is reliable or valid. The point behind this example is that a correlation between tests is only as good as the individual tests involved. A low correlation (e.g., .15) could mean that a predictor variable is a poor predictor of a criterion variable. Also, it may mean that one or both of the tests have poor reliability and validity, and the result is a low correlation. Importantly, it cannot indicate that a particular test is flawed; the low correlation of results merely establishes that there is an unspecified problem.
That said, the authors have no measurement section in this study. Instead, they put partial information about three instruments (DIBELS, the Writing Task, and the SORT) in the Procedures section. The following is what they said about DIBELS:
As part of the Alabama Reading Initiative, all the children, both in kindergarten and 1 st grade, were given the DIBELS in January according to the standardized instructions. Each child's percentile rank was available for each subtest. All of the participants' scores on Phoneme Segmentation Fluency (PSF) and Nonsense Word Fluency (NWF) were used. In addition, the DIBELS Oral Reading Fluency scores that were available in 1st grade were used. (pp. 80-81)
The authors mentioned that DIBELS was given in January "according to the standardized instructions" (p. 81), but they do not explain who tested the students at either school, where they were tested, how long the testing took, if there were inter-rater reliability checks, and so on. Most importantly, there is no mention of why all DIBELS subtests were not included in the statistical analysis in their study.
The authors provide no information about DIBELS reliability (e.g., test-retest, internal consistency, equivalent forms, inter-rater reliability) or validity (e.g., construct validity and other concurrent validity studies). Given that this is a concurrent validity study, it seems odd that the authors do not include information on these topics. This information was available in the Research section of DIBELS Data Systems website (2012). An example of the research studies and technical reports is Hintze, Ryan, and Stoner (2003).
The authors provided more information about how their writing task was administered, and they noted that writing tasks were different from those used in their earlier study (Kamii & Manning, 1999). However, they provided no reliability or validity information about either version of their writing task and, more specifically, any concurrent validity studies comparing their writing task to SORT or other standardized reading tests. The authors did not conduct any fidelity checks involving inter-rater reliability to ensure that two independent reviewers rated the writing samples in the same way. This appears to be a fatal flaw in this study, especially given the unequal distribution of writing task scores (as found later in this study).
The authors then briefly describe the SORT criterion test. They report that the SORT was given the same day as the writing task. Each student read a list of words to the examiner, a graduate student who marked the answer as "correct," "incorrect," or "don't know." The testing was stopped when the student missed five words. The authors' selection of the SORT as the predictor variable for the DIBELS subtests presents many problems. First, why did they use a test that was created in 1963? Second, they did not provide any reliability or validity information about the SORT. Third, SORT is their operational definition of reading for this study, and it is merely word calling--with only 20 words per grade level--and not a comprehensive reading test.
For the Ferreiro-based writing test, young children who do not yet know how to write are asked to write sentences. It is not clear how this procedure is standardized in terms of encouraging, explaining, or assisting the children, as they are asked to do an unfamiliar task.
THEIR RESULTS SECTION
Normally, descriptive statistics are presented first. This information was missing from the DIBELS and SORT testing, and there was no mention of why the DIBELS Initial Sound Fluency and Nonsense Word Fluency results for kindergartners were not reported. Without this information and because the authors ran Z transformation on the DIBELS and SORT scores, it is difficult to interpret their results. However, there is a red flag with respect to the SORT results. On page 86, the authors mention that the kindergarten student with the highest score was able to read 74 words after only 4 months of kindergarten. After reviewing Figures 9 and 10, it appears that one student has a similar score and another student has a close score. There are 20 words per grade level, which means that at least two (possibly three) students are reading at the 3rd-grade level (20 for kindergarten level + 20 words for 1st-grade level + 20 for 2nd-grade level +14 for 3rd-grade level).
Next, we conducted a statistical analysis of the authors' DIBELS results (Tables 1-4 in their study) to evaluate if the distribution of DIBELS scores were within the normal range. The authors mentioned the percentage scores without any statistical analysis. Our analysis shown in our Table 1 shows that the DIBELS score distributions based on our chi-squared analyses were all within the normal distribution ranges.
In contrast, our statistical analysis of their distribution of the writing scores (described in Tables 1-4 of their article) is significantly different from expected scores. Although the authors trumpet the quality of their writing task, their results clearly show flaws with that instrument, as shown in our Table 2. For example, it would be expected that kindergartners, after only 4 months of school, would be in the low levels (0-2). Instead, 61% of the students are in the highest two levels (Level 3YC = 41% and Level 4 = 20%). The chi-squared score for kindergarten students ([chi square] = 22.4, df = 5) and the 1st-grade students ([chi square] = 85.56, df = 5) shows non-normal distributions at the .001 level.
A major part of Kamii and Manning's results section describes the significant correlation between the writing task and DIBELS subtest scores for kindergarten and 1st-grade students. As mentioned earlier, correlations are dependent of the quality of the measures involved. The phonemic segmentation and nonsense word fluency are significant at the kindergarten level with Somers' d = .56 (p < .001) and d = .61 (p < .001), respectively, but then the Somers' d scores drop in 1st grade (d = .10 not significant for phonemic segmentation and d = .23, not significant for nonsense word fluency). The authors have several explanations why the scores drop in 1st grade, but they ignore the obvious: there is little range in the writing task scores. Almost 80% of the 1st-grade students received a writing task score of 3YC and 3YD.
THEIR DISCUSSION SECTION
Their Discussion section was a reiteration of why they thought that DIBELS is an inadequate test for primary-age readers.
The article critiqued here does not meet APA standards for publication. Furthermore, it fails to convince because of assertions made without citations, untested instruments used in questionable contexts, and conclusions drawn that show preference for the authors' own test, in spite of any generally accepted validity or reliability measures. Finally, no study should be published that cannot be reviewed and analytically evaluated. The fact that the authors decline to provide their data or to acknowledge a contradictory body of evidence regarding DIBELS suggests a lack of academic candor.
American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.
Coyne, M. D., & Ham, B. A. (2006). Promoting beginning reading success through meaningful assessment of early literacy skills. Psychology in the Schools, 43(1), 33-43.
DIBELS Data Systems. (2012). Retrieved January 23, 2011, from https://dibels.uoregon.edu/
Elliott, S. N., Huai, N., & Roach, A. T. (2007). Universal and early screening for educational difficulties: Current and future approaches. Journal of School Psychology, 45(2), 137-161.
Fien, H., Park, Y., Baker, S. L., Mercier Smith, J. L., Stoolmiller, M., & Kame'enui, E. J. (2010). An examination of the relation of nonsense word fluency initial status and gains to reading outcomes for beginning readers. School Psychology Review, 39(4), 631-653.
Goffreda, C. T., & DiPerna, C. (2010). An empirical review of psychometric evidence for the dynamic indicators of basic early literacy skills. School Psychology Review, 39(3), 463-483.
Good, R. H., & Kaminski, R. A. (Eds.). (2002). Dynamic Indicators of Basic Early Literacy Skills (6th ed.). Eugene, OR: Institute for the Development of Educational Achievement.
Hintze, J. M., Ryan, A. L., & Stoner, G. (2003). Concurrent validity and diagnostic accuracy of the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) and the Comprehensive Test of Phonological Processing. School Psychology Review, 32(4), 541-556.
Kame'enui, E. J., Fuchs, L., Francis, D. J., Good, R., O'Connor, R. E., Simmons, D. C ..... Torgesen, J. K. (2006). Adequacy of tools for assessing reading competence: A framework and review. Educational Researcher, 35(4), 3-11.
Kamii, C., & Manning, M. (1999). Before "invented" spelling: Kindergartners' awareness that writing is related to the sounds of speech. Journal of Research in Childhood Education, 14, 16-25.
Kamii, C., & Manning, M. (2005). Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A tool for evaluating student learning? Journal of Research in Childhood Education, 20(2), 75-90.
Manning, M., Kamii, C., & Kato, T. (2006). DIBELS: Not justifiable. In K. Goodman (Ed.), Examining DIBELS: What it is and what it does (pp. 71-78). Portsmouth, NH: Heinemann.
Petscher, Y., Kim, Y., & Petscher, Y. (2011). The utility and accuracy of oral reading fluency schore types in predicting reading comprehension. Journal of School Psychology, 49(1), 107-129.
Roehrig, A. D., Petscher, Y., Nettles, S. M., Hudson, R. F., & Torgesen, J. K. (2008). Accuracy of the DIBELS Oral Reading Fluency Measure for predicting third grade reading comprehension outcomes. Journal of School Psychology, 46, 343-366.
Rouse, H. L., & Fantuzzo, J. W. (2006). Validity of the Dynamic Indicators for Basic Early Literacy Skills as an indicator of early literacy for urban kindergarten children. School Psychology Review, 35(3), 341-355.
Schilling, S. G., Carlisle, J. E, Scott, E A., & Zeng, J. (2007). Are fluency measures accurate predictors of reading achievement? Elementary School Journal, 107(5), 429-448.
Portland State University, Portland, Oregon
Korea International School, Seongnam, Republic of Korea
Liberty University, Lynchburg, Virginia
Eastern Washington University, Cheney, Washington
Submitted February 19, 2011; accepted May 9, 2011.
Address correspondence to Gary Adams, Portland State University, 2818 NE Klickitat Street, Portland, OR 97212. E-mail: email@example.com
(1.) Upon reading this article, Kamii and Manning have simply noted that their philosophy regarding the reading process is a constructivist one, and so do not agree with the behaviorist view taken in this critique.
TABLE 1 Distribution of Dynamic Indicators of Basic Early Literacy Skills Scores in the Kamii and Manning Study 1-20th 21-40th 4-60th Kamii & Manning's Percentile Percentile Percentile DIBELS scores (%) (%) (%) Expected 20 20 20 Actual-kindergarten 16 18 21 phoneme segmentation Actual-kindergarten 22 18 22 nonsense word Actual-1st grade 18 30 22 phoneme segmentation Actual-1st grade 15 15 24 nonsense word 61-80th 81-100th Kamii & Manning's Percentile Percentile DIBELS scores (%) (%) Expected 20 20 Actual-kindergarten 21 25 phoneme segmentation Actual-kindergarten 25 14 nonsense word Actual-1st grade 18 12 phoneme segmentation Actual-1st grade 27 20 nonsense word Kamii & Manning's DIBELS scores Results Expected Actual-kindergarten [chi square] = 1.15 (4), ns phoneme segmentation Actual-kindergarten [chi square] = 1.91 (4), ns nonsense word Actual-1st grade [chi square] = 2.10 (4), ns phoneme segmentation Actual-1st grade [chi square] = 2.83 (4), ns nonsense word Note. ns = not statistically significant. TABLE 2 Distribution of Kamii and Manning's Level of Writing Scores for Kindergarten and 1st-Grade Students Kamii & Manning's 0 and 1 2 2Y, 2YA, 2YB, 3YA & Level of Writing (%) (%) 2YC, & 2YD (%) 3YB (%) Kindergarten expected 17 17 17 17 Kindergarten actual 5 17 8 9 1st grade expected 14 14 14 14 1st grade actual 1 4 1 8 Kamii & Manning's 3YC & 4 5 Level of Writing 3YD (%) (%) (%) Results Kindergarten expected 17 15 Kindergarten actual 41 20 x = 22.4 (5), p <.001 1st grade expected 14 14 14 1st grade actual 78 7 2 x = 85.56 (6), p < .001
|Printer friendly Cite/link Email Feedback|
|Author:||Adams, Gary; Cathers, Steve; Swezey, James; Haskins, Tara|
|Publication:||Journal of Research in Childhood Education|
|Article Type:||Critical essay|
|Date:||Oct 1, 2012|
|Previous Article:||The effects of environment on children's executive function: a study of three private schools.|
|Next Article:||Shyness, sibling relationships, and young children's socioemotional adjustment at preschool.|