Printer Friendly

Response to "dynamic indicators of basic early literacy skills (DIBELS)" by Kamii and Manning.

This article is a response to "Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A Tool for Evaluating Student Learning?" by Kamii and Manning (2005). The intent of their study was to evaluate how DIBELS and a writing test predict reading achievement, which was measured by scores on the Slosson Oral Reading Test (SORT) as well as by a writing task. Approximately 200 students from two geographically discrepant schools were included in this study. The authors concluded that their test was a superior predictor of reading achievement. A few of the authors' many criticisms of their study is that there were insufficient descriptions of their sample and their test administration procedures. Also, their study lacked fidelity of implementation checks. Their results actually showed that the DIBELS score distribution was in the normal range, but that was not true of their writing task.

Keywords: reading assessment, DIBELS, reading readiness, reading disability


The Dynamic Indicators of Basic Early Literacy Skills (DIBELS; Good & Kaminski, 2002) is used as a screening tool by more than 15,000 school districts, according to the official DIBELS site (DIBELS Data Systems, 2012). As many as 1.5 million students are at risk for not reading at grade level and were a part of the Reading First program. DIBELS was a widely used assessment tool in that program (Roehrig, Petscher, Nettles, Hudson, & Torgesen, 2008). Many research articles affirm the value of DIBELS as a tool for evaluating the effectiveness of a variety of programs, including improving literacy skills (Coyne & Ham, 2006; Fien et al., 2010) and early screening for educational difficulties (Elliott, Huai, & Roach, 2007).

In their article "Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A Tool for Evaluating Student Learning?" Kamii and Manning (2005) claimed that "the DIBELS is based on an outdated scientific theory and the evidence presented in this study does not justify its use for the evaluation of an instructional program" (p. 90). This finding also was cited in Manning, Kamii, and Kato's chapter in Kenneth Goodman's anti-DIBELS book Examining DIBELS: What It Is, What It Does (Manning, Kamii, & Kato, 2006). Although the limited scope of our response does not address their claim that DIBELS is based on an outdated scientific theory, we requested the opportunity to review their research study. Stacey Neuharth-Pritchett, editor of the journal at the time of the article's publication, provided the "ground rules," with her notification that we could write a critique article. She told us that, after receiving our manuscript, Kamii and Manning could write a response to our critique.


After reading this article, we questioned Kamii and Manning's conclusions and many aspects of their study but decided to ask the authors for only two pieces of information: (1) their rationale for using a Somer's d statistical analysis and (2) a copy of their data. Dr. Kamii did not respond. Dr. Manning's e-mail response was,

I'm sorry, but there is no way that I have time or Dr. Kamii has time to get you everything that you asked concerning the study. We want to cooperate with you and will to the extent that we can. We are both in the middle of big writing projects and teaching full loads so going back to that study just isn't possible.

Her refusal to provide their data is in noncompliance with the American Psychological Association (APA, 2001) publication requirements that were in effect at the time of publication:
   To permit interested readers to verify the statistical analysis, an
   author should retain the raw data after publication of the
   research. Authors of manuscripts accepted for publication in APA
   journals are required to have available their raw data through the
   editorial review process and for at least 5 years after the date of
   publication (section 8.05 includes a discussion about sharing data
   in the subsection on data verification). (p. 137)

Kamii and Manning' s response put us in a difficult situation. They refused to provide information that would assist us in our critical analysis, and yet they will be able to criticize our response. (1)


Essentially, this is a concurrent validity study involving DIBELS and a writing task developed by Kamii and Manning as predictors of reading achievement (based on scores from the Slosson Oral Reading Test [SORT]), Finding an acceptable reading measure was important in the Reading First era. DIBELS was one of the 24 reading measures selected by the Reading First Assessment Committee (Kame'enui et al., 2006), but the authors' writing task was not one of those measures. The authors said nothing about the importance of their study other than their opinion that DIBELS was a flawed reading test.

The beginning of the article should contain a review of the research literature on early reading assessment for prediction. Instead, the article starts with two paragraphs describing DIBELS without any citations of validity and reliability studies (e.g., Roehrig et al., 2008; Rouse & Fantuzzo, 2006; Schilling, Carlisle, Scott, & Zeng, 2007). After providing the names of DIBELS subtests, the authors stated, "Many teachers doubt the validity of the DIBELS, especially of the first two subtests" (p. 75). They are referring to the DIBELS Nonsense Word Fluency and DIBELS Letter Naming Fluency subtests. Unfortunately, the authors cited no research to support their statement about teacher doubts. Given this unsubstantiated comment about the DIBELS Letter Naming Fluency subtest, it seems odd that Kamii and Manning excluded any analysis of that subtest in their study. (For a review of reliability and validity evidence of the DIBELS, see Goffreda and DiPerna, 2010, which provides multiple examples of studies showing high reliability and validity.)

The rest of their literature review section was spent describing Emilia Ferreiro's theory of assessing young children's reading and writing development. The authors stated, "For the past fifteen years, researchers have been trying to develop ways of assessing young children's development in early reading and writing by following the work of Emilia Ferreiro" (p. 75), but they do not say who those researchers are or provide a citation. Then, Ferreiro's three-level system is explained without any research evidence about its validity or reliability.

At the end of this section, we expected that a research question might be stated along these lines: "How well does the DIBELS Phoneme Segmentation Fluency, Nonsense Word Fluency, and Oral Reading Fluency and Kamii/Manning Writing Task predict reading achievement based on Slosson Oral Reading Test (SORT) scores?" In other words, this is a concurrent validity study with four predictor variables: (1) DIBELS phoneme segmentation fluency, (2) DIBELS nonsense word fluency, (3) oral reading fluency, and (4) the Kamii/Manning Writing Task--and one criterion variable--the SORT. Actually, because they stated they were "analyzing the relationships between scores on each DIBELS subtest" in the Abstract section of their study, we were expecting that all of the DIBELS five subtests were involved. Instead, Kamii and Manning stated, "The present study was conducted to investigate the value of two of the DIBELS subtests for the evaluation of an instructional program--Phoneme Segmentation Fluency (PSF) and Nonsense Word Fluency (NWF)" (p. 78). Phoneme Segmentation Fluency and Nonsense Word Fluency are not instructional programs; they are tests that can be used to evaluate instructional programs.

Later in the paragraph, they stated, "Moreover, this study was designed to see if the PSF also correlated with current achievement in reading and writing" (p. 78). Taken together with the first sentence, it appears that there are two predictor variables (DIBELS PSF and NWF) and two criterion variables (the Kamii/Manning Writing Task and the SORT). The DIBELS Oral Fluency Scale is not mentioned, but it was included. This exclusion is surprising due to the research support of the predictive relationship between oral fluency and reading comprehension (e.g., Petscher, Kim, & Petscher, 2011; Schilling et al., 2007). In addition, although the DIBELS Letter Naming Fluency is usually administered, it was not included in this study, for unknown reasons.


The APA style guide (2001) stresses the need to provide sufficient participant characteristics information for possible replication of studies. Kamii and Manning's sample came from two geographical locations. The 107 kindergartners, from six classrooms, were from a middle- to upper-middle-class suburb, and the 101 1st-graders, from seven classrooms, came from a rural town. Income-level information was not provided for the 1st-graders, and the authors did not give free/reduced-price lunch information to evaluate the income level of either group. The authors provided ethnicity information for both groups but did not explain why they selected students from different locations. This section fails to provide sufficient information to meet APA standards.


This is one of the most important sections in a study involving concurrent validity of tests/tasks. The tests involved should have high reliability (consistency of scores) and strong validity (the tests measures what it is supposed to measure). For example, if you wanted to conduct a study for predicting intelligence, the criterion measure would probably be the Wechsler Intelligence Scale for Children III (WISC III), because it is the most popular IQ test. If you decided to test the same students using the Stanford-Binet III IQ Test (the second-most popular IQ test) as one of the predictor variables, you would find high correlations between those two tests, because both of these well-established tests have good reliability and validity. As another predictor measure, you could decide to use one of the "Learn Your IQ" books sometimes located by grocery checkout lines. Usually, those "tests" involve timed mazes and other tasks. It is doubtful that scores on do-it-yourself IQ books are highly correlated with WISC-III scores. Why? It is doubtful that the do-it-yourself IQ test is reliable or valid. The point behind this example is that a correlation between tests is only as good as the individual tests involved. A low correlation (e.g., .15) could mean that a predictor variable is a poor predictor of a criterion variable. Also, it may mean that one or both of the tests have poor reliability and validity, and the result is a low correlation. Importantly, it cannot indicate that a particular test is flawed; the low correlation of results merely establishes that there is an unspecified problem.

That said, the authors have no measurement section in this study. Instead, they put partial information about three instruments (DIBELS, the Writing Task, and the SORT) in the Procedures section. The following is what they said about DIBELS:
   As part of the Alabama Reading Initiative, all the children, both
   in kindergarten and 1 st grade, were given the DIBELS in January
   according to the standardized instructions. Each child's percentile
   rank was available for each subtest. All of the participants'
   scores on Phoneme Segmentation Fluency (PSF) and Nonsense Word
   Fluency (NWF) were used. In addition, the DIBELS Oral Reading
   Fluency scores that were available in 1st grade were used. (pp.

The authors mentioned that DIBELS was given in January "according to the standardized instructions" (p. 81), but they do not explain who tested the students at either school, where they were tested, how long the testing took, if there were inter-rater reliability checks, and so on. Most importantly, there is no mention of why all DIBELS subtests were not included in the statistical analysis in their study.

The authors provide no information about DIBELS reliability (e.g., test-retest, internal consistency, equivalent forms, inter-rater reliability) or validity (e.g., construct validity and other concurrent validity studies). Given that this is a concurrent validity study, it seems odd that the authors do not include information on these topics. This information was available in the Research section of DIBELS Data Systems website (2012). An example of the research studies and technical reports is Hintze, Ryan, and Stoner (2003).

The authors provided more information about how their writing task was administered, and they noted that writing tasks were different from those used in their earlier study (Kamii & Manning, 1999). However, they provided no reliability or validity information about either version of their writing task and, more specifically, any concurrent validity studies comparing their writing task to SORT or other standardized reading tests. The authors did not conduct any fidelity checks involving inter-rater reliability to ensure that two independent reviewers rated the writing samples in the same way. This appears to be a fatal flaw in this study, especially given the unequal distribution of writing task scores (as found later in this study).

The authors then briefly describe the SORT criterion test. They report that the SORT was given the same day as the writing task. Each student read a list of words to the examiner, a graduate student who marked the answer as "correct," "incorrect," or "don't know." The testing was stopped when the student missed five words. The authors' selection of the SORT as the predictor variable for the DIBELS subtests presents many problems. First, why did they use a test that was created in 1963? Second, they did not provide any reliability or validity information about the SORT. Third, SORT is their operational definition of reading for this study, and it is merely word calling--with only 20 words per grade level--and not a comprehensive reading test.

For the Ferreiro-based writing test, young children who do not yet know how to write are asked to write sentences. It is not clear how this procedure is standardized in terms of encouraging, explaining, or assisting the children, as they are asked to do an unfamiliar task.


Normally, descriptive statistics are presented first. This information was missing from the DIBELS and SORT testing, and there was no mention of why the DIBELS Initial Sound Fluency and Nonsense Word Fluency results for kindergartners were not reported. Without this information and because the authors ran Z transformation on the DIBELS and SORT scores, it is difficult to interpret their results. However, there is a red flag with respect to the SORT results. On page 86, the authors mention that the kindergarten student with the highest score was able to read 74 words after only 4 months of kindergarten. After reviewing Figures 9 and 10, it appears that one student has a similar score and another student has a close score. There are 20 words per grade level, which means that at least two (possibly three) students are reading at the 3rd-grade level (20 for kindergarten level + 20 words for 1st-grade level + 20 for 2nd-grade level +14 for 3rd-grade level).

Next, we conducted a statistical analysis of the authors' DIBELS results (Tables 1-4 in their study) to evaluate if the distribution of DIBELS scores were within the normal range. The authors mentioned the percentage scores without any statistical analysis. Our analysis shown in our Table 1 shows that the DIBELS score distributions based on our chi-squared analyses were all within the normal distribution ranges.

In contrast, our statistical analysis of their distribution of the writing scores (described in Tables 1-4 of their article) is significantly different from expected scores. Although the authors trumpet the quality of their writing task, their results clearly show flaws with that instrument, as shown in our Table 2. For example, it would be expected that kindergartners, after only 4 months of school, would be in the low levels (0-2). Instead, 61% of the students are in the highest two levels (Level 3YC = 41% and Level 4 = 20%). The chi-squared score for kindergarten students ([chi square] = 22.4, df = 5) and the 1st-grade students ([chi square] = 85.56, df = 5) shows non-normal distributions at the .001 level.

A major part of Kamii and Manning's results section describes the significant correlation between the writing task and DIBELS subtest scores for kindergarten and 1st-grade students. As mentioned earlier, correlations are dependent of the quality of the measures involved. The phonemic segmentation and nonsense word fluency are significant at the kindergarten level with Somers' d = .56 (p < .001) and d = .61 (p < .001), respectively, but then the Somers' d scores drop in 1st grade (d = .10 not significant for phonemic segmentation and d = .23, not significant for nonsense word fluency). The authors have several explanations why the scores drop in 1st grade, but they ignore the obvious: there is little range in the writing task scores. Almost 80% of the 1st-grade students received a writing task score of 3YC and 3YD.


Their Discussion section was a reiteration of why they thought that DIBELS is an inadequate test for primary-age readers.


The article critiqued here does not meet APA standards for publication. Furthermore, it fails to convince because of assertions made without citations, untested instruments used in questionable contexts, and conclusions drawn that show preference for the authors' own test, in spite of any generally accepted validity or reliability measures. Finally, no study should be published that cannot be reviewed and analytically evaluated. The fact that the authors decline to provide their data or to acknowledge a contradictory body of evidence regarding DIBELS suggests a lack of academic candor.

DOI: 10.1080/02568543.2012.711801


American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.

Coyne, M. D., & Ham, B. A. (2006). Promoting beginning reading success through meaningful assessment of early literacy skills. Psychology in the Schools, 43(1), 33-43.

DIBELS Data Systems. (2012). Retrieved January 23, 2011, from

Elliott, S. N., Huai, N., & Roach, A. T. (2007). Universal and early screening for educational difficulties: Current and future approaches. Journal of School Psychology, 45(2), 137-161.

Fien, H., Park, Y., Baker, S. L., Mercier Smith, J. L., Stoolmiller, M., & Kame'enui, E. J. (2010). An examination of the relation of nonsense word fluency initial status and gains to reading outcomes for beginning readers. School Psychology Review, 39(4), 631-653.

Goffreda, C. T., & DiPerna, C. (2010). An empirical review of psychometric evidence for the dynamic indicators of basic early literacy skills. School Psychology Review, 39(3), 463-483.

Good, R. H., & Kaminski, R. A. (Eds.). (2002). Dynamic Indicators of Basic Early Literacy Skills (6th ed.). Eugene, OR: Institute for the Development of Educational Achievement.

Hintze, J. M., Ryan, A. L., & Stoner, G. (2003). Concurrent validity and diagnostic accuracy of the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) and the Comprehensive Test of Phonological Processing. School Psychology Review, 32(4), 541-556.

Kame'enui, E. J., Fuchs, L., Francis, D. J., Good, R., O'Connor, R. E., Simmons, D. C ..... Torgesen, J. K. (2006). Adequacy of tools for assessing reading competence: A framework and review. Educational Researcher, 35(4), 3-11.

Kamii, C., & Manning, M. (1999). Before "invented" spelling: Kindergartners' awareness that writing is related to the sounds of speech. Journal of Research in Childhood Education, 14, 16-25.

Kamii, C., & Manning, M. (2005). Dynamic Indicators of Basic Early Literacy Skills (DIBELS): A tool for evaluating student learning? Journal of Research in Childhood Education, 20(2), 75-90.

Manning, M., Kamii, C., & Kato, T. (2006). DIBELS: Not justifiable. In K. Goodman (Ed.), Examining DIBELS: What it is and what it does (pp. 71-78). Portsmouth, NH: Heinemann.

Petscher, Y., Kim, Y., & Petscher, Y. (2011). The utility and accuracy of oral reading fluency schore types in predicting reading comprehension. Journal of School Psychology, 49(1), 107-129.

Roehrig, A. D., Petscher, Y., Nettles, S. M., Hudson, R. F., & Torgesen, J. K. (2008). Accuracy of the DIBELS Oral Reading Fluency Measure for predicting third grade reading comprehension outcomes. Journal of School Psychology, 46, 343-366.

Rouse, H. L., & Fantuzzo, J. W. (2006). Validity of the Dynamic Indicators for Basic Early Literacy Skills as an indicator of early literacy for urban kindergarten children. School Psychology Review, 35(3), 341-355.

Schilling, S. G., Carlisle, J. E, Scott, E A., & Zeng, J. (2007). Are fluency measures accurate predictors of reading achievement? Elementary School Journal, 107(5), 429-448.

Gary Adams

Portland State University, Portland, Oregon

Steve Cathers

Korea International School, Seongnam, Republic of Korea

James Swezey

Liberty University, Lynchburg, Virginia

Tara Haskins

Eastern Washington University, Cheney, Washington

Submitted February 19, 2011; accepted May 9, 2011.

Address correspondence to Gary Adams, Portland State University, 2818 NE Klickitat Street, Portland, OR 97212. E-mail:


(1.) Upon reading this article, Kamii and Manning have simply noted that their philosophy regarding the reading process is a constructivist one, and so do not agree with the behaviorist view taken in this critique.
Distribution of Dynamic Indicators of Basic Early Literacy Skills
Scores in the Kamii and Manning Study

                            1-20th      21-40th       4-60th
Kamii & Manning's         Percentile   Percentile   Percentile
DIBELS scores                (%)          (%)          (%)

Expected                      20           20           20
Actual-kindergarten           16           18           21
  phoneme segmentation
Actual-kindergarten           22           18           22
  nonsense word
Actual-1st grade              18           30           22
  phoneme segmentation
Actual-1st grade              15           15           24
  nonsense word

                           61-80th      81-100th
Kamii & Manning's         Percentile   Percentile
DIBELS scores                (%)          (%)

Expected                      20           20
Actual-kindergarten           21           25
  phoneme segmentation
Actual-kindergarten           25           14
  nonsense word
Actual-1st grade              18           12
  phoneme segmentation
Actual-1st grade              27           20
  nonsense word

Kamii & Manning's
DIBELS scores                       Results

Actual-kindergarten       [chi square] = 1.15 (4), ns
  phoneme segmentation
Actual-kindergarten       [chi square] = 1.91 (4), ns
  nonsense word
Actual-1st grade          [chi square] = 2.10 (4), ns
  phoneme segmentation
Actual-1st grade          [chi square] = 2.83 (4), ns
  nonsense word

Note. ns = not statistically significant.

Distribution of Kamii and Manning's Level of Writing Scores for
Kindergarten and 1st-Grade Students

Kamii & Manning's       0 and 1    2    2Y, 2YA, 2YB,     3YA &
Level of Writing          (%)     (%)   2YC, & 2YD (%)   3YB (%)

Kindergarten expected     17      17          17           17
Kindergarten actual        5      17          8             9
1st grade expected        14      14          14           14
1st grade actual           1       4          1             8

Kamii & Manning's        3YC &     4     5
Level of Writing        3YD (%)   (%)   (%)      Results

Kindergarten expected     17      15
Kindergarten actual       41      20          x = 22.4 (5),
                                                 p <.001
1st grade expected        14      14    14
1st grade actual          78       7     2    x = 85.56 (6),
                                                 p < .001
COPYRIGHT 2012 Association for Childhood Education International
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2012 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Adams, Gary; Cathers, Steve; Swezey, James; Haskins, Tara
Publication:Journal of Research in Childhood Education
Article Type:Critical essay
Geographic Code:1USA
Date:Oct 1, 2012
Previous Article:The effects of environment on children's executive function: a study of three private schools.
Next Article:Shyness, sibling relationships, and young children's socioemotional adjustment at preschool.

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |