Evaluating large-scale studies to accurately appraise children's performance.Educational policy is often developed using a top-down approach. Recently, there has been a concerted shift in policy for educators to develop programs and research proposals that evolve from "scientific" studies and focus less on their intuition, aided by professional wisdom. This article analyzes several national and international educational studies that have recently been made available to the public. Recommendations for teachers, personnel preparation professionals, and policymakers are based upon selected aggregate and disaggregated data. Results challenge the negative public perception of U.S. education, question the current trend to spend more class and homework time on informational literacy, and advocate for a more balanced approach to early childhood preparation across the content areas.
Keywords: large-scale studies, performance, achievement, literacy, evaluation
The purpose of this review is to use Rothstein's (1998) argument that achieving further educational improvement requires a focus on what could be termed an accurate appraisal rather than an exaggeration of a few current needs or issues in education. One of the most difficult issues is the relative competence of schoolchildren that has been attributed to their formal education. A number of significant national and international educational studies have recently been made available to the public that can be used to challenge the negative public perception of education. However, it can be difficult to develop an accurate appraisal of children's competence when some of the "best" assessments of achievement provide contrary results. For example, Phillips (2007) reported that only 31% of American students achieved proficient or better results on the 2000 National Assessment of Educational Progress (NAEP) science assessment. In contrast, 55% of Singapore students would be considered proficient or better. Many readers of the NAEP results would concur with Phillips when he wondered how it was possible that roughly one half of Singapore's students--considered the best in the world, according to the Trends in International Mathematics and Science Study (TIMSS)--are not even considered proficient in science, according to NAEP standards.
Aggregating, disaggregating, and comparing data from studies can be misleading, as confounding contextual factors include sampling issues for comparisons: social, economic, fiscal, and policy factors that affect the number of hours and days of school, teacher salaries, teacher preparation, and so on; within and across national cultural values associated with education that include values attributed by parents and communities toward the value of education; and varying curricula values and beliefs.
Although it is difficult to make comparisons across states, nations, or internationally, it is still important to recognize the influence of these types of studies on policy. First, this article provides a common perspective of how the public, policymakers, and professionals have viewed the competence of children in the United States over time. Second, the article selectively reviews several recent large-scale assessments to provide an indication of how well children in the United States are doing in literacy, math, and science. Finally, this article provides recommendations. The intent is to stimulate the debate among teachers, teacher educators, and policymakers toward a more balanced curriculum in early childhood/elementary education.
THE PAST AS PROLOGUE TO THE PRESENT
A historical review of American education shows that perceptions of it have not been kind. Richard Rothstein's (1998) book The Way We Were? documents a selection of 150 years of warnings concerning the state of education. Starting with Horace Mann's (1845) criticism of Boston's brightest "Brag Scholars," Mann was troubled that the students could, by rote, recite numerous facts, yet were lacking the critical thinking skills to use their knowledge in a constructive way. In 1909, Ellwood P. Cubberly, the dean of Stanford's education school, stated, "In an ever more interdependent world economy, whether we like it or not, we are beginning to see that we are pitted against the world in a gigantic battle of brains and skill" (p. 50). In the 1950s, the historian Arthur Bestor compared standards for high school graduation in 1905 and 1955 and summarized, "Our standard for high school graduation has slipped badly. Fifty years ago a high-school diploma meant something" (as cited in Rothstein, 1998, p. 13).
In the past 20 years, perceptions in the media and at the national level haven't changed much: A prominently featured article in Newsweek (Kantrowitz & Wingert, 1992) concerning U.S. education was titled "An 'F' in World Competition." In 2002, Secretary of Education Rod Paige (2002, para. 12) stated that "The national report cards in recent years show we are destroying that future--one child at a time." In just the past few years, we have seen a U.S. Department of Education report (National Mathematics Advisory Panel, 2008) conclude that "without substantial and sustained changes to the educational system, the United States will relinquish its leadership in the 21st century" (p. xi). The current Secretary of Education Arne Duncan provided a similarly bleak view of education: "The fact is that we are not just in an economic crisis; we are in an educational crisis" (Ramirez & Clark, 2009, para. 1).
As Rothstein (1998) explained, many problems of the past are similar to the challenges that we face today--regardless of when today is. Rothstein reviewed statistical and anecdotal evidence and came to the conclusion that we were doing better than the public believes. What seems to be clear is that public perception rarely seems to be consistent with the facts. For example, Gallup polls (Bushaw & McNee, 2009) from the late 1960s to today have consistently shown that the greater proportion of parents rate American schools with a C grade or lower. Only 19% of the 2009 poll participants rated the nation's schools as an A or B grade. However, roughly 74% of parents rate their children's school with an A or B grade. Said another way, parents routinely rate their own schools highly but rate everyone else's as poor to failing. Bracey (2009) put it succinctly: "The reasons for this disconnect are simple: Americans never hear anything positive about the nation's schools and haven't since the years just before Sputnik in 1957" (p. 11).
There has been a tendency for critics to think the worst of American schools (Bracey, 1998), and large-scale databases have frequently been used to provide support for these assertions. Benjamin Disraeli has often been attributed as the originator of the well-known phrase "There are three types of lies: lies, damned lies, and statistics." Although recognizing that numbers and statistics can tell many a tale, the author believes that recent publications of national and international studies can provide an alternate interpretive analysis of children's success in the areas of literacy, math, and science. These databases then can be used to support recommendations about day-to-day practices in teacher education.
LITERACY ACHIEVEMENT OF U.S. CHILDREN
An International Perspective
The most recent publication of the Progress in International Reading Literacy Study (PIRLS; Baer, Baldi, Ayotte, & Green, 2007) came out in November 2007. This is a longitudinal study of 4th-graders' literacy progress. The study is conducted by the International Association for the Evaluation of Educational Achievement and collects data from teachers and parents about instruction, classroom characteristics, and students' exposure to reading materials. The most recent analysis was the third of its kind, a review of what has happened from 2001 to 2006, and included data from more than 215,000 students in 45 jurisdictions around the world. Oversight for the study in the United States comes from the National Center for Education Statistics, an entity of the U.S. Department of Education. The sampling procedure was designed to be representative of 4th-graders in the 50 states and District of Columbia. Schools were randomly selected, and additional "replacement" schools were added if the schools declined to participate. From each school, a random sample of one or two classes was chosen, and all students within the class were asked to participate in data collection. In the United States, the representative sample consisted of 222 schools, of which 214 were asked to participate (eight schools had closed). Of these, 120 elected to participate and an additional 63 replacement schools were added, for a total of 183 schools. This provided a sample of 255 classes, from which 5,190 children completed the assessment.
The study provided an index of literacy comprehension and two subscales: one for reading fictional texts labeled as "literary experience" and a second called "informational" for "acquiring and using information" (Baer et al., 2007, p. iii). The test range is from 0 to 1,000, with the average country's score fixed at 500 with a standard deviation of 100. With large-scale studies, caution should be used when interpreting statistical differences in scores/percentages. For example, Baer et al.'s (2007) report of PIRLS data identified a two-point percentage difference as statistically significantly different at the alpha = .05 level, when the practical difference was meaningless when considering trends (e.g., 14% of U.S. children read for information every day vs. 16% of children internationally). Although Glass, McGaw, and Smith (1981) criticized using absolute values for determining clinical or noteworthy effects in research, Cohen's (1969) arbitrary values of .2 as a small effect size (ES), .5 as a medium effect, or .8 as a large effect are often used as a useful reference point. Another measure that can be applied to help interpret the scores are the benchmark scores provided in the PIRLS report (Baer et al., 2007). Students who score 400 are considered "low," "intermediate" students are at the 475 level, "high" students are at the 550 level, and "advanced" students scored 625 or above.
When looking at aggregate data from the PIRLS report (Baer et al., 2007) and comparing them nation to nation, the reflex interpretation for judging U.S. children can be augmented with contextual interpretation to provide a more accurate appraisal of children's relative achievement. The following provides a typical summary of U.S. achievement: "U.S. falls 9 places to barely being in top 20 internationally," with contextual information about the meaningfulness of the data--"the difference between the highest scoring country and the U.S. is considered a small functional difference":
1. Overall, literacy scores are not getting better relative to other countries. The United States in 2001 had a score of 542 and ranked 9th out of 35 countries. In 2006, the United States, with a score of 540, ranked 18th out of the 45 nations/jurisdictions. In 2001, there were only three nations that statistically significantly outperformed the United States-Sweden (by 19 points), the Netherlands (by 12 points), and England (by 11 points). Given either the standard deviation for the scale as 100 points or the PIRLS use of 75 points to differentiate benchmarks (Baer et al., 2007), a rough estimate of the difference between Sweden as the highest scoring nation and the United States (19 point ~ .19 effect size) can be considered as less than a small effect.
2. In 2006, 11 nations had better combined literacy scores than the United States. Although the 2-point negative difference between 2006 and 2001 is negligible, out of the 28 nations that were part of the study in 2001 and 2006, 14 nations showed improvement. Only four of these countries could be considered to have small effect changes, whereas the rest can be called insignificant. The largest change was by the Russian Federation, which improved by 37 points--or roughly a .37 effect size difference (Baer et al., 2007).
3. Looking at the PIRLS subscales (Baer et al., 2007), the U.S. 4th-graders did improve 4 points to 537 on the informational subscale. The United States ranked 13th out of 28 jurisdictions, with Singapore increasing their score by 36 points. Increasing performance relative to reading informational text might contraindicate increased reading outcomes that are deemed of importance to educators (see below).
4. Although there was minor progress on the informational subscale, the United States was ranked 24th out of 28 nations in terms of their "progress" on the literary subscale, with a focus on reading and comprehending text in novels and/or stories. Although not statistically significant (at the alpha = .05 level) and therefore easily dismissed, the United States' score of 541 was 9 points lower than in 2001. Though not significant, the downward trend might be one of the more important results from the PIRLS (Baer et al., 2007) study (see below).
Literacy Performance Trends in the United States
Data from the PIRLS report (Baer et al., 2007) and the National Assessment of Educational Progress (NAEP) were compared to look for relationships between more and less successful achievement and current trends affecting early education. The analysis focused on data related to the type of reading that children engaged in, when and how much homework children received, and the type of reading instruction children were given.
Reading stories versus reading informational text when at home. Summary data from the PIRLS 2006 (Baer et al., 2007) indicated that the greater the frequency that children read novels or stories at home, the better the students were likely to score (Table 1). Thirty-six percent of children read "every day or almost every day" and scored an average of almost 50 points better (558 vs. 509 points) than children who "never or almost never" read (18% of the children). Given that the Russian Federation made the most improvement of any nation (improving its score by 37 points), the 50-point difference can be considered a substantial difference.
In contrast, when children read for information, the trend worked in the opposite direction. Contrary to the dictum of more is better, children who read for information "once or twice a month" (score of 553, 33% of children) did better than children who read for information "every day" (score of 519, 14% of children), or "once or twice a week" (score of 538, 43%), or "never or almost never" (score of 546, 10% of children). Supporting this finding, a statistically significant and meaningful difference in scores (39 points) was found between children who read novels or stories at home every day (score of 558) versus children who read for information every day (score of 519). These trends are similar, regardless of whether you compare children within the United States or whether you look at international trends.
Less or no homework for younger children. Data from the most recent NAEP Trends in Academic Progress longitudinal study (Perie, Moran, Lutkus, & Tirre, 2004) provided insight into how a range of different literary experiences are related to differences in literacy achievement. Figure 1 (Perie et al., 2004, p. 50) shows little difference (4 points) between 4th-grade children who have no homework and children given less than an hour or 1 to 2 hours of work. The children not given homework did statistically significantly better than the children given more than 2 hours of work a night. As children got older, the positive and linear relationship between homework and reading scores was what we might expect, but this did not hold true for younger children. Children in 4th grade who had no or little homework did substantially better than children given more than 2 hours a night. This is worrisome, given recent trends to increase homework for younger children.
Although the amount of homework for 17-year-olds (typically in 11th grade) has remained relatively stable over a 25-year period from 1980 to 2004, the trend for more homework for younger children is shown in Figure 2 (Perie et al., 2004, p. 51). We see that the amount of homework that changed the most was "an increase in the number of children who do up to one hour of homework a night" and a comparable decease in the percentage of 4th-graders who were "not given homework." Such organizations as the American Federation of Teachers (2009) tend to provide general recommendations about homework:
Over the past 40 years, most research studies on homework have found that students benefit from doing schoolwork outside of class, both in terms of achievement gains and in developing independence, responsibility, organizational and time management skills, and good study habits. (p. 3)
Given the trend of greater academic expectations (e.g., No Child Left Behind [NCLB] legislation), it is not likely that the trend in the future will be to give "little/no homework," which is correlated with the best NAEP test scores.
Formal versus informal reading: More is not better. Given that current PIRLS data (Baer et al., 2007) indicated that general literacy scores in the United States haven't improved or decreased substantially from 2001 to 2006, it is perhaps surprising that 68% of 4th-grade students in the United States received "6 or more hours" of reading instruction a week. This compares with an international average of just 25% of students receiving "6 or more hours" of literacy instruction a week. Whereas 44% of international students received "up to and including 3 hours" of instruction, only 10% of students in the United States did. Most significant is the statement that "[a]lthough the amount of reading instruction may vary across students and schools, average scores for U.S. students on the combined reading literacy scale did not measurably differ by the amount of reading instruction received" (Baer et al., 2007, p. 14). One plausible reason was presented: Baer et al. (2007) reported that 95% of children from the U.S. receive "informal" reading initiatives, compared to an international average of 80%. From a cost-benefit perspective, it is important to consider that teachers in the United States spend more time teaching reading, and the amount of time spent teaching reading does not correlate with increased literacy scores.
The international PIRLS data parallel NAEP 2004 data (Perle et al., 2004). In the United States, students who read 6 to 10 pages during the school day and at home (NAEP score of 220) have reading scores similar to children who read 11 to 15 pages (NAEP score of 222), 16 to 20 pages (NAEP score of 223), or more than 20 pages (NAEP score of 222). When comparing changes from 1984 to 2004 in the number of pages read by 4th-grade students (age 9), questions should be raised about quantity versus quality.
Interpreting data from Figure 3, the most obvious trend over the past 20 years has been an increase in the percentage of students reading more than 20 pages of text a day and a decrease in almost every other category. Even if children read more in the United States, their scores were no better than children who read 6 to 10 pages a day.
Interestingly, the percentage of children in 2004 who read for fun "almost every day" (54%), "once or twice a week" (26%), or "once or twice a month" (7%) hasn't changed more than a couple of percentage points from either 1984 or 1999 (see Figure 4). The children who scored the highest on the reading scale were the children who read for fun "once or twice a week" (NAEP score of 224) and "almost every day" (NAEP score of 220). We also know that at "ages 13 and 17, the percentage saying they read for fun almost every day was lower in 2004 than in 1984. This trend accompanied an increase over the same 20-year time period in the percentage indicating that they never or hardly ever read for fun" (Perie et al., 2004, p. 54).
MATH AND SCIENCE
A recurring theme in education is to keep a focus on the 3 R's: reading, writing, and arithmetic. For some educators, this becomes a narrower focus. An oft-heard slogan is: "The first few years of formal education are for teaching children to learn to read so that afterwards, children can read to learn." There is little doubt that the focus on reading as literacy trumps any other content/subject area without considering the potential psychological costs (Egan, 2001). Even with national evaluations, we see that the NAEP report (Perie et al., 2004) provided "contextual" factors for reading/literacy for the 4th-graders (9-year-olds), 8th-graders (13-year-olds), and 11th-graders (17-year-olds). It's telling that the NAEP contextual factors for math were limited to "course-taking patterns" of children ages 13 and 17 (Perie et al., p. 56). No contextual factors were studied for young children.
With the shift toward using evidence-based practices in schools, data have indicated that a more balanced shift toward math is warranted. In what has been referred to as a controversial study, 13 researchers (Duncan et al., 2007) examined data from six longitudinal databases from three countries (the United States, the United Kingdom, and Canada). One of the more compelling findings is that early math skills is the single best predictor of later school achievement. The more surprising finding was that early math skills, rather than early reading achievement, is a better predictor of later reading achievement.
Another data-based study in support of the early focus on math was published by the American Institute for Research, called Lessons Learned From U.S. International Science Performance (Ginsberg, Leinwand, & Pollock, 2007). This international study's strength was that it compares the United States with 11 other industrialized nations in 4th, 8th, and 11th grade on three science assessments. The implication of early math instruction in the executive summary is clear:
The United States' consistently low international mathematics scores are one explanation for declining U.S. science performance in the upper grades. By contrast, Australia's and New Zealand's relative mathematics performances are higher with successive assessments and parallel their successively higher relative science performance. (Ginsberg et al., 2007, p. iv)
A few reasons were offered for the decline in math and science achievement scores as children get older in the United States. The report's first recommendation was to "strengthen U.S. mathematics performance as a strategy to strengthen U.S. science performance" (Ginsberg et al., 2007, p. 23). As one of the authors of that study noted, "When we look at Japan, whose students consistently rank very high in math and science, we see that they don't even begin science instruction until third grade, instead focusing on a strong math foundation" (cited in McQuillan, 2007, para. 9). That author also stated, "In elementary grades in the U.S., science tends to involve primarily factual recall, but as students progress, they need to employ more mathematics to understand and explain science concepts" (McQuillan, 2007, para. 8).
Wenglinsky's (2002) study provided an empirical argument for the role of teachers and classroom practices in affecting math scores, over and above the influence of socioeconomic status. An Association for Supervision and Curriculum Development (ASCD) research brief (Laitsch, 2003) concurred: "Teacher quality and classroom practice can have an effect on student achievement equal to or exceeding that of socioeconomic status" (para. 7). Even though the results of Wenglinsky's article are controversial (see, e.g., Lubienski, 2006) and the ASCD research brief (Laitsch, 2003) should be limited to math, data support the proposition that current interventions are better able to change math achievement than other subject areas. As Good, Burros, and McCaslin (2005) reminded us, when states push for high-stakes testing, potential changes to achievement scores are often the result of motivational effects. The Good et al. (2005) Comprehensive School Reform study of 24 matched schools concluded that "these findings support the assertion that student performance in mathematics can be enhanced. They also illustrate the need for renewed efforts to identify programs that help students further develop language and reading performance" (p. 2222).
DISAGGREGATING THE DATA
As is typical in many large-scale studies, differences within a country are usually much more significant, statistically and in terms of educational meaningfulness, than differences between countries. Berliner (2006) made a compelling case that the United States is unlike most other nations in its demographic composition. Therefore, what are often seen as equal comparisons (country vs. country comparisons) are often inequitable based on the relative heterogeneity of the United States and relative homogeneity of other countries. It is argued that comparisons that can be viewed as apples and oranges (comparing high socioeconomic status children in the United States to children in Sweden) are still comparisons of the same type and can add to the interpretive analysis. Although the difference between the United States (ranked 18th) and the Russian Federation (ranked 1st) was 26 points on PIRLS combined literacy scores (Baer et al., 2007), whether we should be primarily concerned with international comparisons is put in context when differences in countries are juxtaposed with scores within different U.S. social groups. For example, data from PIRLS 2006 (Baer et al., 2007) indicated that:
1. Similar to every other jurisdiction or country (e.g., Canada is split into several jurisdictions), girls in the United States outperform boys. U.S. girls in the 4th grade scored on average 10 points higher than boys (the international average difference was 17). On average, girls scored 12 points higher on the literary scale and 9 points higher on the informational scale. Data indicated that the only countries that did not show a statistically significant difference in scores were Luxembourg (a 3-point difference favoring girls) and Spain (a 4-point difference for girls).
2. Students in private schools (score of 561) in the United States outperformed students from public schools (538) by an average of 24 points, which is roughly the difference between the United States and Russian Federation (26 points). Using benchmark descriptors, the average private-school student was considered in the "high" range, whereas students taught at public schools were, on average, at the intermediate level. As a subset of students, the private school students would be ranked third in the world, better than Canada (Alberta), but not quite as good as Hong Kong. If we only consider public school students, they would be one place below England (19th) in their international standings.
3. Asian and White, non-Hispanic children score at the top of a racial/ethnic comparison. Black, non-Hispanic and American Indian/Alaska Native, non-Hispanic children scored lowest. There was a 99-point difference in scores between Asian and American Indian/Alaska Native groups. Said another way, if the U.S. schools only consisted of Asian and White children (scores of 567 and 560, respectively), if we compared them to other nations, they would be 1st out of the 45 nations. However, if the United States only consisted of American Indian/Alaska Native children (score of 468), they would rank 37th, somewhere between Georgia (the country) and Macedonia.
4. Roughly 11% of U.S. schools reported that all of their children received free or reduced-priced lunch. Conversely, about 2% of schools had no children who received free or reduced-priced lunch. The low-income schools scored, on average, 93 points lower (average score equals 493) than the schools that did not serve children from low-income homes (average score equals 586). Similar to the comparison of race/ethnicity, this highlighted one of the greatest disparities between social groups in U.S. schools.
POVERTY: THE 600-POUND GORILLA
As sociologist Elizabeth Cohen once stated, "Poverty constitutes the unexamined 600-pound gorilla that most affects American education today" (Biddle, 2001, as cited in Berliner, 2006, p. 952). Berliner reviewed several different databases; one example was the 2003 Trends in International Mathematics and Science Study (TIMSS). As shown earlier, as international data have become more readily available, a natural outcome has been the focus on perceived failures of U.S. children when compared to their peers. In response, Berliner wrote at length of the differences "within" the United States and how the greater proportion of U.S. children are, in fact, doing very well. For example, there is about a 100-point difference between wealthy and poor 4th-graders and 8th-graders for math and science. Extrapolating from the data, and to put these scores in perspective, students in schools where there is less than 10% poverty would be ranked third out of 25 countries in the world in math at 4th grade and sixth of 45 countries at 8th grade. In comparison, the poorest students would rank 19th and 29th, respectively. In science at 4th grade, wealthy children would be ranked first at 4th grade and shared second rank at 8th grade. Children in poverty would be ranked 18th at 4th grade and 25th at 8th grade.
The TIMSS 2007 results were published in December 2008 (Martin, Mullis, & Foy, 2008; Mullis, Martin, & Foy, 2008). They were strikingly similar to the TIMSS 2003 data reported by Berliner. If anything, the data indicated a consistent and unsettling trend for the difference between wealthy and poor students. Math scores for wealthy 4th- and 8th-graders placed them third out of 36 and sixth out of 48 countries, respectively. For the poorest children, they would be ranked 25th and 40th. For science, wealthy children would be ranked first at 4th and 8th grade, whereas poor children would be ranked 25th and 27th. Although 4th-graders in math did better from 2003 to 2007, the difference between the poorest and wealthiest students increased from 94 points in 2003 to 104 points in 2007. In 4th-grade science, the difference between the poorest and wealthiest students increased from 99 points in 2003 to 113 points in 2007. Said another way, the poverty disparity in science and math was worse in 2007 than in 2003. In 2007, the effect-size difference between the poorest and wealthiest 4th-grade public school children for math and science was 1.5 standard deviations, which is a difference of great magnitude.
Although girls outperform boys, and children in private schools outperform children in public schools, the differences were less than the difference between the average U.S. child and the average Russian Federation child. These differences paled in comparison to the differences between racial/ethnic groups and differences related to wealth. When these differences are viewed through the lens of past and current trends in literacy, many questions are raised concerning our current practices in literacy education.
DISCUSSION AND CONCLUSIONS
Joe Frost (2003) reminded us that there is life beyond reading and math. Continuing a theme that he had spoken about for decades, Frost highlighted several "societal gaps in the care and education of young children" (p. 29). The gaps included (1) the child obesity gap; (2) the standardized testing gap and associated "accountability movement"; (3) the pill generation gap--those children given psychiatric drugs; (4) the child crime and violence gap; (5) the morals/ethics gap; and (6) the international gaps. As teachers face increasing pressure from a variety of sources, we should flame the role of an early childhood educator in an appropriate--not exaggerated--place. Several recommendations from the data follow:
1. Play should be the child's work: Teachers, teacher educators, and policymakers should act on the principle that we are spending more time on teaching literacy than in the past without seeing a corresponding improvement in literacy scores. Given the same number of hours in the day, the more time we take for reading literacy, the less time we invariably have for anything else--including play. Baker (2007) noted in an opinion column in The Guardian newspaper that every European country that did better on the PIRLS started formal school later (age 6 or 7) than they did in Britain (age 4 or 5). British children were also considered the unhappiest children in a UNICEF study of 21 industrialized countries.
2. The first three years are not (only) for teaching the child to read: People in the field of early childhood education should do a better job of advocating for "other" content areas. The International Reading Association (IRA, 2001) position statement "Investment in Teacher Preparation in the United States" cited a study by Hoffman and Roller (2001) noting that whereas some beginning teachers have only three semester credit hours related to "reading" instruction, some have 24 semester hours. The position statement continues, "Better-prepared teachers who are competent to teach reading are essential if national and state goals for closing the reading achievement gap (i.e., differences in reading achievement between African American, Hispanic, and Native American students and their white counterparts) are to be realized" (p. 2).
For every credit hour associated with one content area, this leaves one fewer hour of preparation for other areas. A more equitable balance between literacy/language arts, math, science, fine arts, health and physical activity, music, languages, and social studies is needed. When school systems require 120 minutes (or even 180 minutes) of literacy instruction a day, teachers have limited choices in how to cover other content in an equitable way. Children aren't liable to see the value in a broad range of subject areas and be well-rounded learners if teachers overtly or even implicitly prioritize some forms of knowledge over others.
3. Large-scale studies provide specific recommendations: We have an ethical and moral responsibility to take Rothstein's suggestion about focusing on an accurate appraisal of our strengths and weaknesses seriously. Even if we take a purely academic approach to this, the American Institute for Research report (Ginsberg et al., 2007) provided some direction for consideration: have a greater focus on math in the early years; concentrate more on the physical sciences, which tend to be the weakest area in science; spend more time on the conceptual understanding of math and science; and spend more time reinforcing girls' exposure to science.
4. Be wary: Professionals don't always have it right: For the NAEP data, consensus levels for "basic," "proficient," and "advanced" performance were determined to a degree by professional wisdom (Bracey, 2004). Professional wisdom varies greatly. For example, using 2003 data from Texas, Bracey (2004) noted that "91 percent of its eighth graders were proficient in mathematics but NAEP put only 24 percent at that level" (p. 72). Additionally, in one of the first reviews of the NCLB accountability system in 2003, more than 1,500 schools in Michigan were failing, yet not a single school in Arkansas was deemed as such (Bracey, 2004). Nevertheless, Bracey noted that the NAEP data indicated that Michigan students outperformed Arkansas students. Given the disparate results, one could claim (1) the standards in Michigan were too high, (2) the standards in Arkansas were too low, or (3) something in the middle. However, it might be that (4) both are too low or (5) both are too high. Even knowing that Michigan students do better than Arkansas students might not help, as Phillips (2007) reported that almost one half of the "world's best" science students (from Singapore) wouldn't be considered at the proficient level on the 2000 NAEP science test. Knowing some of the limitations of these measures will help to guard against proposed changes to a curriculum based on an exaggeration of deficiencies of education.
5. Are we asking the correct questions?: Often, we are not asking the right question when we ask if children in the United States can do better. There's always room for improvement, but improvements are typically subject to laws of diminishing returns. A different question might be: Should we spend our limited resources chasing arguably inconsequential reading improvements at the expense of other areas of development? Children in the United States have historically been "proficient" compared to their international peers. Analysis of data from the PIRLS 2001 study (Mullis, Martin, Gonzalez, & Kennedy, 2003, p. 26) indicated that only three countries statistically significantly outperformed the United States in reading. Two of these countries scored 11 and 12 points more. In 2006, only two countries scored over 20 points higher than the United States. The scores are on a 1,000-point scale with a standard deviation of 100 points. A reasonable interpretation is that a 20-point difference is a relatively small difference with little functional difference. Given that Torgesen et al.'s study (2007) found that a large-scale randomly assigned implementation of four well-known reading interventions with matched control groups resulted in the children doing worse on statewide tests, targeted interventions with a good "return" on time and money are warranted. In their rationale for their study, Torgesen et al. wrote:
For their part, the nation's 16,000 school districts are spending hundreds of millions of dollars on educational products and services developed by textbook publishers, commercial providers, and nonprofit organizations. Yet we know little about the effectiveness of these interventions. Which ones work best'? For whom do they work best? Do these programs have the potential to close the reading gap? (p. vii)
The irony is twofold. First, Torgesen et al. (2007) chose well-known and "widely used remedial reading instructional programs: Corrective Reading, Failure Free Reading, Spell Read P.A.T., and Wilson Reading" (p. ix), and provided an average of 90 hours of instruction in small groups. The best results were on "word attack" performance, and this was to move from an averaged scale score of 92.4 to 97.4. Any increases for fluency or comprehension were 1 or 2 points at most. To many, the most significant result was that the interventions "did not improve PSSA scores" (p. xiv). The Pennsylvania System of School Assessment (PSSA) is Pennsylvania's end-of-year statewide test. Not only did the children receiving the reading intervention "not improve" on PSSA scores, the 3rd-graders scored an average of 15 points fewer and the 5th-graders scored an average of 27 points fewer than the control group on the statewide reading assessment. Compounding this problem, the reading intervention groups in 5th grade did significantly worse on their statewide math assessments.
Second, the stated figure of hundreds of millions of dollars spent was an underestimate. The Reading First Impact Study (Gamse, Jacob, Horst, Boulay, & Unlu, 2008) cost more than $1 billion a year for 5 years. Even though data at the state level is still being accumulated, results from the national large-scale (30,000 to 40,000 students a year) study of literacy indicated that teachers increased the use of evidence-based practices in the five areas of reading instruction. One positive outcome was that 1st-graders' decoding skills increased significantly. However, the primary results of spending more than $1 billion per year for 5 years could be summed up in its executive summary: "The study finds, on average, that after several years of funding the Reading First program, it has a consistent positive effect on reading instruction yet no statistically significant impact on student reading comprehension" (Gamse et al., 2008, p. xviii).
When looking at the data, it is certainly debatable whether "formal reading" for all--in its current approach--is the primary skill that should dominate early education. Many educators intuitively know that a play-based approach to education is preferred to what often becomes a more narrowly focused academic approach centered on reading, writing, and arithmetic. It isn't that they are mutually exclusive; it's that it's easier to think and respond to dichotomies than it is to work/play in the gray area in-between (e.g., progressive vs. traditional education; whole-language vs. phonics reading; reasoning and understanding vs. facts and procedures in math; inquiry vs. studying the disciplines of physical, life, earth, and space sciences).
One interpretation of the data for educators to consider is that children who read for information once or twice a month were better readers than children who read for information every day or weekly. The children in the first group had more time to explore stories or novels. Data indicate that children should read for fun--it's a good way to get children to continue to read. The amount of time that children spend reading for fun hasn't changed over the past 20 years, yet children who read for fun every day outperformed children who don't. However, children who read for fun once or twice a week outperformed children who read for fun every day. This is certainly an area that warrants further study.
We know that even though children in the United States are provided more reading instruction than almost any other country, this is not related to increases in literacy scores. Over the past 30 years, the biggest increase is in the number of pages read per day (including homework), but this was unrelated to changes in achievement scores. Analysis of data also revealed that the amount of homework assigned each day has increased over time. For younger children, this was not correlated with better achievement scores, and the children who performed the best were given no, or limited amounts of, homework in the early grades.
Spend more time on exploratory math; focus on conceptual understanding by using play as a medium for learning. According to Sarama and Clements (2009), "Constructive play is critical to high-quality preschool" (p. 322) and "Children benefit from richer play experiences, preparation for learning later mathematics, and new ways to understand their world" (p. 333). Revel in the fact that young children's play experiences involve physical and logico-mathematical knowledge (Kamii & DeVries, 1993), thus helping children with later success in the sciences.
The difference in achievement levels among social groups is a complex problem with a larger attribution to societal differences than to problems within the teaching field (Berliner, 2006). This is different than the well-established fact that quality of teaching (however defined) can have a large effect on achievement "gain-scores," and this is only when students are randomly assigned to teachers (as is rarely the case; see Fuhrman, 2010). Although it is unreasonable to expect teachers to be the ones to solve the social inequities that lead to achievement gaps (Berliner, 2006), they nevertheless can be a significant factor in addressing imbalances. Sarama and Clements (2009) reminded us that one of the best ways to help low-income children who are the most disadvantaged in math is "to help children discuss and think about the mathematics they learn in their play" (p. 332).
American Federation of Teachers. (2009, August). Classroom tips: Assigning effective homework (Item no. 39-0090D). Retrieved from www.aft.org/pubs-reports/teachers/CT-Homework.pdf
Baer, J., Baldi, S., Ayotte, K., & Green, P. (2007). The reading literacy of U.S. fourth-grade students in an international context: Results from the 2001 and 2006 Progress in International Reading Literacy Study (PIRLS, NCES 2008-017). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics. Retrieved from http://nces.ed.gov/surveys/pirls/
Baker, M. (2007, December 18). All work and no play makes Jack a sad boy. The Guardian. Retrieved from www.guardian.co.uk/education/mortarboard/2007/dec/18/ allworkandnoplaymakesjac
Berliner, D. C. (2006). Our impoverished view of educational reform. Teachers College Record, 108(6), 949-995.
Bracey, G. W. (1998, February 18). TIMSS, the new basics, and the schools we need. Education Week, 17(23), 36, 38-39.
Bracey, G. W. (2004). Setting the record straight: Responses to misconceptions about public education in the U.S. (2nd ed.). Portsmouth, NH: Heinemann.
Bracey, G. W. (2009). Experience outweighs rhetoric. Phi Delta Kappan, 91(1), 11.
Bushaw, W. J., & McNee, J. A. (2009). Americans speak out: Are educators and policy makers listening? The 41st annual Phi Delta Kappa/Gallup poll of the public's attitudes toward the public schools. Phi Delta Kappan, 91(1), 9-23.
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York, NY: Academic Press.
Cubberly, E. P. (1909). Changing conceptions. Boston, MA: Houghton Mifflin Press.
Duncan, G. J., Dowsett, C. J., Claessens, A., Magnuson, K., Huston, A. C., Klebanov, P., ... Japel, C. (2007). School readiness and later achievement. Developmental Psychology, 43(6), 1428-1446.
Egan, K. (2001). The cognitive tools of children's imagination (ERIC Document Reproduction Service No. ED 469 669)
Frost, J. L. (2003). Bridging the gaps. Childhood Education, 80, 29-34.
Fuhrman, S. H. (2010, September 24). Education Week, 29(28), 32-33.
Gamse, B. C., Jacob, R. T., Horst, M., Boulay, B., & Unlu, E (2008). Reading First impact study final report executive summary (NCEE 2009-4039). Washington, DC: U.S. Department of Education Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance.
Ginsberg, A., Leinwand, S., & Pollock, E. (2007). Lessons learned from U.S. international science performance. American Institutes for Research. Retrieved from www.air.org/news/documents/lessons.learned.in.science.pdf
Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. London, England: Sage.
Good, T. L., Burros, H. L., & McCaslin, M. M. (2005). Comprehensive school reform: A longitudinal study of school improvement in one state. Teachers College Record, 107(10), 2205-2226.
Hoffman, J. V., & Roller, C. M. (2001). The IRA excellence in reading teacher preparation commission's report: Current practices in reading teacher education at the undergraduate level in the United States. In C. M. Roller (Ed.), Learning to teach reading: Setting the research agenda (pp. 32-79). Newark, DE: International Reading Association.
International Reading Association. (2001). Investing in teacher preparation in the United States: A position statement from the Board of Directors of the International Reading Association. Newark, DE: Author.
Kamii, C., & DeVries, R. (1993). Physical knowledge in preschool education. New York, NY: Teachers College Press. (Original work published 1978)
Kantrowitz, B., & Wingert, P. (1992, February 17). An "F" in world competition. Newsweek, p. 57.
Laitsch, D. (2003, May 27). The effect of classroom practice on student achievement. ASCD Research Brief 1(11). Retrieved from www.ascd.org/portal/site/ascd/menuitem.03e1753c019b7a9f989ad324d3108a0c/
Lubienski, S. T. (2006). Examining instruction, achievement, and equity with NAEP mathematics data. Education Policy Analysis Archives, 14(14), 1-29. Retrieved from http://epaa.asu.edu/epaa/vl4n14/
Mann. H. (1845). Tenth annual report of the Board of Education. Boston, MA: Dutton and Wentworth Printers.
Martin, M. O., Mullis, I. V. S., & Foy, P. (with Olson, J. E, Erberber, E., Preuscboff, C., & Galia, J.). (2008). TIMSS 2007 international science report: Findings from IEA's trends in international mathematics and science study at the fourth and eighth grades. Chestnut Hill MA: TIMSS & PIRLS International Study Center, Boston College.
McQuillan, L. (2007). New analysis shifts view of U.S. students' science performance compared with other countries, offers clues to differences. American Institutes for Research. Retrieved from www.air.org/news/pr/new.analysis.aspx
Mullis, I. V. S., Martin, M. O., & Foy, P. (with Olson, J. E, Preuschoff, C., Erberber, E., Arora, A., & Galia, J.). (2008). TIMSS 2007 international mathematics report: Findings from IEA's trends in international mathematics and science study at the fourth and eighth grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Mullis, I. V. S., Martin, M. O., Gonzalez, E. J., & Kennedy, A. M. (2003). PIRLS 2001 international report: IEA's study of reading literacy achievement in primary schools. Chestnut Hill, MA: Boston College.
National Mathematics Advisory Panel. (2008). Foundations for success: The final report of the National Mathematics Advisory Panel. Washington, DC: U.S. Department of Education.
Paige, R. (2002. June 12). Education Secretary Paige addresses first annual teacher quality evaluation conference. U.S. Department of Education Speeches Archives. Retrieved from http://www2.ed.gov/news/speeches/2002/06/061102.html
Perie, M., Moran, R.. Lutkus, A. D., & Tirre, W. (2004). NAEP 2004 trends in academic progress: Three decades of student performance in reading and mathematics (NCES 2005-464). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics. Retrieved from http://nces.ed. gov/pubsearch/pubsinfo.asp?pubid=2005464
Phillips, G. W. (2007). Expressing international educational achievement in terms o[" U.S. performance standards: Linking NAEP achievement levels to TIMSS. Washington, DC: American Institutes for Research.
Ramirez, E., & Clark, K. (2009, February 5). What Arne Duncan thinks of No Child Left Behind. U.S. News and World Report. Retrieved from www.usnews.com/articles/education/2009/02/05/ what-arne-duncan-thinks-of-no-child-leftbehind.html?errors=socialweb_1
Rothstein, R. (1998). The way we were? The myths and realities of America's student achievement. New York, NY: Century Foundation Press.
Sarama, J., & Clements, D. H. (2009). Building blocks and cognitive building blocks. American Journal of Play. 1(3), 313-337. Retrieved from http://www.journalofplay.org/sites/ www.journalofplay.org/files/pdf-articles/1-3-articlebuilding- blocks-cognitive-building-blocks.pdf
Torgesen, J., Schirm, A., Castner, L., Vartivarian, S., Mansfield, W., Myers, D., .... Haan, C. (2007). National assessment of Title L .final report: Volume II: Closing the reading gap, findings from a randomized trial of four reading interventions for striving readers (NCEE 2008-4013). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics.
Wenglinsky, H. (2002, February 13). How schools matter: The link between teacher classroom practices and student academic performance. Education Policy Analysis Archives, 10(12). Retrieved from http://epaa.asu.edu/epaa/v10n12/
James M. Ernest
University of Alabama at Birmingham, Birmingham, Alabama
Submitted November 23, 2009: accepted February 3, 2011.
Address correspondence to James M. Ernest, Associate Professor of Early Childhood Education, Department of Curriculum and Instruction, University of Alabama at Birmingham. 117 SOE, 901 13th Street South. Birmingham, AL 35294-1250. E-mail: jernest @uab.edu
TABLE 1 Reading Activities Outside of School for U.S. Students and Scores on the Combined Literacy Scale Stories or Novels Score % Every day or almost every day 558 36 Once or twice a week 541 28 Once or twice a month 539 18 Never or almost never 509 18 Read for Information Score % Every day or almost every day 519 14 Once or twice a week 538 43 Once or twice a month 553 33 Never or almost never 546 10 Source. Constructed from data on p. 16 of Baer et al. (2007). Figure 1 Relationship between amount of homework and National Association of Education Progress reading scores at ages 9 and 17. Age 9 Scale scores Did not have homework 217 Did not do homework 204 Less than 1 hour 221 1 to 2 hours 221 More than 2 hours 207 Age 17 Did not have homework 270 Did not do homework 279 Less than 1 hour 287 1 to 2 hours 295 More than 2 hours 304 Note: Table made from bar graph. Figure 2 Percentage of 9-year-old students varying amounts of homework from 1984 to 2004. Did not have Did not do Less than 1 to 2 More than homework homework 1 hour hours 2 hours 1984 35 4 41 13 6 1999 26 4 53 12 5 2004 21 3 59 13 5 Note: Table made from bar graph. FIGURE 3 Percentage of 9-year-old students by number of pages read per day in school and for home work Percent 5 or fewer 6 to 10 11 to 15 16 to 20 More than 20 1984 36 25 14 13 13 1999 28 24 15 14 19 2004 25 21 13 15 25 FIGURE 4 Average reading scores for students aged 9 and 13 by frequency of reading for fun. Almost Once or Once or A few Never or every twice twice times hardly day a week a month a year ever Age 9 1984 53 28 7 3 9 1999 54 26 6 4 10 2004 54 26 7 5 8 Age 13 1984 35 35 14 7 8 1999 28 36 17 10 9 2004 30 34 15 9 13 Note: Table made from bar graph.