Eisner's evaluation in the age of race to the top.


Elliot Eisner's approach to evaluating teachers differs so radically from the assessment protocols encouraged by Arne Duncan's Race to the Top program that it is counterintuitive to suggest that Duncan and Eisner share certain commitments to the ability of ordinary language to grasp and communicate the quality of teachers. When I suggest that this conclusion may seem counterintuitive, I refer to the fact that in direct contradiction to Eisner's misgivings about the expanding role of quantitative assessment in the evaluation of schools and schooling, Race to the Top has only raised the profile of the yearly state-level assessments of students' achievement in the domains of math and reading. Race to the Top recruits complex statistical machinery to the cause of translating these achievement scores into evaluation tools aimed not only at schools and 1districts as under No Child Left Behind, but also at individual teachers as well. Under Race to the Top, the year-to-year growth in this particular measurement of student achievement (controlling for a wide variety of factors both within and without the school) is taken to reveal the essences of both teaching and learning, as well as the quality of a school's administration and faculty. Indeed, far from exemplifying any Eisnerian ideas, this state of affairs, the complete quantification of teacher and student evaluation, might well represent the very apotheosis of Eisner's dystopian nightmares.

But in rhetorically proposing the policies under discussion to a broad American public during the summer of 2009, Duncan verbally insists that this quantification represents merely a means to a particular end, an end that resonates with the everyday experience of quality education. In his policy speeches, Duncan seems well aware of the ability of official numbers to lie: the very name of Duncan's program plays upon the unintended consequences of his predecessor's No Child Left Behind act, wherein states found themselves incentivized to define "proficiency" in artificially low terms. However, in response to this tendency, Duncan posits the practical criterion of "college and career readiness" to ground the notion of proficiency at work in his policies. The fact that even this sense of proficiency would remain officially expressible in quantitative terms-and that the same would hold true for the ostensibly practical standard of "college and career readiness"--does not appear to trouble Duncan. Duncan's proposals contain a contradiction.

On the one hand, then, Duncan is acutely aware of, and indeed responsive to, the limitations visible in attempts to quantify something like the concept of proficiency. On the other hand, the fix he proposes in Race to the Top seems to be merely better or more careful quantification of the same. Eisner, one imagines, would like to object to each of these proverbial hands. Yet, in the marketing of the Race to the Top program during the summer of 2009, when Duncan ties the process of quantification to a certain goal, he finds himself describing that which the quantification of teacher effectiveness would reveal in the decidedly nonquantitative form of repeated appeals to common experience. What he perhaps unwittingly demonstrates is the sufficiency of language to address the value of teaching. In the speeches meant to speak to the need for more rigorously quantitative measures of teacher effectiveness and student achievement, Duncan, in fact, engages in a form of Eisnerian connoisseurship and criticism.

This piece (a) explains why Duncan's proposed fix cannot close the gap between (quantitative) teacher effectiveness ratings and the (nonquantitative) concept of great teaching that the policies are meant to address; it (b) uses Duncan's rhetoric to demonstrate that no such gap presents a problem apart from attempts to quantify teacher effectiveness in the first place; and it (c) suggests that, given points (a) and (b), a reappraisal of Eisner's notion of educational criticism and connoisseurship, with an eye toward Duncan's call for better teacher evaluation, is in order.


Race to the Top represents a concerted attempt to make dramatic strides in the quality of American education through several channels, but most directly through improving the overall quality of the national teaching corp. When speaking of the necessity of his legislation, Duncan is quite explicit concerning the development of great teaching being his motivating factor. Even when touting the need for improvements only indirectly related to teaching practice--the creation of massive databases on student achievement, in the following case--he grounds the need in terms of its utility to teacher development: "We can one day do a better job of understanding what makes great teachers tick, why they succeed, why they stay in the classroom and how others can be like them" (Duncan, 2009b). Great teachers will be the objects of study, the objects to be replicated.

Duncan frequently provides examples of great teaching and great schools in these speeches, and he exhorts his audiences to recall similar experiences in their own lives; he thereby vividly describes the sort of teaching he counts as deserving replication and emulation. Speaking to the NEA, for instance, Duncan says,

All of us remember an educator or coach who changed our life ... They're the ones who commit those everyday acts of kindness and love and never ask for anything in return. They counsel troubled teens, take phone calls at night, and reach into their pockets for lunch money for children who are too ashamed to ask. (Duncan, 2009a).

Speaking at Columbia Teachers College, Duncan expands: "Now there is a reason why so many of us remember a favorite teacher forever. A great teacher can literally change the course of a student's life. They light a lifelong curiosity, a desire to participate in democracy, and instill a thirst for knowledge" (Duncan, 2009d). The urgency in identifying and rewarding these teachers has directly to do with the monumental effects that Duncan attributes to them.

In making these attributions, he draws upon his own personal experience, as well, citing his upbringing in a situation of "tremendous poverty" and "staggering neighborhood violence." Despite these circumstances, his neighborhood produced not only a Secretary of Education, but also "a brain surgeon, a Hollywood movie star, one of my top administrators at the Chicago Public Schools, and one of IBM's international corporate leaders" (Duncan, 2009a). In effecting these results, Duncan directly credits his mother, who founded an "after-school, inner-city tutoring program," which, on his telling, resulted in the counterintuitive success he describes. The remarkable overcoming of odds was only possible because the children in Duncan's narrative "had my mother and others in their lives who gave them real opportunities, real support and guidance over the years, and had the highest expectations for them" (Duncan, 2009a).

The examples that Duncan offers underscore the dramatic impact of high-quality teachers and remind his listeners of the gratitude owed to the transformative teachers in their own lives. Duncan associates the sort of great teaching his policies would support and reproduce with life-changing power, with the "light[ing] of a lifelong curiosity," the "counseling of] troubled teens," the "instilling [of] a thirst for knowledge," the "tak[ing of] phone calls late at night," the providing of "real support and guidance over the years," the having of "the highest expectations," and the "commit[ting of] those everyday acts of kindness and love."

However, in Duncan's view, the policies encouraged by Race to the Top are required precisely because we neither identify in any official capacity the kinds of great teachers he cites above, nor reward them for their excellence, nor promote or purge teachers on the basis of quality at all. But the problem, for him, is still deeper than that: we do not appear even to know which teachers are worthy of reward and which are worth rejection. He notes that we "have to fix our method of evaluating teachers, which is basically broken," pointing out that "a recent report by the New Teacher Project shows that 99% of teachers are all rated the same," a fact due, in his understanding, to the fact that "most teacher rating systems don't take student achievement into account." Not to factor student achievement into teacher evaluations is, for Duncan "not fair," and "not honest" (Duncan, 2009c). The upshot is that, as Duncan puts it:

In California, they have 300,000 teachers. If you took the top 10 percent, they have 30,000 of the best teachers in the world. If you took the bottom 10 percent, they have 30,000 teachers that should probably find another profession, yet no one in California can tell you which teacher is in which category. Something is wrong with that picture. (Duncan, 2009b)

Duncan takes the above facts to suggest that we lack the requisite knowledge to tell good teachers from bad teachers. The use of student achievement data in teacher evaluations, he implies, will solve this problem, and thus the project of "understanding what makes great teachers tick" can commence. The new and robust standards according to which state cut scores will be generated under his program, in combination with burgeoning statistical methods of partitioning out a given teacher's particular impact on students' growth toward those proficiency cut scores, will shortly empower such administrators to know the "great" teachers from the "struggling" and the "failing" (Duncan, 2009a; Glazerman et al., 2011; Kane & Staiger, 2008; Kane, Taylor, Tyler, & Wooten, 2011). With a generation of baby-boomer teachers set to retire, these several factors-advances in measurement, a wealth of student achievement data suddenly available, and a certain amount of political wind at his back--allow Duncan to view the upgrading of the teaching profession as a genuine possibility.

Internal Problems With Duncan's Proposals

As one can readily see, however, a great deal hinges upon a certain interpretation of the New Teacher Project's 99% statistic, and it is by no means obvious that Duncan's reading has it right. Duncan puts two facts together: (1) "99% of all teachers are rated the same," on the one hand, and (2) "most teacher rating systems don't take student achievement into account," on the other, and he seems to read (1) as a direct consequence of (2). Teachers are indistinguishable from one another, in this reading, because we have so far lacked the means to distinguish among them on the basis of quality. We do not know what makes a good teacher and what makes a bad one; we seem to lack the criteria by which to identify either the one or the other, and so because of this fact, we simply have not been able to differentiate at all in our evaluations of teachers. If this were indeed the case, then taking student achievement into account would solve the problem that he sees.

But it is not necessarily the case that the failure to distinguish among teachers on the basis of quality--which is indeed a failure--represents any failure of knowledge or criteria. In fact, in addition to the existence of alternative readings of the relation of the two facts that Duncan cites, it ought to be fairly clear from Duncan's own words that a failure of knowledge is among the least likely of candidates to explain the aforementioned failure.

In the first place, the 99% statistic may well indicate a certain discomfort with justifying assertions of teacher quality in the absence of "hard data," which is a problem over and above any ability to know good teaching from bad; and in the second place, it is possible that the 99% statistic is an idiopathic one, as it were: it might simply be the case that no one is making any assertions of teacher quality. As Duncan and the New Teacher Project note elsewhere, there has never been any official incentive to do so: making fine distinctions among the vast majority of teachers has not made any practical difference in personnel or compensation determinations. Only in rare cases, as Duncan also points out, has official evaluative differentiation had any implications at all for salary, promotion, tenure, or any other professional concern. This does not mean that it has never been possible to make those distinctions, whatever the basis. That evaluation procedures have not done so is not evidence for the claim that an epistemological lack is at fault.

But most strongly, the claim that no one can know good teaching from bad in the absence of test scores, and that this fact accounts for the failure to differentiate among 99% of teachers nationwide, flies in the face of lived experience, even the experience that Duncan cites in his speeches. Duncan takes the time, as quoted above, to describe what great teachers do, which presupposes (a) the ability to tell great teachers from poor or mediocre ones and (b) the shared recognizability of quality teaching. If the assessment and evaluation policies of Race to the Top are meant to generate knowledge of quality teaching and to make it publicly communicable, it seems from Duncan's own rhetoric that such a mechanism is unnecessary. It simply, as a statement of fact, is not the case that people do not and cannot know who the great and who the awful teachers are. Duncan asserts this commonsense conclusion himself:

We know what success looks like. I see it the moment I enter a school. It's clean, orderly, the staff is positive and welcoming, and the kids and the classroom are the focus. I see award-winning school work on the walls. I see discipline and enthusiasm in the children. I see parents engaged and teachers collaborating on instruction. (Duncan, 2009e)

Duncan, in his rhetoric, depicts teacher quality as being plainly and immediately accessible, and not only accessible, but also shareable or conveyable by means of examples and descriptions.

However, because the problem concerning teacher quality is constructed at the policy level as one of inadequate knowledge, it suddenly seems that we require a superior means of knowing teacher quality to the one that Duncan and his audiences ordinarily use, and it is this superior need to which student achievement data proves uniquely responsive. The danger of Eisner's dystopian nightmare is particularly apparent here: if the use of student achievement data proposes to solve the problem indicated by a glaring failure to differentiate among teachers on the basis of quality, and if teachers' ability to raise test scores will come to indicate their relative worth, then it is difficult to see how one avoids the wholesale quantification of teacher evaluation. Indeed, Duncan's policy proposals seem to describe a situation in which a teacher's "effectiveness" is simply equated with a teacher's "quality," such that a teacher's impact on students' achievement scores will function as the "preponderant criterion" (National Council on Teacher Quality, 2011, emphasis in original) for making termination and remuneration determinations. Delaware has already gone ahead with this sort of evaluation policy. Teachers will be evaluated in five areas, but, "All other components aside, if a teacher does not meet or exceed student growth requirements in Delaware, the teacher cannot be rated any higher than needs improvement overall, regardless of ratings in the other four components" (National Council on Teacher Quality, 2011).

In translation into state-level policy, then, Duncan's observation that "most teacher rating systems don't take student achievement data into account" threatens to bleed into the opposite situation, one in which teacher evaluations take nothing but student achievement into account. This consequence would clearly seem equally problematic in terms of the epistemological inadequacy it seeks to overcome, and the danger is not lost on Duncan: his rhetoric militates explicitly against this very possibility.

Duncan's resistance to the wholesale quantification of teacher evaluation comes out in two forms: first, he repeatedly and vocally insists that there remains necessary evaluative work that student achievement data cannot do: despite the fact that it "doesn't lie," data also cannot "tell the whole truth" (Duncan, 2009b). In order to access the whole truth, something further must come into play, which makes up the second form of his resistance. He names a specific means of speaking to the "whole truth": Duncan maintains throughout his policy speeches the need for "multiple measures" of teacher quality, noting that "it would be unfair to reduce the complex, nuanced work of teaching to a simple multiple-choice exam" (Duncan, 2009a), and that "I absolutely respect the concerns of teachers that test scores should never be used solely to determine salaries" (Duncan, 2009b). The multiple measures, including classroom observations and surveys of students, would, on this construction, complement the quantitative measures of teacher effectiveness and yield a well-rounded and holistic picture of a teacher's true quality, thus resisting the utter quantification of the evaluative process. Duncan proposes that quantified metrics of teacher effectiveness ought to be merely one among a melange of modes of revealing teacher quality.

In his words, then, he gives voice to an Eisnerian skepticism about the ability of quantified effectiveness measures to do the evaluative work the educative endeavor demands. In combination with other means of knowing and representing teacher quality, Duncan and the reform movement for which he speaks seem to have an unproblematic conception of high-quality teacher evaluation.

However, the multiplicity of the multiple measures is an illusion. Take, for examples, the following two (of several): the use of classroom observations and the development of student surveys as means of evaluating teachers. Conducting "rigorous" observations apparently requires that evaluators know what to look for, and that they all look for the same features of practice across the vast variety of classrooms. Tennessee, a leader in developing teacher evaluation protocols, provides a 116-point rubric for its evaluators to follow--the behaviors that signal teacher quality are specifiable aprioristically, and are derived, as it happens, according to correlations with achievement score gains (Anderson, 2012; Nocera, 2012). In other words, the observational means of supplementing the input of achievement data are admitted to the process on the basis of their agreement with precisely those achievement measures, and under one or both of two unstated assumptions: either the behaviors and practices associated with gains in reading and math will also be associated with gains in history or science or music, or indeed any of the domains not directly measured; or else, and additionally, that achievement scores really do access all that we value, but that they themselves are prone to year-to-year unreliability, the effects of which including observable behaviors correlated with score gains helps to mitigate. In each case, the primacy of achievement measures is simply assumed, and rather than complementing that which test scores reveal, classroom observations merely serve to secure, under the best of circumstances, the elements of teaching that already show up in student achievement data.

The above tendency is rife in the academic discussion. In scholarly and policy journals, the debate over whether or not classroom observations are useful in evaluation hinges upon whether or not observations predict measures of teacher value-added. Even in drawing divergent conclusions, opponents agree with one another on the value of this criterion. One thus finds Kane (2011) and colleagues noting that "well-executed classroom observations" can indeed pick out the quality teachers: "Teachers' scores on the classroom observation components of Cincinnati's evaluation system reliably predict the achievement gains made by their students in both math and reading" (emphasis added). Strong (2011) and colleagues arrive at the opposite conclusion, but for the same reason, as it were: "In every case, judges achieved relatively high levels of agreement but were absolutely inaccurate, leading us to question whether educators can identify effective teachers when they see them" (Strong, Gargani, & Hacifazlioglu, 2011). In each case, the merit of observing teachers in practice and evaluating their quality requires confirmation from the quantified measures of teacher effectiveness in order. It will be worth pointing out that in each case above, the human evaluators arrive at consensus; what compels Strong to declare the judges, rather than the teacher value-added scores, "absolutely inaccurate" speaks to bias in favor of quantifiable measures: the designation of "inaccuracy" could equally apply to either the judges or the effectiveness measures; the assignation of the epithet is nonlogical.

It is the same with other "multiple measures," including surveys of students. Here, too, the survey items that count are determined by their agreement with quantitative measures of teacher quality (MET, 2012). In Amanda Ripley's recent article in the Atlantic, she notes that Ronald Ferguson's student surveys "did indeed help predict which classes would have the most test-score improvement at the end of the year" (Ripley, 2012). Duncan's declarations about the imperfection of tests, the unfairness of reducing "the complex, nuanced work of teaching" to a test score, render these examples all the more disheartening: Obama administration policy on teacher evaluation really is quantification all the way down.

Duncan has professed a worry that to tether teacher evaluations solely to achievement data amounts to a "reduction" of something "complex" and "nuanced"; he has declared that these facts call for "multiple measures" of teacher quality, and that through said multiplicity, we will reach the composite whole that the examples of great teaching he so often cites are meant to recall. But since the multiplicity of his measures represents merely a many-voices-in-unison expression of the same measure, the gulf between the quantitative estimates and the real-world sense of great teaching to which they are meant to speak remains unaddressed. Duncan has posited a larger whole of which student achievement is a part; that his multiple measures merely proxy for effectiveness in terms of raising student test scores means that the rest of the whole remains unilluminated.

But Duncan has been clear about the kind of teachers that justify his call for better teacher evaluation: the thirst-instillers, the democracy-encouragers, the takers-of-late-night-phone-calls. As state assessments of students in the limited domains of reading and math certainly do not purport to shed light on any of these teacher qualities, it seems hasty to attribute the aforementioned properties to teachers on the basis of effectiveness at raising test scores. If teacher value-added cannot speak to the domain of "instill[ing] a thirst for knowledge," then it is entirely unclear why they ought to ground all forms of teacher evaluation. This is not an argument against the utility of teacher value-added in evaluations: among the many tasks the public expects of teachers, and even high on the list, is the inculcation of basic skills. As such, measures of student achievement ought to be among the metrics employed in teacher evaluations. But other measures have to be more than mere proxies for teacher value-added if the policies are to have the effect that Duncan explicitly intends them to have. The current state of affairs, to borrow an analogy from Wittgenstein, puts us in the unfortunate position of reading several copies of today's newspaper in order "to assure [ourselves] that what it said was true" (Wittgenstein, 2009, sec. 265).


In The Educational Imagination, Eisner seeks to resist what he sees as the conflation of "evaluation" with "assessment." More specifically, he notes that assessment is coming to play the role that the former term once did, despite assessment's being "more of an aspiration than a concept that has a socially confirmed technical meaning" (Eisner, 2002, p. 195). Eisner spies a danger in both the way that "assessment" smuggles in certain narrow views of teaching and learning, and also in the way that "assessment" serves or creates the expectation that the many different purposes of the school might be adequately appraised by a single method. Assessment too easily presupposes (a) that those conducting the assessment already know (e.g. by having a list of articulated and affirmed criteria, a rubric) everything that will be relevant to a judgment of quality, and (b) that the means of assessment are in every case adequate to revealing what Eisner elsewhere calls the "universe of particulars" that any educative endeavor in fact is (Eisner, 1976, p. 137).

Eisner recounts in numerous pieces the history of education's relation to quantified assessments, both in its positive and its negative aspects. He, like Duncan, is fully aware of the appeal of quantitative measures, saying that, "a type of precise objectivity is implied when complex forms of learning are reduced to a single score or letter ... Parents often want an unambiguous indicator of their child's or their school's performance" (Eisner, 2002, p. 188). To Eisner, such an attraction is historically reinforced--the possibility opened, as it were--by developments in the social sciences going back as far as the Enlightenment:

Measurement, rationality, theoretical explanation, and eventually prediction and control were the hallmarks of the emerging science. The overall aim, rooted in the Enlightenment, was to create an objectively detached true description of the world as it really is. (Eisner, 2002, p. 196)

That detached measurement procedures aspire to a "true description of the world as it really is" dovetails nicely with parents' (and politicians') felt need for reliable knowledge as to the state of schooling in their respective purviews.

But Eisner raises a Schwabian point that seeks to trouble the waters for the Enlightenment view. Put briefly, Eisner and others advance serious doubts against the coherence of imagining that a "single numerical test score" can adequately "symbolize a universe of particulars." As he puts it a few sentences prior, "The uniqueness of the particular is considered 'noise' in the search for general tendencies and main effects. This, in turn, leads to the oversimplification of the particular through a process of reduction aimed at the characterization of complexity by a single set of scores" (Eisner, 1976, p. 137). The difficulty here lies in the brute fact that educators and education itself are always engaged in particular situations: the utility of general propositions or maxims for any given educator, parent, or school is more seriously restricted than policy makers and researchers tend to realize. As Schwab explains:

There are thousands of ingenious ways in which commands on what and how to teach can, will and must be modified or circumvented in the actual moments of teaching ... Moments of choice of what to do, how to do it, and to whom and at what pace, arise hundreds of times a school day and arise differently every day and with every group of students. No command or instruction can be so formulated as to control that kind of artistic judgement and behavior, with its demand for frequent, instant choices of ways to meet an ever varying situation. (Schwab, 1983, p. 245)

Bringing the general down to the level of particular situations cannot itself be accomplished by any general means; doing it right always means exercising situational judgment. General stipulations about "what works" in classrooms, as Schwab notes, must be ingeniously modified or adapted in unpredictable ways so as to function properly in any particular context; descriptions, assessments, or evaluations of any general conditions only apply contingently and clumsily to particular teachers.

Capturing and conveying the quality of an educative process, for Eisner, defies the sort of language favored among the scientific community, and for similar reasons:

Education is a normative enterprise. What counts educationally depends on one's educational values. The reduction of educational evaluation to a set of limited quantitative methods, ones that harbor their own values, which often go unacknowledged or unappraised, is to reduce educational endeavors to a technical process. (Eisner, 2002, p. 192)

Eisner's fears, enumerated above, pertain to the distinct possibility of losing the sense of the "normative" that underlies the broadly conceived project of education, the sense of why, and the many particular ways, education is taken to matter. What education is, for Eisner, depends upon this (protean) normative feature, a feature systematically eliminated from consideration in all quantitative or scientific attempts to understand it. The scientific approach assumes that education has a sort of essence separable from the messiness of human aims, and that, in fact, such messiness must be cleared away so that education can be properly understood. Thus one finds that, as Eisner points out, "instead of talking about children, we are urged to talk about subjects. Instead of talking about teaching, we must talk about treatments" (Eisner, 1976, p. 138).

Speaking in the language of the objective experiment, we cast off the conceptual haze that clings to emotionally--and socially--encumbered notions of "children" and "teaching." The success of this casting-off, however, is also a serious failure, from an Eisnerian view. For the very reason that the unencumbering of terms like "students" and "teaching" works, Race to the Top's teacher evaluation policies blind themselves a priori to the instances of great teaching recognizable by the very means Arne Duncan so often cites. The examples of great teaching Duncan calls on us to remember, after all, require for communication none of the complex statistical maneuvering that Duncan proposes as a matter of policy. This is also to hazard the strong claim, effectively, that the "conceptual haze" of language, embodied here by "children" and "teaching," is indispensable: clear it away, and one finds that one has lost the thing itself. (Wittgenstein compares concepts to artichokes in this regard; Umberto Eco prefers onions; the point is the same.) To the extent that we seek out great teachers, we require a means of seeing and reporting on teacher quality that fits the motivating needs.

Eisner's development of a method of educational connoisseurship and criticism aims at providing just such a means of seeing and conveying educational quality. If the mismatch between the concept of "effectiveness" that Race to the Top policies seek out and the "great" teaching that justifies the work they do presents a fundamental problem for evaluation protocols, Eisner's connoisseurship and criticism readily overcomes such an objection. Without delving too deeply into the inner workings of Eisner's methodology, which he has fleshed out in numerous places (Eisner, 1976, 1998, 1999, 2001), connoisseurship and criticism operate at bottom on the very same assumptions that underwrite Duncan's policies, namely that aspects of education, including teachers, do differ on a qualitative axis, and that these differences are readily available to someone appropriately positioned with respect to a given classroom.

One might, therefore, say that Eisner manifests the spirit of Duncan's rhetoric taken seriously. The connoisseurship portion of Eisner's method rests upon two major elements: familiarity and training. As with connoisseurship in any domain--wine tasting, baseball scouting, art criticism--one requires familiarity with the subject matter in order to see the fine distinctions inherent therein. While in a sense, we are all familiar with education, and with its normative aspect, the type of familiarity Eisner has in mind can and must also be developed through practice, training, and immersion: "The development of educational connoisseurship is much more complicated than simply a species of discrimination training ... interaction effects are the rule, not the exception. To discern what an event means requires an understanding of the context" (Eisner, 2002, p. 218).

It follows that communicating the knowledge gleaned or glimpsed through the methods of connoisseurship must communicate the meaning (and therefore the context) of the event, rather than merely the "context-free facts" one observes. Eisner's earlier thoughts on the inappropriateness to this purpose of the language of science reemerge here. We wish to know something about teaching and about children, on Eisner's view; of what value is the language of "interventions" or "subjects"? The desire to sanitize the object of study is laudable in the hard sciences, but it is worse than futile here: once one translates "teaching" and "children" into an alternate register, under the auspices of getting clearer about them, one finds that one cannot translate the results into back into ordinary usage. To the extent that "teaching" is more than the administration of "interventions," the language of interventions cannot adequately communicate anything about the quality of teaching under investigation.

Duncan's "effective teaching," once more, therefore fails to map on to the "great teaching" at which Duncan's policies are directed. The former is mysterious, revealed only in complicated statistical manipulations of student achievement data; the other is wide open to observation, immediately ascertainable by parents and students alike. 'All of us," as Duncan says, remember a coach or teacher that changed our life." The greatness "so many of us" remember is greatness precisely in virtue of the particular contexts in which it manifests itself, not in spite of them. It is not only the case that we do not need measures of teacher effectiveness to convince ourselves that our great teachers are great; it is also the case that it is impossible to reliably spot great teachers through the means of teacher effectiveness ratings. If it happens that we also value the aspect of teaching these assessments measure--and we do--then so much the better. But the criteria according to which a teacher is "effective" are not the criteria according to which a teacher is "great," and therefore we cannot make valid inferences from one to the other. Hubert Dreyfus (2005) puts this problem succinctly: "Only if we stand back from our engaged situation in the world and represent things from a detached theoretical perspective do we confront the problem. That is, if you strip away relevance and start with context-free facts, you can't get relevance back" (2005, p. 49).

For just this reason, Eisner's educational criticism stresses the importance of language in conveying the "ineffable" elements of great teaching. Eisner finds it "ironic" that "in the professional socialization of educational researchers, the use of metaphor is regarded as a sign of imprecision; yet in making public the ineffable, nothing is more precise than the artistic use of language" (Eisner, 2002, p. 223). A version of Duncan would surely agree with Eisner--this is the Duncan who lauds the metaphorical "thirst for knowledge," the "lighting of a lifelong curiosity," and so on. This is the Duncan whose illustrative examples of great teaching serve to compel robust teacher evaluations. The very success of Duncan's four policy speeches in the summer of 2009 indicate precisely the rigor of our ordinary language in terms of conveying something like teacher quality. There is no reason, in all senses of the term, to prefer "effectiveness" talk to our ordinary sense of great teaching.


I proposed earlier that Eisner's connoisseurship and criticism represent Duncan's rhetorical claims taken seriously. This essay has demonstrated that the effectiveness measures privileged in Race to the Top policies cannot answer the plaint that motivates their adoption. The hope lies in the fact that, in Eisner's work, we already have at our disposal a means of discerning and discussing qualitative aspects of schooling that does fit the need expressed in Duncan's 2009 speeches. There are such things as better and worse teachers. It is scandalous that distinguishing among them takes place only in school hallways and faculty lounges and parental gossip. But distinguishing among teachers on the basis of quality, and attaching meaningful consequences to these distinctions, requires both practically and morally that we value officially what we value unofficially. Eisner provides a means of maintaining the connection between the need for rigor and the requirements of the normative. In bending Eisner's methodology to the evaluation of teachers, we sacrifice something like measurable uncertainty. The type of accuracy we stand to gain, however, is simply immeasurable.


