Printer Friendly

The evolution of quality teaching and four questions for change.

Quality teaching should shift students' thinking and stretch their capacity. How do you measure that?

The meaning of quality teaching has always been personal for me. It is personal both because it is informed by my own memories of teachers and teaching in school and because I view it as an individual, one-to-one accomplishment between a teacher and student. When I think of quality teaching, I think of teachers, students, and texts that shifted my thinking or stretched my capacity, and I can see them very clearly in my mind. I visit these examples often and hold new examples up against their etched silhouettes. Outside of my own definition of quality teaching, the national conversation about quality teaching is much less clear and much less stable.

Since the seminal Widget Effect report (Weisberg et al., 2009) sounded the alarm about the need to revise teacher evaluation systems, and since Race to the Top (RttT) criteria set a blueprint for such reforms, 46 states and the District of Columbia have revised their teacher evaluation systems to outline what counts as teacher quality. I entered academia on the strength of a dissertation focused on the decision-making process and metaphors of the first state to create a new generation teacher evaluation system under RttT--Tennessee. Since then, I have studied the shifting conceptions of teachers' potential value and the implications of current and future measures of effectiveness in teaching. This has meant several years of rating videos, earning certification in and then deconstructing observation systems, working with administrators on routines and tools for observation, and working with teachers to prepare for observation. In this work, I have come to view conceptions of quality as necessarily shifting and contingent, and all existing measurement tools as painfully incomplete. Still, I have learned from those who measure up to my personal memories of effectiveness that there are questions worth asking when evaluating instruction. These aren't always the questions that commercially available tools for evaluation lead us to ask.

Valuing effectiveness

Political definitions of what counts as quality have shifted in subtle but deeply consequential ways in the last six years. Before 2009, federal policies outlined a conception of quality that was synonymous with qualifications. That is, a teacher's quality or value could be ensured at induction and measured throughout a career based on his or her academic credentials. Raising the bar on quality meant raising the bar on the degrees and certifications needed to be hired or promoted. Professional development in many states was similarly organized: A teacher's development was measured by the accumulated number of hours, credits, and course equivalents he or she logged.

In some ways, this conception of teacher quality aligned with my own: The teachers I admired were academically oriented. They read, they outwardly valued school, talked about their own educations, respected terminal degrees, and earned a few of their own--including honorary degrees for distinguished practice. In other ways, degrees and certifications had little to do with my understanding of quality teaching. The development-by-course-credit approach seemed to place too much stock in titles that could be bought and sold. It assumed professional development in which teachers are vessels to be filled or vests to bedazzle. It required a focus on the inputs to an absurdly flawed plug-and-chug equation for good teaching: Degrees + training + experience = highly qualified teacher + students = quality teaching.

Measuring effectiveness

In 2009, the federal government shifted from a focus on inputs of teaching (e.g. qualifications) to outputs of teaching as measured by teacher effects on student achievement. This began with Race to the Top but has continued with every federal report and policy since. This shift was made possible by advances in statistical methodologies over the past 20 years that allow statisticians to isolate classroom-level contributions to student test scores in a given year and over time (as with value-added measurement). Even preservice teacher education programs are under increasing pressure to demonstrate candidates' effects on student achievement before graduation via performance assessment and after via value-added analysis.

New generation teacher evaluation systems are often rationalized with references to the Bill & Melinda Gates Foundation's mammoth Measures of Effective Teaching (MET) Project. The MET Project was the largest (more than 2,000 teachers), best-funded ($45 million) investigation of ways to measure teaching in history. Observation tools, student surveys, and entire systems of evaluation are based on ideas or findings pioneered by MET study researchers. The fundamental premise of the MET study was that value-added scores (the calculation of an individual student's predicted vs. actual growth on standardized tests) are the gold standard against which all other measures should be compared. Observation systems, surveys, and other indicators that match up well with value-added scores are considered valid while those that do not are not.

When tools for measuring outcomes were unavailable or relatively untested, the focus on inputs for defining quality teaching was considered a necessary evil. Now, the focus on outputs has created its own evil. The magic of correlational thinking, in which we assume that what we can observe or gather by survey explains the mechanisms of higher/lower-than-predicted test scores and also has the unfortunate consequence of elevating observables to indicators of effectiveness. Herein lies the rub: The things we can observe in classroom interactions are themselves products or correlates, not sources, of quality. Measuring and focusing feedback on these things may change a teacher's behavior to include more "best practices" but do little to change the quality of teaching.

For example, a teacher might try cooperative grouping, ask more higher-order questions, or use a strategy to increase student participation in response to rubrics that value these observables. Yet, cooperative group work can be a waste of time if the work itself isn't valuable; higher-order questions are not always the most fruitful feature of a classroom discussion; and calling on more students may just take more time away from valuable independent practice. It isn't the presence of these best practices that makes them "best," it's their strategic deployment. In fact, some practices that do not appear on most rubrics and are not considered the best may be the best approach for a specific moment in a classroom.

Finding effectiveness beyond measure(s)

One of the best lessons I remember from my 10thgrade English class was a day the teacher didn't even show up to class. She left directions on the board and a phone number to call when we were done working through a problem on our own. In that class session, we addressed a vexing question about a character's situation in a scene from "Macbeth" and came up with a way to hold each other accountable for keeping up with the reading so we could solve the next problem the next week. One of the best routines of that class was a monthly book walk that involved chatting about our independent reading books while walking around the local reservoir for 47 minutes. No exit slip, no higher-order questions, no oversight, just the opportunity to choose to talk about text without interruption (Metzger, 2002, 1997).

Those practices would never be considered best practices for general or frequent use. They will never appear on a rubric for classroom observations, but that has nothing to do with a teacher's ability to deploy them in ways that support the quality of teaching and learning. Mine did. And she knew what she was doing. She had a theory of adolescent development that prioritized the need for autonomy, responsibility, and creativity. She had a system of classroom management that offered freedom in exchange for focus. She had a practice-based theory of reader development that demanded opportunities to talk about text in informal conversations not constrained by teacher-written questions, norms of classroom discussion, or Socratic interaction patterns (see Metzger, 2002). So, her best was different (and better) than what appears on most rubrics.

I have come to believe that a teacher's observable practices are only as good as their awareness of student needs and strengths, their funds of professional knowledge, their intentional use of that knowledge, and their responsiveness to dynamic interactions of students, texts, and activities in their particular classroom context. Even if no higher-order question was asked, no exit slip was given, and no student work was posted on the walls, a walk around the reservoir could score at exemplary levels on most rubrics: Students were engaged in describing their texts to naive audiences, comparing them to a range of other texts, gathering ideas for what to read next, building knowledge about authors and genres they hadn't read ... and they thought they were basically at recess.

This does not mean that anything goes or that quality can't be measured in meaningful ways. It means that its measurement cannot be limited to the observed volume or frequency of a list of best practices. The reigning combination of observation, surveys, and test scores fails to gather data that answers the most important question about quality teaching: How is it accomplished? How has this teacher used questioning today, and what drove her decision to do so? How has this teacher structured independent reading assignments, and what drove his decision to do so? How has this teacher decided to assess students' understanding of a text and what drove her decision to do so? The quality lies in a lesson's active ingredients, not just in its final products.

Questions for teacher evaluation

When I think about how to evaluate quality teaching, I no longer look for the presence or absence of a predetermined set of general good teaching behaviors. I think about these three questions:

* Is this teacher's goal for instruction of value to my school and community? (If not, it doesn't matter how many questions are asked or how high students score.)

* How has a teacher decided to accomplish their goal for instruction: What knowledge, beliefs, intentions, and assumptions drive his or her decisions?

* What do they know about their students, how have they come to know it, and how do they use it in their instruction?

As a teacher, when I think about enacting quality teaching, I add one more: "What evidence can I gather for outside eyes that demonstrates my instructional goal was achieved?" This last question forces me to hold myself accountable for knowing when and how goals are met. This is increasingly important in higher education settings, especially those seeking or maintaining accreditation for their teacher preparation programs while attempting to reform how teachers are prepared. It also insulates me from pressures to conform to a set of predetermined best practices for the sake of giving a good performance: I can prove that what I am doing is best for students even if it is not on the list.

The beauty of the questions above is that they cannot inadvertently condone lackluster or irrelevant teaching that bears the hallmarks of best practices in all the wrong ways for all the wrong reasons. If we are to peel our thinking away from indicators of teaching associated with test scores, we have to start asking how teachers accomplish these indicators. What awareness, thinking, and coordination took place? It is in these processes, not in observable outcomes, that quality is generated.

As Chronbach, a father of modern statistics, famously wrote, "The majority of studies of educational effects--whether classroom experiments, or evaluations of programs, or surveys--have collected and analyzed data in ways that conceal more than they reveal" (1976). By focusing on observable indicators that can be statistically matched with trends in student test scores, we shift our focus from the active ingredients for quality to the form of its products. Though this represents some form of progress from a time when we avoided using any outcome data to measure quality, new generation evaluation systems are not yet evolved enough to fruitfully measure quality or inform teacher development (Gabriel & Wolfin, 2015). They require a new set of questions to guide definitions of quality.

RACHAEL GABRIEL ( is an assistant professor of reading education at the University of Connecticut Neag School of Education, Storrs, Conn.


Chronbach, L. (1976). Research on classrooms and schools: Formulation of questions, design, and analysis. Stanford, CA: Stanford Evaluation Consortium.

Gabriel, R. & Wolfin, S. (2015). Evaluating the structure and content of observation instruments. In R. Gabriel & R. Allington (Eds.), Evaluating literacy instruction: Principles and promising practices (pp. 15-30). New York, NY: Routledge.

Metzger, M. (1997). Teaching reading beyond the plot. Phi Delta Kappan, 80 (1), 240-246.

Metzger, M. (2002). Learning to discipline. Phi Delta Kappan, 84 (1), 1-8.

Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect. New York, NY: The New Teacher Project.


Please note: Illustration(s) are not available due to copyright restrictions.
COPYRIGHT 2016 Phi Delta Kappa, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Gabriel, Rachael
Publication:Phi Delta Kappan
Article Type:Essay
Geographic Code:1USA
Date:May 1, 2016
Previous Article:ESSA and rural teachers: new roads ahead? A renewed focus on innovation and flexibility under ESSA could provide a recipe for establishing new ways...
Next Article:Inclusive STEM high schools increase opportunities for underrepresented students.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters