Printer Friendly
The Free Library
14,669,765 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

When principals rate teachers: the best--and the worst--stand out.


Elementary- and secondary-school teachers in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area.  traditionally have been compensated according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 salary schedules based solely on experience and education. Concerned that this system makes it difficult to retain talented teachers and provides few incentives for them to work to raise student achievement while in the classroom, many policymakers have proposed merit-pay programs that link teachers' salaries directly to their apparent impact on student achievement.

Until recently, only a handful of isolated districts had attempted such programs. Now entire state systems are moving toward merit pay Noun 1. merit pay - extra pay awarded to an employee on the basis of merit (especially to school teachers)
pay, remuneration, salary, wage, earnings - something that remunerates; "wages were paid by check"; "he wasted his pay on drink"; "they saved a quarter of all
, with new policies established recently in Florida Florida, state, United States
Florida (flôr`ĭdə, flŏr`–), state in the extreme SE United States. A long, low peninsula between the Atlantic Ocean (E) and the Gulf of Mexico (W), Florida is bordered by Georgia and
 and Texas requiring districts to set teachers' salaries based in part on the gains their students are making on the state's accountability exam.

Implementing a merit-pay system, however, comes with challenges. Students often have more than one teacher but take only one high-stakes test. How do we know which teacher to reward? If students are not tested annually in each subject, how do we determine the merit of a teacher in a year without testing? How do we fairly assess the impact of a teacher during a testing year if we do not know how students performed during the previous school year? Can a merit-pay system overcome these obstacles?

One option is to turn to principals and ask them to help determine the size of pay raises. Such subjective performance assessments are already used to evaluate untenured teachers, and they play a large role in promotion and compensation decisions in other occupations. While principals can and do judge teachers' performance, however, there is little good evidence on the accuracy of their judgments.

The research reported in this paper fills this gap. We found that principals in a western school district did a good job of assessing teachers' effectiveness. In fact, principals are quite good at identifying those teachers who produce the largest and smallest standardized standardized

pertaining to data that have been submitted to standardization procedures.


standardized morbidity rate
see morbidity rate.

standardized mortality rate
see mortality rate.
 achievement gains in their schools (the top and bottom 10-20 percent). They are less able to distinguish among teachers in the middle of this distribution (the middle 60-80 percent), suggesting that merit-pay programs that reward or sanction sanction, in law and ethics, any inducement to individuals or groups to follow or refrain from following a particular course of conduct. All societies impose sanctions on their members in order to encourage approved behavior.  teachers should be based on evaluations by principals and should be focused on the highest- and lowest-performing teachers.

A Representative Sample

We surveyed all 13 elementary-school principals in a midsized school district, that asked to remain anonymous, in the western United States Noun 1. western United States - the region of the United States lying to the west of the Mississippi River
West

Santa Fe Trail - a trail that extends from Missouri to New Mexico; an important route for settlers moving west in the 19th century
. We asked them to rate the teachers in their schools on a variety of performance dimensions. The survey, conducted in February February: see month.  2003, provides evaluations by their principals of 202 elementary-school teachers in grades 2 through 6.

The teachers included in the study are fairly representative of elementary-school teachers nationwide. Sixteen percent of them are men, the average age is 42, and average teaching experience is 12 years. Most of these teachers attended a local university; 10 percent attended another in-state college; and 6 percent attended a school out of state. Seventeen percent of them have a master's degree master's degree
n.
An academic degree conferred by a college or university upon those who complete at least one year of prescribed study beyond the bachelor's degree.

Noun 1.
 or higher, and most are licensed in either early childhood education or elementary education elementary education
 or primary education

Traditionally, the first stage of formal education, beginning at age 5–7 and ending at age 11–13.
. Finally, 8 percent of the teachers in our sample taught in a mixed-grade classroom in 2002-03, and 5 percent were in a "split" classroom, sharing a single contract and dividing the school day with another teacher. The students in grades 2 through 6 in the district are predominantly pre·dom·i·nant  
adj.
1. Having greatest ascendancy, importance, influence, authority, or force. See Synonyms at dominant.

2.
 white (73 percent), with a sizable siz·a·ble also size·a·ble  
adj.
Of considerable size; fairly large.



siza·ble·ness n.
 ethnic minority (Latino students compose com·pose  
v. com·posed, com·pos·ing, com·pos·es

v.tr.
1. To make up the constituent parts of; constitute or form:
 21 percent of the elementary population); 48 percent of them receive a free or reduced-price lunch. Achievement levels in the district are almost exactly at the average of the nation (49th percentile percentile,
n the number in a frequency distribution below which a certain percentage of fees will fall. E.g., the ninetieth percentile is the number that divides the distribution of fees into the lower 90% and the upper 10%, or that fee level
 on the Stanford Achievement Test).

All elementary-school students in the district take a set of exams each year, in reading and math. These multiple-choice mul·ti·ple-choice
adj.
1. Offering several answers from which the correct one is to be chosen: a multiple-choice question.

2.
, criterion-referenced tests A criterion-referenced test is one that provides for translating the test score into a statement about the behavior to be expected of a person with that score or their relationship to a specified subject matter.  cover topics that are closely linked to the district's learning objectives. While student achievement results have not been linked to rewards or sanctions Sanctions is the plural of sanction. Depending on context, a sanction can be either a punishment or a permission. The word is a contronym.

Sanctions involving countries:
 for schools until recently, the results of the exams have been distributed to parents annually for at least the past decade, years before implementation of the No Child Left Behind law. This latter fact is important because our study relies on a consistent data set covering the years 1998 through 2003. The district has not had a merit-pay program for teachers at any time during this period.

To ensure that we could link student achievement data to the appropriate teacher, we limited our sample to classroom teachers, omitting music and gym teachers as well as librarians This is a list of people who have practised as a librarian and are well-known, either for their contributions to the library profession or primarily in some other field. . We excluded kindergarten kindergarten [Ger.,=garden of children], system of preschool education. Friedrich Froebel designed (1837) the kindergarten to provide an educational situation less formal than that of the elementary school but one in which children's creative play instincts would be  and first-grade teachers because earlier achievement exams were not available for their students; this prevented us from developing a "value-added val·ue-add·ed
adj.
Of or relating to the estimated value that is added to a product or material at each stage of its manufacture or distribution:
" measure of student learning. We retain in our analysis the small number of teachers who share a contract, each teaching only half of the school day. For our analysis, the gains made by students in these classes count toward the estimated value added Value Added

The enhancement a company gives its product or service before offering the product to customers.

Notes:
This can either increase the products price or value.
 of each of the two teachers.

Can Principals Identify Effective Teachers?

Principals were asked not only to provide a rating of overall teacher effectiveness, but also to assess, on a scale from one (inadequate) to ten (exceptional), specific teacher characteristics (ten altogether), including dedication and work ethic work ethic
n.
A set of values based on the moral virtues of hard work and diligence.


work ethic
Noun

a belief in the moral value of work
, classroom management, parent satisfaction, positive relationship with administrators, and ability to improve math and reading achievement. Principals were assured that their responses would be completely confidential and would not be revealed to the teachers or to any other employee of the school district.

While there was some variation among principals, the overall assessments they gave teachers were generally quite high, with an average of 8.1. Only 10 percent of the assessments fell below a 6, and the average rating for the least-generous principal was still a 6.7. At the same time, principals did not simply assign similar scores to each of their teachers. In fact, the principals generally used 5 to 6 different ratings for the teachers in their school.

Because principals differ in the generosity Generosity
See also Aid, Organizational; Kindness.

Abbé Constantin

self-sacrificing priest; curé of Longueral. [Fr. Lit.: The Abbé Constantin, Walsh Modern, 105]

Amelia

takes interest in Paul. [Br. Lit.
 and degree of variation in the ratings they give, we placed all the ratings on the same scale by subtracting from each teacher's rating the average rating given by that teacher's principal and then dividing by the principal's standard deviation In statistics, the average amount a number varies from the average number in a series of numbers.

(statistics) standard deviation - (SD) A measure of the range of values in a set of numbers.
. We did this separately for each specific aspect of teacher performance about which principals were asked.

We compared a principal's assessment of how effective a teacher is at raising student reading or math achievement, one of the specific items principals were asked about, with that teacher's actual ability to do so as measured by their value added, the difference in student achievement that we can attribute to the teacher. To estimate the value added by a teacher, we examine the performance of her students after accounting for a wide variety of student and classroom characteristics that could affect achievement independent of the teacher's ability. These characteristics include race, gender, eligibility for the federal lunch program, limited English 1. English - (Obsolete) The source code for a program, which may be in any language, as opposed to the linkable or executable binary produced from it by a compiler. The idea behind the term is that to a real hacker, a program written in his favourite programming language is  proficiency pro·fi·cien·cy  
n. pl. pro·fi·cien·cies
The state or quality of being proficient; competence.

Noun 1. proficiency - the quality of having great facility and competence
, and, most important, previous student achievement. We also take advantage of the availability of data on the same teachers from as far back as the 1996-97 school year; this enables us to distinguish long-term Long-term

Three or more years. In the context of accounting, more than 1 year.


long-term

1. Of or relating to a gain or loss in the value of a security that has been held over a specific length of time. Compare short-term.
 teacher quality from the possibly idiosyncratic id·i·o·syn·cra·sy  
n. pl. id·i·o·syn·cra·sies
1. A structural or behavioral characteristic peculiar to an individual or group.

2. A physiological or temperamental peculiarity.

3.
 performance of a class in any one year.

We find a positive correlation Noun 1. positive correlation - a correlation in which large values of one variable are associated with large values of the other and small with small; the correlation coefficient is between 0 and +1
direct correlation
 between a principal's assessment of how effective a teacher is at raising student achievement and that teacher's success in doing so as measured by the value-added approach: 0.32 for reading and 0.36 for math. These correlations are based not on a principal's overall rating of the teacher, but rather on the principal's personal assessment of how effective the teacher is at "raising student math (or reading) achievement." Previous studies of evaluations by principals have used only the overall rating of the teacher, a less direct assessment of a teacher's ability to raise student performance. Using the overall rating in that way could compromise the accuracy of subjective performance evaluations Performance evaluation

The assessment of a manager's results, which involves, first, determining whether the money manager added value by outperforming the established benchmark (performance measurement) and, second, determining how the money manager achieved the calculated return
, especially if principals value characteristics of teachers that are unrelated to their effect on student performance. Our findings lead us to conclude that principals are able to identify accurately this dimension of teacher effectiveness.

Why aren't these correlations even higher? One possible explanation is that principals focus on the average test scores in a teacher's classroom rather than on student improvement. There is some evidence for this conjecture CONJECTURE. Conjectures are ideas or notions founded on probabilities without any demonstration of their truth. Mascardus has defined conjecture: "rationable vestigium latentis veritatis, unde nascitur opinio sapientis;" or a slight degree of credence arising from evidence too weak or too . The correlation between ratings by principals and the average test scores of a teacher's students is significantly higher than the correlation between ratings by principals and the teacher's value-added rating in reading (0.56 versus 0.32), though not in math.

Another reason could be that principals focus on their most recent observations of teachers. We do find, for example, that the average achievement gains in a teacher's classroom in 2002-03 is a modestly stronger predictor of the principal's rating than the gains in any previous year. In theory, it is possible that principals are correct in assuming that a teacher's effectiveness changes over time so that teachers' most recent experience is the best indicator of their actual effectiveness. If that were the case, however, we would expect to find that principals' ratings are more highly correlated cor·re·late  
v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates

v.tr.
1. To put or bring into causal, complementary, parallel, or reciprocal relation.

2.
 with value-added measures that have been adjusted to account for the fact that teachers tend to be less effective in their first one or two years in the classroom. In fact, the correlation between principals' ratings and experience-adjusted value-added measures is no higher than the correlation with our baseline The horizontal line to which the bottoms of lowercase characters (without descenders) are aligned. See typeface.

baseline - released version
 value-added measures. The bigger mistake principals make, it seems, is not adequately accounting for students' incoming ability.

While informative about principals' overall abilities, a simple correlation does not tell us whether principals are more or less effective at identifying teachers at certain points on the ability distribution. We therefore estimated the percentage of teachers that a principal can correctly identify in the top group within his or her school. We found that the teachers identified by principals as being in the top category were, in fact, in the top category according to the value-added measures about 52 percent of the time in reading and 69 percent of the time in mathematics. If principals randomly assigned as·sign  
tr.v. as·signed, as·sign·ing, as·signs
1. To set apart for a particular purpose; designate: assigned a day for the inspection.

2.
 ratings to teachers, we would expect the corresponding probabilities to be 14 and 26 percent, respectively. This suggests that principals have considerable ability to identify teachers in the top of the distribution. The results are similar if one examines principals' ability to identify teachers in the bottom of the ability distribution.

Despite their success with the top and bottom of the distribution, principals are significantly less successful at distinguishing among teachers in the middle of the ability distribution. Principals correctly identify only 49 percent of teachers as being better than the median teacher in their school in boosting students' reading scores, relative to the 33 percent that one would expect if principals' ratings were randomly assigned. Principals appear somewhat better at distinguishing between teachers in the middle of the distribution in math (they correctly placed 54 percent of teachers above the median, compared with the 26 percent expected if ratings were random), but they again appear to be better at identifying the best and worst teachers.

One reason that principals might have difficulty distinguishing between teachers in the middle is that the distribution of teachers' value-added ratings is highly compressed. However, our analysis of the data suggests that this is not the case. Teachers who receive ratings at or close to the median in the school have estimated value-added measures that are quite widely dispersed dis·perse  
v. dis·persed, dis·pers·ing, dis·pers·es

v.tr.
1.
a. To drive off or scatter in different directions: The police dispersed the crowd.

b.
.

What Characteristics of Teachers Do Principals Value?

Of course, the effects of moving to a system of compensation based on assessment by principals depend on the relative importance they place on a teacher's ability to raise standardized test A standardized test is a test administered and scored in a standard manner. The tests are designed in such a way that the "questions, conditions for administering, scoring procedures, and interpretations are consistent" [1]  scores when making overall assessments of teachers' effectiveness. While such preferences could theoretically be set by district administrators or other policymakers, it is likely that principals would retain some autonomy over personnel decisions, so their preferences are important to investigate. We therefore compared principals' overall rating of each teacher with their assessment of various teacher attributes to examine how principals value different dimensions of quality in teachers.

Perhaps not surprisingly, teachers' ratings on many (though not all) of the individual survey items are highly correlated. Based on the relationships between the questions, we created three groups of teachers' quality characteristics and reanalyzed the results. The first group captures what might be described as traditional teaching ability and includes the ratings of classroom management, organization, and ability to improve students' test scores. The second, including the principal's assessments of a teacher's relationship with colleagues and administrators, measures a teacher's collegiality col·le·gi·al·i·ty  
n.
1. Shared power and authority vested among colleagues.

2. Roman Catholic Church The doctrine that bishops collectively share collegiate power.
. The third measures student satisfaction and includes the principal's ratings of student satisfaction and the teacher as a role model.

Ability, collegiality, and student satisfaction all contribute independently to a principal's overall evaluation of a teacher, but principals weigh the set of questions measuring teachers' ability to improve student achievement and to manage a classroom most heavily. An increase of one standard deviation in a principal's evaluation of a teacher's management and teaching ability, for example, is associated with an increase of 0.56 standard deviations in the principal's overall rating. In comparison, an increase of one standard deviation in teacher collegiality is associated with an increase in overall ratings of roughly one-third of a standard deviation in overall rating. Meanwhile, teachers scoring one standard deviation higher in student satisfaction score just 0.15 standard deviations in their overall rating, all else being equal.

Predicting Performance

We should care about the quality of principals' assessments of teacher quality not just for their reliability in a merit-pay system, but also for their ability to identify teachers who will continue to improve student achievement. In order to get a sense of how well principals' assessments forecast teachers' performance, we examined how well these assessments predict future student achievement gains. For our February 2003 survey of principals, that meant evaluating scores on the spring 2003 tests. We compared the predictive accuracy of a principal's assessment of teacher effectiveness with the predictive accuracy of a teacher's value-added rating. We also measured the accuracy of the traditional determinants of teachers' salaries, experience and education, in predicting those scores. Throughout, we accounted for differences in previous student achievement, student demographics The attributes of people in a particular geographic area. Used for marketing purposes, population, ethnic origins, religion, spoken language, income and age range are examples of demographic data. , and classroom characteristics.

Our findings suggest that ratings by principals, both overall ratings and ratings of a teacher's ability to improve achievement, effectively predict a student's future achievement gains (see Figure 1). Students whose teachers receive an overall rating one standard deviation above the mean are predicted to score roughly 0.06 standard deviations higher in reading than students whose teacher received an average rating. By way of comparison, students receiving free or reduced-price lunch in the same district experience achievement gains approximately 0.16 standard deviations lower than similar students who are not eligible for such programs. Assignment to a teacher with a favorable fa·vor·a·ble  
adj.
1. Advantageous; helpful: favorable winds.

2. Encouraging; propitious: a favorable diagnosis.

3.
 evaluation by her principal appears to be more important for math performance. An increase of one standard deviation in the principal's evaluation predicts an increase of 0.14 standard deviations in math performance, roughly on par with the disadvantage associated with coming from a low-income family.

Measures of teachers' value added in previous years are an even better predictor of future gains in students' achievement than are principal ratings. These results, which are similar for math and reading, suggest that teachers' impact on student achievement, as measured by simple value-added measures of teacher effectiveness, remain fairly stable over time and that principals' ratings effectively capture a substantial fraction of these stable differences in teachers' effectiveness.

We do not find any statistically significant relationship between the number of years a teacher has taught and students' achievement, though this is probably due to the necessary omission omission n. 1) failure to perform an act agreed to, where there is a duty to an individual or the public to act (including omitting to take care) or is required by law. Such an omission may give rise to a lawsuit in the same way as a negligent or improper act.  of first-year teachers (because we cannot measure their value added for a previous school year). Other studies have found that first-year teachers tend to perform worse on average than experienced teachers. Education does have some predictive power The predictive power of a scientific theory refers to its ability to generate testable predictions. Theories with strong predictive power are highly valued, because the predictions can often encourage the falsification of the theory. . Teachers with advanced degrees have students who score roughly 0.10 standard deviations higher. We hesitate to say that education itself is producing these gains, because a teacher's level of education is likely to be associated with personal characteristics not accounted for in our analysis, and these may be the very factors responsible for the improvements in student achievement.

Perhaps our most interesting finding is that the salaries teachers in this district received in 2002-03 bore no relation at all to their impact on student achievement. Students with highly paid teachers made no more progress than those with teachers who had low salaries.

Conclusions

In sum, our results suggest that student achievement (as measured by standardized test scores) would probably improve more under a system based on principals' assessments than in systems where compensation is based solely on education and experience. This is because principals would be able to identify and reward the very best teachers while, at the same time, identifying the least competent teachers for remediation or dismissal.

To the extent that the most important staffing decisions involve sanctioning sanc·tion  
n.
1. Authoritative permission or approval that makes a course of action valid. See Synonyms at permission.

2. Support or encouragement, as from public opinion or established custom.

3.
 incompetent incompetent adj. 1) referring to a person who is not able to manage his/her affairs due to mental deficiency (lack of I.Q., deterioration, illness or psychosis) or sometimes physical disability.  teachers and rewarding the very best teachers, a principal-based assessment system may affect achievement as positively as a merit-pay system based solely on student test results. Moreover, evaluation by the principal has the potential to offset some of the potential negative consequences of test-based accountability systems. If principals can observe inputs as well as outputs, they may be able to ensure that teachers increase student achievement through improvements in pedagogy, classroom management, or curriculum rather than teaching to the test. Principals can also evaluate teachers on the basis of a broader spectrum of educational outputs in addition to test scores that parents may value. At the same time, the inability of principals to distinguish between a broad middle range of teacher quality suggests caution in relying on principals for fine-grained performance determinations, as might be required under certain merit-pay policies.

Two important caveats to consider when interpreting our results. First, we conducted our analysis in a context where principals were not being evaluated on the basis of their ability to identify effective teachers. It is possible that principals' ability to identify the best-performing teachers would be enhanced by a school system where the principals had more responsibility for monitoring teachers' effectiveness. At the same time, social or political pressures might make principals less willing to assess teachers honestly if their judgments directly influenced teachers' compensation. Second, our analysis focuses on the source of the teacher assessment; we do not address the type of rewards or sanctions associated with teacher performance. This is clearly an important dimension of any performance management system, and one would not expect either a principal-based or a test-based assessment system to have a substantial impact on student outcomes unless it were accompanied by meaningful consequences.

Brian Jacob Jacob (jā`kəb), in the Bible, ancestor of the Hebrews, the younger of Isaac and Rebecca's twin sons; the older was Esau. In exchange for a bowl of lentil soup, Jacob obtained Esau's birthright and, with his mother's help, received the blessing  is assistant professor of public policy at the John F. Kennedy School of Government The John F. Kennedy School of Government, colloquially known as the Kennedy School of Government (KSG) or simply the Kennedy School, is a public policy school and one of the professional graduate schools of Harvard University. , Harvard University Harvard University, mainly at Cambridge, Mass., including Harvard College, the oldest American college. Harvard College


Harvard College, originally for men, was founded in 1636 with a grant from the General Court of the Massachusetts Bay Colony.
 and a faculty research fellow with the National Bureau of Economic Research The National Bureau of Economic Research (NBER) is a "private, nonprofit, nonpartisan research organization" dedicated to studying the science and empirics of economics, especially the American economy. . Lars Lefgren is assistant professor of economics, Brigham Young University Brigham Young University, at Provo, Utah; Latter-Day Saints; coeducational; opened as an academy in 1875 and became a university in 1903. It is noted for its law and business schools. .
Principal Distinctions (Figure 1)

Principals do a reasonably good job of identifying those teachers who
are better (and worse) at raising student test scores. Not surprisingly,
the best way to predict how effective a teacher will be is to find out
how effective the teacher has been in the past. Differences in teachers'
salaries within a school system are entirely unrelated to teachers'
effectiveness.

Predictors of Teacher Ability to Improve Student Performance

                                       Test-score performance
                                       explained by measure
                                       (percent of a
                                       standard deviation)
                                       Math  Reading

Teacher's previous performance         21    9
Teacher's overall rating by principal  14    6
Teacher's salary                       No explanatory value

Note: The figure shows the degree to which an increase of one standard
deviation in each variable is related to student achievement in 2003.
Previous performance is measured by the teacher's estimated success in
raising test scores between 1998 and 2002. The analysis controls for
student demographic characteristics, classroom characteristics, fixed
effects for grade and school, and lagged math and reading scores. All
reported effects are significant at the 0.05 level.
SOURCE: Authors' calculations from district's data

Note: Table made from bar graph.
COPYRIGHT 2006 Hoover Institution Press
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:research
Author:Lefgren, Lars
Publication:Education Next
Geographic Code:1USA
Date:Mar 22, 2006
Words:3337
Previous Article:Great expectations: the impact of the National Board for Professional Teaching Standards.(feature)
Next Article:Getting ahead by staying behind: an evaluation of Florida's program to end social promotion.(research)
Topics:



Related Articles
Reforming the teachers' unions: what the good guys have accomplished - and what remains to be done.(peer review, tenure reform)
Confronting Institutional Mediocrity.(education)
The business model: value-added analysis is a crucial tool in the accountability toolbox--despite its flaws. (Forum).
What principals think motivates teachers.
To improve teacher quality, support beginning teachers.
Of teacher shortages and quality: now that we can identify good teachers, let's reward them.(from the editors)
Instructional coaching: eight factors for realizing better classroom teaching through support, feedback and intensive, individualized professional...

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles