Printer Friendly

Fairness in Grading.


This article reviews the difficulties in assigning grades to student work, briefly reviewing highlights from the history of grading practices. It concludes with the suggestion that, given the impossibility of comparing grades either within an institution or between institutions, instructors should base grades on a measure of an individual student's progress during a course.

One of the most important duties of a faculty member at the end of a term is that of determining the final grade that individual students will receive in the class. As difficult a process as this is, it is made even more difficult not only by having to determine the process to arrive at the final grade, but also by the various interpretations of that grade that will be made later by others.

These assigned grades are designed to serve a variety of purposes. Dr. James S. Terwillinger wrote that grades are to serve three primary functions: administrative, guidance, and informational. He indicated that grades should be viewed only "as an arbitrarily selected set of symbols employed to transmit information from teachers to students, parents, other teachers, guidance personnel, and school administrators."(1) However, unless the meaning and interpretation of the grades assigned are universally understood, the system, no matter how carefully designed and understood by the instructor awarding the grade, will not be an effective means of communication to others or over a period of time for cumulative evaluation.

This is true even if the purpose of grading is more specifically defined--as in the following list by Professor James M. Thyne: "To ascertain whether a specified standard has been reached; To select a given number of candidates; To test the efficiency of the teaching; To indicate to the student how he (sic) is progressing; To evaluate each candidate's particular merit; and To predict each candidate's subsequent performance."(2)

In the development of an individual or institutional grading policy, it is important that a decision be made as to the reason for the assessment. If it is merely to have twelve grades at the end of the term or that departmental policy requires that all work be graded, these will become ends in themselves, and the interpretation of the final assigned grade will become even more difficult. Even with a definite purpose beyond "institutional policy," it is extremely difficult to have a consensus as to how to arrive at a grade to properly evaluate the progress made by any individual student in a particular course.

Dr. William L. Wrinkle wrote in 1947 of six interpretation fallacies that are made in understanding course grades. The number one fallacy that he listed in his book was the belief that anyone can tell from the grade assigned what the student's level of achievement was or what progress had been made in the class.(3) This fallacy is as widely believed and probably as correct today as it was when he wrote it in 1947.

Even earlier in a study published in 1912, Dr. Daniel Starch and Edward Elliott questioned the reliability of grades as a measurement of pupil accomplishment. Their study involved the mailing of two English papers to two hundred high schools to be graded according to the practices and standards of that school and its English instructor. The papers were to be graded on a scale of 1 to 100, with 75 being indicated as the passing grade. Teachers at one hundred forty-two schools graded and returned the papers. On one paper the grades ranged from 64 to 98, with an average of 88.2. On the other, the range was 50 to 97, with an average of 80.2. With more than thirty different grades assigned and a range of more than forty points for the same paper, it is no wonder than the interpretation of assigned grades is extremely difficult.(4)

Perhaps the earliest study on individual grading differences was done by Dr. F. Y. Edgeworth of the University of Oxford in 1889. Professor Edgeworth included a portion of a Latin prose composition in an article he wrote for the Journal of Education. He invited his readers to assign a grade to the composition and forward it to him. His only other instruction was that this composition was submitted by a candidate for the India Civil Service, that the work was to be graded as if the reader were the appointed examiner, and that a grade of 100 was the maximum possible.

He received twenty-eight responses distributed as follows: 45, 59, 67, 67.5, 70, 70, 72.5, 75, 75, 75, 75, 75, 75, 77, 80, 80, 80, 80, 80, 82, 82, 85, 85, 87.5, 88, 90, 100, 100. In his conclusions Edgeworth wrote, "I find the element of chance in these public examinations to be such that only a fraction--from a third to two thirds--of the successful candidates can be regarded as quite safe, above the danger of coming out unsuccessful if a different set of equally competent judges had happened to be appointed."(5)

The criteria for evaluation vary not only from institution to institution but from course to course within the same institution and from instructor to instructor of the same courses within the same institution. Since methods used by various instructors vary considerably, it becomes extremely difficult to read a student's transcript to determine the student's standing among others at the same institution or throughout the country at other institutions. The National Collegiate Athletic Association's Division I institutions voted down a requirement that student athletes maintain a standard grade point average in order to retain their eligibility to participate in collegiate sports from year to year. The major reason given was the difference in grading standards that existed between institutions and between courses and programs at the same institution.

One difficulty is that the methods used in arriving at the final course grade are almost too numerous to enumerate. These include the averaging of all course grades made during the term, dropping the lowest one or two test marks, determining the entire course grade on the basis of the final exam or one term paper, counting only the final course grades, grading on the basis of class average, and having only a written comment rather than course grade.

Compounding the problem of interpretation of the grades indicated for students is that both standards and grades themselves vary over time. For example, during the decade of the 60's, many institutions began experimenting with a variety of grading systems, both institution-wide and within selected courses. This was the era of student protests, demonstrations, and student revolts on our college campuses. Institutions that changed their grade reporting system include both small, private institutions and those with longstanding Ivy League academic reputations.

These innovative grading systems included allowing pass/fail grades in selected courses; replacing the traditional grades with "High Pass, Pass, Fail" or with Credit/No Credit; not counting failed but repeated courses in the grade point average; and "A, B, C, No Credit," with the NC not counting in the GPA. Additionally, changes had to be made in the system used to determine academic suspension, semester and graduation honors, and class rank. In many cases, since class rank had become virtually impossible to determine, it was left off the transcript entirely.

Many institutions that made global changes in the recording of grades during the decade of the 60's have changed to a system that is based on the instructor's evaluation as measured by traditional grades. But the same problems of interpretation that existed earlier are still present, with the additional difficulty of interpretation of the transcript of a student who was enrolled during the transition period. For example, at the University of South Carolina, during a seven-year period, which many part-time students need to complete their baccalaureate degree, a student's transcript would indicate the assignment of course grades under four different grading systems. The student would also have been subject to three different suspension and graduation honor criteria.

Even where there appears to be a standardization of the items to be rated there can still be difficulty. In his book, The Pyramid Climbers, Vance Packard reproduces two report cards published in The New York Times Magazine. One report card was for a kindergarten for four-year olds and the other for evaluating executives in one of the largest corporations in the country.

The first report card used a rating system of Very Satisfactory, Satisfactory, and Unsatisfactory. The items to be evaluated were: Dependability, Stability, Imagination, Originality, Self-expression, Health and vitality, Ability to plan and control, and Cooperation. The second report card used a rating of Satisfactory, Improving, and Needs Improvement. The items to be evaluated were: Can be depended upon, Contributes to the good work of others, Accepts and uses criticism, Thinks critically, Shows initiative, Plans work well, Physical resistance, Self-expression, and Creative ability. The first report card was used to evaluate the executives, and the second to evaluate the four-year olds.(6)

In the January 1988 issue of the Academic Leader, Dr. Stephen J. Huxley points out another difficulty and recommends a possible correction.(7) His observation is that the final record of the student, the college transcript, is blind to the differences indicated above. On the transcript in determining the student's grade point average, an "A" in Organic Chemistry, earned under an instructor who rarely gives them, is given the same weight as an "A" in Outdoor Fly Casting with an instructor who rarely gives any grade other than an "A." Since these differences are generally disregarded by employees, scholarship committees, and graduate and professional schools, students with their peer network learn which courses and instructors to take to bolster their grade point average.

Dr. Huxley's recommendation is that, in addition to the individual student's grade in the course, the transcript should indicate the average grade assigned by the instructor for that particular course and section. This would allow a transcript reader to determine more easily if the grade the student has in a particular course was the result of individual academic performance or the result of enrollment in an "easy" course. However, since this would necessitate not only a sophisticated computer grading program, but the official recording of instructors' grading practices, it is doubtful if many institutions will adopt Dr. Huxley's proposal.

Regardless of the grading system used, in interpreting the grades a central problem is in determining what they are trying to measure. Is the grade to measure an individual's achievement against others in the same class, course, or school, or is it only to measure changes in the student's progress since the start of the course? If the measure is of the individual's progress, it makes the measure of one's progress against others almost impossible to ascertain. It is as if one were using an elastic ruler to measure heights of individuals in a class. If the measuring device varies for each student, then one student can be taller than another simply because the ruler was stretched more in one measurement than in another, even if by simple observation it is evident that the first is taller. In educational jargon, such a measuring device would be labeled "unreliable." Yet in many courses the "measuring device" is changed for each semester and possibly for each student.

In certain courses in which competency is to be developed, some instructors have assigned grades at the end of the course on the basis of a student's sustained performance, regardless of the actual average attained or the average of the others in the course. For example, a student enters a writing course making grades of D on the material submitted. During the semester, the student makes the following grades: D, D, C, D, C, C, B, C, B, B+, B. What grade should be assigned as a final course mark? If a strict average is used, then the student has a grade of C or, at best, C+; however, the belief of some faculty is that since this student is writing at a "B" grade at the end of the course, then this is the final grade that should be assigned.

It would seem that in those courses in which competency is desired, the latter example would be a reasonable approach to assigning the final mark for a student. Certainly it is reasonable from the student's standpoint; however, it makes impossible an interpretation with others in the class, as well as any comparison with students in other classes, even of the same subject at the same institution.

The attempts to arrive at a fair and equitable grade to assign to an individual student, without distorting either the student's standing in class or comparative ranking with students at that or other institutions, has proven to be one of the most difficult quests of the faculty member. To be reliable as a tree measure of achievement over a period of time, the grade assigned must be understood by the instructor, students, colleagues, and future evaluators.

Since there appears to be little doubt that a given mark has different interpretations, perhaps the best choice is for the faculty member to follow the course, within departmental and institution guidelines, that in his or her opinion best measures the student's progress during the measuring period without being overly concerned with grading practices of other faculty and other institutions.


(1) James S. Terwillinger, Assigning Grades to Students (Glenview, IL, 1971), p. 108.

(2) James M. Thyne, Principles of Examining (London, 1974), p. 33.

(3) William L. Wrinkle, Improving Marking and Reporting Practices (New York, 1947), p. 36.

(4) Howard Kirschenbaum, Rodney Napier and Sidney B. Simons, Wad-Ja-Get? The Grading Games in American Education (New York, 1971), pp. 80-81.

(5) Frederick J. Kelly, Teachers' Marks, Their Variability and Standardization (New York, 1914), pp. 51-52.

(6) Vance Packard, The Pyramid Climbers (New York, 1964), pp. 80-81.

(7) Stephen Huxley, "Blind Transcripts," American Leader, IV (January 1988), pp. 1-2.

J. Thomas Davis, University of South Carolina

Davis is Associate Dean in the College of Applied Professions <>.
COPYRIGHT 2001 Rapid Intellect Group, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Author:Davis, J. Thomas
Publication:Academic Exchange Quarterly
Geographic Code:1USA
Date:Mar 22, 2001
Previous Article:Assessing the College Impact on Students' Lifetime Educational Aspirations.
Next Article:Self-Grading for Formative Assessment in Problem-Based Learning.

Related Articles
The Views of Teachers on Assessment: A Comparison of Lower and Upper Elementary Teachers.
Should Title IX be changed? .
Air force senior leader management office (Feb. 9, 2004): General Officer announcements (excerpt).
Air Force senior leader management office (Aug. 19, 2004): general officer nominations.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters