The role of gender in teaching effectiveness ratings of faculty.
This paper examines the role of gender and its influence on student ratings of faculty teaching effectiveness. The study recorded professor effectiveness ratings by 930 undergraduate students consisting of 472 females and 458 males. The results reveal several gender differences. Generally, female students rated faculty effectiveness higher than male students. When gender of faculty was considered, female students rated male faculty higher than male students, but did not rate female faculty higher than male students. Gender differences were also examined using an integrated model for student rating behavior. This model included the theories of motivation, grade leniency/stringency, and construct validity, which have been integrated into a structural equation model. Previous research generally treated these theories independently. The effect of gender on the role each of the competing theories was studied. Rating behavior was generally consistent between male and female students. Females seemed to exhibit lower academic expectations than males. The results of this study, in conjunction with previous studies, continue to show that significant bias exists in student ratings of faculty teaching effectiveness regardless of gender of students or faculty.
Many studies have been conducted to study the various factors that influence student ratings of professor effectiveness. Those that have focused on gender differences have revealed inconsistencies related to faculty evaluations. Some studies have shown higher ratings for instructors by females, though in some instances same sex preferences were found also (Ferber & Huber, 1975, Tieman & Rankin-Ullock, 1985). Others studies have shown little or no gender interactions (Elmore & LaPoint 1974, 1975, Wilson & Doyle 1976). Hancock, Shannon & Trentham (1993) considered gender and college disciplines (human sciences, liberal arts, etc) and found no uniform patterns. They did find female students rated instructors higher than did males. Feldman (1993) did an extensive analysis and found minor differences and inconsistent results, though students rated same sex instructors somewhat higher. Fernandez, 1997 reviewed the literature and concluded that gender differences were minimal with regard to rating of faculty. His own study supported these conclusions, stating "...that the effect of student and faculty gender on teaching quality assessment is slight to non-existent".
Other factors also impact teaching evaluations. Academic success expectancy has been studied and found that there were small gender differences with females expectancy slightly less than males (Gigliotti & Seacrest, 1988). The effects of motivation on professor ratings are probably the most agreed upon systematic influence in student ratings of faculty. It has been demonstrated that student motivation, represented by student interest and course type (elective/required), plays a significant role in student ratings of professor effectiveness (Howard & Maxwell, 1980; Hoyt 1973; Marsh, 1984; Marsh & Duncan, 1992). Howard and Maxwell (1980, 1982) modeled the relationship between student motivation, student learning, expected grades, and student satisfaction with the instructor and field of study. These and other studies show that motivation and learning are more highly correlated with ratings of professor effectiveness than is expected grade with professor effectiveness. The authors conclude that student motivation drives the correlation between grades and student satisfaction with the instructor. Therefore, the correlation between grades and ratings of professor effectiveness is an expected artifact, rather than an indication of a direct relationship between grades and ratings of professor effectiveness. Using path analysis, Marsh (1984) also concluded that prior subject interest had a stronger impact on student ratings of various professor effectiveness characteristics than did grades. Additionally, simple classifications (required versus elective) and expanded categories of course type have been found to be significantly correlated with ratings of professor effectiveness (Aleamoni, 1981; Centra, 1993; Feldman, 1978; Marsh & Dunkin, 1992).
Construct validity theory proposes that student ratings reflect student learning and, therefore, measure professor teaching effectiveness. That is, higher student ratings for the instructor indicate greater student learning. Some studies have demonstrated that classes with the highest student ratings have also performed best on standardized final examinations in multi-section classes (Marsh & Roche, 1997). In addition to the earlier studies that provided the foundation for validity theory, numerous factor analytic studies have been conducted to investigate the validity of student ratings (Cashin 1988; Feldman 1989; Howard, Conway, & Maxwell 1985; Marsh 1984; Marsh & Duncan 1992).
Using SEM, Greenwald and Gilmore (1997a&b) found support for grade leniency theory by suggesting that only grade leniency allows for a negative workload a grade relationship. This relationship is explained by students' willingness to work harder in order to avoid very low grades. This negative relationship between workload and grades has been observed in other studies (Marsh, 1980). However, other explanations have been offered for the negative relationship, such as subject difficulty and student capability (McKeachie, 1997
In an effort to integrate the competing theories, SEM analysis was conducted by Hatfield and Kohn, 2004, which confirmed the presence of all the competing theories and their interactions.
Based on the literature and the findings of several studies, we propose that there are differences in the way male and female students rate male and female faculty.
H1: Female students rate faculty higher than male students.
H2: Female students rate male faculty higher than male students
H3: Female students rate female faculty higher than male students
Theories of student rating behavior suggest a variety of hypotheses which form the basis of an integrated approach to study gender differences. Several theories have been offered in explanation of the positive relationship between grades and ratings of faculty (Greenwald & Gilmore, 1997a). Grade leniency suggests that grades directly affect ratings of faculty. Construct validity and motivation assert that a third variable (learning positively affects both grades and ratings, thus, resulting in a positive relationship between grades and ratings. While these theories have been studied extensively in the past, the effect of gender has not been studied when all three theories are present simultaneously.
This theory suggests praise induces liking for the individual giving the praise (Aronson & Linder, 1965; Hatfield & Kohn, 2003). In the context of student ratings, praise is interpreted to be high grades and liking is translated into high faculty ratings. Grade leniency theory suggests that there is a causal relationship between expected grades and ratings of faculty. Further, Greenwald and Gilmore (1997a) suggest that there is a negative relationship (grade stringency) between students working hard and expected grades. In courses that have strict-grading policies students have to work hard in order to avoid very low grades, yet overall grades are still lower than in classes with easy-grading professors. These premises suggest the following hypotheses:
H4: The higher the expected grade, the higher the professor effectiveness rating.
H5: The higher the student effort (worked harder), the lower the expected grade.
This theory suggests that high instructional quality induces high student learning, which results in higher grades and higher professor ratings (Cashin & Downey, 1992; Cohen, 1981; Feldman, 1976 & 1989; Marsh, 1984). Therefore, the following hypotheses are provided to evaluate construct validity:
H6: The higher the student learning, the higher the professor effectiveness rating.
H7: The higher the student learning, the higher the expected grade.
H8: The higher the professor effectiveness rating, the higher the student learning.
H9: The higher the worked hard rating, the higher the student learning.
This theory suggests that student motivation positively affects both grades and ratings of faculty, through student learning, thereby resulting in a positive correlation between grades and ratings of faculty (Aleamoni, 1981; Braskamp & Ory, 1994; Centra, 1993; Kohn & Hatfield, 2001; Marsh, 1984; Marsh & Dunkin, 1992). Student motivation results in more student learning and appreciation for the course and instructor, which leads to higher grades and higher professor effectiveness ratings. Researchers have identified two measures of student motivation: course-specific and general (Howard & Maxwell, 1980; Marsh, 1984). These indicators of student motivation will be examined in this study--student interest in the subject matter of the rated course and course type (major or elective, versus required or core course). Student interest is a course-specific measure, whereas course type is a general measure. The following hypotheses are designed to test the impact of student motivation in student rating behavior:
H10: The higher the student interest, the higher the student learning.
H11: Lack of choice in course (required or core courses) results in lower student learning.
H12: The higher the student learning, the higher the expected grade.
H13: The higher the student learning, the higher the professor effectiveness rating.
The student rating survey contained eight items, which students rated on a six-point Likert scale: (1) strongly agree, (2) agree, (3), slightly agree, (4) slightly disagree, (5) disagree, (6) strongly disagree. The first six items were designed to examine professor effectiveness, with the sixth item being a global item. Student learning was assessed by item 7 and course specific student interest by item 8.
1. The course requirements, including grading system, were explained at the beginning of the semester.
2. The professor provides feedback on exams and assignments.
3. The professor is willing to answer questions and assist students upon request.
4. The professor uses examples and practical applications in class, which aid in my understanding of the material.
5. The professor encourages students to analyze, interpret, and apply concepts.
6. The professor was effective teaching this course.
7. I learned a significant amount in this course
8. I am interested in the subject matter of this course.
These items were selected based on past research, which suggests the desirability of global items that address professor effectiveness (#6) and student learning (#7) factors, and the need to control for student interest (#8) (Cashin, 1995). Items one through five address commonly used dimensions of professor effectiveness in student rating research (Braskamp & Ory, 1994; Cashin, 1995; Centra, 1993; Feldman, 1989; Marsh, 1991).
Students completed a student data sheet that contained demographic items, two grade-related items, and one general student motivation item: (1) The grade I expect to achieve in this course, (2) I worked harder in this course than in most of my other courses, and (3) course type. All response options were designed so that students could use opscan sheets to report their ratings. The scale for expected grade was: 1.A, 2.B, 3.C, 4.D, 5.F. The agree-disagree Likert scale noted above was also used for the 'worked harder' item. Five categories of course type were provided: A. required by major/minor, B. elective in major/minor, C. general education requirement, D. free elective, E. program core course. These items reflect commonly used measures in testing for grade leniency and motivation effects on student ratings of faculty (Greenwald & Gilmore, 1997a&b; Howard & Maxwell, 1980; Marsh, 1984).
SAMPLE AND PROCEDURES
Data were collected from students and professors in the three colleges (business, arts and science, and education) at Shippensburg University at the end of the first semester of the 1997-1998 academic year. Classes were included in the sample from professors volunteering and by request (in order to insure adequate representation from all colleges and departments), a mix of student classes (such as freshman and senior), and a mix of professor characteristics (such as gender, race, degree, and rank).
Nine hundred and thirty students, (472 females and 458 males) and 44 professors (17 females and 27 males) were included in the sample, with the largest percentage (51) of faculty in Arts and Sciences, and equal percentages in Business and Education. The largest percentage of students were seniors, 36 percent, followed by sophomores at 19 percent, juniors at 18 percent, freshmen at 14 percent, and graduate at 13 percent.
VARIABLES AND MEASURES
The professor effectiveness dependent variable is a composite measure, developed by averaging the ratings of the six professor effectiveness items. The reliability coefficient, alpha, for the composite professor effectiveness measure is 0.84. Expected Grade is both a dependent and independent variable, and is used directly as reported. Student Learning, Student Interest, and Worked Hard are also used as directly reported in the survey instrument. The Course Type categories were collapsed into a two-category independent variable: (1) major/minor/elective, and (2) required/core course. There are two measures of student motivation: Course Type and "Student Interest in the subject matter of this course". There are two measures of grade leniency: Expected Grade and "Worked harder in this course than in most other courses". Student Learning, a self-reported rating, is a construct validity measure.
The scales for five variables (professor effectiveness, expected grade, student learning, student interest, worked hard) were reversed so that interpreting the findings would be more consistent with the way these variables are typically referred to, e.g., low to high. For example, the higher the student learning rating, the more the student learned, etc. The course type variable is categorical, and, thus did not need to be reversed.
ANALYSIS AND RESULTS
Descriptive statistics (means, standard deviations, and correlations) for all the variables used in this study are provided in Tables 1, 2, and 3. The hypotheses will be tested on the within-class data using structural equation modeling (SEM) and the Amos 5.0 modeling software. While there are many goodness-of-fit statistics in SEM, this study will report three of the most popular measures (CFI, NFI, Chi-square/df), with Comparative Fit Index (CFI) being the primary fit-statistic used in this study (see End Notes). Path coefficients are tested for significance using Critical Ratios (CR). Amos 5.0 reports both the CR's and the P values for each path so that level significance can be determined. In addition, comparisons of professor effectiveness will be made among the combined male and female, male only, and female only samples for professor effectiveness rating behavior. Average scores will be tested to see if there are differences among the groups using a difference of means test. Finally, a comparison will be made to determine if male and female students rate male and female faculty members differently. A one-way analysis of variance will be performed to see if there are any significant differences in rating behaviors.
A comparison was performed to contrast average professor effectiveness rating scores among male and females combined, females only, and males only. (see Tables 4 & 5). A test for the difference of means was performed (assuming unequal variances) and a highly significant difference (Z = 3.453, P < .000) was found between Females only and Males only. Thus, female average professor effectiveness rating scores are significantly higher than males. These results support H1 and similar findings (Benz and Blau 1995) which also found female ratings higher than males.
An analysis was conducted to determine if the gender of either students or faculty played a role in rating of faculty effectiveness. Because of fewer missing data for this analysis, the total sample size was 936 instead of 930 students. The gender of the faculty was noted and a one way analysis of variance was conducted among the 4 combinations of male and female students and male and female faculty. Table 6 presents the averages and standard deviations of faculty effectiveness ratings, along with sample size. Table 7 presents the results of the analysis of variance indicating significant differences among groups (P < .001). Sheffe multiple comparisons were made between male and female student ratings for male and female faculty. Female students rated male faculty significantly higher than did male students (mean difference = .177, P < .05), supporting H2. There was no significant difference in female and male students ratings of female faculty (mean difference = .103). Therefore, H3 is not supported.
NEW PERSPECTIVES ON STUDENT RATINGS
One of the problems with testing each of the above theories in isolation of each other is that intervening and moderating effects on the predicted relationships are not taken into account. Such effects may suppress or reinforce the predicted relationships. Thus, to accurately assess the presence of the theorized relationships, all the variables of interest need to be included in the same model. This section will integrate the findings predicted by the various theories, using structural equation modeling. The impact of student gender will then be studied to determine the role gender plays in rating behavior.
Following similar methodology of Hatfield and Kohn 2004, all of the predicted direct relationships proposed in the grade leniency/stringency, construct validity, and motivation theories were used to construct integrated structural models for males and females, males only, and females only. The initial analyses of these models are presented in left hand column for figures 1, 2, and 3. Analysis of the model reveals that the fit for all three models was very good (CFI = .94 for all 3 models) and resulted in [R.sup.2] s of .35 (M & F), .41 (F), and .28 (M) for professor effectiveness. However, several path coefficients were not significant. For each model, removal of these paths was evaluated by iteratively removing the path with the highest P value greater than .05, rerunning the model with the path deleted, and then inspecting the P values of the remaining paths for those P values that were greater than .05. The procedure stopped when all remaining paths were significant. The final models are presented on the right hand side of figures 1, 2, and 3.
[FIGURES 1-3 OMITTED]
As a result of these procedures, both Course Type a Student Learning (Motivation H12) and Professor Effectiveness a Student Learning (Construct Validity H8) were deleted from all three models (M & F, M, F). In addition, Worked Harder a Student learning (Construct Validity H6) was deleted from the female model. Inspection of the modification indices of the final models of both males and females, males only, and females only indicated that no additional paths would strengthen the model. These results are presented in the right hand column of Figure 1. All paths of the final models are significant, the fit is very good (CFI = 1.00 (F) and .999 (M & F, M)), and [R.sup.2] s are .35 (M & F), .41 (F) and .29 (M). While the [R.sup.2] s of the final models remained at their original levels, the fit indices of both models improved significantly. The [R.sup.2] for all models are quite strong. [R.sup.2] for females only is 46% higher than for males only. All three of these [R.sup.2]values are much higher than usually reported in many studies. In addition, there is a high degree of consistency in the structural nature of all models with only Worked Harder a Student Learning linkage being omitted from the female only model.
Thus, 9 of the original 12 hypotheses were strongly supported, lending considerable support to the three theories of student rating behavior, regardless of gender. Interestingly, both grade leniency (H4) and grade stringency (H5) are supported. While grade leniency is commonly understood and generally accepted, grade stringency (negative workload, expected grade relationship) has been rarely observed (Gilmore and Greenwald - 1997a & b). In this analysis, it is not only observed in the combined sample of males and females but in both sub-groups of male only and females only. Moreover, the standardized path coefficient between Worked Harder and Expected Grade is much stronger for females (-.31) than males (-.17).
In all final models, the Professor Effectiveness a Student Learning (H8) was removed. This hypothesis is part of Construct Validity theory and may indicate that the feedback loop between student learning and ratings of professor effectiveness may not be as well defined as assumed. A possible explanation for this weakness is that higher effectiveness ratings may not be a good measure of teaching ability and thus does not lead to greater student learning.
Although found in other studies, the Course Type (H12) link was dropped from all final models. It is generally assumed that students are more motivated in electives or courses in their major and less so in required courses. This facet of motivational theory then leads to higher professor effectiveness ratings. Our results do not indicate this to be so. Course Type is an indirect affect, influencing student learning which in turn affects professor effectiveness ratings. In an integrated model, Student Interest has a major impact (path coefficient--.70 (M&F), .81(F) and .64(M)), eliminating the role of Course Type component of Motivation theory. Thus, in an integrated model Course Type may be a redundant variable with Student Interest providing a much stronger indication of student motivation.
Some inconsistencies continue to show up in the study of gender on faculty effectiveness ratings. In general, females students rate faculty higher than did males students. When gender of faculty was considered, females students rated male faculty higher than did male students. Higher female rating scores have been observed in other studies and the results of our study strongly support this difference. However, the differences ended there. In addition, our findings also provide strong support for the need to integrate theories that explain student rating behavior of faculty. Integration is necessary because of the interactions and indirect effects among the theoretical premises. Structural equation modeling provides an ideal analytical methodology to study the complexities of student-rating behavior
Using an integrated approach based on SEM analysis, we have found surprisingly consistent results among all the models. Structurally, little difference in overall student rating behavior based on gender differences was observed. Except for the removal of one path for females (Worked Harder a Student Learning, [H.sub.6], Construct Validity), the three models are identical. All exhibit the identical simultaneous effects of Construct validity, Grade leniency and Stringency, and Motivational theories. All models fit the data very well and have much higher coefficients of determinations than previously reported. These high values continue to lend support for the significant bias that exists in rating of professor effectiveness that continues to be ignored by many schools. Females experience the negative workload expected grade effect to a much higher degree than males. Some studies have found that females have lower expectancy of success (Gigliotti and Seacrest 1988). The results of our study tend to support these findings. Faced with a difficult course, females may be less likely to have the confidence in themselves and thus expect a lower grade though they will continue to work harder.
This study continues to find significant bias, including gender bias, in student ratings of faculty, suggesting the need to re-consider how student ratings are used to evaluate faculty teaching effectiveness. In order to more accurately evaluate professor effectiveness, administrators and faculty need to control for, or at least acknowledge, the complexity student rating behavior.
A 1.0 CFI or NFI suggests a perfect fit and if under .9 the model can probably be improved (Bentler and Bonnett, 1980). Chi-square/df ratios of up to 3 are indicative of acceptable fit models (Marsh and Hocevar, 1985). CFI is less affected by sample size than is NFI or the Chi-square ratio (Kline, 1998).
Aleamoni, L.M. 1981. Student ratings of instruction. In J. Millman (Ed.), Handbook of teacher evaluation (pp. 110-145). Beverly Hills, CA: Sage.
Anderson, J.C. & Gerbing, D.W. 1988. Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103(3): 411-423.
Aronson, E. & Linder, D.E. 1965. Gain and loss of esteem as determinants of interpersonal attractiveness. Journal of Experimental Social Psychology, 1: 156-171.
Bentler, P.M. & Bonnett, D.G. 1980. Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88: 588-606.
Bentler, P.M. & Chou, C. 1987. Practical issues in structural modeling. Sociological Methods and Research, 16: 78-117.
Benz, C. & Blatt, S.J. 1995. Factors Underlying Effective College Teaching: What students Tell Us. Mid-Western Educational Researcher, 8 (1): 27-31.
Bollen, K.A. 1989. Structural Equations with Latent Variables. New York: Wiley.
Braskamp, L.A. & Ory, J.C. 1994. Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Jossey-Bass.
Bridgeman, W.J. 1986. Student evaluations viewed as a group process factor. Journal of Psychology, 120: 183-190.
Cashin, W.E. 1988. Student ratings of teaching: A summary of the research. Idea Paper No. 20. Manhattan: Kansas State University, Center for Faculty Evaluation and Development.
Cashin, W.E. 1995. Student Ratings of Teaching: The Research Revisited IDEA Paper No.32. Manhattan: Kansas State University, Center for Faculty Evaluation and Development.
Cashin, W.E. & Downey, R. 1992. Using Global Student Rating Items for Summative Evaluation. Journal of Educational Psychology, 84: 563-572.
Centra, J.A. 1993. Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass.
Chacko, T.I. 1983. Student ratings of instruction: A function of grading standards. Education Research Quarterly, 8(2): 14-25.
Chapman, J.W. & Lawes, M.M. 1984. Consistency of causal attributions for expected and actual examination outcome: A study of the expectancy confirmation and egotism models. British Journal of Educational Psychology, 54: 177-188.
Cohen, P.A. 1981. Student ratings of instruction and student achievement. A meta-analysis of multi-section validity studies. Review of Educational Research, 51: 281-309.
D'Apollonia, S. & Abrami, P.C. 1997. Navigating student ratings of instruction. American Psychologist, 52(11): 1198-1208.
Davis, M.H. & Stephan, W.G. 1980. Attributions for exam performance. Journal of Applied Social Psychology, 10: 235-248.
Elmore, P.B. & LaPointe, K.A. 1975. Effects of teacher sex, student sex, and teacher warmth on the evaluation of college instructors. Journal of Educational Psychology, 67,368-374.
Feldman, K.A. 1976. Grades and college students' evaluations of their courses and teachers. Research in Higher Education, 4: 69-111.
Feldman, K.A. 1978. Course characteristics and variability among college students in rating their teachers and courses: A review and analysis. Research in Higher Education, 9: 199-242.
Feldman, K.A. 1989. The association between student ratings of specific instructional dimensions and student achievement: Refining an extending the synthesis of data from multi-section validity studies. Research in Higher Education, 30(6): 583-645.
Feldman, K.A. 1993, College Students' Views of Male and Female College Teachers--Part II: Evidence from Social Laboratory and Experiments. Research in Higher Education, 34 (2) 151-211.
Fernandez, M.A.M. 1997, Student and Faculty Gender in Ratings of University Teaching Quality. Sex Roles: A Journal of Research, 37: (n11-n12) 997--1003.
Gigliotti, R.J. 1987. Expectations, observations, and violations: Comparing their effects on course ratings. Research in Higher Education, 26: 401-415.
Gigliotti, R.J., & Buchtel, F.S. 1990. Attritional bias and course evaluations. Journal of Educational Psychology, 82: 341-351.
Gigliotti, R.J., & Seacrest, S.E. 1988. Academic success expectancy: The interplay of gender, situation, and meaning. Research in Higher Education, 29: 281-297.
Greenwald, A.G. & Gillmore, G.M. 1997a. Grading leniency is a removable contaminant of student ratings. American Psychologist, 52(11): 1209-1217.
Greenwald, A.G. & Gillmore, G.M. 1997b. No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of Educational Psychology, 89(4): 743-752.
Hair, J.F., Anderson, R.E., Tatham, R.L., & Black, W.C. 1998: Multivariate Data Analysis, 5th ed. New Jersey, Prentice Hall, 603-604.
Haladyna, T. & Hess, R.K. 1994. The detection and correction of bias in student ratings of instruction. Research in Higher Education, 35(6): 669-687.
Hancock, G.R., Shannon, D.M, & Trentham, L.L., 1993. Student and teacher Gender in Rating of University Faculty: Results from Five Colleges of Study. Journal of Personnel Evaluation in Education, 6: 235-248.
Hatfield, L. & Kohn, J.W. 2003. Attribution Theory Reveals Grade-Leniency/Stringency Effects In Student Ratings Of Faculty, Academy of Educational Leadership Journal, Vol. 7, No.2.: 1-14.
Hatfield, L. & Kohn, J. W., 2004, Student Ratings of Faculty: Back to Square One - Integrating Theoretical Perspectives Using Structural Equation Modeling, Academy of Educational Leadership Journal, Vol. 1. No 10, 29-46
Holmes, D.S. 1972. Effects of grades and disconfirmed grade expectancies on students' evaluations of their instructor. Journal of Educational Psychology, 63(2): 130-133.
Howard, G.S. & Maxwell, S.E. 1980. Correlation between student satisfaction and grades: A case of mistaken causation? Journal of Educational Psychology, 72(6): 810-820.
Howard, G.S. & Maxwell, S.E. 1982. Do grades contaminate student evaluations of instruction? Research in Higher Education, 16: 175-188.
Howard, G.S., Conway, C.G. & Maxwell, S.E. 1985. Construct validity of measures of college teaching effectiveness. Journal of Educational Psychology, 77(2): 187-196.
Hoyt, D.P. 1973. Measurement of instructional effectiveness. Research in Higher Education, 1: 367-378.
Kennedy, W.R. 1975. Grades expected and grades received: Their relationship to students' evaluations of faculty performance. Journal of Educational Psychology, 67: 109-115.
Kline, R.B. 1998. Principles and Practices of Structural Equation Modeling. New York: Gilford Press.
Kohn, J.W. & Hatfield, L. 2001. Student Ratings of Faculty and Motivational Bias--A Structural Equation Approach, Academy of Educational Leadership Journal, 5(1):65-74.
Marsh, H.W. 1980. The influence of student, course, and instructor characteristics on evaluations of university teaching. American Educational Research Journal, 17: 219-237.
Marsh, H.W. 1984. Students' Evaluations of university teaching: Dimensionality, reliability, validity, potential biases, and utility. Journal of Educational Psychology, 76(5): 707-754.
Marsh, H.W. 1986. Self-serving effect (bias?) in academic attributions: Its relation to academic achievement and self-concept. Journal of Educational Psychology, 78:190-200.
Marsh, H.W. 1991. Multidimensional Students' Evaluations of Teaching Effectiveness: A test of Alternative higher-Order Structures. Journal of Educational Psychology, 83: 285-296.
Marsh, H.W. & Duncan, M. 1992. Students' evaluations of university teaching: A multidimensional perspective. In J.C. Smart (Ed.) Higher education: Handbook of theory and research, 8: 143-233. New York: Agaton.
Marsh, H.W. & Hocevar, D. 1985. Application of confirmatory factor analysis to the study of self-concept: First- and higher order factor models and their invariance across groups. Psychological Bulletin, 97(3): 562-582.
MacCallum, R.C., Roznowski, M. & Necowitz, L.B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111(3), 490-504.
Marsh, H.W. & Roche, L.A. 1997. Making students' evaluations of teaching effectiveness effective. American Psychologist, 52(11): 1187-1197.
McHugh, M.C., Fisher, J.E. & Frieze, I.H. 1982. Effect of situational factors on the self-attributions of females and males. Sex Roles, 8:389-396.
McKeachie, W.J. 1997. Student ratings--The validity of use. American Psychologist, 52(11): 1218-1225.
Miller, D.C. 1991 Handbook of Research Design and Social Measurement. Newbury Park, California: Sage Publications, Inc.
Owie, I. 1985. Incongruence between expected and obtained grades and students' ratings of the instructor. Journal of Instructional Psychology, 12:196-199.
Powell, R.W. 1977. Grades, learning, and student evaluation of instruction. Research in Higher Education, 7: 193-205.
Ross, M. & Fletcher, G.J.O. 1985. Attribution and social perception. In G. Lindzey & E. Aronson (Eds.), Handbook of Social Psychology (vol. 2, pp. 73-122). New York: Random House.
Simon, J.G. & Feather, N.T. 1973. Causal attribution for success and failure at university examinations. Journal of Educational Psychology, 64: 46-56.
Stumpf, S.A. & Freedman, R.D. 1979. Expected grade covariation with student ratings of instruction: Individual versus class effects. Journal of Educational Psychology, 71: 293-302.
Tieman, C.R. & Rankin-Ullock, B. 1985. Student evaluations of Teachers. Teaching Sociology, 12, 177-191
Jonathan Kohn, Shippensburg University
Louise Hatfield, Shippensburg University
Table 1: Descriptive Statistics Males and Females: Correlations, Means and Standard Deviations (N = 930) Prof. Student Student Expect. Effect Learn. Interest Grade Prof. Correlation 1 .565 .381 .350 Effect. Significance .000 .000 .000 Student Correlation 1 .570 .314 Learn. Significance .000 .000 Student Correlation 1 .360 Interest Significance .000 Expect. Correlation 1 Grade Significance Worked Correlation Harder Significance Course Correlation Type Worked Course Standard Harder Type Mean Deviation Prof. Correlation .065 -.182 5.34 .653 Effect. Significance .047 .000 Student Correlation .196 -.141 5.00 .959 Learn. Significance .000 .000 Student Correlation .053 -.174 4.75 1.224 Interest Significance .107 .000 Expect. Correlation -.111 -.079 4.17 .775 Grade Significance .001 .016 Worked Correlation 1 -.178 4.07 1.332 Harder Significance .000 Course Correlation 1 1.26 .441 Type Table 2: Descriptive Statistics Females Only: Correlations, Means and Standard Deviations (N = 472) Prof. Student Student Expect. Effect Learn. Interest Grade Prof. Correlation 1 .619 .433 .367 Effect. Significance .000 .000 .000 Student Correlation 1 .602 .332 Learn. Significance .000 .000 Student Correlation 1 .417 Interest Significance .000 Expect. Correlation 1 Grade Significance Worked Correlation Harder Significance Course Correlation Type Worked Course Standard Harder Type Mean Deviation Prof. Correlation .053 -.191 5.41 .633 Effect. Significance .255 .000 Student Correlation .140 -1751 5.07 .970 Learn. Significance .002 .000 Student Correlation .056 -.203 4.71 1.247 Interest Significance .223 .000 Expect. Correlation -.209 -.041 4.17 .775 Grade Significance .000 .372 Worked Correlation 1 -.206 4.11 1.307 Harder Significance .000 Course Correlation 1 1.26 .441 Type Table 3: Descriptive Statistics Males Only: Correlations, Means and Standard Deviations (N=458) Prof. Student Student Expect. Effect. Learn. Interest Grade Prof. Correlation 1 0.506 .340 .317 Effect. Significance .000 .000 .000 Student Correlation 1 .543 .284 Learn. Significance .000 .000 Student Correlation 1 .312 Interest Significance .000 Expect. Correlation 1 Grade Significance Worked Correlation Harder Significance Course Correlation Type Worked Course Standard Harder Type Mean Deviation Prof. Correlation .072 -.175 5.27 .665 Effect. Significance .126 .000 Student Correlation .251 -.107 4.94 .944 Learn. Significance .000 .022 Student Correlation .051 -.144 4.78 1.200 Interest Significance .273 .002 Expect. Correlation -.023 -.119 4.08 .770 Grade Significance .618 .011 Worked Correlation 1 -.150 4.03 1.358 Harder Significance .001 Course Correlation 1 1.26 .441 Type Table 4: Means and Variances for Faculty Effective Ratings for Females and Males combined, Females only, Males only Gender N Mean Variance Males and Females 930 5.340 .426 Females Only 472 5.412 .401 Males Only 458 5.265 .442 Table 5: Differences in Mean Faculty Effectiveness Ratings for Females only and Males only Differences Z Score P value Females--Males 3.453 0.000 Table 6: Measures for Faculty Effectiveness Rating Scores by Gender Groups Female Fac. Female Fac. Male Fac. Male Fac. Female Stu. Male Stu. Female Stu. Male Stu. Average 5.43 5.33 5.40 5.223 Standard 0.627 0.640 0.638 0.6767 Deviation Sample size 189 159 283 305 Table 7: ANOVA Table: Faculty Effectiveness Rating Scores by Gender groups S.S. D.F. MSQ F Sig. Between Groups 6.65 3 2.218 5.269 .001 Within Groups 392.30 932 .421 Total 398.95 935