Printer Friendly

Lecturing performance appraisal criteria: Staff and student differences.

The assessment of academic staff teaching performance is an area of considerable concern and debate. The questions revolve around what should be assessed and by whom. In this study, the ratings of academic staff and tertiary students in a new institution were compared on 21 criteria of lecturing. Analysis of variance demonstrated that the academics placed significantly greater importance than students on a range of performance criteria (e.g. non-sexist language, independent learning, challenging the world view), with the students placing greater importance on one criterion--pace of the presentation. Separate factor analyses of the ratings by staff and students demonstrated differences in the schematic models of these two groups. They agree that the criteria are important, but portray different pictures of the ways in which the criteria combine to produce the understanding of what is a good lecture. The findings of this study contribute to the questions on the assessment of academic staff performance. This study demonstrates that staff and students differ significantly in their interpretations of what is to be measured in the assessment of a good lecture. These findings raise questions regarding the use of student and staff ratings in performance appraisal.

Since the late 1980s, there has been a movement within Australian higher education towards a formal system of academic staff appraisal. The then Education Minister, John Dawkins (1988), set out in his policy statement on higher education a requirement of tertiary institutions to initiate systematic: procedures for summative staff appraisal to facilitate the rewarding of excellence', assist decision making about tenure and promotion, and ensure accountability of academic staff.

The policy of the Department of Employment, Education and Training (DEET) also alluded to a formative dimension of appraisal. According to DEET's principles of staff appraisal (Lonsdale, Dennis, Openshaw, & Mullins, 1988), one purpose of these procedures is to provide the basis for staff development. Thus academics are able to be assessed, and action taken to remedy problems and to support improvements in teaching.

Teaching appraisal in higher education has used information from: (a) students, (b) colleagues, (c) expert/trained raters, and (d) self-reports (Marsh, 1986; Moses, 1986; Thompson, Deer, Fitzgerald, Kensell, Low, & Porter, 1990). By far the most widely used method has been student appraisal (Cruse, 1987). All the approaches have, however, been shown to suffer from measurement flaws; and a variety of studies has found serious inconsistencies between the judgement by different types of raters.

A number of methodological problems associated with self-appraisal centre on the accuracy of ratings (Meyer, 1980; Thornton, 1980). Self-ratings have been found to suffer from inflation, in comparison with others' judgements, and a tendency for the raters to exhibit socially desirable response patterns (Howard, Conway, & Maxwell, 1985; Moses, 1986), or a self-serving bias (Campbell & Lee, 1988). These errors are considered to have the potential to adversely affect the value of self-ratings. However, self-appraisal is considered to be less problematic when used by individuals to predict their future performance for developmental purposes (Campbell & Lee, 1988, Thompson et al., 1990) rather than assessing their past performances.

Colleague and expert appraisals have been proposed as means of overcoming some of the limitations of self-appraisal. However, colleague and expert appraisals also pose a number of specific problems. For practical reasons, raters are not likely to be as familiar with an appraisee's teaching as the students or the appraisee. Consequently, sampling bias is considered to be a potential problem (Cohen & McKeachie, 1981; Doyle, 1975; Scriven, 1987).

The validity of appraisals, based on limited observations of teaching performance, can be questioned. Such observations cannot assure the representativeness of a lecturer's performance over the length of a course. In addition, colleague ratings are influenced by halo effects (Nafturlin, Ware, & Donnelly, 1973), and are dependent on the appraiser's knowledge of the subject matter (Scriven, 1987; Sorcinelli, 1984). Despite these problems, colleague and expert ratings are considered by some to be more reliable than student ratings, even though some evidence indicates that student ratings have better validity than colleague, expert, or self-ratings (Costin, Greenought, & Menges, 1971; Howard et al., 1985; Marsh, 1982; McKeachie, 1986).

Student ratings are liable to be affected by a variety of factors, categorised by Kulik and Kulik (1974) under two broad headings: (a) course setting--class size, elective versus compulsory subjects, subject level, workload/difficulty, and the academic discipline; (b) teacher characteristics--personality, verbal fluency, and rank. Woehr (1992) indicated problems with student appraisals because of the variety of personal and contextual factors which impinge upon the judgements made.

Other reviewers identified physical attractiveness, prior interest, expected grades, and timing of rating as additional factors which may affect student ratings (Braskamp, 1984; Marsh, 1987). Freeman (1994) found that the gender role of the instructor was the most salient factor, independent of the actual gender of the person rated.

Marsh (1984, 1986, 1987) used Multi-Trait Multi-Method (MTMM) procedures to test the validity of student and self-ratings on the Students' Evaluations of Educational Quality (SEEQ). Of particular interest to assessing lecturing were the dimensions: organisation/clarity, breadth of coverage, learning/value, enthusiasm, group interaction (Marsh, 1987). High correlations were reported between student and self-ratings by staff across the studies, and he contended that convergent and divergent validity was demonstrated for both the measures used and the methods of measurement. Thus Marsh concluded that both the items and the types of measures used (i.e. student and self-rating) displayed construct validity.

Other studies employing MTMM procedures do not, however, fully support Marsh's conclusions. Howard et al. (1985) assessed the validity of student, colleague, expert, and self-ratings of effective teaching. They found expert, colleague, and self-ratings to have poor reliability. Student ratings were found to be reliable and to display construct validity. These results contributed to the reliability/validity controversy surrounding student, peer, and self-assessment of lecturer performance (Marsh, 1984; Miller, 1988; Scriven, 1987).

One source of the discrepancy between the sets of studies may be the inferential accuracy of raters (Nathan & Alexander, 1985). That is, although agreement may be reached between academics and students as to the criteria indicative of effective teaching, they may differ in their cognitions of the implicit properties of effective teaching. Such cognitive differences may result in the two groups making dissimilar judgements about a lecturer's performance, despite the same items and measures being used, and lead to the conclusion that student or academic ratings are unreliable.

The problems of differing cognitions can be found in other research in higher education. Powles (1988, 1989) has demonstrated that research students and their supervisors often have quite different understandings and expectations of their roles, rights, and responsibilities. Although each is involved in the same process, the cognitions about that process vary greatly.

The present study deals with the development (using MTMM procedures) of criteria for one facet of effective teaching in higher education--lecturing. Lecturing is the most traditional method of teaching in higher ,education, and many commentators (e.g. Marsh, 1987; Marsh & Roche, 1992; Murray, 1980) contend that there is agreement between academics and students as to what constitutes good university lecturing. Marsh (1994) reviewed the data from 29 543 university classes to demonstrate the efficacy of student evaluations in the improvement of college teaching. However the effective lecture is a multi-faceted activity which can be difficult to measure directly as only selected indicators of its efficacy can be tested (Marsh, 1986; McKeachie, 1986).

Importantly procedures were included in this study to examine the differential understandings of staff and students, and the impact these may have on the construct validity of the proposed criteria and methods of measurement. This was achieved by using statistical techniques, including factor analysis, to explore the convergence and divergence of academics' and students' cognitive constructions of an effective lecture. Differing from many of the previous studies, the present one also used direct comparisons of academic staff and student ratings of the importance of specific performance criteria, as well as studying the ways in which these criteria are grouped to produce schemas that may differ between the groups--that is, the underlying structures that give meaning to the set of criteria.



Academics With the exception of sessional lecturers, all academics from across the four higher education faculties (Arts, Health Science, Applied Science, and Business) of one campus of an Australian university were approached to participate in the survey. A total of 89 academic surveys were returned, of which 88 were usable for statistical analyses. The number of respondents represented approximately one-third of the academic staff. There were approximately equal numbers of males and females in the sample.

Students A representative sample of undergraduate students from across the four faculties were approached to participate in the survey. A total of 320 student surveys were returned, all of which were included in the study. The student sample represented approximately equal numbers of students from the four faculties. The age of students in the sample ranged from less than 21 years to more than 25 years, with the vast majority being less than 21 years old. Approximately 70 per cent of the student sample was female.


Academics and students from the four faculties of the campus participated in separate focus groups. Group members were asked to relate and discuss what they thought were the characteristics of an effective lecture. The discussions were recorded on audiotape and systematically analysed using theme analysis techniques.

The criteria derived from the focus groups were combined with criteria that appeared in the literature. A total of 21 criteria were identified by these processes and are shown in Table 1. A survey instrument was then designed incorporating the 21 criteria. The instrument measured academics' and students' ratings of the importance for each of the criteria. Respondents were asked to indicate their level of agreement or disagreement about the importance of each criterion statement on a 9-point interval scale-'strongly disagree' (1) to `strongly agree' (9). Fourteen of the statements were framed as positive statements, and seven as negative. The survey also requested demographic information.
Table 1 Twenty-one lecturing criteria

Criterion Abbreviation

Provide clear explanations Explanation
Present material in an interesting way Interesting
Stimulate students' interest Stimulate
Pace lecture to allow note taking Pace
Arouse students' curiosity Curiosity
Use non-sexist language Non-sexist language
Ensure lectures have defined structure Structure
Use examples relevant to students Relevant examples
Display mastery of subject matter Mastery
Provide up-to-date research Up-to-date
Stimulate independent learning Independent learning
Use inclusive examples and expression Inclusive presentation
Display high level of verbal fluency Verbal fluency
Interact with students Interact
Possess good public-speaking qualities Public speaking
Challenge students' world views Challenge
Build on students' previous knowledge Build
Pause to allow memory consolidation Pause
Provide periodic summaries during lecture Summaries
Act as academic role model Role model
Project enthusiasm for the subject matter Enthusiasm

Ratings for each criterion provided data for direct comparison between the groups on differential levels of importance. The ratings also provided data to be used in factor analyses to construct cognitive models of good lecturing.


The data for the seven negatively framed criteria statements were re-coded and so analyses were conducted with all items scored in the same direction (low values--low importance; high values--high importance). Separate group means were calculated for the academic and student ratings of each of the criteria, these are shown in Figure 1 (ranked by magnitude of rated importance). It can be seen that the scores for each group were similar, but that the academics placed greater value on 19 of the 21 criteria. The only two criteria rated more important by students were the pace of the lecture for note taking, and the public-speaking skills of the lecturer.


In order to determine the significance of the observed differences in importance ratings, one-way analyses of variance (ANOVAs) between the academic and student group scores were conducted. As there were 21 dependent variables to be analysed, there was an increased chance of Type I errors. To protect the alpha level, a multivariate analysis of variance was conducted using status (staff versus student) as the independent variable and including all 21 criteria as dependent variables. This was found to be significant (F = 8.85, p [is less than] .001), indicating an overall significant difference between staff and students across the range of dependent variables and providing confidence for univariate comparisons.

Univariate analyses of variance demonstrated a series of significant differences between the scores for staff and students. From Table 2 it can be seen that students rated only one criterion significantly more important than did the academics--pace of the lecture for note taking (F = 26.88, p [is less than] .001). The academics also placed more value on some of the structural elements of lecturing. In particular, they saw using non-sexist language (F = 27.72, p [is less than] .001), using examples that did not discriminate or alienate certain groups of students (F = 24.26, p [is less than] .001), and examples that were specifically relevant to the students' experience (F = 7.11, p [is less than] .05) as significantly more important than did the students.
Table 2 Comparisons of academic and student ratings of 21 criteria

 Academic Student

Criterion M SD M SD F

Explanation 8.33 .81 8.26 .99 0.34
Interesting 8.09 .87 7.75 1.26 5.49(*)
Stimulate 8.03 1.30 7.63 1.36 6.27(*)
Pace 6.94 1.50 7.80 1.31 26.88(***)
Curiosity 7.90 0.98 6.97 1.46 31.46(***)
 language 7.71 1.55 6.48 2.02 27.72(***)
Structure 7.66 1.48 7.61 1.31 0.08
 examples 7.54 1.47 7.06 1.52 7.11(**)
Mastery 7.58 1.34 7.37 1.41 1.62
Up-to-date 7.59 1.65 7.50 1.41 0.21
 learning 7.57 1.59 5.70 2.07 60.50(***)
 presentation 7.46 1.57 6.36 1.88 24.26(***)
 fluency 7.38 1.08 7.30 1.49 0.22
Interact 7.34 1.85 6.98 1.98 2.23
 speaking 7.12 1.92 7.27 1.83 0.42
Challenge 7.04 2.10 5.23 1.89 60.67(***)

Build 6.67 2.09 5.93 2.24 7.84(**)
Pause 6.52 1.40 6.26 1.81 1.59
Summaries 6.44 1.68 6.19 2.02 1.01
Role model 5.86 2.07 5.51 1.86 2.23
Enthusiasm 7.60 1.57 6.67 2.02 15.72(***)

(*) p <.05 (*) p <.01 (***) p <.001

For academics, challenging the students' world view (F = 60.67, p [is less than] .001) and stimulating independent learning (F = 60.50, p [is less than] .001) were significantly more important than for the students. These were supported by their higher values placed on stimulating students' curiosity (F = 31.47, p [is less than] .001), stimulating students' interest in the material (F = 6.27, p [is less than].05), and demonstrating enthusiasm for the material (F = 15.72, p [is less than] .01).

In order to understand how academics and students grouped or conceptualised the set of criteria, principal component factor analyses were conducted separately for each group. The solutions were forced to seven factors following an examination of scree plots, with the criteria for inclusion being a loading of at least 0.5 and no higher loading on any other factor. The factor structure matrix for academics appears in Table 3.
Table 3 Factor structure for academic staff

Factor/Variables Factor loading

Social Equity
 Non-sexist Language .79
 Inclusive Presentations .77
 Up-to-date .72
Presentation Skills
 Mastery .73
 Public Speaking .52
 Curiosity .79
 Stimulate .69
 Interesting .68
 Enthusiasm .81
 Role Model .73
Lecturing Mode
 Pace .70
 Summaries .69
 Relevant Examples .60
Critical Thinking
 Independent Learning .65
 Challenge .65
Cognitive Processes
 Structure .79
 Build .77
 Pause .58

For academics, the criteria that loaded on the seven factors reflected reasonably distinct and meaningful dimensions. Items loading highly on Factor 1 were interpreted as reflecting social equity/content bias--a general dimension concerned with avoiding discrimination in the content of lectures. Although the criterion, Up-to-date research, does not appear to match conceptually the other two criteria, it was apparent from the pre-survey interview data collected from academics that they interpreted Up-to-date research not necessarily in terms of its recency, but research that did not perpetuate gender, ethnic, or socioeconomic biases was considered up-to-date, regardless of its date of publication.

Items loading on the remaining six factors posed few problems for interpretation: Factor 2, Presentation Skills--knowledgeable, communicate proficiently, and confidence in dealing with students; Factor 3, Motivation--intellectual stimulation of students; Factor 4, Modelling--behaviour exemplary of positive academic attributes; Factor 5, Lecturing Mode--relating to the mechanics of disseminating information; Factor 6, Critical Thinking; Factor 7, Cognitive Processes--use of mnemonic and cognitive principles to assist students' information processing.

Correlations between the academics' factors (see Table 4) supported the interpretations and appropriateness of the factor solution by demonstrating convergence between factors that were conceptually related, and divergence between conceptually distinct factors. For example, Factor 1, Social Equity, interpreted as a general dimension, was found to correlate highly with five of the other six factors and, thereby, tended to support the factor's generality. Conversely, Factor 2, Presentation Skills, which was interpreted as forming a distinct factor, correlated in ways consistent with its uniqueness, by correlating poorly with the six other factors.
Table 4 Factor correlation matrix for academic staff

Factor 1 2 3 4 5 6 7

Factor 1 1.00
Factor 2 .06 1.00
Factor 3 .20(*) -.08 1.00
Factor 4 .11 .06 .04 1.00
Factor 5 .20(*) -.01 .13 .10 1.00
Factor 6 .15 .06 .00 .17(*) .03 1.00
Factor 7 -.25(*) .12 -.21(*) -.14 -.13 .10

(*) p < .05

From the factor analysis results for students (see Table 5), the first factor seemed in part to be a synthesis of criteria that matched two of Marsh's (1987) dimensions: Organisation/Clarity, and Breadth of Coverage. In this sense, Factor 1 seemed to be concerned primarily with the quality of a lecture's content, and represented Content Merit. Criteria loading highest on Factor 2 suggested Social Skills. This differed conceptually from the academics' presentation skills factor in that the configuration of criteria only dealt with lecturers' competence in communicating with students, whereas the academics' presentation factor relates more to projecting a competent image. There was little ambiguity in the criteria loading highest on Factor 4; they clearly represent Social Equity.
Table 5 Factor structure for students

Factor/Variables Factor loadings

Content Merit
 Explanation .70
 Up-to-date .61
 Mastery .61
 Interesting .61
 Structure .50
Social Skills
 Public Speaking .77
 Verbal Fluency .55
 Interact .52
Logical Progression
 Summaries .73
 Build .60
Social Equity
 Inclusive Presentation .83
 Non-sexist Language .81
Flow of Lecture
 Pace .78
 Pause .60
 Stimulate .60
Instructional Environment
 Role Model .74
 Curiosity .60
 Challenge .79
 Enthusiasm .58

The configuration of items loading highest on Factors 3, 5, 6 and 7 all reflect some aspect of lecturer style. Items loading highest on Factors 3 and 5 suggest procedural aspects of lecture presentation: Factor 3, Logical Progression; Factor 5, Flow of Lecture. Items loading highest on Factors 6 and 7 related `to certain traits of lecturers that enhance presentation: Factor 6, Instructional Environment represented how good a presenter the lecturer was; Factor 7, Dynamism, encompassed a sense of entertainment value in a lecture's lecturing style.

The resultant correlations between the student factors (see Table 6) tended to demonstrate the validity of the factors' interpretation. For example, Factor 1, Content Merit, correlated highly with Factors 4, 5, and 6, all of which were concerned to some degree with lecture content. Conversely, Factor 1 correlated poorly with Factors 2 and 7, both of which correlated highly with each other, and related to a conceptual distinct dimension concerned with social skills.
Table 6 Factor correlation matrix for students

Factor 1 2 3 4

Factor 1 1.00
Factor 2 .09 1.00
Factor 3 .02 .08 1.00
Factor 4 .13(*) .06 .05 1.00
Factor 5 .19(*) .05 .12(*) .07
Factor 6 .18(*) .01 .07 .11
Factor 7 .04 -.20(*) -.16(*) .07

Factor 5 6 7

Factor 1
Factor 2
Factor 3
Factor 4
Factor 5 1.00
Factor 6 .20(*) 1.00
Factor 7 .08 .05 1.00

(*) p < .05

From Table 7, it can be seen that there are distinct conceptual differences in the factor structures for the students and academics. The academic factor structure has a stronger focus on the social equity issues that are beginning to be much more important in higher education--and which match the official ethos of the institution to which the staff members belonged--and to the critical thinking and quality issues of education. For students, the most important factor relates much more to the specific presentation aspects of the lecture--clarity, interest, and structure.
Table 7 Items loading on staff and student factor structures

Staff Students

Factor 1
 Non-sexist Clear presentation
 Non-discriminatory Up-to-date
 Up-to-date Interest
Factor 2
 Mastery Public speaking
 Public speaking Fluency
Factor 3
 Curiosity Summary
 Stimulate Build
Factor 4
 Enthusiasm Non-discriminatory
 Role model Non-sexist
Factor 5
 Pace Pace
 Summary Memory
 Relevance Stimulate
Factor 6
 Independence Role model
 Challenge Curiosity
Factor 7
 Structure Challenge
 Build Enthusiasm


Analysis of the ratings indicated that all 21 criteria were rated by academics and students as being important for the effectiveness of a lecturer's performance. This provided face validation for the criteria to be applied to good lecturing. The overall ratings of importance for all the criteria supported the contention of Murray (1980) and Marsh (1987) that there is agreement between academics and students as to what constitutes good university teaching.

From the analyses of variance, there was strong evidence that the relative importance placed on the criteria differed greatly between academics and students. The factor analyses and differing factor structures showed that there may be quite different interpretations of what a good lecture should be.

The analyses of variance demonstrated that the academic staff placed significantly greater importance on a number of the criteria. These were aspects of lecturing that are focused on challenging the stares quo for students and developing independent learners. Examples are: the use of non-sexist language, stimulating the students' curiosity, and an emphasis on independent learning.

The only two criteria on which students placed a greater importance were the pace of the lecture and the speaking voice of the lecturer--with only pace of the lecture rated as significantly more important by the students. Each of these criteria represents an instrumental component of the lecture. The students need to hear the material clearly and write it down. For students, the notion that lectures should go beyond the presentation and recording of facts was not evidenced.

Factor analyses were used to determine the ways in which criteria combined to provide meaningful clusters that could be used to explain, in more depth, the ways in which the groups perceived and interpreted a good lecture. The finding that the two factor structures were dissimilar is of little analytic value, unless the resultant structures can be meaningfully interpreted. One means of determining the appropriateness of a factor structure relates to the convergence and divergence (or discriminability) between factors (Kerlinger, 1973). That is to say, the validity of factor interpretations may be demonstrated by showing that factors correlate higher with factors to which they are conceptually related, rather than with factors to which they are not conceptually related.

Analysis of the inter-factor correlation matrices was able to demonstrate that academic and student factor structures were appropriate solutions for each group's data. The divergence between the academic and student factor structures, and the related dimensions, suggested that academics and students differed in their conceptualisations of what constitutes an effective lecture. Although both groups agreed that the 21 criteria were important, the combinations in which academics and students perceived the criteria to be relevant appeared to be sharply delineated.

One possible source of the divergence between the academic and student factor structures may relate to the intrinsically different roles academics and students possess within the higher education system. Although both the academic and student data were collected using the same survey instrument, academics' and students' differential experience of higher education may have resulted in academics and students responding in contextually different ways to the survey questions. The role of a student is in many respects that of a consumer. Students may have responded from the perspective of: `What do I want from a lecturer?'. Therefore student data may reflect their more immediate or instrumental concerns about instruction.

Conversely the role of academics is that of providers of education. In such a case, academics may have responded from the perspective of what students require from a lecturer; what is valuable for their education. Academics' data may reflect their identification with the more abstract goals of higher education that correspond to their dual identity within higher education as both researchers and educationalists.

On a broader issue, these findings reflect problems in the assessment of academic performance, or how this should be done. As Cruse (1987) has indicated, student appraisals represent the most common assessment tool; yet it appears that academic staff and students have constructed the notion of good teaching to mean very different things. Thus the validity of such measures must be drawn into question.

To some extent, the academics in the present study appeared to have been reiterating institutional values of equity in, and access to, higher education, which are proclaimed to be basic tenets of their very new institution. Academics' affirmation of these values was evident in the importance they attributed to those criteria related to social justice. The academic data, therefore, may reflect either their fundamental concern for educational equity issues (perhaps indicating a self-selection to the institution because of its goals) or a degree of socially desirable responding (Howard et al., 1986; Moses, 1986). This raises the question of whether academics in older institutions would respond in similar ways or whether these academics are divergent from the rest of their colleagues.

Commenting on the North American experience, McKeachie (1986) noted that, whereas in the past only the academically [and socially] elite students entered university, student populations now exhibit a broader range of academic skills and levels of preparation. As a consequence, there may be a greater demand from students for academics to teach, compared with where students were expected to be more academically self-reliant.

In the present study, the institution was developed to provide college education to areas and populations previously under-represented in the university pool. As Terry (1995) indicated, in his discussion of West University, the newer institutions may require quite different pedagogical approaches to serve a more diverse student population.

The reliable agreement between academics and students as to the criteria for effective teaching found by Marsh (1987) and Murray (1980) was not evident in the present study. The study indicated that inferential accuracy of raters, drawn from the population of academics and students in question, could be seen as a potential threat to the reliability and validity of assessments made using the developed criteria. Importantly this would be the case only on the basis that one of the two groups' cognitive construction of an effective lecture was given primacy over the other. Therefore the results of this study clearly indicate that some caution is warranted in assuming the generality of criteria for effective teaching. Changes in student populations over time may result in criteria for effective teaching, that previously were shown to be valid and reliable, no longer exhibiting these characteristics. Similarly the findings question the transferability of criteria validated in one educational setting to another. For that matter, criteria validated by academics to be used by students to evaluate lecturers' performance must be treated with great circumspection. Thus the assessment of academics by students will represent a very different form of measurement from either self- or peer-review as the construct being used may be totally different. Such differences provide explanations for the findings that appeared in the past to be discrepancies between student and peer appraisals (Arubayi, 1987; Howard et al., 1985; Marsh, 1987).

In sum, the academic and student data in the present study may be thought of as representing samples from two different populations due to differences, first, in the roles academics and students possess within the higher education setting, secondly, in their psychological experiences of higher education, and thirdly, in their perceptions of the purpose and goals of higher education. The factor analyses demonstrated support for this premise, by identifying what appear to be marked conceptual differences between the two groups, as to how the proposed criteria relate to the construct of an effective lecture. Convergent validity could not be demonstrated, due to the dissimilarity in the factor structures between academics and students. Nor was discriminant validity demonstrated, as the criteria that combined to form conceptually distinct factors for the academic data combined differently to form conceptually different and distinct factors for the student data.

academic staff development
higher education
lecture method
student attitudes
teacher attitudes


This research was funded by a grant from the Western Institute Council (Victoria University of Technology) Teaching/Learning Fund. The authors wish to thank Professor Geoffrey George for his support for the project and Professor Susan Moore for her constructive comments on a draft.

An earlier version of this paper was presented at the AARE/NZARE Joint Conference, Deakin University, Geelong, Australia, November 1992.


Arubayi, E. A.(1987). Improvement of instruction and teacher effectiveness: Are student ratings reliable and valid? Higher Education, 16, 267-278.

Braskamp, L. A. (1984). Evaluating teacher effectiveness. Beverly Hills, CA: Sage.

Campbell, D.J. & Lee, C. (1988). Self-appraisal in performance evaluation: Development versus evaluation. Academy of Management Review, 13, 302-314.

Cohen, P. A. & McKeachie, W. J. (1981). The role of colleagues in the evaluation of college teaching. Improving College and University Teaching, 28, 147-154.

Costin, F., Greenought, W. T., & Menges, R. J. (1971). Student ratings of college teaching: Reliability, validity, and usefulness. Review of Educational Research, 41, 511-535.

Cruse, D. B. (1987). Student evaluation and the university professor: Caveat professor. Higher Education, 16, 723-737.

Dawkins, J. S. (1988). Higher education: A policy statement. Canberra: AGPS.

Doyle, K. (1975). Student evaluation of instruction. Lexington, MA: Lexington Books.

Freeman, H. R. (1994). Student evaluations of college instructors: Effects of type of course taught, instructor gender and gender role, and student gender. Journal of Educational Psychology, 86, 627-630.

Howard, G. S., Conway, C. G., & Maxwell, S. E. (1985). Construct validity of measures of college teaching effectiveness. Journal of Educational Psychology, 77, 187-196.

Kerlinger, F. N. (1973). Foundations of behavioral research (2nd ed.). New York: Holt, Rinehart & Winston.

Kulik, J. A. & Kulik, C. C. (1974). Student ratings of instruction. Teaching of Psychology, 1, 51-57.

Lonsdale, A., Dennis, N., Openshaw, D., & Mullins, G. (1988). Academic staff appraisal in Australian higher education, Part 1: Principles and guidelines. Canberra: AGPS.

Marsh, H. W. (1982). Validity of students' evaluation of college teaching: A multitrait-multimethod analysis. Journal of Educational Psychology, 74, 264-279.

Marsh, H. W. (1984). Students' evaluation of university teaching: Dimensionality, reliability, validity, potential bias, and utility. Journal of Educational Psychology, 76, 707-754.

Marsh, H. W. (1986). Applicability paradigm: Students' evaluation of teaching effectiveness in different countries. Journal of Educational Psychology, 78, 465-473.

Marsh, H. W. (1987). Students' evaluation of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11,253-388.

Marsh, H. W. (1994). Weighting for the right criteria in the Instructional Development and Effectiveness Assessment (IDEA) system: Global and specific ratings of teaching effectiveness and their relations to course objectives. Journal of Educational Psychology, 86, 631-648.

Marsh, H. W. & Roche, L. A. (1992). The use of student evaluations of university teaching in different settings: The applicability paradigm. Australian Journal of Education, 36, 278-300.

McKeachie, W.J. (1986). Teaching tips: A guidebook for the beginning college teacher (8th ed.). Lexington, MA: Heath.

Meyer, H. (1980). Self-appraisal of job performance. Personnel Psychology, 33, 291-295.

Miller, A. H. (1988). Student assessment of teaching in higher education. Higher Education, 17, 3-15.

Moses, I. (1986). Self and student evaluation of academic staff. Assessment and Evaluation in Higher Education, 11, 76-86.

Murray, H. G. (1980). A comprehensive plan for the evaluation of teaching at the University of Queensland. St Lucia: Tertiary Education Institute.

Nafturlin, D. H., Ware, J. E., & Donnelly, F. A. (1973). The Doctor Fox lecture: A paradigm for educational seduction. Journal of Medical Education, 48, 630--635.

Nathan, B. R. & Alexander, P,. A. (1985). The role of inferential accuracy in performance ratings. Academy of Management Review, 10, 109-115.

Powles, M. (1988). Know your PhD students and how to help them. Parkville, Vic.: University of Melbourne, Centre for the Study of Higher Education.

Powles, M. (1989). How's the thesis going? Parkville, Vic.: University of Melbourne, Centre for the Study of Higher Education.

Scriven, M. (1987). The validity of student ratings. Nedlands: University of Western Australia, Department of Education.

Sorcinelli, M. D. (1984). An approach to colleague evaluation of classroom instruction. Journal of Instructional Development, 7, 11-17.

Terry, L. (1995). Teaching for justice in the age of the Good universities guide. St Albans, Vic.: Victoria University of Technology.

Thompson, H. R., Deer, C. E., Fitzgerald, J. A., Kensell, H. G., Low, 13. C., & Porter, R. A. (1990). Staff self-appraisal at the departmental level: A case study. Higher Education Research and Development, 9, 39-48.

Thornton, G. L. (1980). Psychometric properties of self-appraisal and job performance. Personnel Psychology, 33, 263-271.

Woehr, D. J. (1992). Performance dimension accessibility: Implications for rating accuracy. Journal of Organizational Behavior, 13, 357-368. Lecturing criteria

Dr Adrian Fisher is a Senior Lecturer, John Alder is a Lecturer, and Mark Avasalu was a research student in the Department of Psychology, Victoria University of Technology--Werribee Campus, PO Box 14428, Melbourne City, Victoria MC 8001.

Adrian T. Fisher John G. Alder Mark W. Avasalu

Victoria University
COPYRIGHT 1998 Australian Council for Educational Research
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1998, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Author:Avasalu, Mark W.
Publication:Australian Journal of Education
Geographic Code:8AUST
Date:Aug 1, 1998
Previous Article:Education for sale: A semiotic analysis of school prospectuses and other forms of educational marketing.
Next Article:Schooling and vocational preparation: Is a revolution really taking place?

Related Articles
Pay for performance.
Rate your association's performance evaluations.
Uses and misuses of student opinion surveys in eight Australian universities.
Redesigning Undergraduate Technology Instruction One College of Education's Experience.
Preparing to be appraised.
Conducting a performance appraisal.
Preparing to be appraised.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters