Printer Friendly
The Free Library
14,735,767 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Uses and misuses of student opinion surveys in eight Australian universities.


Student opinion surveys (SOSs) are commonly used in universities to measure student perceptions of teaching performance. Ostensibly os·ten·si·ble  
adj.
Represented or appearing as such; ostensive: His ostensible purpose was charity, but his real goal was popularity.
 their prime purpose is to improve teaching quality. This paper critically examines SOSs in eight Australian universities where survey design and analysis were examined and compared with literature recommendations. We find that current SOSs are neither designed nor structured according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 sound questionnaire technique and that, as part of a teaching evaluation system, they are seriously flawed flaw 1  
n.
1. An imperfection, often concealed, that impairs soundness: a flaw in the crystal that caused it to shatter. See Synonyms at blemish.

2.
. Deficiencies include: their use as the sole measure of teaching effectiveness, the tendency for universities to rely on unmoderated student opinion without tempering the results with contextual factors, and a lack of testing for reliability and validity that renders the data of unknown precision. We argue that, at present, SOSs expose teachers to unreliable, invalid opinions that influence teacher career advancement and job security.

Introduction

Increased competition between universities in attracting student fee revenue, greater requirements for accountability, and reduced government funding have led to a prolific use of student opinion surveys (SOSs) in Australian tertiary institutions. In these surveys, students `evaluate' teachers and course units (referred to as subjects) by completing survey instruments. The surveys contain questions about the teacher's perceived performance Perceived performance, in computer engineering, refers to how quickly a software feature appears to perform its task. The concept applies mainly to user acceptance aspects. , and the perceived worthiness of aspects of the subject. In Australia, the surveys became widely used in universities in synchrony synchrony /syn·chro·ny/ (-krah-ne) the occurrence of two events simultaneously or with a fixed time interval between them.

atrioventricular (AV) synchrony
 with total quality management initiatives in the late 1980s. A decade later total quality management receives diminished emphasis whereas SOSs are firmly entrenched en·trench   also in·trench
v. en·trenched, en·trench·ing, en·trench·es

v.tr.
1. To provide with a trench, especially for the purpose of fortifying or defending.

2.
; some university students are surveyed as many as eight times each year.

The surveys provide a convenient vehicle for staff appraisal, often making direct input to hiring, promotion and tenure decisions. Lecturers might take careful note of survey results and modify their teaching accordingly, perhaps both in style and content, to improve their `ratings'. Such modification could be desirable, if indeed teaching is improved, but it may be undesirable, if pleasing students is at the expense of objectives such as learning.

For professional purposes, surveys that have not been checked for reliability or validity must be regarded as unusable. For example, Marsh (1987) advises that `criterion measures that lack reliability or validity should not be used as indicators of effective teaching for research, policy formation, feedback to faculty or administrative decision making' (p.286). People who design, collect, analyse, report and use student opinion surveys should be aware of the validity and reliability of the data.

The aim of this study is to compare present survey practices of Australian universities with the current state of knowledge available in the academic literature. We commence with considerations of how one can be sure that the survey data are meaningful, and proceed with a brief summary of whether or not SOSs might be beneficial.

Survey reliability and validity

If an instrument is reliable and valid, the observed score on some aspect X (denoted [X.sub.0]) will equal the underlying, or `true' score (denoted [X.sub.T]). However, random ([X.sub.R]) and systematic ([X.sub.S]) errors distort the measurement:

[X.sub.0] = [X.sub.T] + [X.sub.R] + [X.sub.S]

It is the researcher's job to reduce or eliminate error, and to be aware of the likely magnitude of the error. Both random and systematic errors must be small in comparison with [X.sub.T] if meaningful observations are to be obtained (Churchill, 1979). Further, the absence of one source of error does not guarantee absence of the other.

[X.sub.R] results from spurious spu·ri·ous
adj.
Similar in appearance or symptoms but unrelated in morphology or pathology; false.



spurious

simulated; not genuine; false.
 uncontrolled factors, so that the same instrument administered under nearly identical circumstances yields different results. This type of error detracts from the reliability of the measurements. Reliability can be tested using several techniques. For example, internal consistency In statistics and research, internal consistency is a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether several items that propose to measure the same general construct produce similar scores.  can be checked using Cronbach's alpha Cronbach's (alpha) has an important use as a measure of the reliability of a psychometric instrument. It was first named as alpha by Cronbach (1951), as he had intended to continue with further instruments.  coefficient (Peter, 1979), provided that at least three questions that ask about the same concept (termed a construct) are included in the survey. The reliability of certain student questionnaires has been tested; for example, Marsh (1987) established an acceptable level of reliability for his instrument named SEEQ SEEQ Student Evaluation of Educational Quality  (Student Evaluation of Educational Quality). However, demonstrating reliability for some instruments does not guarantee that other untested instruments will be similarly reliable.

Validity is reduced by systematic errors ([X.sub.S]). Differences between the views of the respondent and the analyst over what a particular response means introduces systematic error. For example, students who learned nothing may strongly agree in a survey that `this teacher is good' if the teacher was kind, generous and devout de·vout  
adj. de·vout·er, de·vout·est
1. Devoted to religion or to the fulfillment of religious obligations. See Synonyms at religious.

2. Displaying reverence or piety.

3.
. An analyst of the data may erroneously er·ro·ne·ous  
adj.
Containing or derived from error; mistaken: erroneous conclusions.



[Middle English, from Latin err
 infer that the teaching was effective. Systematic error is sensitive to the particular questionnaire format and question wording. Should the items in a survey change, or should the wording of an item change, then the observed score is likely to change.

Known sources of error

There are several known sources of random and systematic error in SOSs:

1 Poor survey design Question ambiguity, which includes unclear questions and items that ask two or more questions (double-barrelled statements), should be avoided. Some `negative' or reverse-order questions should be included in each questionnaire to minimise any tendency for respondents simply to circle numbers without careful consideration. Reverse-order questions permit the analyst to determine whether each survey form is fit for inclusion in analysis.

2 Poor survey content Eley (1994) comments that `it is only reasonable to include questions to which students are qualified to respond' (p.6). Questions relating to relating to relate prepconcernant

relating to relate prepbezüglich +gen, mit Bezug auf +acc 
 overall course structure, curriculum content and coordination between courses should generally be excluded, except where they relate to students' direct experiences.

3 Influencing factors Course characteristics, such as class size, level of instruction, discipline and workload difficulty, are known to influence the way students perceive instruction (Cranton & Smith, 1990; Marsh, 1987). Instructor characteristics, such as grading standards, empathy empathy

Ability to imagine oneself in another's place and understand the other's feelings, desires, ideas, and actions. The empathic actor or singer is one who genuinely feels the part he or she is performing.
 and gender, affect teacher evaluations (Abrami, d'Apollonia, & Cohen cohen
 or kohen

(Hebrew: “priest”) Jewish priest descended from Zadok (a descendant of Aaron), priest at the First Temple of Jerusalem. The biblical priesthood was hereditary and male.
, 1990; Tatro, 1995). Student characteristics, such as ability, motivation, effort, expected grade, prior subject interest, reason for taking course and student gender are also correlated with student ratings (Marsh, 1987; Tatro, 1995). On the basis of these findings, it should be expected that female instructors with high grading The term high grading has uses in forestry, mining, and fishing relating to selectively harvesting goods.

Also known as “cutting the best and leaving the restMining
 standards, who are teaching large classes, will receive significantly lower ratings than others, ceteris paribus Ceteris Paribus

Latin phrase that translates approximately to "holding other things constant" and is usually rendered in English as "all other things being equal". In economics and finance, the term is used as a shorthand for indicating the effect of one economic variable on
. In the present study, it is expected that those responsible for developing, analysing and using SOSs engage in some effort to determine the degree to which such factors influence teacher evaluations, and interpret the results in context.

Establishing validity

Validity can be demonstrated in several ways:

1 Face validity face validity (fāsˑ v·liˑ·di·tē),
n
 is expert consensus that the measure adequately represents a particular concept. It provides a modest starting point Noun 1. starting point - earliest limiting point
terminus a quo

commencement, get-go, offset, outset, showtime, starting time, beginning, start, kickoff, first - the time at which something is supposed to begin; "they got an early start"; "she knew from the
 and merely establishes that the measure appears valid without empirical testing.

2 Criterion validity The introduction to this article provides insufficient context for those unfamiliar with the subject matter.
Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page.
 establishes that the construct behaves as expected; that the measure accurately predicts some criterion measure. For example, if the teaching objective is learning, then there should be a correlation between teacher effectiveness ratings and a learning measure. However Crooks (1988) reports that other factors can obstruct ob·struct
v.
To block or close a body passage so as to hinder or interrupt a flow.



ob·structive adj.
 criterion validity using student learning, because students may perform poorly or well for reasons other than the teacher's performance. Therefore other checks for criterion validity may be necessary. Marsh (1994) recommends recording the ratings of the same instructor in different courses, noting changes in student behaviours, conducting experimental manipulations, and measuring progress on specific course objectives.

3 Construct validity construct validity,
n the degree to which an experimentally-determined definition matches the theoretical definition.
 establishes that a rating is directly related to the construct it purports to represent. This involves determining whether the rating is correlated with several other measures of the same concept (convergent validity Convergent validity is the degree to which an operation is similar to (converges on) other operations that it theoretically should also be similar to. For instance, to show the convergent validity of a test of mathematics skills, the scores on the test can be correlated with scores ) and showing that it is not correlated with measures of a different concept (discriminant validity Discriminant validity describes the degree to which the operationalization is not similar to (diverges from) other operationalizations that it theoretically should not be similar to. ). Construct validity can be checked if SOS SOS, code letters of the international distress signal. The signal is expressed in International Morse code as … — — — … (three dots, three dashes, three dots).  ratings are supplemented with: observation by trained experts, self-evaluation/reflection, peer assessments, student work samples, and retrospective evaluations by former students (see, for example, Abrami et al., 1990; Ingvarson, 1994; Marsh, 1995; Ramsden & Dodds, 1989). Drawing upon such sources is recommended in the Hoare Report (Higher Education higher education

Study beyond the level of secondary education. Institutions of higher education include not only colleges and universities but also professional schools in such fields as law, theology, medicine, business, music, and art.
 Management Review Committee, 1995) which argues that feedback to staff on their performance should be solicited from `supervisors, colleagues, staff, students and others with whom the staff member deals' (section 12b).

All three types of validity should be established to minimise systematic error. Establishing face and criterion validity alone may be insufficient.

Attempts to assess the validity of teaching effectiveness measures have met with equivocal EQUIVOCAL. What has a double sense.
     2. In the construction of contracts, it is a general rule that when an expression may be taken in two senses, that shall be preferred which gives it effect. Vide Ambiguity; Construction; Interpretation; and Dig.
 results (e.g. Cashin, Downey, & Sixbury, 1994; Marsh, 1994, 1995). In a review of validity studies that tested for both criterion and construct validity, Abrami et al. (1990) found that many rating factors have low correlation with student learning. Additionally, using factor analysis, Kremer (1990) obtained only moderate construct validity for three teaching evaluation measures, namely peer ratings, student ratings, and teaching awards.

Rather than attempting to establish the validity of new or current instruments, universities could use surveys in which reliability and validity have already been established. In using such a tool, professional researchers refrain from significant customisation for fear of invalidating in·val·i·date  
tr.v. in·val·i·dat·ed, in·val·i·dat·ing, in·val·i·dates
To make invalid; nullify.



in·val
 the results. Databases of questions, from which many combinations of instruments could be drawn, will not be assured of validity and are unable to be tested for internal consistency (the latter because not enough items for each construct are likely to be included).

Cashin et al. (1994) note that different courses have different teaching objectives, and that various teaching methods will be differentially related to student achievement. Biggs (1991) comments that teachers should engage students in ways appropriate to the task, and that the thrust of teaching should vary according to the discipline. This implies that a standard evaluation form will not be appropriate for all subjects, for all courses, or for all fields of study.

The validity of SOSs remains uncertain. Although surveys should adequately represent the qualities of effective teaching (Abrami & d'Apollonia, 1991), a cogent COGENT - COmpiler and GENeralized Translator  and unambiguous definition of effective teaching is not easily identified. Abrami et al. (1990) conclude that, despite decades of testing student ratings for validity, further work is required. Perhaps future research should concentrate on defining effective teaching to enable rigorous testing.

Are student surveys beneficial?

There is general agreement that SOSs can serve a formative formative /for·ma·tive/ (for´mah-tiv) concerned in the origination and development of an organism, part, or tissue.  purpose by providing information that might help teachers improve their teaching (e.g. Marsh 1994; Ramsden, 1991). Ingvarson (1994) claims that evaluation encourages continual improvement Continual Improvement (also called incremental improvement or staircase improvement) is a process or productivity improvement tool intended to have a stable and consistent growth and improvement of all the segments of a process or processes.  in the quality of teaching and learning. Ramsden and Dodds (1989) suggest that evaluations could be legitimately requested by a faculty member to gauge the effect of new and innovative teaching methods.

However, feedback from student ratings apparently contributes only modestly to improvements in the quality of teaching. For example, L'Hommedieu, Menges and Brinko (1988) found that, although the use of student ratings is positively associated with instructional improvement, the effect is small and of little practical significance. Ramsden (1991) argues that voluntary, quasi-voluntary or compulsory survey evaluations are unlikely to encourage behavioural Adj. 1. behavioural - of or relating to behavior; "behavioral sciences"
behavioral
 change. He claims that teachers who improve as a result of these evaluations are those who would have improved regardless of whether or not they had been evaluated.

Survey results are also commonly linked with hiring, promotion and tenure decisions and thus serve the summative Adj. 1. summative - of or relating to a summation or produced by summation
summational

additive - characterized or produced by addition; "an additive process"
 purposes of management. Some authors argue against these summative uses on the grounds that there is a lack of evidence linking staff appraisal with betters' teaching in higher education (e.g. Ramsden, 1991). Others advise caution due to the inconsistent reports on SOS validity, particularly in regard to student learning (e.g. Abrami et al., 1990). We conclude that, according to the literature, SOSs can be put to beneficial formative uses, but that their summative uses are dubious. It is not our intention to attempt to resolve the latter issue. Our interest is to examine current practices in view of present knowledge.

Methodology

We analysed the SOSs in use at 8 Victorian and New South Wales New South Wales, state (1991 pop. 5,164,549), 309,443 sq mi (801,457 sq km), SE Australia. It is bounded on the E by the Pacific Ocean. Sydney is the capital. The other principal urban centers are Newcastle, Wagga Wagga, Lismore, Wollongong, and Broken Hill.  universities, and conducted interviews with people responsible for the design and use of the surveys. A total of 30 surveys, containing 836 questions, were collected; 26 discreet questionnaires were received from 4 universities, and databanks of questions were received from the other 4 universities. To preserve confidentiality, the participating universities are codified cod·i·fy  
tr.v. cod·i·fied, cod·i·fy·ing, cod·i·fies
1. To reduce to a code: codify laws.

2. To arrange or systematize.
 using the letters A to H. The letter `R' preceding this university code indicates an interview at the university corresponding to that code. Two respondents were interviewed at university E and are identified as RE1 and RE2.

Content analysis was conducted in which each question was categorised Adj. 1. categorised - arranged into categories
categorized

classified - arranged into classes
 according to its meaning. Categorisation determines the number of questions asked per construct in each survey. Three broad dimensions were used:

1 Service quality has received considerable attention in the marketing literature. Most notably, Parasuraman, Zeithaml, and Berry's (1988) SERVQUAL SERVQUAL Service Quality  instrument has been recommended (Kotler & Fox, 1995) and used (DeSouza & Soutar, 1990) for measuring service quality at tertiary institutions. Using SERVQUAL-type categories, a total of 14 categories, containing 76 sub-categories, were developed that pertain to pertain to
verb relate to, concern, refer to, regard, be part of, belong to, apply to, bear on, befit, be relevant to, be appropriate to, appertain to
 aspects of service quality in higher education.

2 Satisfaction comprises a number of factors, including: student enthusiasm and demographics The attributes of people in a particular geographic area. Used for marketing purposes, population, ethnic origins, religion, spoken language, income and age range are examples of demographic data.  (McInnis, James, & McNaught, 1995), life-satisfaction (Rain, Lane, & Steiner, 1991), attractiveness of academic success (Geiger & Cooper, 1995), affiliation or congruency con·gru·en·cy  
n. pl. con·gru·en·cies
Congruence.
 with the university (Nafziger, Holland, & Gottfredson, 1973) and the desire to study further (Abrami et al., 1990). For the satisfaction dimension, 6 categories with 32 sub-categories were developed.

3 Learning was represented by 5 categories with 34 sub-categories, covering areas such as independent/dependent learning, student learning outcomes, and student characteristics. Following categorisation, sub-categories were collapsed to form 25 main categories to increase frequency counts to reasonable levels but without appreciable ap·pre·cia·ble  
adj.
Possible to estimate, measure, or perceive: appreciable changes in temperature. See Synonyms at perceptible.
 loss of information.

Questions that could be classified in more than one sub-category are ambiguous. For example, `the academic support for this subject was appropriate' is ambiguous because the attributes that students may consider as `academic support' are not obvious. The number of sub-categories per question is an indication of the extent of ambiguity. To prevent ambiguous questions over-weighting frequency counts, fractional fractional

size expressed as a relative part of a unit.


fractional catabolic rate
the percentage of an available pool of body component, e.g. protein, iron, which is replaced, transferred or lost per unit of time.
 frequencies were allocated such that each ambiguous question contributed a total frequency of unity. For example, a question classified in four sub-categories was coded as a count of 0.25 in each sub-category.

Five interviews were conducted at four universities with senior people responsible for, or otherwise closely associated with, survey design and analysis. The purpose of these interviews was to determine analysis procedures and uses of SOS data, and to determine if recommendations from the literature are practised practised
Adjective

expert or skilled because of long experience in a skill or field: the doctor answered with a practised smoothness

Adj. 1.
 in aspects not evident from our content analysis.

Results

Subject vs. teacher evaluations

Some survey titles indicate that the instrument ostensibly evaluates either the `subject' or the `teacher' (13 surveys in total from universities E, F and G). These surveys were analysed to determine whether there is a difference between the types of questions asked. For this purpose, questions were categorised as being specific to teacher behaviour, specific to subject matters (i.e. within the control of the subject leader or principal lecturer) or as being indistinct in·dis·tinct  
adj.
1. Not clearly or sharply delineated: an indistinct pattern; indistinct shapes in the gloom.

2. Faint; dim: indistinct stars.

3.
. The frequencies were compared using the chi-square test chi-square test: see statistics. , and in each case it was found that the survey titles are associated with different emphases (university E: [chi square chi square (kī),
n a nonparametric statistic used with discrete data in the form of frequency count (nominal data) or percentages or proportions that can be reduced to frequencies.
] (p [is less than] .05, df=2, n=85) = 36.57; university F: [chi square] (p [is less than] .05, df=2, n=68) = 51.67; university G: [chi square] (p [is less than] .05, df=2, n=80) = 19.52).

Teacher surveys contain a large proportion (78%) of questions that concern the teacher's behaviour. However, subject surveys contain a mixture of questions, including aspects of teacher behaviour and institutional infrastructure. Overall, only 21 per cent of these surveys' questions evaluate subject matters, 24 per cent of the questions evaluate teacher behaviour, and indistinct questions are predominant (55%). An example of the latter is: `The library was able to provide me with the references I required'.

Survey administration

Interviews revealed that SOSs are considered to be vehicles for improving teacher performance, and that results are essentially a requirement for promotion or tenure applications (universities B, D-G). For example, university E publishes a book to assist those applying for academic promotions, wherein where·in  
adv.
In what way; how: Wherein have we sinned?

conj.
1. In which location; where: the country wherein those people live.

2.
 it is stipulated that SOS results should be supplied. However, the mechanisms in place to ensure that quality improvements follow from survey results are indirect. For example, although surveys evaluating a subject are compulsory over a two to five year cycle (RB, RD, RE1, RE2, RF), mechanisms to enforce the required frequency are generally weak. Similarly, RB, RD and RF report that surveys designed to evaluate teaching are voluntary and that the results are confidential to the requesting teacher. An exception is university E where teacher evaluations are also compulsory (every two years, RE1) and the teacher does not know when they are going to be evaluated until three to four weeks prior to the date (RE2). Teachers can also request a survey voluntarily. Here results are routinely distributed to heads of departments; in some disciplines, the results are posted on notice boards visible to all (RE1).

All interviewees reported a growing use of teacher surveys. For example, university F now conducts an estimated 500 surveys per semester se·mes·ter  
n.
One of two divisions of 15 to 18 weeks each of an academic year.



[German, from Latin (cursus) s
 compared with fewer than 100 three years ago. It is likely that this growth is driven by promotion and tenure requirements. SOSs are administered near the end of the subject, but before final examinations or grades are known. This timing coincides with the most anxious period of study for students.

Survey design

None of the discreet surveys contain sufficient items for testing all constructs included; overall, only 21 per cent of constructs contain 3 or more questions. At most universities with discreet surveys, there was little indication that questionnaire design is important. Where questions are drawn from a database, teachers select their own questions. Question ordering is generally not considered.

There are many ambiguous questions in the surveys, especially in those from universities B, F and G (see Table 1). This high level of ambiguity indicates a serious lack of reliability and construct validity. Few surveys contain questions that measure influencing factors. For example, 8 per cent of questions dealt with instructor characteristics (instructor empathy); however, fewer than 3 per cent of questions dealt with student or course characteristics, and instructor gender received no attention. Furthermore, reverse-ordered (negatively worded) questions are scarce, with only 3 surveys (out of 26) containing any such questions. In those 3 surveys, the reverse order questions comprise fewer than 5 per cent of the total number of questions. Although students are not usually deemed qualified to evaluate the content of a subject, surveys from six universities (A-F) contain questions on subject content (between 4-11%).
Table 1 Percentage of questions allocated to more than one category

                  2 or more    3 or more    4 or more
University        categories   categories   categories

A                    16%           --           --
B                    39%          17%           --
C                    18%           1%           1%
D                    16%           2%           1%
E                    21%           1%           --
F                    25%           6%           2%
G                    28%           5%           2%
H                    17%           --           --
Total freq.          179           23            8
Total freq. (%)      21%           3%           1%

                  Total no. of
University         questions

A                      56
B                      18
C                     163
D                     129
E                     203
F                     173
G                      88
H                       6
Total freq.           836
Total freq. (%)       100%


Reporting of results

Once SOS data have been collated, a report is issued in which aggregated responses to each question are presented in the form of histograms, means, and standard deviations In statistics, the average amount a number varies from the average number in a series of numbers.

(statistics) standard deviation - (SD) A measure of the range of values in a set of numbers.
. Covariance Covariance

A measure of the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely.
 information is not provided, which means that users of the information cannot discern dis·cern  
v. dis·cerned, dis·cern·ing, dis·cerns

v.tr.
1. To perceive with the eyes or intellect; detect.

2. To recognize or comprehend mentally.

3.
 whether those students who gave low or high ratings to one item also gave low or high ratings to another. An exception occurs at university A, where teachers receive raw data on floppy disk.

Specific guidance on the results is rarely given to individual teachers, unless the teacher seeks further advice from academic development staff. It was reported that very few teachers actually do this; for example RD remarked that only three teachers sought further advice in the last semester. RD, RF and RB acknowledged that the results could be incorrectly interpreted, leading to possible misuse and damage. For example, RF considered that interpreting poor results as a mark against the teacher would constitute misuse. Similarly, RD considered that interpreting the results as evidence of `good' or `bad' teaching would be a misuse. Despite such reservations, the eventual distribution of results to others (superiors, promotion committees, etc.) encourages superficial and erroneous erroneous adj. 1) in error, wrong. 2) not according to established law, particularly in a legal decision or court ruling.  interpretation.

Although knowing that the surveys are used for both summative and formative purposes, RB, RD and RE2 believe that results may be influenced by students' different perspectives and experiences, and by the way the students feel on the survey day. Conversely con·verse 1  
intr.v. con·versed, con·vers·ing, con·vers·es
1. To engage in a spoken exchange of thoughts, ideas, or feelings; talk. See Synonyms at speak.

2.
, others thought that the results were objective because students are considered astute as·tute  
adj.
Having or showing shrewdness and discernment, especially with respect to one's own concerns. See Synonyms at shrewd.



[Latin ast
 (RE1) and because the surveys are thought to be free of judgemental components (RF).

Discussion

Quality assurance

Although SOSs were largely developed as part of quality assurance initiatives, the present mechanisms are insufficient to ensure that improvements are properly enabled and subsequently made. Teachers are almost exclusively evaluated on the basis of student opinion surveys, and particularly on their perceived communication abilities (which accounts for 13% of all questions). However, universities provide staff with little or no specific advice on teaching on the basis of SOS results, nor do they commonly solicit feedback and information regarding teaching from the other sources recommended in academic literature. Making improvements is left to the individual teacher. This state of affairs has several implications. First, it presumes that surveys give useful feedback; however our results show that this may not be the case. Second, if a teacher receives low ratings, they may not know how their teaching should be altered or might not be inclined to make changes. Third, confidentiality precludes departments and faculties from knowing what aspects of individual teacher performance should be improved. Fourth, voluntary requesting of SOSs means that those teachers who might benefit from feedback may not receive it. Fifth, the timing of administering SOSs does not allow teachers the opportunity to respond or provide direct feedback to students. This precludes making changes for those who indicated a need for change, and the next class may have different requirements.

Survey reliability and validity

It is axiomatic ax·i·o·mat·ic   also ax·i·o·mat·i·cal
adj.
Of, relating to, or resembling an axiom; self-evident: "It's axiomatic in politics that voters won't throw out a presidential incumbent unless they think his challenger will
 in reputable survey methodology that instruments are tested for reliability and validity before the data can be considered usable. With the discreet surveys, attempts to establish internal consistency (reliability) would prove futile, as most (79%) of the constructs were allocated fewer than three questions. Furthermore, the high levels of ambiguity indicate severe systematic bias, which produces invalid ratings. For example, the question, `Did the tutor have a professional attitude?' is likely to produce unreliable and invalid data. Students are ill-equipped to discern professionalism, and are in no position to determine the attitude of another person. In one instance, the question, `Did classes start on time?' was asked in the last week of semester when nearly all students were present; in preceding weeks only 15-20% of students had attended classes. Therefore 80-85% of respondents could not properly comment on class starting times Noun 1. starting time - the time at which something is supposed to begin; "they got an early start"; "she knew from the get-go that he was the man for her"
commencement, get-go, offset, outset, showtime, start, kickoff, beginning, first
, yet responded to this question none the less. Even questions that appear to be straightforward can produce erroneous results.

Basic survey design

Question phraseology phra·se·ol·o·gy  
n. pl. phra·se·ol·o·gies
1. The way in which words and phrases are used in speech or writing; style.

2.
, question order, the number of questions used to measure the same construct, and whether questions are positively or negatively expressed, all affect an instrument's reliability and validity (Churchill, 1979).

Questions should be clear, specific, and reflect the university's realistic expectations of how staff should perform. The question, `The academic support for the subject was appropriate (e.g., counselling, advice and help with problems)' is unrealistic. Although academics may be expected to counsel students on approaches to learning, students may interpret `counselling' at a more personal level since the word is more closely associated with social workers than with academics.

Students cannot properly assess the breadth, accuracy or up-to-dateness of curriculum content (Eley, 1994; Ramsden & Dodds, 1989) because they are not expert in any given field. However, questions concerning subject content were included in 16 of the 30 surveys from universities A-F. The types of questions or surveys were not changed according to the respondents' year of study, level of preparation, depth of learning, course objectives, etc., despite the fact that learning objectives and teaching styles may vary between disciplines and year levels (Cashin et al., 1994; Eley, 1994). Four of the universities examined in this study use standard SOSs, implying that teaching is standard for all teachers in all class settings. A standard survey is unlikely to be adequate.

Formative vs. summative uses

University managers use SOS results for summative purposes despite warnings to the contrary in the literature. Several factors evident from this study compound the dangers. First, the validity of the instruments has not been established, rendering the data useless for any professional purpose. Second, when making promotion, tenure and hiring decisions, SOS results should not be used alone (Ramsden, 1991). Our study shows that other methods for evaluating teaching are not commonly used. Third, the critical variables and conditions that affect the interpretation of SOS results are largely unmeasured. This means that both the summative and formative uses of current SOSs are dubious. Without any independent checks in place to temper SOS results, teachers receiving low ratings have little choice but to infer, perhaps mistakenly, that their teaching performance is poor. Only 25 questions (3% of the total) deal with course, instructor or student characteristics, and even then, it remains unclear how the information is used to contextualise results. At the least, teachers and managers should be advised of the limitations of SOSs and encouraged to interpret the data with consideration of extraneous ex·tra·ne·ous  
adj.
1. Not constituting a vital element or part.

2. Inessential or unrelated to the topic or matter at hand; irrelevant. See Synonyms at irrelevant.

3.
 influencing factors and other methods of evaluation.

The direction of assessment is unusual and inconsistent. Personnel assessment is normally a `top-down' activity; however, universities have implemented `bottom-up' assessment of teaching staff. To our knowledge, this principle rarely extends to non-teaching staff, despite the recommendation of the Hoare Report (Higher Education Management Review Committee, 1995) that `employee satisfaction should form part of the assessment of the performance of the Vice-Chancellor and leaders at every level of the university' (section 11b). Combined with the lack of complementary methods, supervisors of teaching staff have largely devolved their responsibility for assessing teaching to students.

Conclusions

The known usefulness of well-designed student opinion surveys (SOSs) is modest. When questionnaires are thoroughly tested and extraneous factors accounted for, they appear to have some value for teachers who are willing and able to use the feedback for improving their teaching. The problem of adjusting questionnaires to the different teaching and learning objectives of various disciplines, although ensuring that each survey is a reliable and valid instrument, remains unresolved in the student ratings literature. Furthermore, the ill-defined concept of teaching effectiveness and its many constructs hinders the establishment of validity of these surveys.

In their current form, surveys used by universities may be doing more harm than good. Unsupported assumptions associated with SOSs are prevalent, with few checks in place to temper results. Under the guise Guise (gēz, gwēz), influential ducal family of France. The First Duke of Guise


The family was founded as a cadet branch of the ruling house of Lorraine by Claude de Lorraine, 1st duc de Guise, 1496–1550, who received
 of quality assurance, the surveys are widely implemented as an agent of change and conformity. Claims of continuous improvement and accountability cannot be taken seriously when, despite literature warnings of the dangers, data from SOSs are invariably in·var·i·a·ble  
adj.
Not changing or subject to change; constant.



in·vari·a·bil
 the sole basis of evaluation. Complementary evidence of student learning and other methods of evaluation are largely neglected.

In an attempt to make university teachers accountable, current practices limit teacher accountability to the opinions of students. Thus a tool has been placed in the hands of students which has potential for damage. Students can favourably or maliciously use SOSs to affect the destiny of their teachers. This may be to the immediate disadvantage of staff, and to the ultimate disadvantage of successive students since teachers desiring high scores may pander To pimp; to cater to the gratification of the lust of another. To entice or procure a person, by promises, threats, Fraud, or deception to enter any place in which prostitution is practiced for the purpose of prostitution.  to student satisfaction, to the detriment Any loss or harm to a person or property; relinquishment of a legal right, benefit, or something of value.

Detriment is most frequently applied to contract formation, since it is an essential element of consideration, which is a prerequisite of a legally enforceable contract.
 of learning. Frequent surveying is likely to reinforce the role of students as customers. In an era of rising course fees and credential-driven clients, universities concerned with learning should discourage a detrimental customer mentality that emphasises short-term student satisfaction. Although student opinions might be considered, they should not outweigh out·weigh  
tr.v. out·weighed, out·weigh·ing, out·weighs
1. To weigh more than.

2. To be more significant than; exceed in value or importance: The benefits outweigh the risks.
 professional academic and educational standards, broader university objectives, and recommendations drawn from sound empirical research Noun 1. empirical research - an empirical search for knowledge
inquiry, research, enquiry - a search for knowledge; "their pottery deserves more research than it has received"
.

Keywords
reliability
student surveys
teacher evaluation
teaching effectiveness
universities
validity


Acknowledgement

This work was conducted as part of the doctoral studies of Rowan rowan

ash tree which guards against fairies and witches. [Br. Folklore: Briggs, 344]

See : Protection
 Bedggood.

References

Abrami, P. C. & d'Apollonia, S. (1991). Multidimensional mul·ti·di·men·sion·al  
adj.
Of, relating to, or having several dimensions.



multi·di·men
 students' evaluations of teaching effectiveness--generalizability of `N=I' research: Comment on Marsh (1991). Journal of Educational Psychology: 83, 411-415.

Abrami, P. C., d'Apollonia, S., & Cohen, P. A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology, 82, 219-231.

Biggs, J. B. (1991). Teaching: Design for learning. Paper presented at the 16th Annual Conference of HERDSA HERDSA Higher Education Research and Development Society of Australasia . In Bob Ross This article is about the painter and television presenter. For the publisher/activist, see Bob Ross (publisher).
Bob Norman Ross (October 29, 1942 – July 4, 1995) was an American painter and television presenter.
 (Ed.), Teaching for effective learning (pp.11--26). Sydney: Higher Education Research & Development Society of Australasia.

Cashin, W. E., Downey, R. G., & Sixbury, G. R. (1994). Global and specific ratings of teaching effectiveness and their relation to course objectives: Reply to Marsh (1994). Journal of Educational Psychology, 86, 649-657.

Churchill, G. A. Jr. (1979). A paradigm for developing better measures of marketing constructs. Journal of Marketing Research, 16(1) 64-73.

Cohen, P. A. (1990, Fall). Bringing research into practice. New Directions for Teaching and Learning, No.43, pp. 123-132.

Cranton, P.& Smith, R. A. (1990). Reconsidering the unit of analysis: A model of student ratings of instruction. Journal of Educational Psychology, 82, 207-212.

Crooks, T. (1988). Assessing student performance (Green Guide No. 8). Sydney: Higher Education Research & Development Society of Australasia.

DeSouza, M. & Soutar, G. (1990). Measuring service quality in a tertiary institution. Perth: Curtin University of Technology, Division of Business and Administration, Curtin Business School.

Eley, M. G. (1994). Using students' ratings in the evaluation of teaching: Monash University's MonQueST Project (1993 John Smyth John Smyth may be:
  • John Smyth (1570-1612), a founder of the Baptist church
  • John Smyth (1748-1811), British Privy Counsellor in 1802
  • John George Smyth (1893-1983), British MP, Privy Counsellor in 1962, recipient of the Victoria Cross during the First World War
 Memorial Lecture). VIER Bulletin, 72, 3-28.

Geiger, M. A. & Cooper, E. A. (1995). Predicting academic performance: The impact of expectancy and needs theory. Journal of Experimental Education, 63, 251-262.

Higher Education Management Review Committee (D. Hoare, Chair). (1995). Report of the Committee of Inquiry. Canberra: AGPS AGPS Assisted Global Positioning System
AGPS Advanced Government Purchasing System
AGPS Advanced Geo Positioning Solutions, Inc
AGPS Advanced Global Positioning System
AGPS Ameron Global Product Support
AGPS Attitude Global Positioning System
AGPS Assisted Gps
.

Ingvarson, L. (1994). Teacher evaluation for a teaching profession (1993 John Smyth Memorial Lecture, Part 2). VIER Bulletin, 2, 29-68.

Kotler, P. & Fox, K. F. A. (1995). Strategic marketing for educational institutions (2nd ed.). Englewood Cliffs, NJ: Prentice Hall Prentice Hall is a leading educational publisher. It is an imprint of Pearson Education, Inc., based in Upper Saddle River, New Jersey, USA. Prentice Hall publishes print and digital content for the 6-12 and higher education market. History
In 1913, law professor Dr.
.

Kremer, J. F. (1990). Construct validity of multiple measures in teaching, research, and service and reliability of peer ratings. Journal of Educational Psychology, 82, 213-218.

L'Hommedieu, R. L., Menges, R. J., & Brinko, K. T. (1990). Validity issues in meta-analysis: Suggestions for research and policy. Higher Education Research and Development, 7(2), 119-130.

McInnis, C., James, R., & McNaught, C. (1995). First year on campus. Project of the Committee for the Advancement of University Teaching, CSHE CSHE Center for the Study of Higher Education
CSHE California Society for Healthcare Engineering (Sacramento, CA)
CSHE Carnegie School of Home Economics
CSHE Center for the Study of Hate and Extremism
, University of Melbourne
  • AsiaWeek is now discontinued.
Comments:

In 2006, Times Higher Education Supplement ranked the University of Melbourne 22nd in the world. Because of the drop in ranking, University of Melbourne is currently behind four Asian universities - Beijing University,
. Canberra: AGPS.

Marsh, H. W. (1987). Students' evaluations of university teaching: Research findings, methodological issues and directions for future research. International Journal of Educational Research, 11, 253-388.

Marsh, H. W. (1994). Weighting for the right criteria in the Instructional Development and Effectiveness Assessment (IDEA) system: Global and specific ratings of teaching effectiveness and their relation to course objectives. Journal of Educational Psychology, 86, 631-648.

Marsh, H. W. (1995). Still weighting for the right criteria to validate student evaluations of teaching in the IDEA system. Journal of Educational Psychology, 87, 666-679.

Nafziger, D. H., Holland, J. L., & Gottfredson, G. D. (1973). Student-college congruency as a predictor of satisfaction. Baltimore, MD: The Johns Hopkins University Johns Hopkins University, mainly at Baltimore, Md. Johns Hopkins in 1867 had a group of his associates incorporated as the trustees of a university and a hospital, endowing each with $3.5 million. Daniel C. , Center for Social Organization of Schools.

Parasuraman, A., Zeithaml, V. A., & Berry, L. L. (1988). SERVQUAL: A multiple-item scale for measuring customer perceptions of service quality. Journal of Retailing, 64, 12-40.

Peter, J. P. (1979). Reliability: A review of psychometric psy·cho·met·rics  
n. (used with a sing. verb)
The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and
 basics and recent marketing practices. Journal of Marketing Research, 16, 6-17.

Rain, J. S., Lane, I. M., & Steiner, D. D. (1991). A current look at the job satisfaction/life satisfaction relationship: Review and future considerations. Human Relations human relations nplrelaciones fpl humanas , 44, 287-306.

Ramsden, P. (1991). Evaluating teaching; supporting learning. Paper presented at the 16th Annual Conference of HERDSA. In Bob Ross (Ed.), Teaching for effective learning (pp.27-42). Sydney: Higher Education Research & Development Society of Australasia.

Ramsden, P. & Dodds, A. (1989). Improving teaching and courses: A guide to evaluation. Parkville: University of Melbourne, Centre for the Study of Higher Education.

Tatro, C. N. (1995). Gender effects on student evaluation of faculty. Journal of Research and Development in Higher Education, 28, 169-173.

Rowan E. Beggood Robin J. Pollard pollard

fine protein-rich feed supplement for farm animals; a byproduct from the milling of wheat for flour. Called also shorts.
 Monash University Facilities in are diverse and vary in services offered. Information on residential sevices at Monash University, including on-campus (MRS managed) and off-campus, can be found at [2] Student organisations  

Rowan Bedggood is a doctoral student in the Department of Marketing, Faculty of Business and Economics, Monash University, 300 Dandenong Road, Caulfield East, Victoria Caulfield East is a suburb of Melbourne, Australia, in the state of Victoria. It is in the Local Government Area of the City of Glen Eira. Its postcode is 3145. Geography  3145. Professor Robin Pollard is Head of the School of Business and Information Technology, Monash University, 2 Jalan Kolej, Bandar Sunway Bandar Sunway is a town in the Klang Valley conurbation in Selangor, Malaysia. It situated in the district of Petaling Jaya, Selangor. The township is adjacent to the newer but much larger area of UEP Subang Jaya (commonly referred to by its initials, USJ), and Subang Jaya. , 46150 Petaling Jaya Petaling Jaya (commonly called "PJ" by locals) is a Malaysian city developed as a satellite city of Kuala Lumpur. It is located in the Petaling district of Selangor. Petaling Jaya has an area of approximately 97.2 km², arguably the state of Selangor's largest city. , Selangor Darul Ehsan, Malaysia.
COPYRIGHT 1999 Australian Council for Educational Research
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1999, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Pollard, Robin J.
Publication:Australian Journal of Education
Article Type:Statistical Data Included
Geographic Code:8AUST
Date:Aug 1, 1999
Words:5412
Previous Article:Multiple-choice versus short-response items: Differences in omit behaviour.(Statistical Data Included)
Next Article:Effects of single-sex and coeducational secondary schooling on children's academic achievement.(Statistical Data Included)
Topics:



Related Articles
Implementing a Web-Based Adaptive Senior Exit Survey for Undergraduates.
Australia's University Courses: Are They Meeting the Educational Expectations of the National Environment: Health Strategy?(Statistical Data Included)
Quality of student experiences at university: a Rasch measurement model analysis.(Statistical Data Included)
A panel-data study of the effect of student attendance on university performance.(Statistical Data Included)
Expert measures. (Forum).
A teaching note on service-learning through applied community research.
'Unwelcome sisters?' An analysis of findings from a study of how Muslim women (and Muslim men) experience university.
Reducing social work students' statistics anxiety.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles