Combing online and paper assessment in a web-based course in undergraduate mathematics.
Online assessment in mathematics is becoming more prominent as mathematics and the internet become more compatible. This paper is based on a case study of a web-based calculus course in which a combination of paper and online assessment is used for assessing various activities. The paper first investigates what student assessment preferences are and shows that there is a leaning towards a preference for online assessment, yet students still show a preference for a combination of online and paper assessment for term tests. Secondly, the paper shows that standards are maintained when incorporating an equally weighted online component into the assessment model for the case under consideration. The investigations are conducted from a background of having been involved in teaching online mathematics courses for the last four years.
Issues such as why one should consider online assessment, how viable online assessment is and what the role of online assessment is in mathematics, in particular, need to be addressed. Computer technology is establishing itself as playing an integral role in teaching mathematics, yet many teachers of mathematics still shy away from granting technology the same significant role in the assessment process. Whether it is at all possible to sensibly assess mathematics online is an issue that is often regarded with scepticism. Concerns exist about whether valuable skills such as the development of a mathematical argument or the exposition of a problem solution, normally conducted on paper, will be forfeited. On the other hand, one can speculate whether it is possible to completely replace paper assessment in mathematics with online assessment, the precedent for which has been set in a number of other disciplines. If possible, it would cut down on grading work for the teachers and fall in line with trends of increasing technology application. Midway between these options is the possibility of combining online assessment and paper assessment. If this option is considered, then one needs to investigate how to deploy such a combination. In other words, how does one maintain the best features of both worlds while venturing into a new way of doing? Hand in hand with this is another important issue: the question of whether it is possible to maintain standards while assessing mathematics online.
In this paper we address the issues stated above from the perspective of having taught mathematics courses online for a number of years and making use of both online and paper assessment.
A danger amongst mathematics teachers is for assessment to be considered as an add-on to the course and not as an integral part of the curriculum. This is not the case with students. For most students, assessment is the central driving force in the learning process. Teachers spend much time on developing detailed study guides containing carefully formulated learning objectives. The majority of students, however, have as their primary objective passing the course. For this purpose they are quick to consider whether their primary focus should be on preparing according to the study guide or being guided by the content of past papers. In most cases they prefer the latter. What we do not assess, students will not learn. This notion is clearly not ideal and in the long run teachers should work towards changing this attitude, but currently it is reality.
It has become customary to distinguish between four types of assessment. All assessment activities contain elements of one or more of the following components:
* Diagnostic assessment that enables the teacher and the student to detect weaknesses in an individual's or the group's progress continuously;
* Formative assessment where the primary purpose is to provide feedback to students on their progress;
* Summative assessment where the main objective is to generate a mark for grading purposes, and
* Accountable assessment certifying public accountability.
Diagnostic and accountable assessments are mainly of interest to the teacher, whereas formative and summative assessments are more in the interest of the student. In general, students are mostly interested in the summative component. They are concerned about the accountability and realize the value of formative assessment. Yet formative assessment is only effective if students make mistakes in a test, mistakes from which they can learn. Students with full marks for a test do not truly benefit in a formative sense. At most they benefit from peace of mind and increased confidence. So to benefit formatively from a test, a student has to lose marks in the test, not a popular occurrence in a student's life.
In formal tests, we like to distinguish between two types of questions, Provided Response Questions (PRQ), in which possible answers are provided, such as Multiple Choice Questions (MCQ), multiple response questions and matching questions; and Constructive Response Questions (CRQ), in which students have to construct their own answers. Both formats are feasible in both paper and online assessment environments. More detail is provided in Engelbrecht and Harding (2003).
Why Online Assessment?
The answer to the question why one should venture into online assessment probably lies in the fact that more and more situations are presently created where online assessment seems the obvious route to take. The most common of these stems from the idea that there is a core of knowledge that is essential in any particular course before problem solving could even be attempted. The process of assessing this core of knowledge, the "must knows," is also sometimes referred to as gateway testing. This type of testing is particularly suited to online assessment.
It is important to note, however, that online assessment is not only used to test the "must knows." The traditional perception is that MCQs can only be used for testing lower level cognitive skills. This is not true, according to Hibberd (1996): "... they can be implemented to measure deeper understanding if questions are imaginatively constructed."
From the teacher's perspective the value of decreased grading time when assessing online should not be underestimated. When working with large groups of a hundred and more students, the grading load impacts on valuable research time and affects staff budgets.
Online assessment has the further advantage of enabling the teacher to readily obtain question-by-question profiles. Subsequent refinement of questions and tests can be carried out. The empirical data that become available make online testing a valuable diagnostic instrument. The objective of every MCQ should be clearly understood and a careful selection of the distracters can itself be utilised to provide diagnostic information.
The issue of assessment innovation is addressed widely, such as by the United Inventors Association in the U.S.A. with their Innovation Assessment Program (United Inventors Association), the Freudenthal Institute in the Netherlands (Freudenthal Institute) with their programme on Research on Assessment Practices, the Center for Innovation in Assessment at the Indiana University (Indiana University) and the MathSkills Discipline Network (The MathSkills Discipline Network) in the United Kingdom.
However, we still venture to say that in many instances much effort is expended on curricular innovation without the same effort being applied to assessment innovation. Many academics claim that they are "reformed teachers" without thinking seriously about new and innovative ways of assessing. Innovation in teaching methods requires innovation in assessment.
When teaching almost exclusively by means of technology, such as in a web-based environment, the foundations are laid for assessing online and it seems almost natural to assess online. This paper largely deals with such a case study. If technology is incorporated in the presentation of the course, it makes little sense to avoid technology in the assessment part of the course. According to Gretton and Chalis (1999),
Assessment has various purposes. Is it for grading and sorting students? Is it for encouraging learning? The answer is yes to both, but when both technology and students' skills are evolving so rapidly, then assessment style must also evolve to ensure it continues to fulfil these objectives. Smith and Wood (2000) believe that ... appropriate assessment methods are of major importance in encouraging students to adopt successful approaches to their learning. Changing teaching without due attention to assessment is not sufficient.
The National Council of Teachers of Mathematics 1995 Assessment Standards (NCTM) states that if students are to increase their mathematical power, a number of related shifts in assessment practice are warranted, including a shift toward using multiple and complex assessment tools such as performance tasks, projects, writing assignments, oral demonstrations, and portfolios, and away from sole reliance on answers to brief questions.
Critics view traditional paper-and-pencil tests as inadequate measures of student abilities because they fail to elicit complex thinking and deep subject matter understanding (Frederikson, 1984; Resnick & Resnick, 1992). This need not be the case but in traditional tests the emphasis is often on manipulative skills or rote learning.
How Should Online Assessment be Applied?
Although it is not possible to lay down rigid rules as to how assessment should best be conducted, we depart from the premises that assessment should match the mode of presentation. Based on the assumption that learning styles differ amongst students (Litzinger & Osif, 1993), the first issue to be investigated is to what extent assessment preferences differ. Assessment possibilities include online, paper and oral assessment, none of which need be used exclusively. Even within each of these categories, the possibilities are numerous. The second issue to be investigated is whether and how a combination of different modes of assessment could be used.
The Research Environment
We turn to our own experience in teaching online courses to address the issues above.
The teaching model. In 2000 the first web-based calculus course was introduced at the University of Pretoria. Experiences and findings are reported in Engelbrecht and Harding (2001(1), 2001(2), 2002). Due to the success of the project, it has been expanded to three successive web-based courses. So there are students now that have completed three semesters of calculus on the web. Large groups of students are involved, up to two hundred students per course.
All the web-based calculus courses run along the same model. A textbook (Stewart, 1999) is assigned and the student is guided through the course on a dynamic, day-by-day basis. The course provides for one contact hour per week, a discussion session. The platform is WebCT, the reason being that the university subscribes to this software and so provides the necessary infrastructure and support. The study material is broken down, first into themes and secondly into units, each of which provides for more or less a daily portion. For each of the units detailed study objectives, short lecture notes, and problems of the day are provided. None of these activities is monitored and they therefore require a fair amount of self-discipline from the student's side. Because of the importance of this matter, all activities are geared towards cultivating self-discipline. Communication takes places via online discussion forums and e-mail.
The courses considered in this paper are first and second semester calculus courses for the first year for the period 2001 to 2002. After trial runs in 2000, these were the first two years of running these web-based courses formally. Two teachers (the authors) were involved in working cooperatively but alternating in taking the main responsibility of the course. Test papers were drawn up jointly.
The assessment model. Online assessment forms an integral part of the assessment strategy. The most basic assessment activity is a weekly quiz done on the web with rapid feedback. Students do these quizzes individually. Although there is no security check, it does contribute to the semester aggregate and students soon get to use it as a formative tool and as a fair judge of their progress.
Also as a formative tool but in a totally different style of assessment is the assignment and project activity of the course. Students hand in four hard copy assignments and one project during the semester. Assignments mainly consist of selected problems and the project requires the use of mathematical software such as Matlab or Maple. The assignments and the project are graded, after which the solutions are published on the website. These activities are done in group regard and the benefit of cooperative learning for this mode of learning is discussed in Engelbrecht and Harding (2002).
The two formative tools discussed above differ in their role of contributing to the learning process but complement each other. Whereas the online quizzes assess almost continuously in an environment suited to the presentation style, the assignments cultivate discussion and the art of writing mathematics. Whereas the online quizzes have a time limit, simulating examination conditions, the assignments offer open-ended time. The quizzes offer no partial credit but the assignments do. Online quizzes consist of MCQs, as well as Constructed Response Questions (CRQs), whereas assignments consist only of CRQs.
For the two semester tests and the final examination, two modes of assessment are again combined. Each of these tests consists of an online as well as a paper section carrying equal weight. The online and paper sections of the semester tests under discussion differ in nature. The paper section lends itself to problems involving exposition of the solution, formulation of concepts, graphing and modelling of situations. Grading allows partial credit, a feature that is normally absent in online assessment.
The online section is done in a computer lab under supervision. The online section consists mainly of MCQs, single answer questions, and other related types of questions like matching and multiple response. PRQs, such as MCQs, lend themselves to assessing concept understanding, whereas online CRQs, such as single answer questions, are used for assessing the result of calculations on manipulations ranging over a number of steps.
It is clear that we have different objectives with the paper and online sections of the semester tests, and we are convinced that each of these modes has its place in the assessment system and that the two modes have different advantages. Having been exposed to the weekly quizzes, students are familiar with this way of assessment and again it fits in with the mode of instruction. On the other hand, as is the case with the assignments, the paper section assesses skills such as formulation, exposition, and sketching that are not possible to assess online. From the lecturer's perspective, there is the added benefit of reduced grading and the diagnostic features of the online section.
When setting an online test, it is important to use a combination of question types. Students tend to perform better in online MCQs than in online CRQs and better in paper CRQs than in online CRQs. This issue is discussed in detail in Engelbrecht and Harding (2003).
Online CRQs requiring a single answer are often criticised both by teachers and students for the lack of partial credit. Perhaps there is merit in training students to work with care in order to deliver the completely correct product and not to rely on partial credit. Single answer CRQs without partial credit are fine as long as this forms part of the assessment process and is not the only way of assessing. This matter is discussed in more detail in Engelbrecht and Harding (2003).
The aim in constructing the model was to incorporate each of the diagnostic, formative, summative, and accountability components of assessment.
Student Assessment Preferences
We discuss responses to a questionnaire on preferred assessment modes, issued to a group of 106 first year calculus students. These students were all doing an online course for the first time and the questionnaire was issued towards the end of the semester. The purpose of the questionnaire was simply to determine with what mode of assessment they felt more comfortable. The understanding and learning experienced by the students through the different modes of assessment were not investigated.
In this group the majority (56.6%) stated that they prefer online assessment, 21.7% prefer paper assessment and 21.7% of the students have no particular preference. Although the majority of the students prefer online testing, almost half of the students either prefer paper tests or a combination of the two modes of assessment.
Students were asked to give reasons why they prefer a particular mode of assessment. We list reasons mostly given, illustrated by quotes.
Reasons given for online preference:
* The absence of examination stress.
It makes maths seem fun and stress free, I say this because I tend to stress when I get into an exam situation then I panic and forget everything I've learnt. Also, I hate being around people who stress after the paper; it really gets my spirit down! (Student 21083119)
This student refers in particular to the quizzes that can be done any place, any time. This response is somewhat surprising because it was anticipated that students would experience the "all-or-nothing" situation in single answer questions in online tests as a reason for stress.
* The immediate feedback and availability of the results.
You know exactly where you stand, i.e., you know how much time you have left. It is neat and you get your mark immediately. (student 23217538)
* Suitability for formative assessment.
I prefer to see my results immediately so I can see if I need to further study the week's work or I am "up to date." (student 23031434)
* Flexibility of the online environment.
I like the fact that I can do the test on any computer, on the internet. This means that I do not even have to go to the university to do a test. (student 23238152)
It allows me to organise my own schedule and plan my learning activities a lot better. I know what to expect ... (student 23030713)
* Being exposed to modern technology
By incorporating computers into the module, we learn far more than just mathematics. (student 23134772)
Reasons given for paper preference:
* The rigid way of marking in online assessment, no partial credit.
The computer-based test doesn't leave room for rough work where the examiner could find one or two marks, which can make a difference between passing and failing. (student 23167158)
* Difficulty of adapting to an unfamiliar way of testing.
I think better when I sit and write, then I see what I think. (student 23094436)
Comments offered by students with no preference include:
With both computer and written tests we can get "the best of both worlds" having equal usage of both. (student 23148978) Both are equally acceptable. I enjoy the computer modules more but find the written section more practical since you don't always have a computer with you ... (student 23319055) Both have advantages and disadvantages; you can guess and sometimes get something right not knowing anything and on the other hand on paper you can get marks for steps so it balances, however doing both simultaneously has a much better effect. (student 23163692)
On what the relative contributions of the online tests should be to their total grade for the course, the leaning is towards a bigger contribution from the online component as can be seen from Figure 1. This response corresponds to the preference for online assessment discussed earlier.
Although students tend to prefer a heavier online portion, only 8.7% of the students are of the opinion that no paper tests should be written, pointing to the difference in student preferences.
[FIGURE 1 OMITTED]
Maintaining the Standard
Having given an example of how assessment styles can be combined, the question then is whether it is possible to set an online examination in which the performance standard is maintained. In addressing the issue of accountability, there is a concern that standards are sacrificed for the sake of accommodating online assessment.
Judging the level of difficulty of MCQs is not easy. Early research by Lorge and Kruglov (1952) show that mathematics teachers can judge the relative difficulty of MCQs fairly well but cannot predict the pass rate. Finney et al. (1999) report that pooled judgment is better than individual judges' opinion. It stands to reason that judging the degree of difficulty and predicting performance of online questions is something one instinctively becomes more proficient at the longer the involvement, just as is the case with paper questions.
In an attempt to compare the performance in paper sections of tests with the online sections, data were collected on performance in both the online and the paper sections over a period of two years from eight semester tests.
The average performance as a percentage for each group taking the particular test is given in Table 1. The Pearson correlation coefficients of the students' performance in the paper and online sections in each of the tests are also included in this table.
The reason for determining these correlation coefficients is to get an idea of whether the same students do well in both the online and paper sections of the tests, or whether a difference in performance indicates a difference in preference in the two sections.
The results in Table 1 on the average performance in the tests are represented graphically in Figure 2.
[FIGURE 2 OMITTED]
* There has never been any "disturbing" difference between the online and paper sections. The biggest difference of 14.81% still is acceptable in light of the fact that there is a fluctuation of 30.43% in the paper tests and of 27.09% in the online tests.
* Students do seem to perform slightly better in the online section in general, although this is marginal in most cases and not even always the case.
* The differences in performance in the online and paper sections seem to be getting smaller as time progresses. Setting online quizzes is a skill that has to be cultivated and although still far from having perfected it, it seems as if there is improvement.
* Although most of the correlations between the online and paper sections in Table 1 are significant on the 5% level, the values are still relatively low. The somewhat low values should not be viewed negatively. In fact, this could be an indication that some students perform better in the online section and others better in the paper section and that the combination accommodates students with different learning styles.
* The overall Pearson correlation coefficient between performances of individual students in the online and paper sections of all tests is 0.27. Taken over a sample of 896 students, this is highly significant even on the 0.1% level of significance. The high correlation, together with the comparability of the averages, confirms the notion that although individuals could perform better in either one of the sections, overall the standards are maintained.
To get an indication of the standards of the online and paper sections, three experienced colleagues independently rated the level of difficulty of each question in each of the tests on a four-point scale in which 1 indicates an easy question and 4 a difficult question. A weighted average level of difficulty for each of the tests could then be calculated and these ratings are illustrated in Figure 3.
Figure 3 indicates no noteworthy difference in the level of difficulty between the questions in the paper and online sections, and supports all the earlier observations.
When comparing performances in the online and paper sections above, averages were considered. Unfortunately, an average blocks out the distribution of marks within a certain group and valuable information could be lost. We have a closer look at the performance distribution for a particular test, the 2002 second semester test 1, where the average marks in the online and paper sections were almost identical. The distribution of marks is graphically represented in Figure 4. Both sets of marks are reasonably normally distributed, although the paper marks are more centred towards the average.
[FIGURE 3 OMITTED]
[FIGURE 4 OMITTED]
All assessment is ultimately subjective: there is no such thing as an "objective test." Even when there is a high degree of standardisation, the judgement of what concepts or techniques are assessed and what constitutes a criterion of satisfactory performance lies in the hands of the assessor.
Choosing between online assessment, paper assessment, or a combination is ultimately also a subjective decision. From the collected empirical information on student assessment preferences, it seems as if the assessment model under discussion provides for different assessment preferences. The combination falls in line with student preferences and corresponds to the structure of the teaching model.
The high correlation between performances in the online and paper sections, together with the comparability of the averages, confirms the notion that overall performance standards can be maintained when combining assessment modes.
Figure 1. Students' preference on the contribution of online tests to the total mark. 0% 1.9 30% 14.4 50% 39.4 70% 35.6 100% 8.7 Note: Table made from bar graph. Table 1 Comparison of Performance in Online and Paper Sections of Semester Tests Number of Paper Online Correlation Students Section Section Coefficient 2001 sem 1 test 1 87 46.29 58.30 0.10 2001 sem 1 test 2 87 40.86 53.45 0.13 2001 sem 2 test 1 180 51.96 47.49 0.49 2001 sem 2 test 2 180 54.52 69.33 0.39 2002 sem 1 test 1 170 51.88 61.56 0.36 2002 sem 1 test 2 170 71.29 74.58 0.29 2002 sem 2 test 1 83 54.35 54.62 0.58 2002 sem 2 test 2 83 46.51 50.82 0.35 Averages 52.21 58.77 0.27
Engelbrecht, J. & Harding, A. (2001). WWW Mathematics at the University of Pretoria: The trial run. South African Journal of Science, 97(9/10), 368-370.
Engelbrecht, J. & Harding, A. (2001). Internet Calculus: An option? Quaestiones Mathematicae, Supplement 1, 183-191.
Engelbrecht, J. & Harding, A. (2002). Cooperative learning as a tool for enhancing a web-based Calculus course, Proceedings of the ICTM2, Crete, July.
Engelbrecht, J. & Harding, A. (2003). Online assessment in mathematics: Multiple assessment formats, New Zealand Journal of Mathematics, 32 (Supplement), 57-66.
Finney, S. J., Smith, R. W. & Wise, S. L. (1999). The effects of judgment-based stratum classifications on the efficiency of stratum scored CATs, Paper presented at the annual meeting of the National Council on Measurement in Education, Quebec, (23 pages).
Frederikson, N. (1984). The real test bias, American Psychologist, 39, 193-202.
Freudenthal Institute, Research on Assessment Practices, Retrieved September 2, 2003, from http://www.freudenthal.nl/en/projects/.
Gretton, H. & Chalis, N. (1999). Assessment: Does the punishment fit the crime? Proceedings of the International Conference on Technology in Education, San Francisco.
Hibberd, S. (1996). The mathematical assessment of students entering university engineering courses, Studies in Educational Evaluation, 22(4) (1996), 375-384.
Indiana University, Center for Innovation in Assessment, Retrieved September 2, 2003 from http://www.indiana.edu/~cia/.
Litzinger, M. L. & Osif, B. (1993). Accommodating diverse learning styles: Designing instruction for electronic information sources. In What is Good Instruction Now? Library Instruction for the 90s. Linda Shirato (ed.), Ann Arbor, MI: Pierian Press.
Lorge, I. & Kruglov, L. (1952). A suggested technique for the improvement of difficulty prediction of test items, Educational and Psychological Measurement, 12, 554-561.
NCTM (National Council of Teachers of Mathematics) (1995). Assessment standards in Principles and Standards for School Mathematics, Retrieved June 10, 2003 from http://standards.nctm.org/Previous/AssStds/PurpMon.htm
Resnick, L. B. & Resnick, D. P. (1992). Assessing the thinking curriculum: New Tools for Educational Reform, In B. R. Gifford & M. C. O'Connor (Eds.), Changing Assessments: Alternative Views of Aptitude, Achievement and Instruction, 33-75, Kluwer.
Smith, G. & Wood, L. (2000). Assessment of learning in university mathematics, International Journal for Mathematics Education in Science and Technology, 31(1), 125-132.
Stewart, J. (1999). Calculus, Early Transcendentals, Brooks/Cole.
The MathSkills Discipline Network, Department of Education and Employment, U.K., Retrieved September 2, 2003 from http://www.hull.ac.uk/mathskills/themes/theme3/mathskill.html.
United Inventors Association, Innovation Assessment Program, Retrieved September 2, 2003 from http://www.uiausa.com/UIAIAP.htm.
University of Pretoria
University of Pretoria