Printer Friendly

Operational Definitions for Higher-Order Thinking Objectives at the Post-secondary Level.


As a first step toward studying ways to develop higher-order thinking in undergraduate students, we used a modified version of Bloom's taxonomy to assess study questions in two computer-mediated psychology courses. Three assessors developed operational definitions of the thinking levels required to answer study questions (or components of the questions). For each course, there was a high level of independent agreement between these assessors and a second group of assessors who used the operational definitions constructed by the first group to assess the level of each question. This indicates that the operational definitions developed are reliable. Future studies will focus on determining at what level students answer the questions in a given course and generating higher-order thinking by raising the average level at which students answer questions.


Post-secondary institutions are places at which the highest levels of thinking are fostered and developed. Thus, a challenge for university educators is developing students' critical, or higher-order thinking about course material. Research might provide a solution to this problem. Before embarking on such research a fundamental question must be addressed: how do we define critical or higher-order thinking? Williams (1999) has argued cogently that operational definitions of cognitive constructs used in education are necessary in order to reliably assess and promote the processes they encompass. A number of definitions of higher-order thinking have been proposed. Some emphasize reasoned argumentation as an essential feature (Newman, 1991a, b; Nelson, 1997), while others include other elements (Bloom, 1956; Carnine, 1991; Hohn, Gallagher, & Byrne, 1990; Paul & Heaslip, 1995).

The most detailed and comprehensive set of definitions that appears to encompass all aspects of higher-order thinking is Bloom's Taxonomy of Educational Objectives in the Cognitive Domain (1956), which identifies six categories of exam or test questions: (1) Knowledge, (2) Comprehension, (3) Application, (4) Analysis, (5) Synthesis, and (6) Evaluation. The categories are often assumed to be hierarchical, and thus are often referred to as "levels." (It should be noted that Bloom includes subcategories of the six broad categories mentioned above; these subcategories are not part of our focus at this stage.)

As a first step toward studying ways to develop higher-order thinking in undergraduate students, we sought to apply Bloom's taxonomy to study questions in several computer-mediated psychology courses. These courses were taught using computer-aided personalized system of instruction (CAPSI), which is based on the Keller (1968) approach, or "personalized system of instruction" (PSI) (Kinsner & Pear, 1990; Pear & Crone-Todd, 1999; Pear & Kinsner, 1998; Pear & Novak, 1996). PSI-taught courses are typified by (a) short study units that require mastery (e.g., a score of 80% or better) before the next unit may be attempted, (b) restudy and re-testing on a unit when a student fails to demonstrate mastery of the unit, and (c) the use of students who have already demonstrated mastery of the material to provide feedback to students learning the material. Studies show that courses using PSI produce higher examination scores than courses taught using traditional methods (Kulik, Kulik, & Bangert-Drowns, 1990). Moreover, Reboy and Semb (1991) provided evidence that PSI can be effective in generating higher-order thinking. Extending their work seems to require the development of more rigorous definitions of higher-order thinking.

Although Bloom's taxonomy has been widely used in curricula development at primary, secondary, and post-secondary levels (e.g., Coletta, 1975; Frazier & Caldwell, 1977; Freese, 1998; Lipscomb, 1985; Onosko, 1991; Paul, 1985; Willson, 1973), some researchers have reported problems in reliably or consistently applying it (e.g., Calder, 1983; Gierl, 1997; Kottke & Schuster, 1990; Roberts, 1976; Seddon, 1978; Seddon, Chokotho, & Merritt, 1981). Specifically, the levels of agreement between assessors of questions are often low, indicating that the categories may not be well delineated. In our initial attempt to use the taxonomy, we also obtained low inter-observer reliability. This led us to modify Bloom's definitions of the six levels to obtain more precise operational definitions of them (Williams, 1999), at least with regard to the study questions used in our CAPSI-taught courses.

In this report we outline the modified version of the taxonomy that we developed and present reliability data obtained when we applied this version to the study questions in two second-year undergraduate psychology courses -- Behavior Modification Principles and Orientations to Psychological Systems -- taught at the University of Manitoba.

Modification of Bloom's Taxonomy


In each of these courses there were nine content-based units that followed a unit on the teaching method itself. Only the content-based units are considered in this report. Each unit contained 14 to 28 study questions. These questions were broken down into sub-components. For example, a question such as "What is the most common treatment for obsessive-compulsive disorders? What may account for the effectiveness of this treatment?" would be broken into two sub-components: (1) What is the most common treatment for obsessive-compulsive disorders, and (2) What may account for the effectiveness of this treatment? In this report "question" refers to any sub-component of a question.


Two assessors, the instructor (professor) and the teaching assistant (graduate student) for the courses, first independently assessed the questions from the first unit on the basis of the six categories in Bloom's taxonomy. Each question was assessed at the lowest level required to answer it. Inter-observer reliability (IOR) was calculated by dividing the number of agreements by the number of agreements plus the number of disagreements and multiplying by 100%. (Thus, IOR measured point-to-point agreement.) Initial IOR was low (50%). The assessors discussed and resolved their points of disagreement, and refined the definitions of the categories accordingly. The assessment was repeated with a third assessor, who had a BA honours degree in psychology and philosophy. As subsequent units were assessed we further refined the definitions of the levels. The final operational definitions that resulted from this refinement process are summarized in Table 1. In addition, to increase our IOR scores, we found it useful to construct a flowchart to use as an aid in categorizing each question. The flowchart was also refined at each meeting. The final flowchart (with some modifications as described below) is shown in Figure 1.


Categories I and II

The answers to these types of questions will always be found in the
assigned material (e.g., textbook or lecture), and require
no extrapolation

I. Knowledge Answers may be memorized or closely paraphrased
 from the assigned material.

II. Comprehension Answers must be in the student's own words,
 while still using terminology appropriate to
 course material.

Categories III, IV, V, and VI

These questions go beyond the textual material in that they must be
inferred or extrapolated from the information in the text. That is,
the questions require some "processing" of the information that is
not already performed in the assigned material.

III. Application May require recognition, identification, or
 application of a concept or principle learned at
 Category II in a new situation or solve a new
 problem. Questions in this category present or
 require examples not found in the assigned

IV. Analysis Requires breaking down concepts into their
 constituent parts, or the identification or
 explanation of the essential components of
 concepts, principles, or processes. In addition,
 this category may require the student to compare
 and contrast, or explain how an example
 illustrates a given concept, principle, etc.

V. Synthesis Requires the putting together of parts to form a
 whole (i.e., the opposite of Level IV).
 Questions may require the generation of
 definitions not identified in the assigned
 material (i.e., going from specific to general),
 or to explain how to combine principles or
 concepts to produce something new.

VI. Evaluation Requires the presentation and evaluation of
 reasons for and against a particular position,
 and (ideally) to come to a conclusion regarding
 the validity of that position. The most
 important part of the answer is the
 justification or rationale for the conclusion,
 rather than the answer per se. A good discussion
 in this category involves the use of all
 preceding levels.

The average IORs between the three assessors on all units were 78.00% in Behavior Modification and 85.00% in Systems. The assessors reached complete agreement by the end of each meeting by discussion, and where appropriate, changes to the definitions were made. Although the average IORs derived from independent assessments were high, this might have been due to idiosyncrasies of the assessors. Thus, it was necessary to determine whether new assessors using the same operational definitions (see Table 1 and Figure 1) as the original assessors would show high agreement with the original assessors. A second group of assessors was assigned to two courses and asked to use the modified definitions and flowchart to assess the same set of questions that the initial assessors had assessed. The initial ratings (Group A) were used as the standard against which the new groups' (Group B) assessments were compared. There were three new assessors for the Behavior Modification course and two for the Systems course. One new assessor was assigned to both courses, and she coordinated meetings and data delivery with the first author. Two of the new assessors were graduate students, and the other two were second-year and third-year undergraduate students. Group B followed the same general method used by Group A. The members judged the categories independently, then met to discuss their differences and to reach agreement. The categories Group B arrived at using the definitions in Table 1 and the flowchart in Figure 1 were compared with those arrived at by Group A.


The overall IORs across all question levels by unit between the new assessors and the initial assessors were 89% in Behavior Modification and 87% in Systems. As a result of meetings between the original and new assessors, minor changes were made to the flowchart and instructions to make the differentiation between Levels III and IV more precise (these changes have been incorporated in Figure 1).

Table 2 shows the levels obtained for the questions in each course, the between-groups agreement for occurrence and non-occurrence for each level, and the kappa statistic for each course. Kappa is a statistic that takes into account the number of occurrences and non-occurrences of each level for each assessor in order to take chance agreements into account. Note that the table includes the interpretation of the values (Landis & Koch, 1977), ranging from slight to almost perfect agreement. As seen from the table, the agreement calculated using kappa was higher than the point-to-point agreement indicated above. This is because kappa considers the total number of occurrences and non-occurrences for each level, whereas point-to-point agreement does not. Note that agreement decreased as the question level increased. There are two reasons for the lower agreement at the higher levels. First, the initial instructions for Level 3 and Level 4 lacked precision, since Group B often rated questions that asked for descriptions of original examples at Level 4 (i.e., such questions are at Level 3 according to our operational definitions). Second, Group B did not have much practice in identifying Levels 5 and 6 due to the low number of questions at these levels (see Table 2).

Question Group A Group B Number of Percent of
Level No. (%) No. (%) Between- Between-
 Group Group
 Agreements Agreements

 B. Mod I (Number of Questions = 417)

 1 170 168 354 85.89%
 (40.77) (40.28)
 2 130 142 392 94.00%
 (31.18) (34.05)
 3 71 51 393 94.24%
 (17.03) (12.23)
 4 36 53 376 90.17%
 (8.63) (12.71)
 5 5 2 412 98.80%
 (1.20) (0.48)
 6 5 1 412 98.80%
 (1.20) (0.24)

 Systems (number of Questions = 340)

 1 208 184 269 79.12
 (61.18) (54.12)
 2 147 126 297 87.35
 (43.24) (37.06)
 3 3 2 339 99.71
 (0.88) (0.59)
 4 14 24 320 94.12
 (4.12) (7.06)
 5 1 4 335 98.53
 (0.29) (1.18)
 6 2 0 338 99.41
 (0.59) (0.00)

Question Kappa
Level (Interpretation)

 B. Mod I (Number of Questions = 417)

 1 .69 (Substantial)
 2 .88 (Almost
 3 .77 (Substantial)
 4 .47 (Moderate)
 5 .17 (Slight)
 6 .01 (Slight)

 Systems (Number of Questions = 340)

 1 .57 (Moderate)
 2 .76 (Substantial)
 3 .80 (Substantial)
 4 .45 (Moderate)
 5 .01 (Slight)
 6 0.00 (Poor)


The major problem encountered in conducting the ratings was assessing questions at the lowest possible level necessary to adequately answer them. We often heard statements such as, "Oh yes, but to really answer this question well ...", or "Well, sure, one could answer it by saying what's in the text, but I think to really answer this one should also ...," and so forth. We observed that at times the initial assessment was perhaps based upon an assessor's history as an above-average student. We found that the meetings resulted in greater understanding of the categories, and helped us make the operational definitions tighter.

The high IOR and kappa scores obtained for levels 1 through 4 suggest that the modified taxonomy is useful and reliable for assessing thinking levels in courses at the post-secondary level. The lower scores we obtained for levels 5 and 6, however, indicate that further refinements are needed.

Instructors may use the taxonomy to establish whether their course requirements are at the appropriate level. An instructor may think that his or her course is at an advanced undergraduate level. However, upon assessing the questions in the course, the instructor may discover that the course is actually pitched at a lower level. Conversely, an instructor may find that a course intended for lower level students asks questions that are more appropriate for advanced students.

It is a frequent recommendation that courses should contain an adequate number of higher-level questions. Bruner (1966; 1973), for example, has suggested that questions requiring reflection combined with some minimal amount of knowledge may be imperative to successful learning. Similarly, Coletta, (1975) pointed out that too much reliance on lower-level questions hinders higher-level thinking. It is also likely, however, that too much emphasis on higher-level questions could result in making a course unnecessarily difficult, which could lead to students performing poorly or dropping out (e.g., Solman & Rosen, 1986). It is an empirical question what proportion of questions at each level would be optimal in a given course. This may be particularly tree in computer-mediated and other self-paced courses, where the onus for learning is on the student. In such courses, knowledge and comprehension questions can serve as a base for higher-levels. In this connection, the use of guided study has been shown to develop higher-level thinking that transfers to other courses (Reboy & Semb, 1991).

The taxonomy may also be used to help clarify to the student what a question requires. The fact that a question is assessed at a given level does not guarantee that each student will know to answer it at that level (Gierl, 1997). For example, if the text provides a number of examples of a concept, but never clearly defines it, asking for a definition of the concept may imply to the student that a knowledge-level answer is required. However, the minimally required answer is actually one of synthesis, since students must go from the general to the specific to generate the definition. A change in the wording of a question, a "hint," or some form of training in identifying what a question is asking may be required for students' answer levels to match or go beyond the level required by the question.

Once an instructor has decided on the questions and the levels for a given course, the next step is to determine at what level students are answering the questions. The stage is then set to provide feedback that may serve to encourage, or reinforce, higher-order thinking. There are several ways in which instructors can encourage higher-order thinking. One is to provide praise to students for going above what is expected in an answer. Another is to provide additional credit toward the course grade for answering at a level higher than the question requires. Research is needed to determine the relative effectiveness of these and other procedures for developing higher-order thinking. We view the operational definitions developed in this study as a step toward that goal.


Bloom, B.S. (1956). Taxonomy of educational objectives: Cognitive and affective domains. New York: David McKay.

Bruner, J.S. (1966). Toward a theory of instruction. Cambridge, MA: Belknap.

Bruner, J.S. (1973). Some elements of discovery. In J.S. Brunet, The Relevance of Education. New York, Norton.

Calder, J.R. (1983). In the cells of the "Bloom Taxonomy." Journal of Curriculum Studies, 15, 291-302.

Carnine, D. (1991). Curricular interventions for teaching higher order thinking to all students: Introduction to the special series. Journal of Learning Disabilities, 24(5), 261-269.

Coletta, A.J. (1975). Reflective didactic styles for teachers of the young, gifted and poor children. Gifted Children Quarterly, 19, 230-240.

Dorow, L.G., & Boyle, M. E. (1998). Instructor feedback for college writing assignments in introductory classes. Journal of Behavioral Education, 8, 115-129.

Frazier, L., & Caldwell, E. (1977). Testing higher cognitive skills in young children. Reading Teacher, 30, 475-478.

Freese, J.R. (1998). An old friend of the social studies teacher. Canadian Social Studies, 32, 124-125,129.

Gierl, M.J. (1997). Comparing cognitive representations of test developers and students on a mathematics test with Bloom's taxonomy. The Journal of Educational Research, 91, 26-32.

Hohn, R.L., Gallagher, T., & Byrne, M. (1990). Instructor-supplied notes and higher-order thinking. Journal of Instructional Psychology, 17(2), 71-74.

Kazdin, A.E. (1982). Single-case research designs: Methods for clinical and applied settings. New York: Oxford University Press.

Keller, F.S. (1968). "Good-bye, teacher ". Journal of Applied Behavior Analysis, 1, 79-89.

Kinsner, W., & Pear, J. J. (1990). A dynamic educational system for the virtual campus. In U.E. Gatticker, L. Larwood, & R.S. Stollenmaier's (eds.), End-User Training, 201-238.

Kottke, J.L., & Schuster, D.H. (1990). Developing tests for measuring Bloom's learning outcomes. Pyschological Reports, 66, 27-32.

Kulik, C.-L., Kulik, J. A., & Bangert-Drowns, R. L. (1990). Effectiveness of mastery learning programs: A meta-analysis. Review of Educational Research, 60, 265-299.

Landis, J., & Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174.

Lipscomb, J.W. (1985). Is Bloom's taxonomy better than intuitive judgments for classifying test questions? Education, 106, 102-107.

Lunenburg, F.C. (1998). Constructivism and technology: Instructional designs for successful education reform. Journal of Instructional Psychology, 25, 75-81.

Martin, G., & Pear, J. J. (1999). Behavior modification: What it is and how to do it. (6th ed.). Upper Saddle River, NJ: Prentice Hall.

McDaniel, T. R. (1979). Designing essay questions for different levels of learning. Improving College and University Teaching, 27, 120-123.

Newman, F.M. (1991a). Promoting higher order thinking in social studies: Overview of a study of 16 high school departments. Theory and Research in Social Education, 19, 324-340.

Newmann, F.M. (1991b). Classroom thoughtfulness and students' higher order thinking: Common indicators and diverse social studies courses. Theory and Research in Social Education, 19, 410-433.

Onosko, J.J. (1991). Barriers to the promotion of higher-order thinking in social studies. Theory and Research in Social Education, 19, 341-366.

Paul, R.W. (1985). Bloom's taxonomy and critical thinking instruction. Educational Leadership, 42(8), 36-39.

Paul, R.W., & Heaslip, P. (1995). Critical thinking and intuitive nursing practice. Journal of Advanced Nursing, 22, 40.47.

Pear, J.J., & Crone-Todd, D.E. (1999). Personalized system of instruction in cyberspace. Journal of Applied Behavior Analysis, 32, 205-209.

Pear, J.J., & Kinsner, W. (1998). Computer-aided personalized system of instruction: An effective and economical method for short- and long-distance education. Machine-Mediated Learning, 2, 213-237.

Pear, J.J., & Novak, M. (1996). Computer-aided personalized system of instruction: A program evaluation. Teaching of Psychology, 23, 119-123.

Poole, R.L. (1971). Characteristics of the taxonomy of educational objectives: Cognitive domain. Psychology in the Schools, 8, 379-385.

Reboy, L.M., & Semb, G.B. (1991). PSI and critical thinking: Compatibility or irreconcilable differences? Teaching of Psychology, 18, 212-215.

Roberts, N. (1976). Further verification of Bloom's taxonomy. Journal of Experimental Education, 45, 16-19.

Seddon, G. (1978). The properties of Bloom's taxonomy of educational objectives for the cognitive domain. Review of Educational Research, 48, 303-323.

Seddon, G.M., Chokotho, N.C., & Merritt, R. (1981). The identification of radex properties in objective test items. Journal of Educational Measurement, 18, 155-170.

Shoemaker, D.A. (1981). Beyond the Bloom taxonomy. E + M Newsletter, 36, 1-4.

Solman, R., & Rosen, G. (1986). Bloom's six cognitive levels represent two levels of performance. Educational Psychology, 6, 243-263.

Willson, I.A. (1973). Changes in mean levels of thinking in grades 1-8 through use of an interaction analysis system based on Bloom's taxonomy. Journal of Educational Research, 66, 423-429.

D. E. Crone-Todd is a doctoral student in psychology, carrying out her dissertation research on higher-order thinking. In addition, she has published on the use of computer-aided personalized system of instruction <umtodd06@cc.UManitoba.CA>. J. J. Pear is a Professor of Psychology,. conducting research in basic and applied behavior analysis. He has co-authored (with Dr. Garry Martin) a popular textbook on behavior modification and has just completed a book on learning to be published by Psychology Press. C.N. Read is a masters student in philosophy, working on a thesis in philosophy of psychology.

Authors' Notes This research was supported in part by a grant to J. J. Pear from the Social Sciences and Humanities Research Council of Canada. D. E. Crone-Todd was supported by a fellowship from the Social Sciences and Humanities Research Council of Canada
COPYRIGHT 2000 Rapid Intellect Group, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Author:Read, Cynthia N.
Publication:Academic Exchange Quarterly
Article Type:Statistical Data Included
Date:Sep 22, 2000
Previous Article:Thinking Skills to Creatively Enhance Information Competence.
Next Article:Film as a Medium for Analysis in a Graduate Psychology Course.

Related Articles
Supported postsecondary education for people with mental illness.
Postsecondary Education and Employment of Adults with Disabilities.
Assessment of thinking levels in students' answers.
Assessing higher levels of learning in post-secondary education. (Online Instruction).
Economic well-being of single mothers: work first or postsecondary education?
Aiming higher: across Canada, about 1.1 million full-time students were enrolled in post-secondary institutions in 2001, but thousands have been...
Saving for post-secondary education in individual development accounts.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters