Direct observation: factors affecting the accuracy of observers.
Direct observation is a method of data collection in which the target behavior is observed and recorded as it occurs. Although thousands of studies have used direct observation, research on the accuracy of the observers who provide data is not commensurate with the widespread use of observation methodologies. There exists a general presumption that observers are collecting accurate data, as well as a belief that adequate reliability scores are synonymous with adequate levels of accuracy. Some research, however, has suggested that these views are not necessarily correct (e.g., DeMaster, Reid, & Twentyman, 1977).
The first purpose of this article is to review research on the accuracy of observers. The review is organized around seven major factors that may potentially affect accuracy: (a) reactiv-ity; (b) observer drift; (c) the recording procedure; (d) location of the observation; (e) reliability; (f) expectancy and feedback; and (g) characteristics of subjects, observers, and settings. The second purpose is to offer recommendations for increasing the accuracy of observers.
FACTORS AFFECTING OBSERVER
In direct observation, we concurrently observe and record the behaviors of interest (Repp, 1983), and in many of these situations, the observer's presence is known to the subject. A common presumption would seem to be that the subject's behaviour is the same as it would be if the observer were not present. However, reactivity surely occurs in some of these cases, as subjects respond to the presence of observers by changing their behaviors. Haynes and Horn (1982) presented a comprehensive review of studies related to reactivity and suggested that behaviors may be increased, decreased, made more variable, or not be affected at all. For example, some subjects may present themselves in their "best light," and socially desirable behaviors may increase in probability as a function of observer presence; other subjects may have the opposite reaction. When such reactivity occurs, the study's internal validity is threatened as the effects from reactivity would not have been separated from any effects of the experimental variable. External validity, or the extent to which the findings of a study can be generalized, may also be affected, and two concerns arise (Kazdin, 1980). Once is whether the findings of a study in which reactivity occurred apply to nonreactive situations. A second is pretest sensitization, wherein reactivity during baseline may sensitive subjects to the intervention and make them either more or less receptive to the experimental variable. Therefore, the results may not apply to individuals who are not similarly sensitized.
Whereas most research has focused on subject reactivity, some has focused on the effect that observation has on the behavior of the observers. This effect may be termed observer reactivity, and it has been demonstrated in two studies. In one (Hay, Nelson, & Hay, 1977), teachers who were instructed to record the behavior of students in their classrooms began giving more prompts to the observed students. In another (Hay, Nelson, & Hay, 1980), one of four teachers acting as observers increased her rate of instructing and of giving positive feedback.
Drift is a cognitive phenomenon that involves a gradual shift by the observer from the original response definition, and it results in behavior being inconsistently recorded (Hersen & Barlow, 1976). As Lipinski and Nelson (1974) noted, it is relevant to both between-group and within-group designs and thus may warrant considerable concern. When drift occurs, the data collected are no longer directly comparable across conditions, because they no longer quantify the same precise response. For example, an intervention study cannot be properly evaluated if the target response has been defined differently in the baseline and treatment phases.
This phenomenon also points out the distinction between observer agreement and observer accuracy. The former is obtained by comparing the scores of observers, whereas the latter is obtained by comparing these scores with a previously established criterion (Mash & McElwee, 1974). These two measures may differ: high observer agreement, for example, may be gained at the expense of accuracy. Kent, O'Leary, Diament, and Dietz (1974) conducted a study of observer variables and found a consistent difference between observation pairs in their use of the behavioral code. Members of pairs developed agreement between themselves, but there was far less agreement between pairs. Thus each pair had high agreement scores, but at least one pair had to have inaccurate data. This condition, in which both members of an observation pair similarly change definitions, has been termed consensual observer drift (Johnson & Bolstad, 1973).
The recording procedure itself can probably bring more error to the data than would most conscientious observers, and there are no experiments to show whether observer-contributed errors are greater under one procedure than under another. In behavioral research using direct observation, data are collected either through continuous recording or one of three time-sampling procedures: whole interval recording, partial interval recording, and momentary time sampling. Time-sampling procedures are appealing when multiple behaviors are observed and the equipment necessary to record then continuously is unavailable.
The accuracy of the data produced by these procedures, however, has been seriously questioned. A small body of literature has examined how closely time-sampling methods can approximate continuous measurement (e.g., Brulle & Repp, 1984; Repp, Robert, Slack, Repp, & Berkler, 1976). In general, these studies have found that (a) partial interval overestimates the continuous measure; (b) whole interval underestimates; (c) momentary time sampling is preferred, because it randomly overestimates and underestimates the continuous measure and thus produces a fairly accurate average; and (d) smaller observation intervals produce far more accurate data than do large intervals.
Much of the observational research employs partial interval recording at 10-second intervals, yet this interval size has been shown to produce unrepresentative data. Researchers who use time sampling and are interested in accurate data should use extremely small intervals (cf. Sanson-Fisher, Poole, & Dunn, 1980). Ideally, with this procedure, one should first sample responding, select an interval length such that only one response can occur per interval, and then begin formal data collection (Repp, 1983; Repp et al., 1976).
Location of the Observation
Although most of the data from direct observation are collected in situ, some are collected from audio or videotapes in an effort to reduce the obtrusiveness of observations (e.g., Schoggen, 1964). Though the devices used may cause some reactivity initially, studies have suggested that this effect is at most ephemeral.
For example, in the first of a two-part study, Christensen and Hazzard (1983) questioned whether families who were being audiotaped would change their rate of positive and negative interactions over time, and they found no systematic changes over 16 sessions. In the second part of their study, they compared conditions in which families were either aware or unaware of their conversations being taped. Results for two of the three families studied show no effects; results for the third showed an initial but unlasting change.
Additional studies (e.g., Fulton & Rupiper, 1962; Kent, O'Leary, Deitz, & Diament, 1979) suggested that there is little difference for most behaviors between data collected in the natural setting and that collected from videotape, although some behaviors (e.g., vocalization) may show a difference.
In this article, the term reliability is used specifically to refer to interobserver agreement, or the degree to which two observers agree that responding has occurred. Although observers can be trained to evaluate themselves, Boykin and Nelson (1981) cautioned experiments to perform the calculations themselves, lest observers reach high agreement scores at the expense of accuracy.
In addition, observer awareness during reliability checks has been shown to affect both observer accuracy and reliability. Reid (1970), for example, found that observers were more accurate when they believed they were being monitored; and Romanczyk, Kent, Diament, and O'Leary (1973) found that interobserver agreement was higher when observers believed reliability was assessed. The concept of reactivity discussed earlier is thus also applicable here.
The question of how to calculate interobserver agreement has generated considerable discussion and different formulas (Hartman, 1977; Hawkins, 1979; Hopkins, 1979; Kratochwill, 1979; Repp, 1983; Repp. Deitz, Boles, Deitz, & Repp, 1976; Rojahn & Schroeder, 1983). Much of the discussion has revolved around the contribution of chance to interobserver agreement scores. For example, high-frequency behaviors inflate percentage agreement on occurrence, whereas low-frequency behaviors inflate percentage agreement on nonoccurrence (Rojahn & Schroeder, 1983). Because there is a relationship between the rate of behavior and the formula used to calculate reliability, no single standard of acceptable agreement levels has been adopted.
The purpose of observation is also a factor. Data that are used to diagnose and place children must be highly reliable and accurate, but a teacher who is collecting data to determine if recess is a reinforcer fora child may tolerate more error. In sum, reliability scores and the methods used to calculate them should continue to be provided to consumers of research so they can make their own evaluations of the data.
Observer Expectancy and Feedback
Observers may be biased through expectations of subject performance that are based on factors such as sex, the behavior of peers, or the purpose of the intervention. Since bias in a study weakens any possible conclusions concerning independent variables, its presence is serious. O'Leary and Kent (1977), in a review of their own studies in observation, reported that global evaluations can be influenced by expectations alone. For example, if observers are informed that an intervention to reduce stereotypic behavior is occurring, they are likely to report at the end of the intervention that the target behavior decreased. However, research shows that error-producing bias can be substantially reduced through the use of observers who are trained in systematic direct observation methods (Kent et al., 1974; Redfield & Paul, 1976).
Experimenter feedback may also affect the behavior of observers. For example, O'Leary, Kent, and Kanowitz (1975) showed that observer expectations alone were insufficient to affect the behaviors of observers using systematic procedures such as time sampling. They found other factors at work: expectation of behavior change and contingent experimenter feedback on whether the data supported the researcher's hypotheses. In this study feedback was provided by the experimenter; however, in other instances, feedback could be from events in the setting. For example, if a teacher has just praised a child for being on task, an observer might presume (cf. Strain, Lambert, Kerr, Stagg, & Lenkner, 1983) that the child should have just been scored as being on task. This issue was addressed by Harris and Ciminero (1978), who found that some observers increased their scoring of a behavior when they witnessed a consequence for it, but did not increase their scoring of another behavior for which they did not witness a consequence when neither behavior occurred.
Subject and Setting Variables
Sex of Subjects. The majority of the research on demographic characteristics of observed subjects has been on the variable of sex. Yarrow and Waxler (1979) noted that separate analyses are provided for males and females in most child development studies and that a significant finding often appears for only one sex. They suggested that these differences may either be genuine, or due to differences in the way observers score behaviors for the sexes. Their data indicated that for many behaviors, equal reliability was attained for both sexes; for other behaviors, however, significant differences were found.
In two studies, for instance, Yarrow and Waxler found that observers scored aggressive behavior more reliably for males than for females. Other studies have reported the presence of an interaction between the sex of the subject and the sex of the observer. One study (Gurwirz & Dodge, 1975) found that adult observers tended to rate opposite-sex children more positively. However, another study (Horn & Haynes, 1981) found that training observers in an objective coding method that focuses their attention on overt, operationally defined responses may provide a way to reduce sex bias. In this study, male and female observers were trained to code disruptive behaviors in children and then were asked to rate the subjects along 12 subjective dimensions. Results showed no sex differences among the behavioral ratings and a difference in the subjective ratings on only one dimension.
Another study (Moss & Jones, 1977) examined the demographic variable of socioeconomic status. This study found that reliability was significantly higher when observers were scoring the behavior of middle class mothers than when scoring that of lower class mothers. Since all observers were middle class, the authors suggested that this finding may reflect an interaction between social class of observer and subject.
Subject Behavior Patterns. Subject characteristics also include the nature of behavior patterns. The effect of behavioral complexity on observer accuracy was studied by Jones, Reid, and their colleagues (Jones, Reid, & Patterson, 1975; Taplin & Reid, 1973). These authors defined behavioral complexity as the number of discriminations required during an observation session, as measured by the number of different categories rated. In a series of studies, they consistently found negative correlations between the complexity of observed behavior and reliability coefficients. They thus concluded that observation is more difficult when there is a broad range of responses to code. Along these lines, reliability coefficients will be misleading if behavioral complexity differs systematically between reliability and nonreliability sessions (Jones et al., 1975). Interestingly, complexity levels have been found to be lower during reliability sessions (Jones et al., 1975).
Predictability of Subject Responses. Behavior may be predictable because it occurs in a sequence with one response always following another, or because it occurs either frequently or infrequently each session. Mash and McElwee (1974) hypothesized that behaviors which often occurred in predictable sequences would be more easily scored than those in unpredictable sequences. Other researchers have suggested that rate may be a factor in observer accuracy (Johnson & Bolstad, 1973; Thomas, Loomis, & Arrington, 1983). In our own training of large numbers of undergraduates, we have found that certain categories of behavior, when occurring more than 80% of the time, tend to be scored in every recording interval whether they are occurring or not. This seems to be especially true when a behavior occurs frequently in the earlier part of the session and seldom in the later part. We have also found that additional traning is needed to reach acceptable levels of accuracy on low-rate behaviors, which are more often overlooked.
Familiarity with Setting or Subjects. Finally, several authors have suggested that setting characteristics may affect the accuracy of observers. For example, familiarity with the setting may make observation easier and thereby increase observer accuracy. Kent and Foster (1977) noted that reliability seems to lower when observers first enter a new setting, but then increases after practice in the setting. Similarly, familiarity with the subject population may make observation easier. Other setting characteristics, such as a high activity and noise level, may make observation more difficult (Wasik & Loven, 1980).
INCREASAING THE ACCURACY OF
The second purpose of this article is to offer recommendations for increasing the accuracy of observers. Of course, oberver accuracy is a moot point if the selected target behaviors do not reflect the questions and concerns of clients, staff, parents, and so forth. The first step is thus to select what behaviors are to be measured. Kazdin (1980) suggested that several dependent measures be used, because many of the behaviors assessed are multifaceted and complex. For example, if a child is referred to special education because of academic deficits and noncompliance, many academic and social behaviors may be assessed. In addition, teacher behaviors such as instructional antecedents and the delivery of reinforcers may be observed. This is not to suggest, however, that all the measures are likely to converge on a single conclusion; because of the complexity of behavior and settings, such convergence would be the exception rather than the rule. Rather than regarding disagreements among multiple measures as introducing ambiguity into the process, they should be regarded instead as a means of elaboration (Kazdin, 1980).
After the behaviors to be measured are selected and defined, they must be observed accurately. Table 1 provides a summary of five recommendations for increasing observer accuracy, along with the threats to accuracy that each addresses. The recommendations are explained and illustrated as follows.
Hartmann and Wood (1982) provided a model for training observers that includes learning the observation manual, practice sessions, retraining and recalibration sessions, and postinvestigation debriefing. Observation is thus a skill to be taught systematically, and extensive practice is advised before beginning a study (Hersen & Barlow, 1976). Recalibration and retraining throughout a study serves to maintain this skill and guards against observer drift (O'Leary & Kent, 1977).
In addition, there are other precautions that may be taken to increase the level of observer performance. First, the observation codes used should not be more complex than is necessary, and observation schedules should be reasonable in order to reduce fatigue-related errors. Second, both sexes should be used equally in the study, and observers should be balanced across sessions by experimental conditions. A possible exception to this is when phase changes would signal the hypothesis; in this case, new observers should be brought in, for they cannot be biased by what has already occurred. Third, as alluded to, all observers should be blind to the experimental hypothesis, and they should be praised for accuracy rather than for obtaining the desired results. Fourth, consensual observer drift can be avoided by eliminating interaction between observers and by giving feedback on interobserver agreement only after the study is done. Experiments should perform reliability calculations themselves (Boykin & Nelson, 1981). Finally, where possible the data should be compared with a standard. For example, some sessions can be videotaped, coded by experienced observers, and used as a criterion for novice observers.
A potential correction for reactivity is the use of adaptation periods, wherein both subjects and observers can become familiar with the observation process (Sulzer-Azaroff & Mayer, 1977). For example, Barkley (1981) discussed the use of observational methods in the diagnosis of hyperactivity and suggested that observations can be made in clinic playrooms equipped with a sound system and a one-way mirror. He further suggested that children be given at least an hour to adpt to the playroom. In some cases, the length of the adaptation period can be defined empirically; i.e., when certain behaviors (e.g., looking at the observer) decrease, or when behavior becomes more stable.
Observing unobtrusively means taking steps to ensure that subjects are relatively unaware of assessment and that observers are unaware of reliability evaluations. Of course, both parties must agree to the fact of observation for ethical reasons, but they must also agree to be unaware of the exact observation schedules in order to reduce reactivity and increase reliability. For instance, an observer coding interactions on the playground can sit indoors by a window facing the play area, rather than on the playground itself.
Permanent products of behavior, such as written responses or projects completed, are invaluable; they can be coded after behavior has taken place and as many times as necessary to achieve accuracy and reliability. Audiotapes and videotapes that capture more fleeting responses, such as talking, can be scored repeatedly to avoid bias and error. For example, pages from a child's daily workbooks can be used as natural samples of academic task performance. A cassette player can be used to record interactions among adolescents in a group discussion in order to facilitate coding specific social skills, such as waiting for another to complete a statement before expressing one's own point of view.
Systematic and Frequent Observation
Objective rather than subjective methods of recording increase accuracy by providing, along with explicit response definitions, rules for scoring behavior. These are more likely to reduce biases contributed by characteristics of the subjects, observers, or setting. In cases where time sampling is preferred over continuous recording for practical reasons, extremely small and numerous intervals are best (Repp, 1983). Portable lap computers, programmed for the entry and storage of continuous data (e.g., Repp, Harmon, & Felce, 1984), may serve to increase the feasibility of continuous measures and may be used to gather data for use in developing and monitoring Individual Educational Plans (Olinger & Brusca, 1985).
The results of the research on the accuracy of observers suggest that more caution might be exercised in conducting observational studie than is generally evidenced. A number of factors that contribute to observer error emerge; fortunately many are correctable. Formally training observers, using an adaptation period, observing unobtrusively, using permanent products of behavior, and observing frequently and systematically are all good practices to follow.
Barkley, R. A. (1981). Hyperactive children. New York: Guilford Press.
Boyd, R. D., & deVault, M. V. (1966). The observation and recording of behavior. Review of Educational Research, 36, 529-551.
Boykin, R. A., & Nelson, R. O. (1981). The effects of instructions and calculation procedures on observers' accuracy, agreement, and calculation correctness. Journal of Applied Behavior Analysis, 14, 479-489.
Brooks, P. H., & Baumeister, A. A. (1977). A plea for consideration of ecological validity in the experimental psychology of mental retardation: A guest editorial. American Journal of Metanl Deficiency, 81, 407-416.
Brulle, A. R., & Barton, L. E. (1980). The accuracy of momentary time sampling procedures when used in applied setting. Paper presented at the Annual Meeting of the Association for Behavior Analysis, Dearborn, Michigan.
Brulle, A. R., & Repp, A. C. (1984). An investigation of the accuracy of momentary time sampling procedures with time series data. British Journal of Psychology, 75, 481-485.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin Company.
Christensen, A., & Hazzard, A. (1983). Reactive effects during naturalistic observation of families. Behavioral Assessment, 5, 349-362.
Cunningham, T. R., & Tharp, R. G. (1981). The influence of settings on accuracy and reliability of behavioral observation. Behavioral Assessment, 3, 67-68.
DeMaster, B., Reid, J. B. (1973). Effects of feedback procedures in maintaining observer reliability. Eugene, OR: Oregen Research Institute (cited in Johnson, S. M. & Bolstad, O. D. Methodological issues in naturalistic observation: Some problems and solutions for field research. In L. A. Hamerlynck, L. C. Hardy, & E. J. Mash (Eds.), Behavior Change: Methodology, concepts and practice. Champaign, IL: Research Press).
DeMaster, B., Reid, J., & Twentyman, C. (1977). The effects of different amounts of feedback on observer's reliability. Behavior Therapy, 8, 317-329.
Dunbar, R. I. M. (1976). Some aspects of research design and their implications in the observational study of behaviour. Behaviour, 58, 78-98.
Flanders, N. A. (1965). Teacher influences, pupil attitudes, and achievement. U.S. Department of Health, Education, and Welfare, Office of Education, Cooperative Research Monograph No. 12. Washington, DC: Government Printing Office.
Fulton, W. R., & Rupiper, O. J. (1962). Observation of teaching: Direct vs. vicarious experiences. Journal of Teacher Education, 13, 157-164.
Green, S. B., & Alverson, L. G. (1978). A comparison of indirect measures for long-duration behaviors. Journal of Applied Behavior Analysis, 11, 530.
Gurwitz, S. B., & Dodge, K. A. (1975). Adults' evaluation of a child as a function of sex adult and sex of child. Journal of Personality and Social Psychology, 32,822-828.
Harris, F. C., & Ciminero, A. R. (1978). The effect of witnessing consequences on the behavioral recordings of experimental observers. Journal of Applied Behavior Analysis, 11, 513-521.
Hartmann, D. P. (1977). Considerations in the choice of interobserver reliability estimates. Journal of Applied Behavior Analysis, 10, 103-116.
Hartmann, D. P., & Wood, D. D. (1982). Observational methods. In A. Bellack, M. Hersen, & A. E. Kazdin (Eds.), International handbook of behavior modification and therapy. New Yorkd Plenum.
Hawkins, R. P. (1979). The functions of assessment: Implications for selection and development of devices for assessing repertoires in clinical, educational, and other settings. Journal of Applied Behavior Analysis, 12, 501-516.
Hay, L. R., Nelson, R. O., & Hay, W. M. (1977). Some methodological problems in the use of teachers as observers. Journal of Applied Behavior Analysis, 10, 345-348.
Hay, L. R., Nelson, R. O., & Hay, W. M. (1980). Methodological problems in the use of participant observers. Journal of Applied Behavior Analysis, 13, 501-504.
Haynes, S. N., & Horn, W. F. (1982). Reactivity in behavioral observation: A review. Behavioral Assessment, 4, 369-385.
Hersen, M., & Barlow, D. H. (1976). Single-case experimental designs: Strategies for studying behavior change. New York: Pergamon Press.
Hopkins, B. L. (1979). Proposed conventions for evaluating observers reliability: A commentary on two articles by Birkimer and Brown. Journal of Applied Behavior Analysis, 12, 561-564.
Horn, W. F., & Haynes, S. N. (1981). An investigation of sex bias in behavioral observations and ratings. Behavioral Assessment, 3, 173-183.
Johnson, S. M., & Bolstad, O. D. (1973). Methodological issues in naturalistic observation: Some problems and solutions for field research. In L. A. Hamerlynck, L. C. Hardy, & E. J. Mash (Eds.), Behavior change: Methodology, concepts, and practice. Champaign, IL: Research Press.
Jones, R. R., Reid, J. B., & Patterson, G. R. (1975). Naturalistic observation in clinical assessment. In P. McReynolds (Ed.), Advances in psychological assessment (Vol. 3). San Francisco: Jossey-Bass.
Kauffman, J. M. (1981). Characteristics of children's behavior disorders, second edition. Columbus, OH: Charles E. Merrill Publishing.
Kazdin, A. E. (1977). Artifact, bias, and complexity of assessment: The ABCs of reliability. Journal of Applied Behavior Analysis, 10, 141-150.
Kazdin, A. E. (1979). Unobtrusive measures in behavioral assessment. Journal of Applied Behavior Analysis, 12, 713-724.
Kazdin, A. E. (1980). Research design in clinical psychology. New York: Harper & Row.
Kent, R. N., & Foster, S. L. (1977). Direct observational procedures: MEthodological issues in naturalistic settings. In A. R. Ciminero, K. S. Calhoun, & H. E. Adams (Eds.), Handbook of behavioral assessment. New York: John Wiley & Sons.
Kent, R. N., O'Leary, K. D., Diament, C., & Dietz, A. (1974). Expectation biases in observational evaluation of therapeutic change. Journal of Consulting and Clinical Psychology, 42, 774-780.
Kent, R. N., O'Leary, K. D., Dietz, A. & Diament, C., (1979). Comparison of observational recordings in vivo, via mirror, and via television. Journal of Applied Behavior Analysis, 12, 517-522.
Kratochwill, T. R. (1979). Just because it's reliable doesn't mean it's believable: A commentary on two articles by Birkimer and Brown. Journal of Applied Behavior Analysis, 12,553-558.
LaGrow, S. L., & Repp, A. C. (1984). Stereotypic responding: A review of intervention research. American Journal of Mental Deficiency, 88, 595-609.
Landesman-Dwyer, S. (1981). Living in the community. American Journal of Mental Deficiency, 86, 223-234.
Lipinski, D., & Nelson, R. (1974). Problems in the use of naturalistic observation as a means of behavioral assessment. Behavior Therapy, 5, 341-357.
Mash, E. J., & McElwee, J. D. (1974). Situational effects on observer accuracy: behavior predictability, prior experience, and complexity of coding categories. Child Development, 45, 367-377.
Moss, H. A., & Jones, S. J. (1977). Relations between maternal attitudes and maternal behavior as a function of social class. In P. H. Leiderman & S. R. Tulkin (Eds.), Cultural and social influences on behavior in infancy and early childhood. New York: Academic Press.
Nelson, R. O. (1983). Behavioral assessment: Past, present, and future. Behavioral Assessment, 5, 195-206.
O'Leary, K. D., & Kent, R. N. (1977). Sources of bias in observational recording. In B. C. Etzel, J. M. LeBlanc, & D. M. Baer, New developments in behavioral research, theory, method, and application. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishes.
O'Leary, K. D., & Kent, R. N., & Kanowitz, J. (1975). Shaping data collection congruent with experimental hypotheses. Journal of Applied Behavior Analysis, 8, 43-51.
Powell, J., Martindale, A., & Kulp, S. (1975). An evaluation of time-sample measures of behavior. Journal of Applied Behavior Analysis, 8, 463-469.
Powell, J., Martindale, B., Kulp, S., Martindale, A., & Bauman, R. (1977). Taking a closer look: Time sampling and measurement error. Journal of Applied Behavior Analysis, 10, 325-332.
Redfield, J., & Paul, G. L. (1976). Bias in behavioral observation as a function of observer familiarity with subjects and typicality of behavior. Journal of Consulting and Clinical Psychology, 44, 156.
Reid, J. B. (1970). Reliability assessment of observation data: A possible methodological problem. Child Development, 41, 1143-1150.
Reid, J. B. (1973a). The relationship between complexity of observer protocol and interobserver agreement for twenty-five reliability assessment sessions: A technical note. Unpublished manuscript, University of Oregon.
Reid, J. B. (1973b). Differences in the complexity of reliability assessment vs. adjacent non-reliability assessment observation sessions: A technical note. Unpublished manuscript, University of Oregon.
Repp, A. C. (1983). Teaching the mentally retarded. Englewood Cliffs, NJ: Prentice Hall.
Repp, A. C., Deitz, D. E. D., Boles, S. M., Deitz, S. M., & Repp, C. F. (1976). Differences among common methods for calculating interobserver agreement. Journal of Applied Behavior Analysis, 9, 109-113.
Repp, A. C., Harmon, M. L., & Felce, D. (1984). A real-time, parallel entry, portable computer system for observational research. Paper presented at the annual meeting of the Association for Behavior Analysis.
Repp, A. C., Roberts, D. M., Slack, D. J., Repp, C. F., & Berkler, M. S. (1976). A comparison of frequency, interval, and time-sampling methods of data collection. Journal of Applied Behavior Analysis, 9, 501-508.
Rojahn, J., & Schroeder, S. R. (1983). Behavioral Assessment. In J. L. Matson & J. A. Mulick (Eds.), Handbook of mental retardation (pp. 227-243). New York: Pergamon Press.
Romanczyk, R. G., Kent, R. N., Diament, C., & O'Leary, K. D. (1973). Measuring the reliability of observational data: A reactive process. Journal of Applied Behavior Analysis, 6, 175-184.
Rumford, H. P. (1962). An experiment in teaching elementary school methods via closed circuit television. Journal of Educational Research, 56, 139-143.
Sackett, G. P. (Ed.) (1978a). Observing behavior, Vol. I: Theory and application in mental retardation. Baltimore, MD: University Park Press.
Sackett, G. P. (Ed.) (1978b). Observing behavior, Vol. II: Data collection and analysis methods. Baltimore, MD: University Park Press.
Salvia, J., & Ysseldyke, J. E. (1981). Assessment in special and remedial education, second edition. Boston: Houghton-Mifflin Co.
Sanson-Fisher, R. W., Poole, A. D., & Dunn, J. (1980). An empirical method for determining an appropriate interval length for recording behavior. Journal of Applied Behavior Analysis, 13, 493-500.
Schoggen, P. (1964). Mechanical aids for making specimen records of behavior. Child Development, 35, 985-988.
Schueler, H., & Gold., M. H. (1964). Video records of student teachers: A report of the Hunter College research project evaluating the use of kinescopes in preparing student teachers. Journal of Teacher Education, 15, 358-364.
Schwartz, M. S., & Schwartz, C. G. (1955). Problems in participant observation. American Journal of Sociology, 60, 343-353.
Skindrug, K. D. (1973). An evaluation of observer bias in experimental-field studies of social interaction. Dissertation Abstract International, 334, 4989.
Stoneman, Z., Brody, G. H., & Abbott, D. (1983). In-home observations of young Down Syndrome children with their mothers and fathers. American Journal of Mental Deficiency, 87-591-600.
Strain, P. S., Lambert, D. L., Kerr, M. M., Stagg, V., & Lenkner, D. A. (1983). Naturalistic assessment of children's compliance to teachers' requests and consequences for compliance. Journal of Applied Behavior Analysis, 16, 243-249.
Sulzer-Azaroff, B., & Mayer, G. R. (1977). Applying behavior-analysis procedures with children and youth. New York: Holt, Rinehart & Winston.
Taplin, P. S., & Reid, J. B. (1973). Effects of instructional set and experimental influences on oberver reliability. Child Development, 44, 547-554.
Thomas, D. S., Loomis, A. M., & Arrington, R. E. (1933). Observational studies of social behavior. New Haven: Institute of Human Relations, Yale University.
Thomas, J. D., Presland, I. E., Grant, M. D., & Glynn, T. L. (1978). Natural rates of teacher approval and disapproval in grade-7 classrooms. Journal of Applied Behavior Analysis, 11, 91-94.
Wasik, B. H., & Loven, M. O. (1980). Classroom observational data: Sources of inaccuracy and proposed solutions. Behavioral Assessment, 2, 211-227.
White, M. A. (1975). Natural rates of teacher approval and disapproval in the classroom. Journal of Applied Behavior Analysis, 8, 367-372.
Wildman, B. G., Erickson, M. T., & Kent, R. N. (1975). The effect of two training procedures on observer agreement and variability of behavior ratings. Child Development, 46, 520-524.
Wildman, B. G., & Erickson, M. T. (1977). Methodological problems in behavioral observation. In J. D. Cone & R. P. Hawkins (Eds.), Behavioral assessment: New directions in clinical psychology. New York: Brunner/Mazel.
Yarrow, M. R., & Waxler, C. Z. (1979). Observing interactions: A confrontration with methodology. In R. B. Cairns (Ed.), The analysis of social interactions: Methods, issues, and illustrations. Hillsdale, NJ: Erlbaum.
ALAN C. REPP is Professor, and GAYLA S. NIEMINEN, ELLEN OLINGER, and RITA BRUSCA are Instructors, Department of Learning, Development, and Special Education, Northern Illinois University, DeKalb.
|Printer friendly Cite/link Email Feedback|
|Author:||Repp, Alan C.; Nieminen, Gayla S.; Olinger, Ellen; Brusca, Rita|
|Date:||Sep 1, 1988|
|Previous Article:||The regular education initiative: patent medicine for behavioral disorders.|
|Next Article:||Measuring progress on IEPs: a comparison of graphing approaches.|