Assessment of a rotating time sampling procedure: implications for interobserver agreement and response measurement.
The current study was designed to evaluate a rotating momentary time sampling (MTS) data collection system. A rotating MTS system has been used to measure activity preferences of preschoolers but not to collect data on responses that vary in duration and frequency (e.g., talking). We collected data on talking for 10 preschoolers using a 5-s MTS and interpolated larger MTS values. Results indicated that 30-, 60-, and 90-s intervals approximated the 5-s MTS. Next, we implemented a rotating MTS for each value in which we observed either 10 or 19 students. For all interval durations, there was high interval-by-interval interobserver agreement (IOA). However, we found overall low occurrence IOA. These results illustrate that it is important to consider which IOA algorithm is used, based on the type of data collected.
Keywords: data collection, interobserver agreement, momentary-time sampling, response measurement
Behavior analytic interventions are frequently used in classroom settings to address problem behavior (e.g., Wright-Gallo, Higbee, Reagon, & Davey, 2006), off-task behavior (e.g., Lo & Cartledge, 2004), skill acquisition (e.g., Smith, Spooner, Jimenez, & Browder, 2013), and engagement (e.g., Hanley, Cammilleri, Tiger, & Ingvarsson, 2007). Objective measurement of behavior is one of the critical and defining features of applied behavior analysis (Baer, Wolf, & Risley, 1968). For the past five decades, behavioral researchers have refined methods of measurement by evaluating discontinuous (e.g., Bijou, Peterson, & Ault, 1968; Hanley et al., 2007; Powell, Martindale, Kulp, Martindale, & Bauman, 1977; Rapp, Colby-Dirksen, Michalski, Carroll, & Lindenberg, 2008; Wirth, Slaven, & Taylor, 2014) and continuous (e.g., MacLean, Tapp, & Johnson, 1985; Mudford, Martin, Hui, & Taylor, 2009; Repp, Deitz, Boles, Deitz, & Repp, 1976) measurement systems.
Continuous measurement allows one to record three measureable properties of behavior suggested by Johnston and Pennypacker (1993): count (the number of responses or events that occurred), temporal locus (when in time a response or event occurs), and temporal extent (the duration of a response or event). Continuous measurement is increasingly becoming the method of choice due to the availability of data collection technologies (Mudford, Taylor, & Martin, 2009). However, continuous measurement can be time consuming because data collectors must continuously observe participant(s). In addition, it is typically only possible to record data for one individual at a time. Therefore, continuous measurement may not be practical for teachers or clinicians who do not have the time or resources to continuously observe the behavior of one child.
Discontinuous methods, although used less frequently than continuous measurement, are still fairly common. Mudford, Taylor et al. (2009) report that 45% of articles published in the Journal of Applied Behavior Analysis between 1995 and 2005 used discontinuous methods. In a review of six recent issues of Education and Treatment of Children (Volume 37, Issues 1-2, and Volume 36, Issues 1-4), approximately 46% (11 of 24) of research articles that directly observed behavior used discontinuous measurement. It is possible that discontinuous methods are even more common outside of the research community (e.g., when teachers collect data in educational settings). There are three discontinuous recording methods: partial interval recording (PIR), whole interval recording (WIR), and momentary time sampling (MTS; Powell, Martindale, & Kulp, 1975). The level of behavior tends to be overestimated in PIR and underestimated in WIR as compared to continuous recording (Powell et al., 1977). MTS typically is the least biased of the three discontinuous methods when compared to continuous recording of duration events (Meany-Daboul, Roscoe, Bourret, & Ahearn, 2007; Powell et al., 1977; Rapp et al., 2008). However, Wirth et al. (2014) conducted a computer simulation in which they applied MTS, PIR, and WIR methods to randomly distributed target events and demonstrated that MTS tends to underestimate low-rate duration events and overestimate high-rate duration events.
Regardless of the level of behavior in a given observation, Powell et al. (1977) demonstrated that all three discontinuous observation methods had minimal error as compared to continuous recording when the observation interval was 5 s. Thus, if it is not possible to use continuous recording, discontinuous recording with intervals of 5 s may be acceptable. However, 5-s intervals may be just as impractical as continuous observation. Fortunately, MTS intervals as large as 120 s have been shown to produce less than 10% difference in percentage of intervals compared to an MTS of 5 s, suggesting that large MTS intervals may acceptably estimate duration events (Hanley et al., 2007). Further, error may be affected by the duration of each individual event and the duration of the observation. MTS intervals have minimal absolute error when event duration is similar to or greater than the MTS interval (Wirth et al., 2014). Therefore, the longer the total observation, the more similar MTS-generated data are to continuous data (Devine, Rapp, Testa, Henrickson, & Schnerch, 2011).
A practical advantage of MTS over continuous observation, PIR, and WIR is that a target individual's behavior does not need to be observed for the entire interval. Therefore, it is possible to observe multiple participants in one observation, particularly if larger MTS values can be used. Hanley et al. (2007) measured 20 preschoolers' activity preferences by rotating from child to child in a fixed order during free playtime (hereafter referred to as rotating MTS [rMTS]). Observers recorded the location of each child in the room, which corresponded to certain types of activities (e.g., art). Based on findings that intervals up to 120 s estimate a 5-s MTS with minimal error, Hanley et al. evaluated the rMTS with intervals of 30, 60,90, and 120 s. For example, in the 30-s rMTS, each child was observed in each 30-s interval, and there were exactly 30 s between observations of the same child. Hanley et al. concluded that the rMTS with 60-, 90-, and 120-s intervals might be an acceptable method of collecting data on multiple participants in one observation because interobserver agreement (IOA) was above 90%.
Despite the apparent advantages of the rMTS, only a few studies have applied this data collection system (e.g., Hanley, Tiger, Ingvarsson, & Cammilleri, 2009; Lo & Cartledge, 2004). Thus, the feasibility of the rMTS system for various responses is unknown. In the studies by Hanley et al. (2007; 2009), observers recorded the physical location of each child. Hanley et al. (2007) noted that most children stayed in one or two play areas, suggesting that switching between play areas was infrequent. Other responses that teachers may be interested in recording may be more variable in that the responses may start and stop more frequently, which could affect properties of data collection. Given the practical benefits of the rMTS, it is important to determine whether it could be feasible with other types of responses.
One method to assess the feasibility of a data collection system is to calculate error, or the difference in levels of a response obtained by two different data collection systems (Hanley et al., 2007; Powell et al., 1977; Wirth et al., 2014). However, calculating error requires that more than one data collection system be used and would render the advantages of using discontinuous methods with larger intervals moot. The most common alternative approach to quantify properties of response measurement is IOA, or the extent to which two independent observers agree that event(s) did or did not occur (Mudford, Taylor et al., 2009).
Most methods of IOA involve dividing the observation into intervals and determining the extent to which observers agree about events that occurred in each interval. The agreement scores are then averaged and multiplied by 100 to yield a percentage. In exact IOA (e.g., Repp et al., 1976), agreement is either 1 (observers scored the same number of responses) or 0 (observers did not score the same number of responses). In block-by-block (Mudford, Martin et al. 2009; also called proportional agreement, Rolider, Iwata, & Bullock, 2012), agreement is based on the quotient of responses scored by each observer and can be anywhere from 0 to 1 (see Mudford, Martin et al., 2009).
Because discontinuous recording systems indicate whether or not a response occurred in an interval (i.e., they do not indicate how many responses occur in each interval), different methods must be used to evaluate discontinuous data (Rapp, Carroll, Stangeland, Swanson, & Higgins, 2011). Common IOA algorithms for discontinuous observations are occurrence, nonoccurrence, and interval-by-interval IOA. Only intervals in which at least one observer recorded that response did occur (as in occurrence IOA) or did not occur (as in nonoccurrence IOA) are included in the algorithm (Bijou et al., 1968). Intervalby-interval agreement, the discontinuous corollary to block-by-block agreement, is a combination of occurrence and nonoccurrence IOA (Rolider et al., 2012). Interval-by-interval IOA is also sometimes called overall agreement or interval agreement (Hopkins & Hermann, 1977).
In general, exact agreement tends to be the most stringent method. That is, agreement tends to be lower when the algorithm is calculated using the exact method, particularly with high-rate behavior (Mudford, Martin et al., 2009; Repp et al., 1976; Rolider et al., 2012). However, with low-rate behavior, the exact method may inflate IOA and as such, be an inappropriate method to use to calculate IOA. For example, exact IOA may be very high (close to 100%) if one observer scores two responses in a 10-min observation and the other observer scores zero instances because both observers would have scored zero instances of the response in the majority of intervals. In that same case, if occurrence IOA was used, agreement would be 0%.
The rate of behavior and the method of data collection (continuous versus discontinuous) are important considerations for selecting an IOA algorithm. For low-rate discontinuous data, Rapp et al. (2011) suggest that interval-by-interval agreement may be too lenient. Further, Bijou et al. (1968) recommend reporting occurrence IOA for very low-rate responses and nonoccurrence IOA for high-rate responses. Moreover, Hopkins, and Hermann (1977) suggest that multiple methods of calculating IOA can and should be reported. A review of six issues of Education and Treatment of Children (Volume 37, Issues 1-2 and Volume 36, Issues 1-4) indicates that of the 11 research articles that utilized discontinuous measurement, eight reported interval-by-interval IOA only and three did not report which IOA method was used. In addition, approximately half of the research articles that used continuous observation reported interval-by-interval agreement (6 of 13). Therefore, despite nearly 30-year-old recommendations regarding choice of IOA, interval-by-interval IOA appears to be one of the most common IOA algorithms, particularly for discontinuous measurement.
Although the rMTS has the potential to be a very useful data collection system when data are collected on more than one individual (Hanley et al., 2007), discontinuous methods of data collection might lead to inflated IOA scores (Rapp et al., 2011). As such, it is important to evaluate how different methods of calculating IOA may impact an rMTS data collection system. Studies that have applied an rMTS used interval-by-interval IOA both when the rate of behavior is high (e.g., Hanley et al., 2007) or low (e.g., Lo & Cartledge, 2004). However, no study has utilized more than one IOA algorithm. Therefore, it is unclear how different types of responses will affect different types of IOA.
The rMTS may also be suitable for other types of responses that are of interest to clinicians and teachers. For example, a teacher may wish to decrease the amount of time that students are talking out of turn during independent work periods. Talking is also a behavior that could be recorded as a duration event and may be amenable to discontinuous observation techniques. Unlike play area location (Hanley et al., 2007), talking is highly variable, it is likely to start and stop rapidly, and it may occur only briefly (e.g., 2 s) or for extended periods of time (e.g., 3 min). One purpose of the current study was to systematically replicate the procedures of Hanley et al. (2007) with a response of variable duration--talking. First, we planned to determine the correspondence of records using various MTS interval durations with a record using brief (5 s) intervals (error analysis). If, like Hanley et al., larger intervals estimated 5-s MTS data with minimal error, it would be possible to evaluate an rMTS that takes advantage of larger MTS values to record talking. Second, we planned to determine rMTS intervals that resulted in high levels of IOA. Initially, like Hanley et al., we calculated IOA using interval-by-interval agreement. We expected that the rMTS would not result in acceptable IOA ([greater than or equal to] 80%), particularly with smaller MTS intervals. However, interval-by-interval IOA was quite high. Despite this, experienced data collectors reported that they were not confident in the accuracy of their records. Thus, the third purpose of the current study was to evaluate whether the rMTS would be deemed effective when different methods of IOA were used to compare the data collectors' records.
Experiment 1: Error Analysis
Experiment 1 was a systematic replication of the error analysis described by Hanley et al. (2007). Experiment 1 was conducted to determine acceptable MTS interval sizes for recording data on talking. Like Hanley et al., we interpolated intervals from 5-s MTS data streams and compared the percentage of intervals obtained in each interval to the percentage of intervals obtained with the 5-s MTS.
Participants and setting
Ten typically-developing preschool students (5 female, 5 male) between the ages of 4.57 and 5.32 years (M=5.01) participated in the study. Observations were conducted during free play periods in participants' typical classroom. Observations prior to the start of the study indicated that children talked at variable levels during free play periods.
Talking was defined as the child moving his or her mouth and making sounds that could form words. For instance, "ah" would be counted, but a cough or laugh would not. If observers could not see the child's face or locate the child in the designated interval, the inter val was scored as "no talking." The purpose of scoring "no talking" was to reflect what would typically occur. That is, if a teacher could not tell if a child was talking, he or she may assume the child was not talking. The primary dependent variable was the difference score, which was the difference in the percentage of intervals in which a child was talking given one of several MTS values and the 5-s MTS value--the standard for comparison. A 5-s MTS was chosen as the standard (a) to replicate Hanley et al. (2007) and (b) based on prior research that indicates that discontinuous measurement intervals of 5 s yielded nearly identical levels of responses as continuous recording (Powell et al., 1977).
Two observers independently collected data for 60% of observations in the error analysis. The primary data collectors were Board Certified Behavior Analysts who were enrolled in a doctoral program with five or more years of experience with behavioral data collection. Secondary observers were undergraduate students with less than one year of experience with data collection. Secondary observers were trained in the data collection procedures until interval-by-interval IOA was 80% or greater for three consecutive observations. To calculate interval-by-interval IOA, each interval was 5-s in duration (identical to the MTS value), and the interval was scored as an agreement if both observers recorded that the participant was (or was not) talking. The number of agreements was divided by the total number of 5-s intervals and multiplied by 100. To calculate occurrence IOA, the number of agreements that a participant was talking was divided by the number of intervals that at least one observer recorded that talking occurred and multiplied by 100. Mean interval-by-interval IOA was 91.90%, and mean occurrence IOA was 75.38%. Occurrence IOA was fairly low, likely because the overall occurrence of talking was somewhat low (28.52% percentage of intervals).
We replicated the error analysis procedure described by Hanley et al. (2007) with ten students. Each student was observed in one 18min observation. Observers recorded talking using a 5-s MTS. In the final second of each 5-s interval, observers recorded whether or not the target student was talking. This procedure was repeated for the remaining nine students. From the 5-s record, we interpolated 10-, 20-, 30-, 60-, 90-, 120-, 180-, 360-, 540-, and 1080-s MTS intervals (see Table 1). To interpolate the 10-s MTS, we only included values scored at intervals of 10 in the calculation. For example, if the observer scored talking at seconds 5, 10, and 20, only seconds 10 and 20 contributed to the percentage of intervals calculation for the 10 s MTS. This was repeated for each of the MTS values. We subtracted the percentage of intervals for each MTS value from the percentage of intervals obtained by the 5-s MTS to obtain a difference score. As in Hanley et al., we considered intervals with difference scores less than 10% as acceptable and evaluated those values in the IOA analysis.
Results and Discussion
Figure 1 depicts the mean difference scores collapsed across participants at each interpolated MTS value. Intervals of 10, 20, 30, 60, and 90 s produced mean difference scores of 2.87%, 4.35%, 4.26%, 6.57%, and 6.48%, respectively. Values greater than 90 s produced mean difference scores above 10%. Therefore, 30-, 60-, and 90-s intervals were considered acceptable and assessed in the IOA analysis. The difference scores for intervals of 10 and 20 s were also less than 10 s; however, like Hanley et al. (2007), these intervals were not considered for the IOA analysis because (a) the difference scores were similar to the 30-s MTS and (b) the interval sizes in the rMTS would have been extremely small (e.g., 0.5 s).
A potential limitation of Experiment 1 is that a 5-s MTS was used as the standard rather than continuous measurement. Even small intervals may have some nonsystematic error. However, prior research has shown that 5-s MTS values result in nearly identical percentage of intervals scored as continuous measurement (Powell et al., 1977). Furthermore, we chose to use a 5-s MTS to ensure that our results and those of Hanley et al. (2007) were mutually interpretable.
In addition, like Hanley et al. (2007), we interpolated larger MTS intervals from the 5-s MTS data. However, it is possible that data collected in real time with larger MTS intervals would be different than interpolated data. Unfortunately, it is extremely difficult to address this limitation. Two (or more, if collecting IOA) data collectors could record data for a given observation. One could use a 5-s MTS and the other could record data using a large MTS interval. However, discrepancies could be due to individual differences, poor IOA; or both, rather than measurement error. Alternatively, researchers could video-record observations and then have an observer score the data using each of the MTS intervals. However, differences in interpolated and real-time observations may be due to practice effects.
Like prior studies, we found that fairly large MTS intervals (up to 90 s) may still estimate a duration event similarly to MTS intervals of 5 s. Unlike Hanley et al. (2007) intervals of 120 s had difference scores larger than 10%. This may be due the variable nature of talking.
Talking started and stopped frequently, and each individual instance of talking was fairly short. Because we did not collect continuous data on talking, we cannot specify exactly how long each bout of talking lasted. However, it was not uncommon for talking to be scored in one interval but not the surrounding intervals, suggesting that it was sometimes 5 s or less. These results contrast with the findings of Wirth et al. (2014), which report that large MTS intervals (e.g., 90 s) are not appropriate for events that occur for a short duration (e.g., 5 s). Because intervals up to 90 s resulted in minimal error, it was possible to evaluate the rMTS for talking.
Experiment 2: Interobserver Agreement Analysis
Experiment 1 suggested that the percentage of intervals obtained in 30-, 60-, and 90-s MTS intervals were within 10% of the 5-s MTS. Therefore, the initial purpose of Experiment 2 was to replicate the IOA analysis of Hanley et al. (2007), in which acceptable MTS interval sizes were used to observe up to 20 children at a time using an rMTS procedure. We expected that IOA would be low, particularly with smaller intervals, due to the variable nature of our dependent variable. However, interval-by-interval IOA was high. Thus, the secondary purpose of Experiment 2 was to analyze different algorithms for calculating IOA to determine whether different conclusions would be drawn.
Participants and setting
Nineteen students (8 female, 11 male) from the same classroom as Experiment 1, including the 10 participants from Experiment 1, participated in the IOA analysis, ranging in age from 4.53 to 5.47 (M=5.02). The setting was identical to Experiment 1.
Observers recorded data on talking, which was as defined in Experiment 1. Although data were recorded on talking, the primary dependent variable in Experiment 2 was the percentage of agreement between observers for talking. We analyzed interval-by-interval and occurrence IOA. The algorithms were identical to those used to assess IOA in Experiment 1, except that the interval size varied depending on the size of the MTS interval. For example, for MTS intervals of 30 s when all students were observed, the IOA interval was 1.5 s (see procedure for details on how interval sizes were calculated). Agreement scores are reported in the Results section.
We replicated the interobserver analysis conducted by Hanley et al. (2007). Each observation was 18 min. Data were recorded using an rMTS with interval durations of 30-, 60-, or 90-s, which resulted in acceptable difference scores in Experiment 1. In other words, these intervals had less than a 10% difference with a 5-s MTS. As noted for Experiment 1, intervals of 10 and 20 s were not included because the difference scores were nearly identical to 30 s and the interval sizes were deemed too small for the rMTS to be practical. Observers stood in a central location of the classroom where they could view all students. It was not always possible to have a clear view of every child's face from this location. Data collectors did not change positions, however, because it would have required data collectors to constantly move during the observation, and we wanted the system to be feasible and simple for teachers or clinicians to implement. The primary and secondary data collectors were the same observers described in Experiment 1. The data collection procedure was explained and practiced with both research assistants. However, undergraduate research assistants were not provided with training until IOA reached a certain criterion because the purpose of Experiment 2 was to investigate how IOA was affected by the rMTS.
Nineteen students. The rMTS was conducted as initially described by Hanley et al. (2007). Children were observed in a fixed order that was determined at the start of each observation. To determine the order, observers wrote down the names of each child in the classroom on their data sheets from left to right in the room. To ensure that both data collectors recorded data for the same student, prior to collecting data, the data collectors said the names of the student aloud as they wrote them on the data sheet and practiced locating each child in the order listed on the data sheet at least one time prior to each observation. If a child changed locations during an observation, data for that child were still collected at the time that was determined at the start of session. In each observation, data were collected using an rMTS with intervals of either 30, 60, or 90 s. That is, only one interval size was used to collect data in each observation. The interval duration between observations of the same child in a given observation was 30, 60, or 90 s. The interval duration between observations of one child to the next was 1.5, 3.0, and 4.5 s, respectively. Values were identical to those used by Hanley et al. and calculated as the rMTS interval divided by 20. Because there were only 19 students in the class, the last interval of each round was not assigned to a student and was skipped. Furthermore, intervals for absent students were skipped. The missing trials were not included in the calculations for percentage of intervals or IOA. Two observers wore headphones connected to one MP3 player, which played a tone at the end of each interval. For instance, during the MTS 30 s, the tone played every 1.5 s.
Ten students. Because we expected IOA to be poor when all students were observed due to very small inter-child observation intervals, we also evaluated the procedure with only 10 students. Prior to the observation, the primary data collector wrote down the names of all students that were present. Then, 10 of the present students were randomly selected. Their names were recorded on the data sheets from left to right in the room. The procedure was the same as with all students except that the inter-child interval duration was the MTS value divided by 10 (3, 6, and 9 s, respectively).
Results and Discussion
Figure 2 depicts the results of the IOA analysis. The data paths show IOA for each MTS value across observations. Each data point reflects one observation. Interval-by-interval IOA is presented on the leftmost panels and occurrence IOA is shown on the rightmost panels. The gray horizontal bar depicts the range of IOA (interval and occurrence, respectively) from Experiment 1 (the error analysis) to make comparisons to the IOA analysis. Results for the 19-student sample are presented on the top, and results for the sample 10-student are on the bottom. Interval-by-interval IOA was high for 30 s (M=90.48%), 60 s (M=88.11), and 90 s (M=91.56) for the 19-student sample. Similarly, mean interval-by-interval IOA was 90.28%, 90.00%, and 91.39%, respectively, when 10 students were observed. Occurrence IOA was low for all MTS values but was systematic. Mean occurrence IOA was lowest for MTS 30 s (19.37% for 19 students and 23.59% for 10 students). Mean occurrence IOA for MTS 60 s was 46.46% for 19 students and 48.60% for 10 students. For MTS 90 s, mean occurrence IOA was 56.96% for 19 students and 51.86% for 10 students.
With both samples of students, occurrence IOA was higher in Experiment 1 than all the rMTS conditions. In addition, occurrence IOA was much lower in the 30-s MTS conditions for both samples of students than any of the other conditions. Table 2 summarizes the mean difference of occurrence IOA in each of the MTS conditions. Therefore, despite somewhat low occurrence IOA in Experiment 1 (75.38%), these results indicate that (a) the rMTS results in lower occurrence IOA than recording MTS data for only one person at a time and (b) the smaller the rMTS interval, the lower the occurrence IOA.
Figure 3 depicts the percentage of intervals with talking in each condition. Overall, data collectors recorded fewer instances of talking when using the rMTS than was observed in Experiment 1. Lower recorded levels of talking could be due to less actual talking or be an artifact of the rMTS system.
By only assessing interval-by-interval IOA, we have evidence of a systematic replication of the findings reported by Hanley et al. (2007). That is, high interval-by-interval IOA was obtained suggesting that the rotating IOA method was a feasible system. However, our cautious interpretations of the interval-by-interval IOA findings led us to consider an alternative metric. Specifically, we assessed occurrence IOA, which was low, and suggested that this system may not be appropriate for collecting data on a variable response such as talking.
We were somewhat surprised by the high interval-by-interval IOA. However, IOA measures the extent to which two observers agree rather than the accuracy of the data recorded (Johnston & Pennypacker, 1993). High interval-by-interval IOA may be due to the relatively low level of talking. As previously noted, prior research indicates that interval-by-interval IOA may be inflated with low-rate responses (Bijou et al., 1968; Hopkins & Hermann, 1977). In Experiment 1, talking was recorded in 28.52% of intervals. The low rate of talking may have been an artifact of our definition of talking and data collection system. Because the rMTS was fast-paced, particularly with smaller intervals, it was difficult to locate every child at every predetermined interval. In addition, children were in various locations throughout the room and occasionally changed locations, which made it difficult to track them. Therefore, there were many non-occurrences of talking, which may have inflated interval-by-interval IOA. To reiterate, we chose to measure talking as the dependent variable for two reasons. First, observations indicated that the behavior occurred variably, and we wanted to evaluate whether the rMTS would be feasible with a variable response. Second, teachers and practitioners may be interested in collecting data on behavior like talking (e.g., talking out of turn), and we wanted our dependent variable to be a response that was relevant in an educational setting.
Anecdotally, data collectors reported that they did not believe that they were accurately recording data in many of the rMTS sessions. In particular, observers reported that they had the most difficulty locating the next child in the rotating sequence in the 30-s rMTS, in which they recorded talking of participants every 1.5 s. Two pieces of objective evidence supported this. First, the percentage of intervals with talking was lower using the rMTS than it was using the single-student 5-s MTS. Second, occurrence IOA decreased as rMTS interval size decreased. Although occurrence IOA was below acceptable levels even in Experiment 1, it was much lower in the IOA analysis. Hanley et al. (2007) did not report occurrence IOA, but it is probable that occurrence IOA was high because observers recorded in which of eight play areas a child was located or if he or she was out of a play area. Based on chance alone, the probability that a child was in a play area was .89 (i.e., 8 out of 9). Therefore, it is unlikely that a large number of non-occurrences were scored in Hanley et al., and occurrence IOA was likely more similar to interval-by-interval IOA.
The original purpose of this study was to replicate the method of data collection used by Hanley et al. (2007) to determine if the system was feasible in a classroom setting with a response that was of variable duration (i.e., talking). Experiment 1 was consistent with the results of the error analysis in Hanley et al. In Experiment 2, interval-by-interval, but not occurrence, IOA was high when we observed either 10 or 19 children in a single observation. These results are partially consistent with Hanley et al., which reported high interval-by-interval agreement. However, there is little reason to believe that occurrence IOA would have been low in Hanley et al. This finding led us to further examine different methods of computing IOA.
Results suggest that selection of IOA method may have an impact on the conclusions that are drawn. That is, had we not considered occurrence IOA, we may have concluded that we replicated the findings of Hanley et al. (2007) and that the rMTS is a feasible measure for variable responses like talking. Prior research on measurement and IOA has recommended that different algorithms of calculating IOA be used with different types of behaviors (e.g., Hopkins & Herman, 1977; Mudford, Martin et al., 2009; Repp et al., 1976). For example, occurrence IOA may be more stringent than interval-by-interval IOA with low-frequency events. Nonetheless, our review of recent articles published in Education and Treatment of Children suggests that studies using discontinuous methods report interval-by-interval IOA to the exclusion of other algorithms. The current study adds to prior research that indicates that it is important to consider the best method of calculating IOA for each response rather than universally using one method.
In addition, any measurement evaluation must consider individual differences of observers. In the current study, the primary data collectors were all Board Certified Behavior Analysts and doctoral students with over five years of experience with various data collection systems. However, the secondary observers were undergraduate students with less than one year of experience collecting behavioral data. Experience with data collection may have impacted occurrence IOA. It is possible that occurrence IOA would have been higher had both observers for each observation been highly experienced. Therefore, an rMTS may require more extensive training than was provided in Hanley et al. (2007) or the current study when recording data on variable responses.
Furthermore, we did not evaluate how the rMTS is affected by phase changes in which the level of behavior changes. Frequently, data are collected on responses during different phases of an intervention. For example, a teacher may record baseline data on talking. Then, the teacher may implement an intervention to decrease talking. It is important to determine whether the data collection system can detect potential changes in a response across phases. Devine et al. (2011) found that MTS was more sensitive to phase changes than PIR. In addition, the longer the observation, the more likely it was for the MTS to detect small changes in the level of the response. Results of Devine et al. may translate to the rMTS. However, additional research to evaluate sensitivity of the rMTS to phase changes is warranted.
The rMTS may not be a feasible data collection strategy for highly variable behavior like talking. However, this does not mean that the rMTS is not a feasible strategy in general. It may be useful for measuring responses of interest to teachers and clinicians such as staying on-task during academic work, staying in one's seat, or crying. Unlike talking, these responses may be less likely to start and stop rapidly (e.g., staying in-seat) or may be more obvious (e.g., crying), and therefore, they may be easier to record using an rMTS. Furthermore, teachers may wish to record data on multiple students in a single observation when implementing group contingencies. In addition, the rMTS may be useful when children are all located in fixed positions in a small space and are unlikely to move for a period of time (e.g., at circle time). In the current study, the students frequently changed positions, which made it difficult to track each student during the rMTS conditions. Moreover, it may not always be necessary to observe all children in a classroom, and the rMTS may be more useful when it is used with smaller numbers of participants. We attempted to demonstrate this by recording data for a small sample of students, but results were nearly identical to the large sample of students. However, results may have differed with a different target response or even fewer participants. For example, a teacher may identify 4-5 students that are frequently out-of-seat during independent instructional times. The rMTS may be an efficient procedure to record data in this scenario. Additional research should evaluate the ideal conditions for the use of the rMTS system.
The utility of behavior analytic principles and interventions will be determined, in part, by the measurement systems that provide evidence of convincing demonstrations. The professional practice of behavior analysis is growing rapidly, and familiarity with the nuances, strengths, and weaknesses of various measurement systems may impact a significant portion of the clinical populations that behavior analysts serve. In principle, measurement is simple. In practice, measurement can be challenging. For this reason, we hope to promote research that sets useful boundaries for the implementation of common measurement strategies.
Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91-97. doi: 10.1901/jaba.1968.1-91
Bijou, S. W., Peterson, R. F., & Ault, M. H. (1968). A method to integrate descriptive and experimental field studies at the level of data and empirical concepts. Journal of Applied Behavior Analysis, 1, 175-191. doi: 10.1901/jaba.1968.1-175
Devine, S. L., Rapp, J. T., Testa, J. R., Henrickson, M. L., & Schnerch, G. (2011). Detecting changes in simulated events using partial-interval recording and momentary time sampling III: Evaluating sensitivity as a function of session length. Behavioral Interventions, 26,103-124. doi:10.1002/bin.328
Hanley, G. P., Cammilleri, A. P., Tiger, J. H., & Ingvarsson, E. T. (2007). A method for describing preschoolers' activity preferences. Journal of Applied Behavior Analysis, 40(4), 603-618. doi: 10.1901/ jaba.2007.603-618
Hanley, G. P., Tiger, J. H., Ingvarsson, E. T., & Cammilleri, A. P. (2009). Influencing preschoolers' free-play activity preferences: An evaluation of satiation and embedded reinforcement. Journal of Applied Behavior Analysis, 42, 33-41. doi: 10.1901/jaba.2009. 42-33
Hopkins, B. Lv & Hermann, J. A. (1977). Evaluating interobserver reliability of interval data. Journal of Applied Behavior Analysis, 10, 121-126. doi: 10.1901/jaba.1977.10-121
Johnston, J. M., & Pennypacker, H. S. (1993). Readings for strategies and tactics of behavioral research (2nd ed.). Hillsdale, NJ: Erlbaum.
Lo, Y., & Cartledge, G. (2004). Total class peer tutoring and interdependent group oriented contingency: Improving the academic and task related behaviors of fourth grade. Education and Treatment of Children, 27, 235-262.
MacLean, W. E., Tapp, J. T., & Johnson, W. L. (1985). Alternate methods and software for calculating interobserver agreement for continuous observation data. Journal of Psychopathology and Behavioral Assessment, 7, 65-73. doi:10.1007/BF00961847
Meany-Daboul, M. G., Roscoe, E. M., Bourret, J. C., & Ahearn, W. H. (2007). A comparison of momentary time sampling and partial-interval recording for evaluating functional relations. Journal of Applied Behavior Analysis, 40, 501-514. doi:10.1901/ jaba.2007.40-501
Mudford, O. C., Martin, N. T., Hui, J. K. Y., & Taylor, S. A. (2009). Assessing observer accuracy in continuous recording of rate and duration: Three algorithms compared. Journal of Applied Behavior Analysis, 42,527-539. doi: 10.1901/jaba.2009.42-527
Mudford, O. C., Taylor, S. A., & Martin, N. T. (2009). Continuous recording and interobserver agreement algorithms reported in the Journal of Applied Behavior Analysis (1995-2006). Journal of Applied Behavior Analysis, 42, 165-169. doi: 10.1901/jaba. 2009.42-165
Powell, J., Martindale, B., & Kulp, S. (1975). An evaluation of timesample measures of behavior. Journal of Applied Behavior Analysis, 8, 463-469. doi: 10.1901/jaba,1975.8-463
Powell, J., Martindale, B., Kulp, S., Martindale, A., & Bauman, R. (1977). Taking a close look: Time sampling and measurement error. Journal of Applied Behavior Analysis, 10, 325-332. doi: 10.1901/jaba.1977.10-325
Rapp, J. T., Carroll, R. A., Stangeland, L., Swanson, G., & Higgins, W. J. (2011). A comparison of reliability measures for continuous and discontinuous recording methods: Inflated agreement scores with partial interval recording and momentary time sampling for duration events. Behavior Modification, 35, 389402. doi:10.1177/0145445511405512
Rapp, J. T., Colby-Dirksen, A. M., Michalski, D. N., Carroll, R. A., & Lindenberg, A. M. (2008). Detecting changes in simulated events using partial-interval recording and momentary time sampling. Behavioral Interventions, 23, 237-269. doi: 10.1002/ bin.269
Repp, A. C., Deitz, D. E. D., Boles, S. M., Deitz, S. M., & Repp, C. F. (1976). Differences among common methods for calculating interobserver agreement. Journal of Applied Behavior Analysis, 9,109-113. doi: 10.1901/jaba.1976.9-109
Rolider, N. U., Iwata, B. A., & Bullock, C. E. (2012). Influences of response rate and distribution on the calculation of interobserver reliability scores. Journal of Applied Behavior Analysis, 45, 753-762. doi: 10.1901/jaba.2012.45-753
Smith, B. R., Spooner, F., Jimenez, B. A., & Browder, D. (2013). Using an early science curriculum to teach science vocabulary and concepts to students with severe developmental disabilities. Education and Treatment of Children, 36, 1-31. doi: 10.1353/ etc.2013.0002
Wirth, O., Slaven, J., & Taylor, M. A. (2014). Interval sampling methods and measurement error: A computer simulation. Journal of Applied Behavior Analysis, 47, 83-100. doi:10.1002/jaba.93
Wright-Gallo, G. L., Higbee, T. S., Reagon, K. A., & Davey, B. J. (2006). Classroom-based functional analysis and intervention for students with emotional/behavioral disorders. Education and Treatment of Children, 29, 421-436.
Jessica L. Becraft
John C. Borrero
Barbara J. Davis
Amber E. Mendres-Smith
University of Maryland, Baltimore County
Author note: We would like to thank Sayeh Chaharbaghi, Patricia Hallberg, and Kathleen Hand for their assistance with data collection. Completion of this study and preparation of this manuscript were supported in part by Grant RO1HD049753 from the Eunice K. Shriver National Institute of Child Health and Human Development (NICHD). Its contents are solely the responsibility of the authors and do not represent the official views of NICHD.
Address correspondence to: J. C. Borrero, Department of Psychology, University of Maryland, Baltimore County, 1000 Hilltop Road, Baltimore, MD 21250. Phone: 410-455-2326. Email: firstname.lastname@example.org
Table 1 Momentary Time Sampling Interpolation Intervals Measurement Number of Percentage of interval(s) observations observation Interpolation rule 5 216 100.0 Each interval 10 108 50.0 Every second interval 20 54 25.0 Every fourth interval 30 36 16.7 Every sixth interval 60 18 8.3 Every 12th interval 90 12 5.7 Every 18th interval 120 9 4.2 Every 24th interval 180 6 2.8 Every 30th interval 360 3 1.4 Every 60th interval 540 2 0.9 180th and final interval 1080 1 0.5 Final interval only Table 2 Mean Difference of Occurrence IOA Scores during the 5-s MTS Interval and the 30-, 60-, and 90-s Rotating MTS Intervals MTS Interval M 19 Students Rotating Rotating Rotating MTS Interval 5-s 90-s 60-s 30-s M10 Students Ms 75.38 56.96 46.46 19.37 5-s 75.38 -- 18.43 28.92 56.01 Rotating 90-s 51.86 23.53 -- 10.50 37.59 Rotating 60-s 48.60 26.79 3.26 -- 27.09 Rotating 30-s 23.59 51.80 28.27 25.01 -- Note. Values above the diagonal represent the mean difference occurrence IOA between MTS conditions with 19 students. Values below the diagonal indicate mean difference occurrence IOA between MTS conditions with the sample of 10 students. The conditions are listed in descending order based on the mean IOA. There are dash marks along the diagonal because a difference between a condition and itself cannot be calculated. IOA=Interobserver agreement; MTS=Momentary time sampling.
|Printer friendly Cite/link Email Feedback|
|Author:||Becraft, Jessica L.; Borrero, John C.; Davis, Barbara J.; Mendres-Smith, Amber E.|
|Publication:||Education & Treatment of Children|
|Date:||Feb 1, 2016|
|Previous Article:||Daniel Eugene Hursh March 11, 1947-July 22, 2015.|
|Next Article:||The effects of word box instruction on acquisition, generalization, and maintenance of decoding and spelling skills for first graders.|