Printer Friendly

Comparison of Probe Procedures in the Assessment of Chained Tasks.

The fields of special education and behavioral science have devoted considerable attention to identifying evidence-based practices (EBPs) for individuals with disabilities (Cook & Odom, 2013; Slocum et al., 2014). Many of these practices have been empirically evaluated using single-case designs (SCDs). This is not surprising given the benefits and historical precedent of using SCD with individuals with disabilities when evaluating behavioral interventions (Baer, Wolf, & Risley, 1968; Homer et al., 2005). In order to validate practices as evidence based, the methods used should minimize possible threats to internal validity. For example, testing threats may occur when repeated measurement alone results in therapeutic or contratherapeutic changes in the value of the dependent variable (Gast, 2014). Uncontrolled facilitative effects can result in the dependent variable improving in baseline, thus confounding interpretations of the intervention effects. The occurrence of an improving baseline may prevent or delay the participant from receiving intervention (Johnston & Pennypacker, 2009). Inhibitive testing effects can also occur, resulting in suppressed responding and a more immediate introduction of intervention than would otherwise occur. In this scenario, visual analysis can produce attribution of overestimated levels of effect to the independent variable. Unfortunately, assessment of testing threats in baseline is difficult, given the likelihood of publication bias in the case of facilitative effects (e.g., these effects may result in no identification of a functional relation or a smaller effect, leading to less compelling findings and lower likelihood of publication; Shadish, Zelinsky, Vevea, & Kratochwill, 2016) and the inability to differentiate inhibitive effects from true effects (e.g., a baseline consisting of 0% correct responding may reflect true unknown behaviors or inhibitive testing effects). Content validity--the degree to which the measurement procedure captures the true value of a participant's ability--is another consideration when evaluating measurement procedures (Gast, 2014). Like testing threats, problems with content validity may lead to underor overestimating the potency of the intervention (Johnston & Pennypacker, 2009). It is critical that the identification of EBPs for individuals with disabilities are based on studies using valid measurement procedures.

Much of the research on daily living and vocational instruction for individuals with disabilities focuses on teaching skills that occur in a chain, such as prevocational (e.g., Smith et al., 2013) and daily living skills (e.g., Godsey, Schuster, Lingo, Collins, & Kleinert, 2008). Assessors may face challenges in measuring an individual's baseline performance for a chained task given that it is inherently more complex than measuring a single discrete response (Noell, Call, & Ardoin, 2011). Therefore, mastered steps may go undetected if preceded by steps not yet mastered. Typical measurement procedures for chained tasks involve one of two procedures: singleopportunity probe (SOP) or multiple-opportunity probe (MOP; Snell & Brown, 2000).

When SOP procedures are used, the assessor presents an opportunity to perform the first step of the task. The session continues until the participant engages in an error or completes all steps correctly. If the participant makes an error, the assessor scores the error and all subsequent (not attempted) steps as incorrect. Problems exist with using SOP procedures in applied research despite compelling rationales (e.g., time and cost efficiency; Moon, Inge, Wehman, Brooke, & Barcus, 1990). From an operant view, SOP procedures can affect the participant's future responding via exposure to punishment, extinction, and reinforcement. When the evaluation ends contingent on the first incorrect response (i.e., error), future attempts to complete steps may decrease (i.e., punishing attempts). Likewise, if opportunities for reinforcement for correct responding become unavailable, extinction of previously reinforced behaviors may occur, resulting in future nonresponding (Farlow, Lloyd, & Snell, 1987). Lastly, if the repeated inability to perform the task or steps of the task is aversive to the participant, inappropriate behaviors or incorrect responses may be negatively reinforced when the session is stopped and the task is removed. Given potential problems with SOP procedures, the analysis of graphed data may lead to false conclusions regarding baseline performance. Plotted data may suggest that a participant can do few, if any, steps of a task analysis because sessions end after a single error. Thus, SOP procedures may pose problems that affect content validity (via suppression of baseline responding) and overestimate intervention effects.

MOP procedures address many of the weaknesses with SOP procedures by providing individuals with opportunities to complete each step in the task. When an error occurs, the assessor arranges the environment out of view of the participant and then provides the participant with an opportunity to complete the next step. The session ends when either the participant or the assessor performs the last step in the chain. Thus, the individual can demonstrate his or her ability for every step, even if he or she does not perform all steps correctly in sequence (Snell & Brown, 2000). In the context of chained behaviors, the environmental arrangement following the completion of one step functions as the discriminative stimulus (SD) for initiation of the next step (Noell et al., 2011). When the assessor arranges the environment (i.e., completes the step correctly out of view of the participant), it should signal the availability of reinforcement forresponding required in the next step. The opportunity to perform each step potentially improves the utility of the probe procedure by the ability to document the presence of correct behaviors needed for successful completion of individual steps of task analyses (Gast, 2014). Additionally, if identified reinforcers are provided contingent on correct responding in baseline, MOP procedures provide an opportunity to reinforce all correct responses, potentially resulting in improved task engagement.

Although MOP procedures address many of the concerns with using SOP procedures, testing threats exist. Researchers have anecdotally noted the risk of facilitative (Farlow, Lloyd, & Snell, 1988) and inhibitive testing effects (Schuster, Gast, Wolery, & Guiltinan, 1988) when using MOP procedures. Others have abandoned the recommendation of using MOP procedures and instead used SOP procedures in an attempt to reduce the likelihood that the participant will acquire the skill in baseline (Tekin-Iftar, 2008). Although improving behavior is typically the goal of applied research, SCD research uses strict guidelines for stable baseline responding prior to implementation of the intervention. Changes in baseline data in a therapeutic direction compromise an assessor's ability to evaluate a functional relation and call into question the need for intervention (Cooper, Heron, & Heward, 2007). However, although participants' behavior may improve with repeated testing, they are not receiving intervention with practices likely to be more effective or efficient than learning from MOP procedures. Further, if repeated testing is sufficient for acquisition of targeted chained behaviors, the value of any instructional variable the assessor intended to evaluate may be questionable. Thus, weaknesses exist with both SOP procedures and MOP procedures for the assessment of chained behaviors.

To address documented problems, researchers have proposed and have preliminarily begun using an alternative procedure: the fixed-opportunity probe (FOP; previously referred to as the natural opportunity probe in Shepley, Smith, Ayres, & Alexander, 2017). With FOP procedures, the participant is given a set amount of time to complete the entire task (e.g., 2 min) unless he or she fails to correctly complete any steps within a shorter amount of time (e.g., 30 s). The total time is calculated by multiplying the number of steps by the number of seconds given for each step. For example, if evaluating a participant's baseline performance on assembling a flashlight--(a) insert battery, (b) insert second battery, (c) screw light bulb in top section, (d) assemble light bulb cover, and (e) connect body to top--where each step must be completed within 10 s, the experimenter would give the participant 50 s to complete the entire task. The first occurrence of a step adhering to the topographical definition is scored as correct (e.g., inserting the first battery in the correct way or securely placing the light bulb on top). Sequence is only considered when the order of steps is critical to correct task completion (e.g., Step 5 is critical to Step 4). The probe ends when one of the following occurs: (a) the participant notifies the assessor that he or she is finished, (b) a predetermined interval of time elapses without the completion of a correct step (e.g., 30 s), (c) the session timer ends (e.g., 50 s), or (d) the participant completes all steps correctly. Comparatively, not completing a sequential step correctly within 10 s would end the trial with SOP; with MOP, this would result in the assessor completing the step out of sight and the trial ending following either the assessor or participant completing all steps.

Although preliminary, FOP procedures have the potential to solve problems associated with SOP and MOP procedures. Specifically, FOP procedures give the participant an opportunity to complete all steps of a task while decreasing the likelihood that the participant will stop responding (an inhibitive effect; SOP procedures) and decreasing the likelihood that the participant will learn via exposure during baseline (a facilitative effect; MOP procedures). Although opportunities for correct responding for each step exist, the environment is not explicitly arranged (i.e., relevant SD for each step are not provided).

Alexander, Smith, Mataras, Shepley, and Ayres (2015) conducted a meta-analysis of SCD studies measuring chained tasks using multiple baseline and probe designs. Specifically, the researchers were interested in determining if there were differences in responding when using SOP or MOP procedures to assess baseline performance (i.e., absolute-level change within and across conditions and slope within conditions). No statistical differences were found between data from studies using MOP versus SOP procedures. The authors concluded that possible limitations to the number of articles found or publication bias were factors in the nonstatistically significant findings. Additionally, Alexander et al. proposed that the differential effects of probe procedures should be experimentally evaluated. Therefore, the current study sought to answer the following research questions: (a) What are the testing effects of SOP, MOP, and FOP procedures on the responding of chained nonsense tasks? and (b) What are the potential implications of differences in data patterns?



Twenty-one individuals volunteered for the study from a pool of 39 college students enrolled in special education courses at a large university. The experimenters recorded the order in which participants returned written consent and selected the first 12 responders for the study. Two of the original participants failed to attend the first session. Of these, one dropped out based on limited availability, and the other did not return emails requesting times for rescheduling a makeup session. Therefore, two additional volunteers were recruited for participation from the original list.

Participants met the following criteria, as verified via selfreport: (a) enrolled as an undergraduate or graduate student in special education or school psychology, (b) naive to the procedures or purpose of the study, and (c) absence of sensory impairments that could prohibit participation in blockbuilding tasks. Participants' ages ranged from 21 to 27 years (M = 23); one man and 11 women participated. Before they were exposed to any of the three assessment procedures, participants answered two questions: (a) "What are the two most widely used probe procedures for assessing chained tasks (if unsure, take a guess or answer with '1 don't know')?" and (b) "If you are familiar with them, have you used one, both, or neither of the probe procedures in practice or research?" Six participants answered the first question correctly; out of those, five reported using one or both procedures. Based on the students' courses at the time of the study, all had been exposed to information regarding SOP and MOP procedures.

Setting and Arrangements

All sessions took place in one of five rooms located in the College of Education building. Each room included a table and one to three chairs. Additional materials used by other individuals occupying the space were present (e.g., dry erase markers, sanitizing wipes, file cabinet). The participant completed tasks while sitting at the table; the experimenter stood to the right and the secondary reliability observer stood to the left. No other individuals were present.


The experimenter created three sets of five large blocks (i.e., Mega Bloks). Each was given a name to represent a specific nonsense structure (i.e., ruzzer, lifton, galtee). Sets were created out of a pool of 15 similar but nonidentical blocks in a variety of colors (red, yellow, green, blue, white, lime, turquoise), shapes (single, long, square, slope, two), and sizes (short, tall). The experimenter counterbalanced color, shape, and size across three sets to decrease the likelihood of differential difficulty in building structures. A black-and-white grid was printed and laminated for use during all sessions. The grid measured 12.7 x 12.7 cm and was composed of 16 (4 x 4) 3.18-cm squares, each labeled with a letter and a number (i.e., A1-D4) in 36-point Times New Roman font. The blocks and nonsense structures were chosen not only to allow for tasks of similar difficulty across procedures but also to simulate situations in which participants might be familiar with the materials and the functions of those materials but are initially unable to complete the task (e.g., typical of situations in which functional skills are taught to individuals with disabilities).

Response Definitions and Recording Procedures

Data Collection Trained observers independently collected data on responding for each task. Observers scored steps as correct for participant behaviors meeting the topographical definition. Observers scored steps incorrect for SOP and MOP conditions if a topographical, sequential, latency, or duration error occurred. During SOP and FOP conditions, observers scored steps as incorrect at the end of the session for all steps not attempted or completed incorrectly (see Table 1).

Task Analyses Each task (i.e., building of a nonsense structure) included six steps and incorporated both critical and noncritical steps (Williams & Cuvo, 1986). Critical steps are those that must be completed prior to subsequent steps; noncritical steps can occur in any order. For example, to complete the third step in a task of washing clothes in a washing machine (i.e., pour detergent into machine), a participant must remove the cap from the container (Step 1, critical for Step 2) and then pour the detergent into the cap (Step 2, critical for Step 3). Each task was designed to simulate typical chained tasks by including steps that were critical to the completion of other steps. For each task, Steps 1 and 2 were critical for Step 3 to occur, and Step 5 was critical for Step 6 to occur.

Each task also included a manipulation of a block already placed to mimic chained tasks in which the same material is manipulated more than once and does not affect correct completion of the final product. For example, if preparing a peanut butter sandwich. Step 1 (i.e., open jar) is necessary; Step 4 (i.e., close jar) is not necessary to correctly create the sandwich but may be desirable to prevent spoiling. The order and number of steps in which placement changed were consistent across all tasks in order to maintain similar response difficulty. Color photos of block structures and a list of steps for completing each structure are available from the first author.

Experimental Design

An adapted alternating treatments design (AATD) was used for each of the 12 participants to compare the effects of SOP, MOP, and FOP procedures on acquisition of chained nonsense tasks. Traditional use of AATDs includes baseline data on all skills and compares responding with at least two different interventions on at least two sets of behaviors (Wolery, Gast, & Ledford, 2014). In this study, SOP, MOP, and FOP procedures were compared and treated as three independent variables. One structure was assigned to each assessment procedure for each participant (see Table 2). Assignments were counterbalanced across participants; two participants were assigned to each possible variation. This allowed the experimenter to compare the effects of repeated exposure to each probe procedure and assist in detection of differential task difficulty. Data were collected for six sessions across all participants and conditions to allow for visual inspection and comparison of effects.

Interobserver Agreement (IOA) and Procedural Reliability (PR)

IOA and PR data were collected for all probe procedures for 33% of sessions for each participant. To calculate IOA, data from each step of the task analyses were compared and scored as agreed or disagreed. Percent agreement was calculated by dividing the number of agreements by the number of agreements plus disagreements and multiplying by 100 (Ayres & Ledford, 2014). Mean agreement for the SOP, MOP, and FOP conditions was 100%, 100%, and 98.6% (range of 83%100%), respectively. To assess PR, the observer rated each experimenter behavior (e.g., blocks set to the left of the grid, task direction given, praise provided) as correct, incorrect, or not applicable. To calculate PR, the number of correct responses was divided by the sum of correct plus incorrect responses and multiplied by 100 (Ayres & Ledford, 2014). The mean PR across participants for the SOP, MOP, and FOP conditions was 99.4% (range of 93%-100%), 98.5% (range of 91.5%-100%), and 99.4% (range of 93%-100%), respectively. The overall mean was 99.1%.


Each session included three trials (one for each probe procedure). One or two sessions occurred each day for 1 to 2 days per week, with at least 1 hr between sessions. All participants completed the sessions within 2 weeks (range of 5-14 days, M = 8.5 days). There were six possible order combinations for sessions that were randomly assigned and counterbalanced for each participant in an attempt to control for potential order effects. At the beginning of the session, the experimenter positioned the grid on the table in front of the participant. For each trial, the experimenter placed the corresponding blocks for the task on the table to the left of the grid and delivered the task direction (e.g., "Create the ruzzer"). The experimenter provided a single direction to the participant to complete the nonsense task to simulate a typical testing situation in which the target behavior is unknown but the participant likely has a learning history with the materials (e.g., an individual who is being assessed on his ability to make a sandwich has likely encountered bread and deli meat and thus is familiar with how they might be manipulated). The experimenter delivered general, nondescript praise contingent on correct responses and at the end of each trial. Specific steps for SOP, MOP, and FOP procedures are described in the following sections and are displayed in Table 3.

SOP Condition The participant had 5 s to initiate (i.e., touching the correct block) and 5 s to complete each step in the task, beginning with the provision of the task direction. The trial continued until the participant engaged in an error or completed all steps correctly. When an error occurred, the experimenter stopped the participant from engaging with the materials by saying "stop," and the trial ended.

MOP Condition Similarly to the SOP condition, researchers provided the participant 5 s to initiate (i.e., touching the correct block) and 5 s to complete steps in the task analysis. If the participant engaged in an error, the experimenter asked the participant to close his or her eyes, blocked his or her view with a binder, completed the step, asked the participant to open his or her eyes, and allowed the participant 5 s to initiate the next step. The trial ended when either the experimenter or the participant completed the last step in the task analysis.

FOP Condition Following the task direction, the participant had a total of 60 s to complete the entire task (i.e., 5 s to initiate + 5 s to complete x 6 steps). The trial ended when 60 s elapsed, 30 s elapsed without a correct response, or the participant completed all steps.


Correct responding was measured via direct observation, calculated as a percentage correct, and evaluated via visual analysis in the context of the AATD. Figure 1 displays the individual probe data for all 12 participants. Each row of graphs contains data from the two participants who were assigned to the same condition variation (e.g., MOP procedures with ruzzer structure). Figure 2 displays participants' mean responding across each probe procedure.

SOP Condition Toni was the only participant to respond correctly; she completed one step correctly on the last session. Responding in all sessions for the other 11 participants was at 0%.

MOP Condition All but one participant (i.e., Kaylee; 17%) responded with 0% accuracy during the first session. By the second session, half of the participants increased responding to between 17% and 50% correct (i.e., between one and three steps correct; M = 16.7%). By the sixth session, half of the participants obtained 100% correct responding, with the remainder responding between 17% and 83% correct by the last session (M = 80.5%). Within-condition analyses showed that 11 participants increased correct responding from the first to the last session with a mean absolute-level change of 79.1% (i.e., the value of the last data point minus the value of the first data point; Gast & Spriggs, 2014). Relative-level changes were also calculated and resulted in a mean of 51.3% across participants (i.e., the value of the median of the last 3 data points minus the value of the median of the first 3 data points; Gast & Spriggs, 2014).

FOP Condition Correct responding ranged from 0% to 50%; seven participants responded correctly at least once, and the other five remained at 0% for all sessions. Four participants engaged in correct responding in one session and completed one step correctly (i.e., 17%). The remaining three participants (i.e., Kassie, Aricia, and Adele) responded correctly on more than one session. Aricia, Liza, and Adele increased correct responses from the first session to the last session (i.e., 33%, 17%, and 33%), and Kassie decreased correct responses (i.e., 17%). The mean absolute-level change across participants was 5.5%. Aricia was the only participant with a relative increase in trend (i.e., 17%). Kassie and Adele's data demonstrated a relative decrease in trend (i.e., -17% and -33%). The mean relative-level change across the participants was -2.8%.

Across Conditions Variability was relatively low across all probe procedures. Stable responding was almost always associated with the SOP condition, where responding remained at 0% for the majority of sessions. Some variability was observed in both the FOP and MOP conditions, but more variability occurred in the MOP condition. When looking at changes in trend within probe procedures, absolute and relative change was lowest in the SOP condition (i.e., 1.4% and 0%, respectively). FOP conditions resulted in a mean absolute-level increase of 5.5% but a decrease of 2.8% when calculating relative change. MOP conditions resulted in the greatest mean increase for both absolute-level (79.1 %) and relative-level (51.3%) changes.


The purpose of this study was to evaluate possible testing threats with SOP, MOP, and FOP procedures. When examining the effects of probe procedures on completion of nonsense tasks, there were distinct patterns in responding. First, visual analysis demonstrated a stable trend in the SOP condition, with 11 participants completing zero correct responses in any session, showing a possible inhibitive testing effect. Although it is unlikely that the participants had prior knowledge of how to build the structures, they likely had block building in their repertoires. This is analogous to steps in chains that individuals may know how to complete out of sequence or within other chains. Consider two task analyses related to washing surfaces, where the participant is able to perform one task at 100% (e.g., cleaning a window) and a new one is being assessed (e.g., cleaning a wooden table). Both tasks have similar steps, such as wiping the surface, but differ in the way the solution gets to the surface (e.g., a Windex bottle for windows vs. a Pledge bottle for furniture). In this case, the wiping behavior is in the participant's repertoire, but it would never get assessed on the cleaning furniture task if the participant performed the spray step incorrectly if a SOP procedure is used.

All participants responded correctly at some point during MOP conditions, and half achieved 100% responding between Sessions 4 and 6. Most data in the MOP conditions were moderately variable with steep trends and large absolute- and relativelevel changes, indicative of a facilitative testing effect. From an operant perspective, this likely occurred because the competition of the step by the experimenter served as a model that increased the future likelihood that the participant would engage in the correct behavior in the next session and thus contact reinforcement (i.e., praise for correct responding or simply negative reinforcement in the form of completing a step and being closer to finishing the task). Accessing reinforcement thus resulted in maintenance of previously acquired steps and acquisition of new steps. This was a significant finding given the widespread use of MOP procedures to assess chained tasks, with little attention given to facilitative testing threats. Although texts and published studies discuss such concerns, this was the first time (to the best of the researchers' knowledge) that the effects were experimentally evaluated.

FOP conditions resulted in two patterns of responding: Five participants maintained 0% correct responding across all sessions, and seven participants engaged in some correct responding (i.e., range of 0%-50%). Of these seven, four participants correctly performed a step during one trial but failed to repeat the correct response during subsequent trials (Jonas, Liza, Kassie, and Kaylee); one participant correctly performed a step and maintained correct performance (Aricia); one participant correctly performed three steps but performed them variably (e.g., performed Step 4 in Trials 1,2,3, and 6; performed Step 1 in Trials 2,3, and 6 ; performed Step 2 in Trials 2 and 6); and one participant performed a step correctly during the last trial only. Overall, no participant demonstrated consistent learning over time (consistent with a testing threat), with the exception of Aricia, who performed zero, then one, then two steps correctly and maintained responding through Session 6.

Variable responding across participants is interesting because it may illustrate that for some steps (e.g., noncritical), participants had more of an opportunity to respond when compared to the SOP condition but did not necessarily acquire or maintain responding like they did in the MOP condition. Alternatively, the FOP condition may have resulted in similar inhibitive effects as the SOP condition for some steps (e.g., critical). Unlike a MOP procedure, both FOP and SOP procedures do not include an opportunity for participants to perform steps in which the previous step is completed for them following an error or no response. Additionally, studies are needed to evaluate trends in responding during FOP procedures.

In summary, SOP conditions resulted in zero-celerating stable data, MOP conditions resulted in accelerating trends with large level changes, and FOP conditions resulted in variable but minimal responding. Data comparisons confirm the potential presence of threats for SOP and MOP procedures. For example, if using data from SOP procedures alone, the likely conclusion would be that participants were unable to complete the task. However, correct responding occurred in both the FOP and MOP conditions. Further, participants learned from the MOP condition despite the fact that the steps were not logically related compared to tasks where setting up the environment for the step may evoke a behavior previously acquired within the context of a different chain (e.g., holding a cup of laundry detergent evokes pouring behavior). These data confirm that MOP procedures may result in facilitative testing threats, given the minimal correct responding in the SOP and FOP conditions. The results are unlikely to be a by-product of unequal task difficulty, given the counterbalanced assignment of structures to procedures across participants.

Implications for Research

The results of this study suggest that MOP procedures are likely to lead to facilitative testing threats and that SOP procedures may lead to inhibitive testing effects. We recommend that researchers avoid using MOP procedures alone when possible and instead use other assessment methods such as combining procedures or exploring new ones. For example, a researcher could conduct the first baseline session with a MOP procedure and then use FOP procedures for the remainder of the sessions. With this method, the participant is initially given an opportunity to complete each step to minimize inhibitive testing threats associated with SOP procedures while minimizing facilitative testing threats and acquisition of other unrelated behaviors (e.g., closing eyes between steps in the chain) from repeated exposure to MOP procedures.

Additional research is needed to determine whether facilitative testing threats are more likely to be associated with particular types of tasks or participants. When MOP procedures are used, extended minimum baselines (e.g., a minimum of five vs. three sessions) should be used. This recommendation is a result of some participants demonstrating stable responding in Sessions 1 to 3, followed by ascending trends in Sessions 4 to 6. Baseline conditions with a minimum of 5 data points are also consistent with some contemporary recommendations (What Works Clearinghouse, 2010). The potential benefits of extended baseline conditions when using MOP procedures are illustrated in Fig. 3. For simulation, each graph contains a condition change line between the first and last set of 3 data points. If the graphs simulated a baseline in the first three sessions, it is clear that intervention may have been introduced for some participants (i.e., the left column in Fig. 3). The researcher may have falsely attributed improvements in behavior to the intervention procedures. Conversely, if the researcher collected baseline data for five sessions, the increasing trend would have alerted the researcher to a participant learning during the baseline condition (although attributions to probe procedures vs. history or maturation effects would not be possible).

Similarly, SOP procedures should be used with caution and perhaps only following the use of MOP or FOP procedures in which a participant exhibited 0% correct responding. If SOP procedures are used alone, results should be interpreted conservatively to avoid overly attributing large level changes across conditions to the intervention. Additionally, researchers should identify the use of SOP procedures as a limitation potentially tempering the conclusions to be drawn.

Given the lack of valid measurement procedures for chained tasks, researchers should continue to evaluate variables that are more likely threats with SOP and MOP procedures. The results of this study suggest that threats exist when SOP and MOP procedures are used; however, it is unclear which variables increase or decrease the likelihood of such effects. Additionally, alternative options such as FOP procedures or combination approaches should be explored. Although FOP is a promising alternative, data patterns from the FOP condition suggest potential problems as well. For example, some participants built similar structures from one session to another, suggesting that a chain of errors was reinforced. Other procedural variations may aid in increasing the validity of measurement procedures (e.g., giving more explicit instructions).

The last suggestion is related to publication bias. Most of the recommendations made for the use of particular probe procedures came from discussion sections or personal experiences. Alexander et al. (2015) concluded that publication bias was a possible reason for the nonstatistically significant differences in trends during baseline when SOP and MOP procedures were used. It is likely that researchers do not publish data when participants learn from MOP procedures during baseline, making the planned intervention unnecessary. The inclusion of such data in future research can only advance the field in further understanding patterns in threats to validity from probe procedures.


A number of limitations related to this study are important to mention. First, although careful consideration was paid to creating three equal tasks, there were some patterns in responding by task type. With FOP procedures, no correct responding ever occurred with the ruzzer task, and correct responding occurred most frequently with the lifton task. It is possible that the ruzzer task was more difficult when paired with the FOP procedures, whereas the lifton task was easier. Second, some of the participants reported having knowledge of SOP and MOP procedures. Out of the 12 participants, half answered the question correctly about the two types of probe procedures. Knowledge about the procedures could alert participants on how to respond and may result in participants responding differently than those without knowledge. The only evident pattern with MOP procedures and selfreported knowledge was that the two participants who did not have an ascending baseline were unable to answer the question correctly. Otherwise, participants with and without self-reported knowledge demonstrated facilitative testing effects. There were no discernible differences in participant performance in FOP conditions based on self-reported knowledge, which is not surprising given its limited dissemination. Third, it is possible that the praise provided for correct responding served as a reinforcer for accurate block-building behavior, thus accounting for all or some of the effects reported instead of the probe procedures themselves. The decision was made to include praise for correct responding to typify what is typically observed in research evaluating the effects of an intervention on chained tasks (see Batu, 2008; Ersoy, Tekin-Iftar, & Kircaali-Iftar, 2009; and Stonecipher, Schuster, Collins, & Grisham-Brown, 1999, for examples). Lastly, this study was conducted with college student participants and not with the population in which chained tasks are typically assessed. This question could not be easily answered using typical chained tasks with individuals with disabilities because it is essentially impossible to equate task difficulty given heterogeneous steps and history with materials. This limits the generality of findings to individuals with disabilities, as the special education literature and a large portion of the behavioral sciences literature include the use of the probe procedures under investigation. Therefore, researchers should evaluate the effects of these procedures on participants with disabilities.


This study sought to answer two research questions related to probe procedures and their possible threats to measurement validity. The findings suggest that facilitative effects are a testing threat with MOP procedures; therefore, researchers should (a) avoid them altogether; (b) combine them with other procedures, or (c) include a minimum of five sessions for MOP procedures prior to intervention. SOP procedures have possible inhibitive testing threats and lack content validity; therefore, researchers should avoid them altogether or combine them with other procedures. If SOP and MOP procedures are used, researchers should include conservative interpretations of their results and acknowledge the use as a limitation to content validity. As outlined in this article, researchers and practitioners should consider using FOP procedures when evaluating performance on chained tasks. More research on probe procedures is needed for assessing chained tasks to increase the validity of the measurement procedures being used.

DOI 10.1007/s40732-017-0257-9

Compliance with Ethical Standards All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Conflict of Interest On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Informed Consent Informed consent was obtained from all individual participants included in the study.


Alexander, J. L., Smith, K. A., Mataras, T., Shepley, S. B., & Ayres, K. M. (2015). A meta-analysis and systematic review of the literature to evaluate potential threats to internal validity in probe procedures for chained tasks. Journal of Special Education, 49, 135-145.

Ayres, K. M., & Ledford, J. R. (2014). Dependent measures and measurement procedures. In D. L. Gast & J. R. Ledford (Eds.), Single case reseaivh methodology: Applications in special education and behavioral sciences (2nd ed., pp. 124-153). New York, NY: Routledge.

Baer. D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, I, 91-97.

Batu, S. (2008). Caregiver-delivered home-based instruction using simultaneous prompting for teaching home skills to individuals with developmental disabilities. Education and Training in Developmental Disabilities, 43, 551-555.

Cook, B. G., & Odom, S. L. (2013). Evidence-based practices and implementation science in special education. Exceptional Children, 79, 135-144.

Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Columbus, OH: Pearson.

Ersoy, G., Tekin-Iftar, E., & Kircaali-Iftar, G. (2009). Effects of antecedent prompt and test procedure on teaching simulated mental care skills to females with developmental disabilities. Education and Training in Developmental Disabilities, 44, 54-66.

Farlow, L. J., Lloyd, B. H., & Snell, M. E. (1987, October). Assessing student performance: The effect of procedural contrast between training and probe conditions. Paper presented at the 14th Annual Conference of the Association for the Severely Handicapped, Chicago, IL.

Farlow, L. J., Lloyd, B. H., & Snell, M. E. (1988, March). The implications of the procedural contrast between training and probe conditions on the interpretation of student performance data. Paper presented at the annual con ference of the Council for Exceptional Children. DC: Washington.

Gast, D. L. (2014). General factors in measurement and evaluation. In D. L. Gast & J. R. Ledford (Eds.), Single case research methodology: Applications in special education and behavioral sciences (2nd ed., pp. 85-104). New York, NY: Routledge.

Gast, D. L., & Spriggs, A. D. (2014). Visual analysis of graphic data. In D. L. Gast & J. R. Ledford (Eds.). Single case research methodology: Applications in special education and behavioral sciences (2nd ed., pp. 176-210). New York, NY: Routledge.

Godsey, J. R., Schuster, J. W., Lingo, A. S., Collins, B. C., & Kleinert, H. L. (2008). Peer-implemented time delay procedures on the acquisition of chained tasks by students with moderate and severe disabilities. Education and Training in Developmental Disabilities. 43,111-122.

Homer, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidencebased practice in special education. Exceptional Children, 71, 165-179.

Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics of behavioral research (3rd ed.). New York, NY: Routledge.

Moon, M. S., Inge, K. J., Wehman, P., Brooke, V., & Barcus, J. M. (1990). Helping persons with severe mental retardation get and keep employment: Supported employment issues and strategies. Baltimore, MD: Brookes.

Noell, G. H., Call, N. A., & Ardoin, S. P. (2011). Building complex repertoires from discrete behaviors by establishing stimulus control, behavioral chains, and strategic behavior. In W. W. Fisher, C. C. Piazza, & H. S. Roane (Eds.), Handbook of applied behavior analysis (pp. 250-269). New York, NY: Guilford Press.

Schuster, J. W., Gast, D. L., Wolery, M., & Guiltinan, S. (1988). The effectiveness of a constant time-delay procedure to teach chained responses to adolescents with mental retardation. Journal of Applied Behavior Analysis, 21, 169-178.

Shadish, W. R., Zelinsky, N. A., Vevea, J. L., & Kratochwill, T. R. (2016). A survey of publication practices of single-case design researchers when treatments have small or large effects. Journal of Applied Behavior Analysis, 49, 656-673.

Shepley. S. B., Smith, K. A., Ayres, K. M., & Alexander, J. L. (2017). Use of video modeling to teach adolescents with an intellectual disability to film their own video prompts. Education and Training in Autism and Developmental Disabilities, 52, 158-169.

Slocum, T. A., Detrich, R., Wilczynski, S. M., Spencer, T. D., Lewis, T., & Wolfe, K. (2014). The evidence-based practice of applied behavior analysis. The Behavior Analyst, 37, 41-56.

Smith, K. A., Ayres, K. M., Mechling, L. C., Alexander, J. L., Mataras, T. K., & Shepley, S. B. (2013). Evaluating the effects of a video prompt in a system of least prompts procedure. Career Development and Transition for Exceptional Individuals, 38, 39-49.

Snell, M. E., & Brown, F. (2000). Instruction of students with severe disabilities (5th ed.). Upper Saddle River, NJ: Pearson.

Stonecipher, E. L., Schuster, J. W., Collins, B. C., & Grisham-Brown, J. (1999). Teaching gift wrapping skills in a quadruple instructional arrangement using constant time delay. Journal of Developmental and Physical Disabilities, 11, 139-158.

Tekin-Iftar, E. (2008). Parent-delivered community-based instruction with simultaneous prompting for teaching community skills to children with developmental disabilities. Education and Training in Developmental Disabilities, 43, 249-265.

What Works Clearinghouse. (2010). Procedures and standards handbook. Version 3.0. Retrieved from Docs/referenceresources/wwc_procedures_v3_0_standards_ handbook.pdf

Williams, G., & Cuvo, A. J. (1986). Training apartment upkeep skills to rehabilitation clients: A comparison of task analytic strategies. Journal of Applied Behavior Analysis, 19, 39-51.

Wolery, M., Gast, D. L., & Ledford, J. R. (2014). Comparison designs. In D. L. Gast & J. R. Ledford (Eds.), Single case research methodology: Applications in special education and behavioral sciences (2nd ed., pp. 297-345). New York, NY: Routledge.

Jennifer L. Alexander (1) * Kevin M. Ayres (2) * Sally B. Shepley (3) * Katie A. Smith' * Jennifer R. Ledford (4)

Published online: 27 September 2017

[mail] Jennifer L. Alexander

(1) Comprehensive Behavior Change, 3870 Peachtree Industrial Blvd. Suite 340-177, Duluth, GA 30096, USA

(2) Department of Communication Sciences and Special Education, The University of Georgia, Athens, GA, USA

(3) Department for Early Childhood, Special Education, and Rehabilitation Counseling, The University of Kentucky. Lexington, KY, USA

(4) Department of Special Education, Vanderbilt University, Nashville, TN, USA

Caption: Fig. 1 Participants' data during SOP (open circles), MOP (open triangles), and FOP (open squares) trials. Trials arranged by participant groupings. All tasks included the same number of steps. SOP = singleopportunity probe; MOP = multiple-opportunity probe; FOP = fixedopportunity probe

Caption: Fig. 2 Mean data by session for participants. Data are displayed for SOP (open circles). MOP (open triangles), and FOP (open squares) trials. All tasks included the same number of steps. SOP = single-opportunity probe; MOP = multiple-opportunity probe; FOP = fixed-opportunity probe

Caption: Fig. 3 Illustration of facilitative testing threat inherent with MOP. The left column includes graphs in which the first 3 data points represent steady-state responding. The right column includes graphs in which beginning intervention would be unlikely with unstable baseline data. All tasks included the same number of steps. MOP =: multipleopportunity probe
Table 1 Description of Possible

Error           Definition                        SOP   MOP   FOP

Topographical   Engaging in a behavior            X     X
                other than that described
                for the step

Sequential      Engaging in a behavior            X     X
                described for a step
                but out of order

Latency         Elapse of 5 s without             X     X
                initiation following the
                task direction (i.e., Step 1)
                or completion of previous step
                (i.e., Steps 2-6)

Duration        Elapse of 5 s after               X     X
                initiating without completion
                of the step

No response     Step not attempted or             X           X

SOP single-opportunity probe, MOP multiple-opportunity probe,
FOP fixed-opportunity probe

Table 2 Grouping Assignments

Participant         Assignment no.   Probe procedure   Task type

Jonas and Liza      1                SOP               Ruzzer
                                     MOP               Lifton
                                     FOP               Galtee
Aricia and Adele    2                SOP               Ruzzer
                                     MOP               Galtee
                                     FOP               Lifton
Kaylee and Kassie   3                SOP               Galtee
                                     MOP               Ruzzer
                                     FOP               Lifton
Sandy and Ellie     4                SOP               Lifton
                                     MOP               Ruzzer
                                     FOP               Galtee
Starla and Toni     5                SOP               Lifton
                                     MOP               Galtee
                                     FOP               Ruzzer
Hannah and Carol    6                SOP               Galtee
                                     MOP               Lifton
                                     FOP               Ruzzer

SOP single-opportunity probe, MOP multiple-opportunity probe, FOP
fixed-opportunity probe

Table 3 Description of Probe Procedures

Single-opportunity      Multiple-opportunity    Fixed-opportunity
probe                   probe                   probe

1. Set blocks to the    1. Set blocks to the    1. Set blocks to the
   left of the grid.       left of the grid.       left of the grid.

2. Provide a task       2. Provide a task       2. Provide a task
   direction               direction               direction
   ("Create the--").       ("Create the--").       ("Create the ").

3. Allow 5 s to         3. Allow 5 s to         3. Start trial timer
   initiate each           initiate each           for completion
   step.                   step.                   amount (i.e.,
                                                   initiation time of
                                                   each step +
                                                   completion time of
                                                   each step; e.g.,
                                                   [5 s x 6 steps] +
                                                   [5 s x 6 steps] =
                                                   60 s).

4. Allow 5 s to         4. Allow 5 s to         4. If participant
   complete each           complete each           performs a step
   step.                   step.                   correctly, mark
                                                   (+) on the data
                                                   sheet and provide

5. If participant       5. If participant       5. Assessment ends
   performs the step       performs the step       when:
   correctly, mark         correctly, mark
   (+) on the data         (+) on the data      a. Participant
   sheet and provide       sheet and provide       completes all
   praise.                 praise.                 steps correctly
                                                b. 30 s elapse
                                                   without correct
                                                c. Timer ends

6. If participant       6. If participant       6. Provide praise
   performs an error.      performs an error,      for participating.
   mark (-) on the         mark (-) on the
   data sheet and          data sheet, ask
   end the probe           the participant
   trial.                  to close his or
                           her eyes, block
                           view with binder,
                           complete the step
                           correctly, remove
                           binder from view,
                           ask the
                           participant to
                           open his or her
                           eyes, and allow
                           him or her to
                           attempt the next

7. Assessment           7. Assessment ends      7. Mark all steps
   ends when:              when the last           not completed
                           step is completed       correctly as
a. First error occurs      by the participant      incorrect (-).
b. Participant             or researcher.
   completes all
   steps correctly
   in order

8. Provide praise for   8. Provide praise
   participating.          for participating.

9. Mark all steps
   not attempted as
   incorrect (-).
COPYRIGHT 2017 Springer
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Alexander, Jennifer L.; Ayres, Kevin M.; Shepley, Sally B.; Smith, Katie A.; Ledford, Jennifer R.
Publication:The Psychological Record
Article Type:Report
Date:Dec 1, 2017
Previous Article:Cooperation and Metacontingency in Pigeons.
Next Article:Effects of Levamisole on Cocaine Self-Administration by Rats.

Terms of use | Privacy policy | Copyright © 2021 Farlex, Inc. | Feedback | For webmasters |