Competition among Stimulus-Stimulus Relations in an Intelligence-Test Format.
In the past two decades, important steps have been taken towards the development of a functional-analytic model of analogical reasoning (Barnes, Hegarty, & Smeets, 1997; Stewart, Barnes-Holmes, Roche, & Smeets, 2001, 2002) based on the experimental paradigm of equivalence relations (Sidman, 1971). Mostly inspired by Relational Frame Theory (RFT; Hayes, Barnes-Holmes, & Roche, 2001), this area of research analyzed the conditions under which participants were able to relate relations (Stewart, Barnes-Holmes, Hayes, & Lipkens, 2001). For example, we can say that "apple is to manzana as orange is to naranja" following the classical structure of analogies if we realize that apple and manzana share a common meaning in English and Spanish (relation X) as the same as orange and naranja (also relation X).
The majority of experimental studies concerned with this behavior analytic approach to analogical reasoning have used complex or multielement conditional discriminations to train arbitrary stimulus relations and test the derived relational behavior. For example, in the pioneering study by Barnes et al. (1997) participants were trained on a series of conditional discriminations and then tested for equivalence relations. Once the participants demonstrated to have formed four classes of arbitrarily related stimuli (A1, B1, C1, D1; A2, B2, C2, D2; A3, B3, C3, D3; A4, B4, C4, D4), the authors used pairs of related (e.g., A1B1) and nonrelated stimuli (e.g., C3B4) as multielement samples and comparisons. It was observed that in the presence of a compound sample formed by equivalent stimuli, participants would choose the comparison composed of related stimuli (equivalence-equivalence). When the sample was formed by nonequivalent stimuli, participants chose the comparison composed of unrelated stimuli (no equivalence-no equivalence). In other words, participants treated one instance of relation X (e.g., equivalence between A1 and B1) as equivalent to another instance of relation X (e.g., equivalence between C2 and D2).
Other experiments have employed different arrangements to form stimulus classes. A training and testing structure of special relevance for the present work is second-order conditional discrimination, where one or more stimuli provide contextual control of a conditional discrimination (Hernandez-Pozo, 1986; Sidman, 1986). In a prototypical example, two squares as contextual stimuli (relation X, same shape) indicate that in the presence of a circle sample, choosing a circle comparison (instead of, say, a triangle) will be reinforced because they share the same relation (relation X). Performance on second-order conditional discriminations can be transferred to other discriminations (Perez-Gonzalez & Martinez, 2007), including equivalence-equivalence (Perez & Garcia, 2008). The participants in this experiment learned the prerequisite conditional discriminations to form equivalence classes. Then, some of them successfully completed a relational evaluation in either equivalence-equivalence or second-order conditional discrimination format. Finally, they successfully solved the relational task with the other format. These and other tests demonstrated that stimulus control in these experiments could be based not only on an isolated stimulus or stimulus compound, but also on the relation between stimuli.
The role of nonarbitrary relations in analogies (e.g., shape, color) turned out to be important, because analogies in everyday life share arbitrary as well as nonarbitrary relations. Following the previous example, an apple and an orange share the label "fruits," which is an arbitrary relation, but they also share nonarbitraiy properties such as being round, of comparable size, etc. In a similar vein, Stewart, Barnes-Holmes, Roche, and Smeets (2001) showed that the abstraction of nonarbitrary properties, such as color, could be the basis for equivalence and equivalence-equivalence responding, extending the previous model of analogical reasoning to include nonarbitrary relations in the RFT account of analogical reasoning.
Competing stimulus control between arbitrary and nonarbitrary relations has been found when arbitrary and nonarbitrary relations coexist in an analogy, (e.g., Garcia, Bohorquez, Perez, Gutierrez, & Gomez, 2008; Kenny, Devlin, Barnes-Holmes, Barnes-Holmes, & Stewart, 2014). For example Garcia, Gutierrez, Bohorquez, Gomez, and Perez (2002), using a traditional equivalence-equivalence task, found that when a comparison shared an arbitrary relation (relation X) with the sample but the other shared a nonarbitrary relation (relation Y), 75-80% of the participants made their choices following the nonarbitrary relation.
Different authors have argued that competing stimulus control may explain some failures in derived relational responding tasks, including analogies (e.g., Rehfeldt, Dixon, Hayes, & Steele, 1998; Ruiz & Luciano, 2011). For example, Garcia et al. (2008) gave 75 college students an equivalence-equivalence task where the arbitrary relation of equivalence competed with the nonarbitrary relation of physical similarity. Depending on training conditions, the percentage of participants who chose the arbitrary relation varied from 27% to 100%. This effect was found not only with arbitrary stimuli, but also with everyday categories such as tools or animals (Garcia, Perez, Gutierrez, Gomez, & Basulto, 2013), supporting the ecological validity of the findings.
The aforementioned competition among stimulus-stimulus relations may also occur in some standardized intelligence tests that rely on visual analogies. Some of these tests, such as the Progressive Matrices Test or the g-factor test, include certain items that resemble the structure of second-order conditional discriminations (although they rapidly increase in complexity). This is also the case of the Test of Nonverbal Intelligence (TONI-2; Brown, Sherbenou, & Johnsen, 1995) that was used in this study. The TONI-2 is a widely used intelligence test intended for ages 5-85 and is frequently chosen to evaluate populations with verbal impairments or low proficiency in the language spoken by the examiner. It is based on visual analogies where the participant is required to complete a missing space in a stimulus array using several (usually six) alternatives. An informal analysis of the test suggested that in some items several stimulus-stimulus relations could be competing in each sample-comparison pair. For example, in item A3 two empty circles maintaining a relation of physical identity (relation X) functioned as contextual stimuli. The sample, another empty circle, also maintained a relation of physical identity (relation X) with the correct comparison. However, the incorrect comparisons also shared nonarbitraiy relations with the sample: for example, one was an empty square (relation Y, same padding) and the other was an incomplete circle (relation Z, perceptual similarity). In addition, several nonarbitrary relations appeared in any incorrect comparison (e.g., padding, similarity, number, orientation). This situation resembled the aforementioned experimental structure where different comparisons maintained different and potentially competing relations with the sample.
In most experimental tasks, competing stimulus control is assessed through the repeated presentation of various stimulus combinations, using experimental control to rule out the interference of extraneous variables (Fields, Garruto, & Watanabe, 2010). However, this is not possible when the objective involves analysis of a published intelligence test, and therefore eye-tracking measures were used as an alternative. Although eye-tracking measures are not equivalent to stimulus control (Hansen & Arntzen, 2015), eye movements are highly correlated with experimental measures of visual stimulus control, and some studies have tested the relation between eye-tracking measures and stimulus control in conditional discriminations (e.g., Dube et al., 1999; Perez, Endemann, Pessoa, & Tomanari, 2015). Other studies have shown that observation duration, among other parameters, was related to stimulus control and performance. For example, Dube et al. (2006) found that a longer observation duration in a 4-sample conditional discrimination task was related to higher accuracy, and that improvements in accuracy were accompanied by an increase in the duration of sample observation. Further, Pessoa, Huziwara, Perez, Endemann, and Tomanari (2009) found that the time spent on correct comparisons was higher and diminished less during training than the time spent on incorrect alternatives in a simple discrimination task. This finding was replicated by Perez et al. (2015), who also found a link between eye-tracking data and the degree of stimulus control gained by two dimensions (shape and color) of a stimulus compound in a simple discrimination.
The aim of this article is to provide the first evidence to suggest that a behavior analytic model of analogical reasoning and competing stimulus control can predict, at least to some extent, the observing behavior of participants taking an intelligence test based on visual analogies. We selected several items structured as a second-order conditional discrimination and analyzed all the possible relations between the sample and the comparisons.
A classification system based on the number of common relations ranked the similarity between each sample-comparison pair in a given item. The test was then presented to the participants following the test manual as closely as possible in order to ensure similarity with a real application of the test. The observation duration in each comparison was recorded to estimate the stimulus control exerted, using eye-tracking data for the first time to study competing stimulus control in analogical reasoning. In particular, our hypothesis was that as the number of sample-comparison relations increased, the inspection time of that comparison would increase accordingly.
Thirty-three students from the University of Huelva (Spain) participated in this study (12 men and 21 women). Their age ranged from 17 to 25 years, and they all had normal or corrected vision. Informed consent was obtained from all individual participants included in the study. No extrinsic incentives were used.
Apparatus and Setting
A Tobii X2-30 Eye Tracker was used. The apparatus was attached to the bottom of the laptop screen allowing the participant to move within a restricted but comfortable range while keeping accurate measures. The rate of data acquisition was 30 Hz. The laptop screen was 15 in. wide, with a screen resolution of 1200 x 800 pixels and 60 Hz refresh rate. The software used for data acquisition and eye-tracking calibration was Ogama V.4.4 (Vosskuhler, Nordmeier, Kuchinke, & Jacobs, 2008).
Eye-tracking data were collected in a quiet room illuminated with artificial light only. The laptop was placed on a table in front of which the participants sat in a chair with a vertical backrest. Based on previous tests, the distance between the front of the chair and the table was approximately 5 cm, and it remained constant across participants. This allowed the participant's eyes to be in the range of 45-70 cm from the laptop screen, as recommended by the eye-tracking manufacturer.
Twelve items from TONI-2 were used. The items were selected if the matrix structure was identified as a second-order conditional discrimination (labeled as "simple pairing," "pairing," and "alteration" items in the test manual). In each selected item, two stimuli at the top provided the context (one or more relations), the third stimulus of the matrix was the sample, and six stimuli at the bottom functioned as comparisons. Only one of these shared the same relation(s) with the sample as that indicated by the contextual stimuli (see Fig. 1). The matrix had an area of approximately 8 x 9 cm and the comparisons were around 6 cm below, with small variations among items. Each square delimiting a comparison was considered an area of interest for analysis purposes. Comparisons had an area of 4 x 4.5 cm and the distance between them was approximately 1 cm. The distance between the eyes of the participants and the screen ranged from 45 to 70 cm, and the visual angle of the areas of interest ranged from 3.3 x 3.6 to 5.1 x 5.7[degrees] of visual angle.
The number and complexity of the relations of the items in the TONI-2 increase as the test progresses. In order to avoid excessive complexity in our category system (see below) we selected the first 12 items that followed the aforementioned structure out of the 55 slides that compose the A-form of the test. Four items, A6, A13, A14, and A15 that departed from that general structure were not included. Thus, items Al, A2, A3, A4, A5, A7, A8, A9, A10, All, A12, and A16 were selected and digitalized in order to present them on the computer screen. In items 1, 2, 3, 4, and 7 (simple pairing items according to the test manual) the contextual stimuli, the sample, and the correct comparison were all identical. In items 5, 8, 9, 10, 11, and 12 (pairing items) the contextual stimuli were identical, but the sample (and therefore the correct comparison) was physically different from the contextual stimuli. In the last item analyzed (item 16, considered an "alteration" item) the relation between the contextual stimuli, and therefore the relation between the sample and the correct comparison, was a 90[degrees] clockwise turn.
Construction of the Classification System The selected items were analyzed, identifying several relations between the sample and the comparison stimuli. In most cases, identity (or reflexivity) was the relation between the sample and the correct comparison, but comparisons also shared different relations with the sample. These relations were based on nonarbitrary properties of the visual stimuli such as shape (e.g., triangles, squares, circles) number of figures, spatial position, padding, and rotation. Other possible relations involving additional smaller parts or features such as the size or rotation of inner elements were not considered. Only these six relations were included in order to maintain a balance between the precision and the complexity of the classification system, plus an additional similarity relation. The operational definitions of these relations can be found in Table 1. A total score or rank that graded the similarities between sample and comparisons was calculated by simply adding the individual scores. The maximum score was 9 and the minimum, 0. For example, if the correct comparison was identical to the sample (1), had the same shape (1), position (1), number of elements (1), padding (1), rotation (3), and perceptual similarity, it received a score of 9. An incorrect comparison with all these features but rotated 45 degrees (1) would obtain a score of 7.
Two independent volunteers, not related to this research in any other way, categorized the comparisons in each item according to the classification system. They were provided with the aforementioned definitions and were trained using five items from the B-form of the test. After being provided with one example of item classification the volunteers then proceeded to classify the remainder under the guidance of a researcher. Once the task was clear the observers classified the selected items of the A-form without the researcher's intervention. Differences were resolved using the criterion of one of the researchers.
Data Collection The recommended standard conditions for application of the test were followed as closely as possible.
Following eye-tracker calibration the researcher first pointed at the contextual stimuli, then at the sample, and finally at the empty cell of the matrix. While pointing at the comparisons, the participants were asked (in Spanish): "Which picture should go there?" All participants responded correctly and their responses were verbally reinforced. The researcher then instructed them to do the same with the rest of the slides. No time limits were imposed. The test sequence was as follows: A white screen was presented for one second, followed by a small black cross (fixation stimulus) in the upper center of the screen in the exact place where the matrix of the next item was about to appear. The item was presented and, after the participant responded, the white screen was shown again, finishing the cycle. This procedure was repeated for all items. The test ended after item 12 was presented. Given the small number of trials, no eye-drift corrections were included.
The independent variable was the number of relations in common with the sample for each comparison in the selected items (its rank). The dependent variable was observation duration in each comparison. This study can be considered a prospective, ex post facto design, given that the items were selected and the classification system developed before data collection.
A one-factor repeated-measures ANOVA with linear trend analysis was used to compare observation duration in each response alternative. The position of the comparison was not included in the classification system because the position of the correct comparison was not evenly distributed across the items (e.g., four times in position 1 vs. 0 times in position 6). In addition, separate analyses were carried out for each item because the number of common relations and its distribution (independent variable) was different for each case. Raw measures of observation duration were transformed to proportion of time spent on each response alternative to facilitate the comparison between different items. Participants responded to all items correctly and therefore incorrect responses were not analyzed. This finding was expected since all the participants were adults and the selected items, according to the test manual, are intended for children as young as 5 years old (Brown et al., 1995).
The normality assumption was not met in most of the measures, as reflected by the Kolmogorov-Smirnov test (not reported). Because violation of the normality assumption can increase the rate of Type I errors, we used an alpha value of 0.01 on all tests. In addition, a conservative adjustment method for the alpha value (i.e., Bonferroni, which divides alpha by the number of comparisons) was used to compare pairs of measures. None of the measures met the sphericity assumption, as shown by Mauchly's test (also not reported). In order to correct this deviation, Greenhouse-Geisser adjustment of the degrees of freedom was used in all tests.
One participant was excluded from the analysis because her eye tracking data were not recorded due to calibration problems. No other data or participants were excluded.
Definitive Classification System
The overall percentage of interobserver agreement was 95.92%. Agreement per category was as follows: identity: 98.61%; shape: 97.22%; position: 100%; number: 97.22%; padding: 94.44%; rotation: 95.83%; perceptual similarity: 84.72%.
The rank scores for each comparison are listed in Table 2. The correct comparison always scored 9 points because it shared all nonarbitrary relations with the sample. All the incorrect comparisons shared at least one, and frequently more, nonarbitrary relations with the sample. The number of nonarbitrary relations identified in the incorrect comparisons varied substantially both across and within the items.
Observation Duration at Comparisons
Participants dedicated between approximately one and two seconds per item when inspecting the six comparisons (see Table 3). With the exception of items A2 and A4, there was a slight tendency to increase the time as the test progressed; the highest increment occurred in item A16, which roughly doubled the average observation time of the previous items.
The mean and standard deviations of the proportion of observation duration in each comparison can be found in Table 2. Comparisons are arranged in the same order (left to right) in which they appeared on the screen. Overall, participants spent more time inspecting the correct comparison (in bold) than the incorrect alternatives, and the time dedicated to each distractor decreased as the number of common relations with the correct comparison decreased (see also Fig. 2).
The repeated measures ANOVA revealed a statistically significant effect of the number of common relations for all items (see Table 4). Thus, there were significant differences in the time that the participants devoted to observing a given comparison depending on the number of shared relations. Linear trend analysis (see also Table 4) revealed a significant linear relation between the number of common features and the time dedicated to observing a comparison for all items. Figure 2 shows that, as the number of common relations that each response alternative shared with the correct comparison increased, the proportion of time dedicated to observing them also increased. This pattern was reliably found across all items with different stimuli, relations, and slide arrangements. The effect size (the variance associated with the independent variable divided by the total variance as measured with r|2, see Table 4) ranged from medium to large according to Cohen (1992).
There were also significant differences in the proportion of observation duration within response alternatives. Only significant differences, p < 0.01, will be reported.
Item Al Statistically significant differences were found between the correct comparison (M = 0.570, SD = 0.259) and the comparisons sharing seven common relations (M = 0.202, SD = 0.141), five (M= 0.084, SD = 0.920), four (M= 0.659, SD = 0.093 and M= 0.131, SD = 0.038) and three common relations (M= 0.03, SD = 0.014). Among incorrect comparisons there were differences between the comparison sharing seven relations and those sharing five, four, and three. Further, the first comparison ranked 4 was statistically different from the second comparison and from that ranked 3.
Item A2 Differences were found between the correct comparison (M= M= 0.358, SD = 0.192) and the comparison sharing five common relations (M= 0.119, SD = 0.070), with the comparisons ranked 3 and 4, respectively (M= 0.125, SD = 0.079; M= 0.157,SD = 0.129; M= 0.137, SD= 0.109), and with the distractor ranked 3 (M = 0.030, SD = 0.042. The distractors with ranks 5 and 4 were also different form the distractor ranked 3.
Item A3 The correct comparison (M= 0.650, SD = 0.240) was different from the distractor ranked 4 (M= 0.042, SD = 0.065) and from the four distractors ranked 3 (M = 0.076, SD = 0.110; M= 0.784, SD = 0.118; M= 0.065, SD = 0.118; M = 0.021, SD = 0.049).
Item A4 The correct comparison (M= 0.399, SD = 0.185) was different from the distractor ranked 6 (M = 0.143, SD = 0.167), from the three distractors ranked 5 (M = 0.159, SD = 0.110;M= 0.110, SD= 0.078; M= 0.078, SD= 0.066) and from the distractor ranked 3 (M= 0.109, SD = 0.082).
Item A5 The correct comparison (M= 0.618, SD = 0.246) was different from the distractor ranked 7 (M = 0.218, SD = 0.169), from the two distractors ranked 4 (M= 0.077, SD = 0.099; M= 0.021, SD = 0.061) and from the two distractors ranked 3 (M= 0.044, SD = 0.086; M= 0.024, SD = 0.125). The distractor ranked 7 was also different from the rest.
Item A7 The correct comparison (M= 0.449, SD= 0.182) was different from the distractor ranked 7 (M = 0.220, SD = 0.107), from the two distractors ranked 5 (M= 0.107, SD = 0.111; M= 0.052, SD = 0.075) and from the two distractors ranked 3 (M= 0.150, SD = 0.107; M= 0.024, SD = 0.055). The distractor ranked 7 was different from the first distractor ranked 5 (M= 0.107, SD = 0.11) and first distractor ranked 3 (M= 0.150, SD = 0.107). These were in turn different from the second distractors ranked 5 and 3, respectively. Finally, there were differences between the two distractors ranked 3.
Item A8 The correct comparison (M= 0.695, SD = 0.271) was different from the three distractors ranked 7 (M= 0.069, SD = 0.111; M= 0.036, SD= 0.107; M= 0.034, SD = 0.072) and from the two distractors ranked 6 (M= 0.084, SD = 0.103; M = 0.082, SD = 0.114).
Item A9 The correct comparison (M= 0.509, SD = 0.261) was different from the distractor ranked 7 (M = 0.176, SD = 0.193), and from the four distractors ranked 6 (M = 0.152, SD = 0.132; M= 0.110, SD = 0.106; M= 0.0322, SD = 0.048; M= 0.020, SD = 0.047). The distractor ranked 7 was different from the two distractors ranked 6 with lower means. There were also differences between the distractors ranked 6. The two with higher means were statistically different from the two with lower means.
Item A10 The correct comparison (M= 0.510, SD = 0.190) was different from the distractor ranked 7 (M= 0.343, SD = 0.069), from the two distractors ranked 4 (M= 0.061, SD = 0.093; M= 0.032, SD = 0.064) and from the two distractors ranked 2 (M= 0.043, SD = 0.069; M= 0.010, SD = 0.035). The distractor ranked 7 was different from the others.
Item A11 The correct comparison (M= 0.353, SD = 0.149) was different from the two distractors ranked 6 (M= 0.126, SD = 0.102; M = 0.075, SD = 0.079), from the distractor ranked 5 (M= 0.137, SD = 0.093) and from one of the two distractors ranked 3 (M= 0.220, SD = 0.223.
Item A12 The correct comparison (M= 0.356, SD = 0.266) was different from the distractors ranked 4 (M= 0.078, SD = 0.101), 3 (M= 0.116, SD = 0.099) and 2 (M= 0.058, SD = 0.076). Distractors ranked 7 (M = 0.201, SD = 0.190) and 6 (M = 0.182, SD = 0.132) were different from the distractor ranked 2.
Item A16 The correct comparison (M= 0.363, SD = 0.260) was different from the 3 distractors ranked 3 (M= 0.121, SD = 0.107; M= 0.084, SD= 0.094; M= 0.055, SD= 0.062). The distractor ranked 7 (M = 0.223, SD = 0.123) was different from the two distractors ranked 3 with lower means, and the distractor ranked 6 (M = 0.153, SD = 0.130) was different from the smaller of these.
Analysis of the relation between rank and observation time revealed some patterns across items. A significantly greater proportion of the time (range 0.356-0.697) was always observed for correct over incorrect comparisons, except in items 11, 12, and 16. In item 11 this was different from all but one, and in items 12 and 16 this was different from all but two. In those cases that presented a distractor similar to the correct comparison, a significantly longer time was observed than the lower ranked distractors (see distractors ranked 7 in items 1, 5, 7, 10, 12, and 16 in Fig. 2).
Distractors sharing five or fewer relations in common with the sample were inspected less than 25% of the time on average (range 0.02-0.160). In addition, differences were found in four cases between distractors with the same rank. As a cautionary note, it is also important to observe that the sphericity criterion was not met, probably because the standard deviation was different across categories, with higher values related to higher means (see Table 3).
Overall, our classification system of stimulus competition explained between 29% and 73% of the variance in observation time of the comparisons, and the linear relation between number of common features and observation time explained between 32% and 95% of the variance (see Table 4). However, the proportion of explained variance appeared to decrease as the test progressed, being higher in the first nine items and lower in the last three.
In the present work, an analysis based on the concept of competing stimulus relations, integrated in the behavior analytic model of analogical reasoning first developed by Barnes et al. (1997) and Stewart et al. (2002) has allowed us to achieve two objectives. First, it has permitted the behavioral analysis of certain visual analogies of a published intelligence test; second, to some extent, it has allowed for the prediction of the participants' exploratory behavior when solving the test, which is consistent with the idea that competition among stimulus relations plays a role in solving visual analogies. The results of the present study reproduced, to a certain degree, the previous findings of experimental research into the phenomena of equivalence-equivalence and competing stimulus control, and also introduced a complementary measure that had not previously been used in this area of research, namely eye-tracking. In addition, this work extends the application of the behavior analytic model of analogical reasoning to a situation far more complex than the usual experimental tasks employed in stimulus-stimulus competence in analogical reasoning reviewed above. Although these tasks typically present two to four comparisons and study only two competing relations, in this work we analyzed up to seven different nonarbitrary relations competing simultaneously in six different comparisons. Finally, by using the TONI-2 and following the test manual as closely as possible, we extended the aforementioned results to a situation similar to the conditions in which intelligence tests are usually taken, thus increasing the scope of the experimental model.
Although there have been some approaches to understanding intelligence tests in behavior analytic terms (e.g., see Cassidy, Roche, & O'Hora, 2010), the present work constitutes the first systematic behavioral analysis of part of a published intelligence test on an item by item basis. When the test was analyzed using the behavior analytic model of analogical reasoning, we found that competing stimulus-stimulus relations were fairly abundant. A common nonarbitrary relation between the contextual stimuli (relation X) was found in all the selected items, which was the same as the relation between the sample and the correct comparison (relation X). However, the sample simultaneously shared several nonarbitrary relations with the incorrect comparisons (relations Y, Z, etc.), as in the experiments on competing stimulus control reviewed in the introduction. These nonarbitrary relations were reliably identified using the category system developed in this work. They were also relevant for the participants and appeared to compete for their attention to the point that a higher number of shared features resulted in longer gaze duration. An informal analysis of other tests and some unpublished data from our research group suggest that this kind of competing relation could be prevalent in other tests based on visual analogies, such as the test of Advanced Progressive Matrices of Raven. This is also relevant because, although the role of nonarbitrary relations and competing stimulus control was rapidly acknowledged in the behavior analytic model of analogical reasoning, this aspect has not been addressed in the behavioral analysis of intelligence tests.
Regarding our hypothesis that observation time would increase with the number of common relations, Fig. 2 and the linear trend analysis presented in Table 4 show a general tendency, repeated item after item, for comparisons with more features in common with the sample to be inspected longer than comparisons sharing fewer features in common. In this regard, the same results were obtained here in the form of a second-order conditional discrimination as had been found by Pessoa et al. (2009) and Perez et al. (2015) in the form of a simple discrimination, where the S+ was inspected for longer than the S-. In our case, the correct comparison (Rank 9) was always subject to a significantly higher proportion of observation time for all items, followed by the most similar comparison (rank 7) in the items where there was one. As predicted, the rest of the distractors received less attention, but their results were less clear. Overall, these results are consistent with the predictions of the behavioral model of analogical reasoning and stimulus-stimulus competition.
Some limitations of this study should also be mentioned. First, the classification system used here could be improved considerably. Despite the fact that overall interobserver agreement was high (95.92%), the percentage obtained in the subjective category "perceptual similarity" (84.72%) was rather low in comparison. Further, the scores assigned to the different relations were purely additive and probably not sufficiently sensitive to discriminate among similar low-rank distractors. A priori knowledge of the relevance of different perceptual, abstract, or arbitrary relations acquired through experimental manipulations for a certain population could help to weight the scores of different relations and fine-tune the predictions.
Moreover, each item also differed in aspects that probably influenced the duration of observation time in the comparisons. The degree of perceptual similarity between distractors varied considerably between items, as well as the position of the correct alternative and other factors that we could not control with the current design. It would be necessary to conduct an experimental investigation that explicitly manipulates these factors (e.g., randomize the order of the trials and the location of the response options across participants, create tailor made stimuli and relations to avoid potential confounds, or include unsolvable items or correlating observation parameters with selection accuracy) to attain a more complete account of all factors involved in the participants' observation behavior. Besides, ex post facto designs are not optimal for detecting functional relations such as stimulus control, because they lack the manipulative component necessary for that purpose. Further, because our participants committed no errors, we were not able to assess stimulus control. In the future it will be necessary to combine established measures of stimulus control with eye-tracking data in the same experimental design to corroborate our findings.
Perhaps one of the more evident limitations of our approach is that we only considered the observation time for the comparisons. An analysis of different observation parameters such as latency, duration, frequency of fixations, or scan path analysis, as well as a systematic classification of the relations between the contextual stimuli and the sample based on well-established behavioral principles, could extend the scope of the present analysis to more complex items and other intelligence tests. Moreover, eye-drift corrections should be used in more extensive studies.
Despite the aforementioned limitations, some possible extensions of this work should be noted. From our point of view, these results imply that any behavior analytic attempt to explain or improve performance in intelligence tests based on visual analogies or related academic tasks needs to take into account the role of competing relations, since the combination of arbitrary and nonarbitrary relations in intelligence tests based on visual analogies appears to be a key feature in its design. Manipulating parameters such as number, complexity, and salience with respect to the correct response is one possible way to increase the difficulty of the test throughout its items, and thus its discriminative capacities regarding the subjects' relational abilities.
As Cassidy et al. (2010) noted, several intelligence tests include items that involve different relational abilities. A detailed analysis of their structure, including the possible presence of competing arbitrary and nonarbitrary relations in visual analogies, could be a way to better understand why some individuals fail in certain items of an intelligence test and to find remedying interventions. For example, an analysis of observation parameters such as duration or exploration pattern (Dube et al., 2006) could lead to deficits in exploration of the test items that may significantly limit the participants from using all available information, such as not exploring all stimuli or for not enough time or identifying one relation (perhaps a nonarbitrary, salient relation) and then not looking for others (perhaps a subtler, but correct, arbitrary relation). In addition, interventions aimed at improving exploration behavior can be designed. For instance, analysis of possible instances of competing stimulus control and the development of training procedures that teach participants to ignore irrelevant relations (e.g., Kenny, Barnes-Holmes, and Stewart, 2014) could improve their performance based on derived relational training. Complementing this, competing stimulus control could be progressively and systematically introduced in interventions designed to increase intelligence quotients or scholarly abilities in order to encourage a more precise stimulus control by the relation considered correct by the test designers.
Furthermore, the behavioral analysis of intelligence tests based on visual analogies also opens a new way of addressing the construction and evaluation of these instruments. We found that participants treated distractors differently, and that perceptual and other nonarbitrary characteristics were unevenly distributed among distractors in different items. Eye-tracking, combined with actual measures of stimulus control, experimental manipulations, and a molecular analysis of the different types of relations involved in a particular item could contribute towards improving item construction and selection.
In summary, in this work we used the behavior analytic model of analogical reasoning to examine the relational behaviors involved in solving some basic visual analogies. First, we identified stimulus competition in a published intelligence test, analyzed its items, and compared them with experimental situations where stimulus competition was found; second, we provided compelling data showing that the visual exploration of the test was related to the presence of potentially competing relations, using a novel measure--eye-tracking--in a situation where the traditional methods for assessing stimulus control were not feasible.
To conclude, although we are aware of the limitations of this work, from our point of view it extends the scope of the behavior analytic model of analogical reasoning and shows that a detailed analysis of what is traditionally called intelligence and intelligence tests is possible using the conceptual tools of behavior analysis. It also opens up an interesting field for improving test design, evaluating participant's strategies, and enriching the psychological (in addition to the purely statistical) approach to test evaluation.
Funding This research was partially funded by the Department of Clinical, Social and Experimental Psychology of the University of Huelva. The financial support consisted of the purchase of the eye-tracking device that was assigned to the Laboratory of Experimental Psychology where this study was carried out.
Compliance with Ethical Standards
Conflict of Interest The authors declare that they have no conflict of interest.
Ethical Approval This article does not contain any studies with animals conducted by any of the authors. All procedures employed in studies involving human participants were carried out in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Informed consent was obtained from all individual participants included in the study.
Barnes, D., Hegarty, N., & Smeets, P. M. (1997). Relating equivalence relations to equivalence relations: A relational framing model of complex human functioning. Analysis of Verbal Behavior, 14, 57-83.
Brown, L., Sherbenou, R. J., & Johnsen, S. K. (1995). Test de inteligencia no verbal TONI-2. Madrid, Spain: TEA.
Cassidy, S., Roche, B., & O'Hora, D. (2010). Relational frame theory and human intelligence. European Journal of Behavior Analysis, 11, 37-51.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
Dube, W. V., Balsamo, L. M., Fowler, T. R., Dickson, C. A., Lombard, K. M., & Tomanari, G. Y. (2006). Observing behavior topography in delayed matching to multiple samples. Psychological Record, 56(2), 233-244.
Dube, W. V., Lombard, K. M., Farren, K. M., Flusser, D. S.. Balsamo, L. M., & Fowler, T. R. (1999). Eye tracking assessment of stimulus overselectivity in individuals with mental retardation. Experimental Analysis of Human Behavior Bulletin, 17, 8-14.
Fields, L., Garruto, M., & Watanabe, M. (2010). Varieties of stimulus control in matching-to-sample: A kernel analysis. Psychological Record, 60(1), 3-26.
Garcia, A., Bohorquez, C., Perez, V., Gutierrez, M. T., & Gomez, J. (2008). Equivalence-equivalence responding: Training conditions involved in obtaining a stable baseline performance. Psychological Record, 55(4), 597-622.
Garcia, A., Gutierrez, M. T., Bohorquez, C., Gomez, J., & Perez, V. (2002). Competencia entre relaciones arbitrarias y relaciones no arbitrarias en el paradigma de equivalencia-equivalencia. Apuntes de Psicologia, 20(2), 205-224.
Garcia, A., Perez, V., Gutierrez, M. T., Gomez, J., & Basulto, E. (2013). Competencia entre criterios de equivalencia-equivalencia y semejanza usando categorias naturales. Revista Mexicana de Aanalisis de la Conducta, 39(1), 11-34.
Hansen, S., & Arntzen, E. (2015). Fixating, attending, and observing: A behavior analytic eye-movement analysis. European Journal of Behavior Analysis, 16(2), 229-247.
Hayes, S. C., Barnes-Holmes, D., & Roche. B. (2001). Relational frame theory: A post-Skinnerian account of human language and cognition. New York, NY: Kluwer Academic/Plenum.
Hernandez-Pozo, R. (1986). Aprendizaje condicional de relaciones en humanos: Evaluation de dos metodos de election. Revista Mexicana de Analisis de la Cconducta, 12(2), 105-126.
Kenny, N., Barnes-Holmes, D., & Stewart, I. (2014). Competing arbitrary and non-arbitrary relational responding in normally developing children and children diagnosed with autism. Psychological Record, 64(4), 755-768.
Kenny, N., Devlin, S., Barnes-Holmes, D., Barnes-Holmes, Y., & Stewart, I. (2014). Competing arbitrary and non-arbitrary stimulus relations: The effect of exemplar training in adult participants. Psychological Record, 64(1), 53-61.
Perez, V., & Garcia, A. (2008). Equivalencia-equivalencia y discriminaciones condicionales de segundo grado. Revista Mexicana de Analisis de la Conducta, 34(2), 179-196.
Perez, W. F., Endemann, P., Pessoa, C. V., & Tomanari, G. Y. (2015). Assessing stimulus control in a discrimination task with compound stimuli: Evaluating testing procedures and tracking eye fixations. Psychological Record, 65(1), 83-88.
Perez-Gonzalez, L. A., & Martinez, H. (2007). Control by contextual stimuli in novel second-order conditional discriminations. Psychological Record, 57(1), 117-143.
Pessoa, C., Huziwara, E. M., Perez, W. F., Endemann, P., & Tomanari, G. Y. (2009). Eye fixations to figures in a four-choice situation with luminance balanced areas: Evaluating practice effects. Journal of Eye Movement Research, 3, 1-6.
Rehfeldt, R. A., Dixon. M. R., Hayes, L. J., & Steele, A. (1998). Stimulus equivalence and the blocking effect. Psychological Record, 48(4), 647-664.
Ruiz, F. J., & Luciano, C. (2011). Cross-domain analogies as relating derived relations among two separate relational networks. Journal of the Experimental Analysis of Behavior, 95(3), 369-385.
Sidman, M. (1971). Reading and auditory-visual equivalences. Journal of Speech & Hearing Research, 14(1), 5-13.
Sidman, M. (1986). Functional analysis of emergent verbal classes. In T. Thompson & M. D. Zeiler (Eds.), Analysis and integration of behavioral units (pp. 213-245). Hillside, NJ: Lawrence Erlbaum Associates.
Skinner, B. F. (1990). Can psychology be a science of mind? American Psychologist, 45(11), 1206-1210.
Stewart, I., Barnes-Holmes, D., Hayes, S. C., & Lipkens, R. (2001). Relations among relations: Analogies, metaphors, and stories. In S. C. Hayes, D. Barnes-Holmes, & B. Roche (Eds.), Relational frame theory: A post-Skinnerian account of human language and cognition (pp. 73-86). New York, NY: Kluwer Academic/Plenum.
Stewart, I., Barnes-Holmes, D., Roche, B.. & Smeets, P. M. (2001). Generating derived relational networks via the abstraction of common physical properties: A possible model of analogical reasoning. Psychological Record, 51(3), 381-408.
Stewart, I., Barnes-Holmes, D., Roche, B., & Smeets, P. M. (2002). A functional-analytic model of analogy: A relational frame analysis. Journal of the Experimental Analysis of Behavior, 78(3), 375-396.
Vosskuhler, A., Nordmeier, V., Kuchinke, L., & Jacobs, A. M. (2008). OGAMA (open gaze and mouse analyzer): Open-source software designed to analyze eye and mouse movements in slideshow study designs. Behavior Research Methods, 40(4), 1150-1162.
Jesus Gomez Bujedo  (iD) * Luis Ignacio De Amores Cabello  * Jose Andres Lorca Marin  * Andres Garcia Garcia 
Part of the research described in this paper was included in the final honors dissertation presented by the second author at the University of Huelva.
Jesus Gomez Bujedo
 Departamento de Psicologia Clinica y Experimental, Facultad de Education, Psicologia y Ciencias del Deporte, Universidad de Huelva, Campus de "El Carmen" Avenida de las Fuerzas Armadas S/ N., C.P. 21007 Huelva, Spain
 Universidad de Sevilla, Sevilla, Spain
Caption: Fig. 1 Schematic representation of the selected items
Caption: Fig. 2 Relation between comparison rank (number of common features) and mean proportion of observation duration for each selected item. The correct response option always had nine common features and was inspected for a longer time than the incorrect alternatives. Error bars show standard deviation. Horizontal lines indicate significant differences, [rho]p < 0.01
Table 1 Operationalization of sample-comparisons relations Relation Definition Identity Complete, point-to-point sample-comparison Shape Sample and comparison belonged to the same geometric category Position The comparison appeared in the same position of the area of interest as the sample Number Sample and comparison had the same number of figures. Padding Sample and comparison had the same padding features. correspondence Rotation The comparison was similar or identical to the sample but rotated in any axes and number of degrees. If the figure rotated around one of its axes (mirror image), it scored 2 Perceptual similarity Perceptual resemblance in other aspects according to the subjective criterion of the judge Relation Score Identity 1 = Yes; 0 = No Shape 1 = Yes; 0 = No Position 1 = Yes; 0 = No Number 1 = Yes; 0 = No Padding 1 = Yes; 0 = No Rotation 3 = No rotation (0[degrees]) 2 = (0,10)[degrees], (0,-350)[degrees] 1 = (11,85)[degrees], (349,-276) [degrees] 0 = (86, -275)[degrees] Perceptual similarity 1 = Yes; 0 = No "No rotation (0[degrees])" implied greater similarity and therefore a higher score Table 2 Mean and standard deviation of the proportion of observation duration for each comparison classified by rank in each item Item Comparisons C1 C2 C3 C4 C5 C6 Al Mean 0.013 0.066 0.084 0.202 0.570# 0.002 SD 0.039 0.093 0.092 0.141 0.259# 0.014 Rank 4 4 5 7 9# 3 A2 Mean 0.031 0.126 0.157 0.137 0.119 0.358# SD 0.042 0.079 0.129 0.109 0.071 0.192# Rank 3 5 4 4 4 9# A3 Mean 0.021 0.065 0.651# 0.078 0.079 0.043 SD 0.049 0.119 0.240# 0.118 0.110 0.065 Rank 3 3 9# 3 3 4 A4 Mean 0.143 0.399# 0.109 0.111 0.160 0.078 SD 0.167 0.185# 0.082 0.078 0.111 0.066 Rank 6 9# 4 5 5 5 A5 Mean 0.024 0.044 0.218 0.617# 0.077 0.021 SD 0.126 0.086 0.168 0.246# 0.098 0.061 Rank 3 3 7 9# 4 4 A7 Mean 0.448# 0.220 0.024 0.150 0.106 0.052 SD 0.183# 0.106 0.055 0.106 0.111 0.074 Rank 9# 7 3 3 5 5 A8 Mean 0.034 0.082 0.696# 0.084 0.069 0.036 SD 0.072 0.114 0.271# 0.103 0.111 0.107 Rank 7 6 9# 6 7 7 A9 Mean 0.032 0.111 0.152 0.508# 0.176 0.020 SD 0.048 0.106 0.133 0.261# 0.193 0.046 Rank 6 6 6 9# 7 6 A10 Mean 0.010 0.061 0.342 0.043 0.510# 0.033 SD 0.036 0.093 0.195 0.069 0.190# 0.065 Rank 2 4 7 2 9# 4 All Mean 0.353# 0.136 0.220 0.126 0.090 0.075 SD 0.149# 0.093 0.230 0.102 0.114 0.079 Rank 9# 5 3 6 3 6 A12 Mean 0.356# 0.209 0.183 0.116 0.058 0.078 SD 0.266# 0.189 0.133 0.100 0.076 0.101 Rank 9# 7 6 3 2 4 A16 Mean 0.364# 0.223 0.120 0.084 0.153 0.056 SD 0.260# 0.123 0.106 0.094 0.130 0.062 Rank 9# 7 3 3 6 3 Numbers in bold indicate the correct comparison C Comparison, SD Standard Deviation Note: Numbers in bold print is indicated with the correct comparison are indicated with #. Table 3 Mean and standard deviation of observing time for each item for all participants (milliseconds) Item Al A2 A3 A4 A5 A7 A8 A9 M 1015 1593 866 1597 1138 1223 1025 1273 SD 537 739 491 893 569 566 849 616 Item A10 All A12 A16 M 1257 1326 1328 2082 SD 554 668 645 2601 M mean, SD Standard Deviation Table 4 Results of the repeated measures ANOVA and linear trend analysis for each item Item Repeated measures ANOVA [epsilon] df df F P [[eta].sup. (G-G) (Factor) (Error) 2.sub.p] A1 0.343 1.71 53.09 77.238 < 0.001 0.714 A2 0.447 1.89 69.38 27.71 < 0.001 0.472 A3 0.478 2.39 74.15 100.573 < 0.001 0.764 A4 0.525 2.62 81.32 24.031 < 0.001 0.437 A5 0.453 2.27 70.27 68.875 < 0.001 0.690 A7 0.521 2.60 80.79 49.701 <0.001 0.616 A8 0.374 1.87 57.99 85.477 < 0.001 0.734 A9 0.409 2.04 63.32 36.924 < 0.001 0.544 A10 0.386 1.93 59.87 75.139 < 0.001 0.708 All 0.488 2.44 75.59 15.350 < 0.001 0.331 A12 0.489 2.44 75.75 12.866 < 0.001 0.293 A16 0.416 2.08 64.55 16.377 < 0.001 0.346 Item Linear trend analysis F P [[eta].sup. 2.sub.p] A1 253.351 < 0.001 0.891 A2 99.776 <0.001 0.763 A3 160.717 < 0.001 0.838 A4 81.860 < 0.001 0.725 A5 140.312 < 0.001 0.819 A7 133.487 < 0.001 0.812 A8 92.147 < 0.001 0.748 A9 186.845 <0.001 0.858 A10 559.188 < 0.001 0.947 All 14.441 0.01 0.318 A12 42.248 < 0.001 0.577 A16 50.900 < 0.001 0.621 The degrees of freedon for linear trend analysis were 1 and 31 G-G Greenhouse-Geisser, df Degrees of Freedom
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||ORIGINAL ARTICLE|
|Author:||Bujedo, Jesus Gomez; Cabello, Luis Ignacio De Amores; Marin, Jose Andres Lorca; Garcia, Andres Garci|
|Publication:||The Psychological Record|
|Date:||Dec 1, 2018|
|Previous Article:||Equivalence-Based Instruction with Nonfood Items to Increase Portion-Size Estimation Accuracy.|
|Next Article:||And We Are the Easiest to Fool: A Review of Franz de Waal's Are We Smart Enough to Know How Smart Animals Are?|