Inter-trial, stimulus duration and number of trials in contingency judgments.
In the normative sense, the relations exemplified above can be represented in a 2 x 2 contingency table containing the frequencies of occurrence of the various event combinations. Both events can be present in their target state (cell a: [A.sub.1][B.sub.1]), only one event is in its target state (cell b: [A.sub.1][B.sub.2]; cell c: [A.sub.2][B.sub.1]), or both events are in their alternative states (cell d: [A.sub.2][B.sub.2]). The overall contingency can be assessed using the [Delta]P coefficient. It consists in subtracting the conditional probability of [B.sub.1] given [A.sub.2] from the conditional probability of [B.sub.1] given [A.sub.1], yielding a value which can vary anywhere between -1 and +1. The formula is (Ward & Jenkins, 1965):
[Delta]P = p([B.sub.1] [where] [A.sub.1]) - p([B.sub.1] [where] [A.sub.2]). (1)
The [Delta]P coefficient is normative in the sense that it provides a mathematically correct assessment of a contingency. But what of the cognitive system responsible for covariation detection? The computational objective (Marr, 1982) of the cognitive system is the quantification of a pattern of relation among probabilistic data. Thus, at its most abstract level, a computational theory of covariation detection must primarily specify the logic by which it achieves this quantification. At an intermediate level, the theory should specify how the inputs are represented in the system and by which algorithm they can be transformed into outputs. Finally, the computations must be implemented in a biological device for people and non-human animals. Many, though not all, theories that have been put forth to account for covariation detection have been described in such a manner that we can know their computational level and at least one example of an algorithm that could achieve its function.
For the purpose of this paper, theories of covariation detection are regrouped in two categories at the computational level: the associative models and the statistical models. The focus of the paper is on Wagner's (1976, 1981; Mazur & Wagner, 1982) standard operating procedures (SOP) as the main associative model and Cheng & Novick's (1990, 1992; Melz, Cheng, Holyoak & Waldmann, 1993) probabilistic contrasts (PB) as the main statistical model. The associative models postulate that any exposure to a given pair of target events will change the amount of associative strength accruing to these stimuli. Conversely, exposure to only one of the two events in the pair decreases the associative strength. Thus, covariations are not assessed directly but sensitivity to contingencies results from contiguity-based variations in associative strength. On the other hand, statistical models postulate that frequencies of events can be monitored cumulatively and that the contingency relating one event to another can be assessed directly from the cumulated frequencies according to a prescribed formula such as [Delta]P or some other variant. Thus, the two types of models differ fundamentally as to the logic by which sensitivity of contingency is achieved.
Wagner (1976, 1981) and Mazur & Wagner (1982) have proposed an associative model which can be applied to contingency judgment. Initially designed to account for variations in the effectiveness of reinforcement in conditioning, the model was intended to explain variations in the strength of an associative link between a conditioned stimulus and a reinforcer (pavlovian conditioning) or between an action and its outcome (operant conditioning). However, much research on human contingency judgment does not involve conditioned stimuli or reinforcers, although some tasks, such as the keypress used by Wasserman, Chatlosh & Neunaber (1983), bear a very close resemblance to operant conditioning. Nevertheless, the absence of a strong reinforcing stimulus in a contingency should not absolutely prevent the applicability of an associative model. An obvious case in point is the well-known animal learning phenomenon of sensory preconditioning (Mercier & Baker, 1983). To apply associative models usefully to human contingency judgment, it is necessary to think of the associative link at a more abstract level. In human research, the events entering into a contingency are often an action, real or imagined, and its outcome (e.g. using a certain ingredient to make a cake rise or not, Shaklee & Mims, 1986; firing shells in an attempt to explode a tank in a video game, Shanks, 1985a, b), or a predictor stimulus and an outcome (e.g. sunny weather/skin smoothness, Arkes & Harkness, 1983; symptoms and diseases, Shanks, 1991). Beyond the differences, however, there is a strong theoretical link between animal learning studies and human contingency judgment. One of the fundamental requirements of Wagner's SOP (1976, 1981) is to predict increased or decreased associative strength when the relation among stimuli is either a positive or a negative contingency respectively, and no strength when the contingency is null. This requirement has been strongly established since Rescorla's experiment (1967, 1968) on the probability of shock in the presence or absence of a conditioned stimulus in fear conditioning. Yet even though sensitivity to contingencies is the computational goal, it can be achieved without calculating contingencies per se. This is what Rescorla & Wagner (1972) showed when they first proposed a set of equations formalizing the variations in associative strength resulting from contiguous pairings and non-pairings of stimuli.
In that model, the associative strength sums over all the stimuli present on a given trial.
[Mathematical Expression Omitted]. (2)
The amount of change on a given occurrence is given by:
[Mathematical Expression Omitted]. (3)
The expression [Mathematical Expression Omitted] indicates that the change in the associative strength of stimulus A on trial n is a function of the salience of that stimulus ([[Alpha].sub.A]) times the salience of the reinforcer ([[Beta].sub.1]) times the difference between the maximum strength possible ([Lambda]) and the previously accumulated strength to all stimuli present on that trial [Mathematical Expression Omitted].
An associative strength cumulates over trials according to:
[Mathematical Expression Omitted]. (4)
Thus 'contingency' per se is not computed anywhere. Yet the resulting associative values covary with contingency. The Rescorla-Wagner model (1972) has been frequently cited in the context of human covariation detection (Baker, Mercier, Vallee-Tourangeau & Frank, 1993; Kao & Wasserman, 1993; Melz et al., 1993; Mercier, 1996; Shanks, 1991). Shanks (1985a; 1987) argued that an associative model like Rescorla-Wagner's is required to account for covariation detection because when contingency judgments are monitored continuously they can be shown to develop gradually. Using a tank video game, Shanks asked people to assess the relative efficacy of shells to explode a tank. Judgments were made every five trials. Although the actual contingency was constant, the estimates grew gradually with successive trial blocks from zero to a point close to the normative value. The gradual development of the judgments has been questioned because it is not always observed empirically (Baker, Berbrier & Vallee-Tourangeau, 1989). However, this is an algorithmic issue since the Rescorla-Wagner model can be made to predict nearly instantaneous learning if the salience of the stimuli is considered very high. More importantly for the computational status of the model, simulations have shown an excellent fit between its predictions and the final contingency estimates made by the research participants (Kao & Wasserman, 1993; Shanks, 1991, 1993).
Recent mathematical elaborations of the basic associationist principles have led to connectionist models of covariation detection (Gluck & Bower, 1988, 1990; Shanks, 1990, 1991). All of these models retain the original contiguity-based mechanism. They clearly assume that covariation detection is a highly efficient, low-level, parallel process. Yet they offer little specificity as to what the processing capabilities actually are and how they would be taxed by high execution speed or varying amounts of information to be processed. Wagner's (1981) SOP model is more explicit on this point.
Partly inspired from the Rescorla-Wagner model (1972) and partly borrowing concepts from studies of working memory and automatic processing (Posner & Snyder, 1975; Shiffrin & Schneider, 1977), the core of SOP assumes that immediate behaviour as well as learning depend on the course of 'active' representation in memory. Wagner views memory as a graph structure with representative nodes connected via directional associative links, a view which is certainly compatible with the connectionist approach. The nodes are conceived as a set of informational elements in the spirit of stimulus-sampling theory (Estes, 1955). Subsets of the elements can correspond to separable aspects of represented events (e.g. colour, shape or other dimensions of stimuli). In contingency judgment, a node would correspond to one of the stimuli involved in the contingency.
Accordingly, the presentation of an experimental stimulus to the sensory system will tend to activate the elements of its corresponding memory node. After this initial state transition from inactivity (I) to primary activation (A1), the stimulus elements decay gradually to a secondary activity level (A2) and then back to I. This is similar to saying that a stimulus initially gets represented or 'attended to' in a focal, short-term buffer, then drifts to a more marginal working memory, before returning to inactivity in a long-term store. Wagner postulates no practical limitations on the number of inactive nodes in the system. On the other hand, the primary activity is severely limited (only two or three nodes could be fully in their A1 state simultaneously) and the secondary activity is limited to 10-15 nodes fully in their A2 state. The A1 limitation is crucial to the priming effects discussed below.
The more or the longer the representation of a stimulus is actively rehearsed (state A1) in the short-term (STM), the more associative learning will take place between that stimulus and other contemporaneous events. This statement roughly corresponds to the formal equations of the 1972 model. The primary activation is assumed necessary for the formation of associations while the secondary activation is not. A stimulus in its A2 state can trigger existing associations but it cannot enter in the formation of new ones. The presentation of a stimulus de nova is considered normally to lead to the rehearsal of its representation while the presentation of a physical stimulus that is already active or 'primed' in short-term memory does not provoke this primary activity. This is because of the severe capacity limitations of A1 processing. According to the model, there are two types of priming. Self-generated priming occurs when stimuli are presented in close succession such that the rehearsal of the last trial has not yet terminated when the new trial begins. However, because some amount of time has elapsed, the remaining rehearsal activity is mostly that of the secondary type which does not result in new learning. Retrieval-generated priming occurs when a cue previously associated with the target stimulus is present just before the physical presentation of the target. The associative link between the cue and the target fetches the representation of the target from long-term memory and makes it active (secondary type) in short-term memory. Hence, the target representation is already active in STM at the time of its new physical occurrence; that new physical occurrence does not provoke as much rehearsal as it should, and less new learning takes place.
Wagner developed this model to account for a variety of learning phenomena (Pfautz & Wagner, 1976; Terry & Wagner, 1975; Wagner, Rudy & Whitlow, 1973; Wagner & Terry, 1975), one of which was the apparent paradoxical effects of short- and long-term habituation (Davis, 1979; Whitlow, 1975). In this paradigm, a stimulus such as a loud tone is presented repeatedly. Habituation is measured immediately (short-term) by the reduction in unconditioned reaction to it (e.g. less orienting, less startling) as well as during a deferred, longer term test (e.g. 24 hours later). Presentations of the to-be-habituated stimulus at short ITIs produce more short-term habituation and less long-term habituation than the same stimulus presented with longer intervals. Wagner explained the increased short-term habituation by self-generating priming and the decreased long-term habituation by a double effect of self-priming and lack of retrieval-generated priming. Thus the first effect is non-associative while the second depends on some association. When the stimuli are presented at short ITIs, they are primed from trial to trial. That is, when a physical stimulus appears for the second time, the interval since its previous presentation has been so short that not all of the elements of its representation have decayed. The shorter the ITI is, the larger is the proportion of stimulus elements still in their A1 or their A2 state. Since the processing in these states, especially in A1, is severely limited, the new presentation of the physical stimulus essentially has no effect beyond the sensory register and gains little or no access to focal rehearsal. Because of this, it causes little or no immediate responding. The immediate absence of responding (e.g. no orienting or no startle) is a strong short-term habituation. When the ITI is longer, every new physical stimulus presentation occurs after all the elements of its representation have returned to inactivity. Thus every new physical presentation activates the representation in full and this state transition from I to A1 is accompanied by a strong overt reaction (more orienting or startle) or lack of immediate habituation. This effect is non-associative since only the habituated stimulus and its representation is involved in self-generated priming.
How is it then that 24 hours later the stimulus habituated at short ITIs, and for which a lot of immediate habituation was observed, now produces less habituation than a stimulus exposed at long ITIs? The explanation is that, contrary to the short-term effect, long-term habituation is associative. To develop, long-term habituation depends on the formation of an associative link between the habituated stimulus and another stimulus called the context. In associative learning, the context is a complex and somewhat ill-defined stimulus made up of all that is the experimental situation; the experimental room with its colour, size, shape, etc. When a stimulus is exposed at short ITIs, it cannot form a strong associative link with the context in which exposure takes place because it is continually self-primed. That is, in order to become associated, the habituation stimulus should enter into focal rehearsal (A1) simultaneously with the context but the opportunities for this state transition are reduced by the fact that physical presentation repeats faster than past trials decay. At long ITIs, the association does form. When a test for habituation is carried out later in the same context, and the association is strong (long ITIs during exposure), the context activates the representation of the habituated stimulus (retrieval-generated priming) and the physical stimulus does not provoke an overt reaction (habituation) since its effect does not get past the sensory register. When the context is weakly associated with the exposed stimulus (short ITIs during exposure), there is a lack of retrieval-generated priming. Thus the physical presentation of the test stimulus gets more focal processing and triggers an overt reaction (lack of habituation).
From Wagner's own admission, 'there is no simple isomorphism between the basic processes of SOP and the terms of the Rescorla-Wagner equations' (1981, p. 40). However, 'because it similarly assumes that conditioned stimuli combine the associative consequences of a trial' (Wagner, 1981, p. 41), SOP would make the same predictions as Rescorla-Wagner, whether applied to conditioning or to contingency judgments. In addition, when the contingency is experienced trial by trial, SOP expects that short ITIs would result in a weaker associative link between the events in the contingency because of self-generated priming. Since the degree of judged contingency is a by-product of the strength of association, its estimation should be reduced by short ITIs. In a related fashion, very short stimulus durations might also yield a weaker associative link than longer durations because short stimuli would cause the active rehearsal to be shorter and let the decay process start earlier in each trial. Thus the SOP model anticipates clear consequences to the manipulation of processing requirements in terms of speed of execution (short ITIs) and amount of information (only two or three nodes can be fully active simultaneously). The predictions of statistical models on the other hand are not as clear.
A recent statistical model of contingency judgment has been proposed by Cheng & Novick (1990, 1992). The model is inspired in part by Kelley's (1967, 1973) attribution theory and the tradition of social cognition. Its basic tenet is that we infer the cause of a given person's behaviour to a certain stimulus on a certain occasion based on: (a) the degree of consensus between the person's response to the stimulus and other people's response on the same occasion, (b) the degree of distinctiveness of the person's response to the stimulus versus other stimuli on the same occasion, and (c) the degree of consistency of the person's response to the stimulus on this occasion versus other occasions (Cheng & Novick, 1990). This is a multidimensional model in which people play a central role since it is a social model. The research dealing with relations between action and outcome or cue and outcome has focused on uni- and bidimensional contingencies. Consistency across occasions has been the dimension of choice in trial-by-trial unidimensional contingency assessments, and distinctiveness of stimuli has typically been the second dimension in studies of overshadowing and discounting (Baker et al., 1993; McClure, Jaspars & Lalljee, 1993; Melz et al., 1993; Shanks, 1985b, 1986, 1989, 1991, 1993; Vallee-Tourangeau, Baker & Mercier, 1994).
To assess the degree of consensus, distinctiveness and consistency, Cheng & Novick (1992) propose main effect and interaction contrasts. A main effect specifies that a cause involving a single factor i is defined by the probabilistic contrast:
[Mathematical Expression Omitted], (5)
where the proportion of events for which the effect occurs in the absence of factor [Mathematical Expression Omitted] is subtracted from the proportion of events for which the effect occurs in the presence of factor i ([p.sub.i]). A two-way interaction contrast would be:
[Mathematical Expression Omitted]. (6)
Equation (6) would be useful for studies of overshadowing and discounting. However, this paper concerns equation (5), which, evidently, is the same as equation (1) (Ward & Jenkins, 1965). Cheng & Novick explicitly state (1990, p. 550, 1992, p. 368, footnote 4) that their model is computational in Marr's (1982) sense of the term and that it specifies what is computed, not how the computation is carried out. They add that they leave issues of processing limitations to others. This is why their model does not make clear and specific predictions regarding variations in speed of execution and amount of information to be processed. Nevertheless, irrespective of how the computation is done, its results must conform to the value of [Delta]P. If manipulations of processing difficulties are found to cause deviations from the [Delta]P norm, they must be accounted for. In the discussion section, we will analyse whether this could be done within the confines of this and other statistical models.
In addition to theoretical justifications, there is at least one type of empirical evidence suggesting that contingency judgments deviate from [Delta]P when processing 'difficulty' increases. Some experiments on contingency judgments have presented their participants with summary data in the form of tabled frequencies or numbers embedded in a story. This is what we refer to as the off-line method. Other experiments have used a trial-by-trial or on-line method of presentation. It has been observed, both across (Arkes & Harkness, 1983; Arkes & Rothbart, 1985; Jennings, Amabile & Ross, 1982; Shaklee & Mims, 1986) and within experiments (Kao & Wasserman, 1993; Schustack, 1988; Ward & Jenkins, 1965), that on-line judgments are less consistent with [Delta]P than off-line judgments. The argument generally invoked to explain this difference is that on-line acquisition of information creates a heavier memory load than off-line data. This argument is more suggestive than definitive because off-line and on-line experiments differ in more ways than one. For instance, it can be argued that the presentation of tabled data off-line in itself induces a tendency to turn to a calculation like [Delta]P. Indeed, Hastie & Park (1986) have shown that the relation between memory and judgment on-line is very complex and simultaneously open to a number of biases of encoding and retrieval. Nevertheless, the possibility remains that some judgments are more difficult to make than others because 'the judgment operator is limited by working memory capacities constraining the complexity of elementary information processes that can be executed at any point in time' (Hastie & Park, 1986, p. 259).
Similarly, contingency judgements (Trolier & Hamilton, 1986) and contingency generation (Malmi, 1986) are more accurate with binary than continuous data. Again the suggestive argument would be that continuous data, being a richer representation of a contingency, are more demanding to process.
Along the same line of thinking, the amount of information to be processed on-line might influence judgment accuracy. Arkes & Harkness (1983) manipulated the matrix size in Expt 2 and found no effect; but this experiment was carried out off-line so that participants simply looked at larger numbers instead of having to process more trials. Cheng & Novick (1992, p. 368, footnote 3) indicate that sample size should play a role in judging contingencies. In order for a factor to be considered causal, its contingency would be required to exceed a minimum criterion, the magnitude of which would reflect random fluctuations. This kind of role would apply equally on- and off-line but it affects criterion magnitude not judgment accuracy. On-line, however, sample size may have another effect. If the judgment process is statistical and event frequencies are tallied in memory, then fewer trials should be easier to keep track of and should allow better accuracy. At the other extreme, with a very large number of trials, even a very partial sampling of the information, as long as it is random, should yield representative data. If the process is associative, fewer trials might produce pre-asymptotic strength and an underestimation of the true contingency whereas many trials would insure asymptotic associative strength.
Thus, in spite of its theoretical importance, the current empirical evidence on the effects of processing requirements on the accuracy of contingency judgments remains inconclusive. The current experiment attempts to fill this gap. The experiment used a type of video game as the contingency task. Space tanks, camouflaged or non-camouflaged, defended a planet against alien mines. The participants made 54 contingency judgments on three contingencies (one positive, one null, one negative) presented in factorial combinations of two stimulus durations, three ITIs and three numbers of trials. Processing difficulty is defined in the traditional manner of cognitive psychology. Short stimulus durations and short ITIs are expected to increase difficulty by forcing the cognitive processes involved in acquiring and maintaining information to operate faster. Associative models such as SOP predict that faster operation will reduce accuracy. Statistical models are vague on this issue. The well-known phenomenon of speed-accuracy trade off (Ratcliff, 1978) constitutes independent evidence that speed will eventually reduce cognitive performance accuracy. The expected effect of the number of trials manipulation is that if the judgment process is statistical and event frequencies are tallied in memory, then fewer trials should be easier to keep track of. At the same time, a very large number of trials sampled randomly should yield representative data. On the other hand, if the process is associative, fewer trials might produce pre-asymptotic strength and an underestimation of the true contingency.
Twenty-five young adults volunteered to participate (13 women and 12 men, mean age = 27 years, range 20-48; mean education level = 16 years, range 11-21). A prize of NZ$50 was awarded randomly to one of them after the experiment was completed.
The experiment was programmed in MEL (Schneider, 1989) and carried out on two IBM compatible microcomputers, each equipped with a 14 in VGA colour monitor and located in an individual testing room.
The computer initially asked three questions about the participant's gender, years of education and age. Following this, an introductory display described the experimental task as follows:
Thank you for participating in this experiment about making judgments under uncertainty. Imagine that you are defending your planet against space invaders. You have tanks to defend yourself against the special 'visual' mines that the invaders have hidden. Since these mines are visual, it may be possible to help the tanks be safer by painting them so that the alien mines do not see them as well. You will be shown a series of trials during which a tank will appear at random positions on the screen and will either be in its natural base colour or painted (name of target colour).
A SAFE tank appears UPRIGHT; an UNSAFE (exploded) tank appears UPSIDE DOWN.
Your job is to assess how much the (target colour) PAINT CHANGES THE RATE OF EXPLOSION compared to the (base colour).
This was followed by three series of practice trials. The first practice contingency had a [Delta]P value of 1.0 and contained eight 200 ms stimulus duration trials, each 1000 ms apart, with trial frequencies of 4, 0, 0 and 4 in cells a, b, c and d. The second practice series had a [Delta]P of 0.33 with trial frequencies of 8, 4, 4 and 8 respectively, each stimulus presented for 200 ms and 200 ms apart. The last practice was a null contingency made up of 10, 10, 10 and 10 trials of 50 ms duration each with a 50 ms ITI. These practice trials are not completely balanced; in particular, [Delta]P decreases as ITI also decreases and as the number of trials increases. However, these data are not part of the experimental results and only served an illustrative purpose for the participants. After this practice, the experimenter answered any clarification question the participant might have, without ever suggesting any specific manner by which to arrive at an answer. The participants were told that they might sometimes feel as though they were guessing but that this sentiment was to be expected and that they should continue to give the best possible answer they could.
The tanks were bitmaps of 64 horizontal by 48 vertical pixels, in cyan or purple on a black background (colours 5 and 3 of the standard VGA palette). The designation of the target and base colours was balanced across participants. The upside-down tank was a 180 degrees vertical rotation of the upright drawing. The centres of the bitmaps were displayed at the 240th vertical pixel and at pixel 256, 320 or 384 horizontally. The horizontal position was chosen at random with the constraint that the same position could not occur twice in a row. The drawings were carried out in the background and displayed on screen by switching memory synchronously with the top of the vertical retrace.
Following practice, 54 test series were presented in random order, each separated by a warning screen with bright white text on a bright red background. The 54 series were made up of all the combinations of numbers of trials (8, 32 or 40), stimulus durations (50 or 200 ms), ITIs (50, 200 or 1000 ms) and contingencies ([Delta]P values of +.5, 0 or -.5). When the [Delta]P was +.5, the contingency tables were made up of three target colour safe, one target colour unsafe, one base colour safe, and three base colour unsafe trials for series of length eight; 12, 4, 4 and 12 for series of length 32; and 15, 5, 5 and 15 for length 40. The tables for [Delta]P = 0 were 2, 2, 2, 2 or 8, 8, 8, 8 or 10, 10, 10, 10 respectively. The tables for [Delta]P = -.5 were 1, 3, 3, 1 or 4, 12, 12, 4 or 5, 15, 15, 5 respectively. At the end of a trial, the screen was cleared (no mask was used) and remained black for the duration of the ITI.
The series lengths of 8, 32 and 40 were chosen because eight is about the minimum number of trials that still allows for a reasonable number of different contingency tables within each [Delta]P value. Also, many studies of serial recall from working memory present a similar number of items (e.g. Phillips & Christie, 1977). Thirty-two is a quadruple of 8 and makes it possible to compare the processing of eight long trials (duration and ITI 200 ms) with 32 short ones (duration and ITI 50 ms). Forty trials make it possible to directly compare this study with many others previously published (Baker et al., 1993; Shanks, 1985a, 1987). A stimulus duration of 50 ms was judged to be the minimum that could be used without running into many perceptual problems (Busey & Loftus, 1994), while 200 ms is a typical value in working memory experiments (Baddeley, 1993). Finally, ITIs of 50 and 200 were chosen to keep in line with the similar stimulus durations while an ITI of 1000 ms would create a trial pace similar to that used in many on-line studies (Baker et al., 1993; Shanks, 1985a, 1987).
At the end of each series, the rating scale appeared in bright white on a blue background and the participants typed their answer with the number keys. They were given the opportunity to change their answer before starting the next series but no more changes were possible once the next series had begun.
A Type I error rate of .05 is used throughout. The data were analysed using univariate randomized block factorial analysis of variance with a pooled error term (Kirk, 1968) and conventional F values reported. To guard against possible bias resulting from heterogeneity of covariances, all significant results are accepted only if they are also consistent with multivariate significance tests (Wilk's lambda), unless otherwise noted. When an effect has more than one degree of freedom on the numerator, multiple pairwise comparisons are carried out using Bonferroni corrections and the relevant error term from the ANOVA. Frequencies are tested with the likelihood ratio [[Chi].sup.2].
The contingency estimates provided by the participants are summarized in Fig. 1. A contingency (3) by number of trials (3) by duration (2) by ITI (3) within-subjects analysis of variance showed a significant main effect of contingency (F(2,1280) = 709.22, pooled MSE = 913), confirming that the participants could discriminate the positive, the null and the negative contingencies. The three contingencies also interacted with the number of trials (F(4,1280) = 10.53), the stimulus duration (F(2,1280) = 7.2) and the ITIs (F(4,1280) = 11.9). No other effects were significant.
Inspection of Fig. 1 suggests that there may have been two sources of interaction with contingencies. First, when [Delta]P = -50, the slope of the curve for negative values artificially created an interaction for judgments which were actually parallel to their positive counterparts. To work around the artificial nature of this effect, the sign of all estimates for the negative contingency only was changed before reanalysing. Second, the estimates for [Delta]P = 0 essentially followed a flat line whereas the others did not. On the surface this looks like the usual non-parallelism for interactions. However, it is apparent from the pattern of judgments in the non-zero contingencies that the experimental manipulations degrade estimates in a manner bringing them closer to zero (more on this issue below). When the true [Delta]P is zero, it becomes impossible to distinguish between correct judgments and degraded judgments. For this reason, the new analysis excluded the zero contingency estimates.
After changing the sign and excluding the null contingency, a contingency (2) by number of trials (3) by duration (2) by ITI (3) within-subjects analysis of variance yielded significant differences for contingency (F(1,844) = 21.22, pooled MSE = 1096), number of trials (F(2,844) = 17.3), stimulus duration (F(1,844) = 11.99) and ITI (F(2,844) = 19.27). The duration by ITI interaction approached significance (F(2,844) = 2.91, p [less than] .06) but not on the multivariate test (Wilk's lambda = .82, F(2,23) = 2.56, p [less than] .11). No other effects were significant. The +50 contingency (mean = 43) was estimated significantly higher than the -50 (mean = 33). The estimates were also larger for the long stimulus duration (mean = 42) than for the short one (mean = 37). Bonferroni comparisons showed that the short ITI produced smaller estimates (mean = 30) than the intermediate ITI (mean = 38) which, in turn, produced lower average estimates than the long ITI (mean = 46). Significantly lower contingency estimates were also produced in the small number of trials condition (mean = 29) while the estimates for the intermediate (mean = 42) and larger number of trials (mean = 43) did not differ from one another. Fig. 2 depicts these main effects along with the corresponding estimates for the null contingency even if those were not part of the statistical analysis.
In terms of accuracy, the mean absolute deviation from the normative contingency was 28 (SD = 19.7) when [Delta]P = +50, 16 (SD = 18.5) when [Delta]P = 0, and 33 (SD = 27.7) when [Delta]P = -50. These mean absolute deviations were all different from one another (F(2,1347) = 68.59, MSE = 497; Bonferroni comparisons). Also, even the most accurate of these estimates, 16, was significantly different from zero (t(449) = 18.54).
One aspect of the data which deserved further attention was the consistent decline of the estimates towards zero as the conditions became more demanding. This decline could have been caused by systematic underestimations resulting from the experimental treatment, in which case many estimates would have lower yet non-zero values in the appropriate experimental conditions. Unfortunately, it could also be that the participants used zero as an answer when they really had no answer at all, as if they were saying: 'I do not know the answer, therefore I do not want to commit myself and I give the value in the centre of the scale'. If the smaller means were obtained only because the participants used the answer zero as a form of suspended judgment, then these data have little theoretical value and the analyses inappropriately include them. Certainly, there were significantly more zeros than any other numerical answer in all the conditions where judgment was significantly degraded. For ITIs, there was 18 per cent of zeros at 50 ms, 11 per cent at 200 ms and 8 per cent at 1000 ms ([[Chi].sup.2](2) = 14.42). For the stimulus durations, 50 ms stimuli yielded 16 per cent zeros and 200 ms stimuli only 8 per cent ([[Chi].sup.2](1) = 11.53). For number of trials, there were 20, 9 and 7 per cent zero answers in the 8, 32 and 40 trial conditions ([[Chi].sup.2](2) = 23.56). The +50 and -50 contingencies did not correlate with different frequencies of zeros (7 and 5 per cent respectively; [[Chi].sup.2](1) = 3.03). Nevertheless, when the preceding ANOVA was recalculated excluding all the 109 zero answers, all the major conclusions remained the same. There were significant differences among contingency (F(1,735) = 32.69, pooled MSE = 1065), number of trials (F(2,735) = 11.34), stimulus duration (F(1,735) = 5.91) and ITI (F(2,735) = 13.57), and no other effects were significant. Multiple comparisons for number of trials and for ITIs remained significant except for the difference between the 50 and the 200 ms ITIs for which the Bonferroni probability became marginal (p [less than] .06).
Overall, people were able to make contingency estimates which discriminated well among fairly separable contingency values (positive, null, negative). They did not do so well when the sample size of the data on which the estimates were based was very small. In this experiment, eight trials resulted in significantly less accurate judgments than 32 or 40. Under time pressure, the accuracy of the judgments also decreased markedly. Shorter stimulus presentations and shorter ITIs both had a negative effect.
These results are not just an artifact of the participants not knowing what to answer and using the zero at the centre of the scale in an uncommitted fashion since an analysis excluding all zero answers led to the same conclusions. Note that this analysis excluding all zero answers was conservative since some true zeros have been eliminated along with the artifactual ones, thus inflating the mean values more than the minimal correction called for.
In absolute terms, the accuracy of the judgments statistically deviated from the normative answer. This observation might lead one to conclude, like Malmi (1986) and Alloy & Tabachnik (1984), that people are poor judges of contingencies. However, the position of the judgments relative to one another, their sign, and even the size of the average deviation from the norm convey the opposite impression. That is, unless the pressure is extreme, the judgments are quite good. For instance, at a full second of ITI, the average judgment is nearly right on the mark for all conditions. The null contingency was perceived accurately, showing that the experimental task is reasonably free of the influence of prior beliefs (Chapman, 1967; Chapman & Chapman, 1967, 1969) and that the frequencies used did not suffer from an outcome density bias (Shanks, 1985a).
The reduced accuracy with a small data sample is consistent with the associative view of contingency judgments. This type of model assumes that judgments are based on the underlying associative strength developed gradually from the contiguous experience of the event combinations. According to this, few trials may not be enough to bring the associative strength to asymptote and, on average, people will report estimates that are lower than the expected asymptotic value. By comparison, Cheng & Novick's (1992) statistical model holds that contingency is estimated from the cumulated frequencies of the event combinations. Thus it should not matter whether the database contains few trials if the contingency coefficient is calculated properly from the available data. With many trials, there should also be little interference, if any, since a large sample is more likely to be representative.
The detrimental effect of short ITIs is also more consistent with associative models than statistical ones. It probably comes as no surprise that more mistakes are made when the pace of information acquisition is very fast. In that sense, Cheng & Novick's statistical model is not so much invalidated by the data as falling short of accounting for why the mistakes occur. Other statistical models may be seen as accounting for the detrimental effect of short ITIs. For instance, according to Shaklee and collaborators (Shaklee & Mims, 1986; Shaklee & Tucker, 1980; Shaklee & Wasserman, 1986), the mere on-line nature of the task, quite apart from the added pressure of the fast pace, is sufficient to make people shift to less sophisticated judgment strategies. That is, instead of comparing conditional probabilities ([Delta]P), people would resort to comparing diagonals ([Delta]D), specific cell frequencies (e.g. a vs. b) or even just look at cell a when required to judge under pressure. These various strategies, dubbed 'linear heuristics' by Cheng & Novick (1992, pp. 374-376), are statistical in the sense that they rely on some form or other of cumulative frequencies and/or probabilities. However, they are not normative because the values they compute sometimes differ markedly from the mathematically correct contingency estimate ([Delta]P). In the multiple strategy view, different people can use different strategies and the same people can also change strategies with circumstances. However, since this approach does not specify by what mechanism the strategy changes come about, its predictive value is severely limited.
Another statistical model, proposed by Kao & Wasserman (1993), averages and weights the information in the four cells of the contingency table according to the formula:
[R.sub.w] = [W.sub.A] x A - [W.sub.B] x B - [W.sub.C] x C + [W.sub.D] x D/[W.sub.A] x A + [W.sub.B] x B + [W.sub.C] x C + [W.sub.D] x D + error. (7)
This formula is statistical but also non-normative because of its weighting scheme which attributes different values to different pieces of information. However, it does integrate the information of all cells in a single coefficient and, applied extensively to a rich set of null contingencies, it was shown to perform as well if not better than the associative model of Rescorla and Wagner (1972). Unfortunately, because of the combination of positive signs for cells a and d and negative signs for cells b and c in the numerator of equation (7), it essentially makes the same predictions as [Delta]D over the entire contingency space. Although widely cited, [Delta]D is insufficient to account for all contingency judgments. Moreover, like the other statistical models, Kao & Wasserman's does not have a specific mechanism whereby the detrimental effect of short ITIs would be accounted for.
In an attempt to salvage the statistical models, one might argue that what happens under pressure is that people become more uncertain. The uncertainty might lead them to try not to commit themselves or it might lead them to 'guess' more. We have already shown above that attempts at noncommittal answers are insufficient to explain the overall pattern of results. On the other hand, guessing more should produce a more random pattern of errors; a pattern containing as many overestimations as underestimations. However, the judgment changes that do occur are clearly not random. This fact is also beyond the reach of some associative models yet it is definitely anticipated by Wagner's (1981) SOP. In this model, events are represented and 'rehearsed' in working memory for a certain period of time before their traces decay to the point of returning to complete inactivity. Furthermore, physical events do not trigger as much rehearsal when their representation is already active (self-generated priming). Although the exact duration of the rehearsal is not specified, it follows from these principles that experiencing events at a very fast pace should interfere with their processing because of self-generated priming. This interference results in the accumulation of a lesser amount of associative strength over a fixed, especially small, number of trials. If more trials are experienced, then, because priming does not block rehearsal completely, the asymptotic level of associative strength can eventually be reached.
On the surface at least, it is less obvious how the SOP model can account for the reduced accuracy brought about by the shorter stimulus duration. At long ITIs, and with stimulus durations always long enough to allow identification of the stimuli, why should there be any disturbance? A careful examination of the mechanisms described by Wagner (Fig. 1.7, p. 34) indicates that longer stimulus durations maintain a higher degree of primary activity in working memory. That is, the rehearsal responsible for augmenting associative strength lasts longer. Conversely, the representation of more punctate stimuli would decay faster. Thus the 50 ms stimulus duration used in this experiment was short enough to slow down the accumulation of associative strength because it did not allow as much rehearsal on every trial. According to this mechanism, increasing rehearsal time via longer durations should offset the interference of short ITIs to some extent and vice versa. There was some evidence for this in the experiment, especially in the eight-trial condition ([ILLUSTRATION FOR FIGURE 1 OMITTED], left panel), although the duration by ITI interaction did not quite reach statistical significance. It is possible that rehearsal changes brought about by exposure time versus ITI do not offset one another in a one-to-one correspondence, but clearly further evidence will be required on this issue.
The overall support for the associative view is further enhanced by the less than neutral instructions. Asking the participants to indicate how much the target colour changes the rate of explosion compared to the base colour, if anything, should induce a tendency to calculate [Delta]P. Crocker (1981) and Beyth-Marom (1982) have argued that the phrasing of the covariation question seems to influence participants. It is in spite of this that the judgments deviated from [Delta]P.
Although Wagner's SOP was originally devised to account for a series of phenomena in animal learning, it is interesting to see how it extends in a fairly straightforward manner to human contingency judgments just like Shanks (1985a) had noted the applicability of the older Rescorla-Wagner formulation. It is also noteworthy that the building blocks of SOP, which were initially borrowed from the human cognitive literature, continue to dovetail with current cognitive models such as Busey & Loftus's (1994). Busey & Loftus did not study contingency judgments but they propose that the acquisition of visual information, in a digit recall task, is a function of (a) the magnitude by which the sensory response exceeds some threshold and (b) the proportion of still unacquired information. The first principle is not far removed from the decaying rehearsal postulated in SOP, especially when one compares it with the sensory response function and the information extraction rate (Busey & Loftus, 1994, p. 450), at stimulus durations which are in the same order of magnitude as those used in this experiment. The second principle is exactly the same as that of Rescorla and Wagner (1972) and Wagner (1981). All this suggests that further research on contingency judgments, guided by associative theories, should bring us closer to an integrated view of this and other cognitive processes.
This paper was prepared while the first author was on sabbatical leave at Victoria University of Wellington, New Zealand.
Alloy, L. B. & Tabachnik, N. (1984). Assessment of covariation by humans and animals: The joint influence of prior expectations and current situational information. Psychological Review, 91, 112-149.
Arkes, H. R. & Harkness, A. R. (1983). Estimates of contingency between two dichotomous variables. Journal of Experimental Psychology: General, 112, 117-135.
Arkes, H. R. & Rothbart, M. (1985). Memory, retrieval, and contingency judgements. Journal of Personality and Social Psychology, 49, 598-606.
Baddeley, A. (1993). La Memoire Humaine: Theorie et Pratique. Grenoble: Presses Universitaires de Grenoble.
Baker, A. G., Berbrier, M. W. & Vallee-Tourangeau, F. (1989). Judgements of a 2 x 2 contingency table: Sequential processing and the learning curve. Quarterly Journal of Experimental Psychology, 41B, 65-97.
Baker, A. G., Mercier, P., Vallee-Tourangeau, F. & Frank, R. (1993). Selective associations and causality judgments: Presence of a strong causal factor may reduce judgments of a weaker one. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 414-432.
Beyth-Marom, R. (1982). Perception of correlation reexamined. Memory and Cognition, 10, 511-519.
Busey, T. A. & Loftus, G. R. (1994). Sensory and cognitive components of visual information acquisition. Psychological Review, 101, 446-469.
Crocker, J. (1981). Judgment of covariation by social perceivers. Psychological Bulletin, 90, 272-292.
Chapman, L. J. (1967). Illusory correlation in observational report. Journal of Verbal Learning and Verbal Behavior, 6, 151-155.
Chapman, L. J. & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of Abnormal Psychology, 72, 193-204.
Chapman, L. J. & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 74, 271-280.
Cheng, P. W. & Novick, L. R. (1990). A probabilistic contrast model of causal induction. Journal of Personality and Social Psychology, 58, 545-567.
Cheng, P. W. & Novick, L. R. (1992). Covariation in natural causal induction. Psychological Review, 99, 365-382.
Crocker, J. (1981). Judgment of covariation by social perceivers. Psychological Bulletin, 90, 272-292.
Davis, M. (1979). Effects of interstimulus interval length and variability on startle response habituation. Journal of Comparative and Physiological Psychology, 72, 177-192.
Estes, W. K. (1955). Statistical theory of distributional phenomena in learning. Psychological Review, 62, 145-154.
Gluck, M. A. & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychology: General, 117, 227-247.
Gluck, M. A. & Bower, G. H. (1990). Component and pattern information in adaptive networks. Journal of Experimental Psychology: General, 119, 105-109.
Hastie, R. & Park, B. (1986). The relationship between memory and judgment depends on whether the judgment task is memory-based or on-line. Psychological Review, 93, 258-268.
Jennings, D. L., Amabile, T. M. & Ross, L. (1982). Informal covariation assessment: Data-based versus theory-based judgments. In D. Kahneman, P. Slovic & A. Tversky (Eds), Judgment under Uncertainty: Heuristics and Biases, pp. 211-230. Cambridge: Cambridge University Press.
Kao, S.-F. & Wasserman, E. A. (1993). Assessment of an information integration account of contingency judgment with examination of subjective cell importance and method of information presentation. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 1363-1386.
Kelley, H. H. (1967). Attribution theory in social psychology. In D. Levine (Ed.), Nebraska Symposium on Motivation, vol. 15, pp. 192-238. Lincoln, NB: University of Nebraska Press.
Kelley, H. H. (1973). The processes of causal attribution. American Psychologist, 28, 107-128.
Kirk, R. E. (1968). Experimental Design: Procedures for the Behavioral Sciences. Belmont, CA: Brooks/Cole.
McClure, J. L., Jaspars, J. M. F. & Lalljee, M. (1993). Discounting attributions and multiple determinants. Journal of General Psychology, 120, 99-122.
Malmi, R. A. (1986). Intuitive covariation estimation. Memory and Cognition, 14, 501-508.
Marr, D. (1982). Vision. New York: Freeman. Mazur, J. E. & Wagner, A. R. (1982). An episodic model of associative learning. In M. L. Commons, R. J. Herrnstein & A. R. Wagner (Eds), Quantitative Analyses of Behavior: Acquisition, pp. 3-40. Cambridge, MA: Ballinger.
Melz, E. R., Cheng, P. W., Holyoak, K. J. & Waldmann, M. R. (1993). Cue competition in human categorization: Contingency or the Rescorla-Wagner learning rule? Comments on Shanks (1991). Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 1398-1410.
Mercier, P. (1996). Computer simulations of the Rescorla-Wagner and the Pearce-Hall models in conditioning and contingency judgement. Behavior Research Methods, Instruments, and Computers, 28, 55-60.
Mercier, P. & Baker, A. G. (1985). Latent inhibition, habituation, and sensory preconditioning: A test of priming in short-term memory. Journal of Experimental Psychology: Animal Behavior Processes, 11, 485-501.
Pfautz, P. L. & Wagner, A. R. (1976). Transient variations in responding to Pavlovian conditioned stimuli have implications for the mechanisms of 'priming'. Animal Learning and Behavior, 4, 107-112.
Phillips, W. A. & Christie, D. F. M. (1977). Components of visual memory. Quarterly Journal of Experimental Psychology, 29, 117-133.
Posner, M. I. & Snyder, C. R. R. (1975). Facilitation and inhibition in the processing of signals. In P. M. A. Rabbitt & S. Dornic (Eds), Attention and Performance, vol. 5. New York: Academic Press. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59-108.
Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological Review, 74, 71-80.
Rescorla, R. A. (1968). Probability of shock in the presence or absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1-5.
Rescorla, R. A. & Wagner, A. R. (1972). A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds), Classical Conditioning II: Current Theory and Research, pp. 64-99. New York: Appleton-Century-Crofts.
Schneider, W. (1989). Micro Experimental Laboratory (MEL). Pittsburgh, PA: Psychology Software Tools.
Schustack, M. W. (1988). Thinking about causality. In R. J. Sternberg & E. E. Smith (Eds), The Psychology of Human Thought, pp. 92-115. New York: Appleton-Century-Crofts.
Shaklee, H. (1983). Human covariations judgment: Accuracy and strategy. Learning and Motivation, 14, 433-448.
Shaklee, H. & Mims, M. (1986). Development of rule use in judgments of covariation. In H. R. Arkes & K. R. Hammond (Eds), Judgment and Decision Making: An Interdisciplinary Reader, pp. 495-606. Cambridge: Cambridge University Press.
Shaklee, H. & Tucker, D. (1980). A rule analysis of judgments of covariation between events. Memory and Cognition, 8, 459 467.
Shaklee, H. & Wasserman, E. A. (1986). Judging interevent contingencies: Being right for the wrong reasons. Bulletin of the Psychonomic Society, 24, 91-94.
Shanks, D. R. (1985a). Continuous monitoring of human contingency judgment across trials. Memory and Cognition, 13, 158-167.
Shanks, D. R. (1985b). Forward and backward blocking in human contingency judgement. Quarterly Journal of Experimental Psychology, 37B, 1-21.
Shanks, D. R. (1986). Selective attribution and the judgment of causality. Learning and Motivation, 17, 311-334.
Shanks, D. R. (1987). Acquisition functions in contingency judgment. Learning and Motivation, 18, 147-166.
Shanks, D. R. (1989). Selectional processes in causality judgment. Memory and Cognition, 17, 27-34.
Shanks, D. R. (1990). Connectionism and human learning: Critique of Gluck and Bower (1988). Journal of Experimental Psychology: General, 119, 101-104.
Shanks, D. R. (1991). Categorization by a connectionist network. Journal of Experimental Psychology: Learning, Memory and Cognition, 17, 433-443.
Shanks, D. R. (1993). Associative versus contingency accounts of category learning: Reply to Melz, Cheng, Holyoak, and Waldmann (1993). Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 1411-1423.
Shiffrin, R. M. & Schneider, W. (1977). Controlled and automatic information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127-190.
Terry, W. S. & Wagner, A. R. (1975). Short-term memory for 'surprising' versus 'expected' unconditioned stimuli in Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 104, 122-133.
Trolier, T. K. & Hamilton, D. L. (1986). Variables influencing judgments of correlational relations. Journal of Personality and Social Psychology, 50, 879-888.
Vallee-Tourangeau, F., Baker, A. G. & Mercier, P. (1994). Discounting in causality and covariation judgments. Quarterly Journal of Experimental Psychology, 47B, 151-171.
Wagner, A. R. (1976). Priming in STM: An information processing mechanism for self-generated or retrieval-generated depression in performance. In T. J. Tighe & R. N. Leaton (Eds), Habituation: Perspectives from Child Development, Animal Behavior, and Neurophysiology. Hillsdale, NJ: Erlbaum.
Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behavior. In N. E. Spear & R. R. Miller (Eds), Information Processing in Animals: Memory Mechanisms, pp. 5-47. Hillsdale, NJ: Erlbaum.
Wagner, A. R., Rudy, J. W. & Whitlow, J. W. (1973). Rehearsal in animal conditioning. Journal of Experimental Psychology (Monograph), 97, 407-426.
Wagner, A. R. & Terry, W. S. (1975). Backward conditioning to a CS following an expected vs a surprising UCS. Animal Learning and Behavior, 3, 370-374.
Ward, W. C. & Jenkins, H. M. (1965). The display of information and the judgement of contingency. Canadian Journal of Psychology/Revue Canadienne de Psychologie, 19, 231-241.
Wasserman, E. A., Chatlosh, D. L. & Neunaber, D. J. (1983). Perception of causal relations in humans: Factors affecting judgments of response-outcome contingencies under free-operant procedures. Learning and Motivation, 14, 406-432.
Whitlow, J. W. J. (1975). Short-term memory in habituation and dishabituation. Journal of Experimental Psychology: Animal Behavior Processes, 104, 189-206.
|Printer friendly Cite/link Email Feedback|
|Author:||Mercier, Pierre; Parr, Wendy|
|Publication:||British Journal of Psychology|
|Date:||Nov 1, 1996|
|Previous Article:||Responses to music in aerobic exercise and yogic relaxation classes.|
|Next Article:||Object recognition from point-light stimuli: evidence of covariation structures in conceptual representation.|