Vigilance and signal detection theory: an empirical evaluation of five measures of response bias.
The theory of signal detection is a model of perceptual processing that is often used to characterize performance effectiveness in signal detection situations because it permits the derivation of independent measures of perceptual sensitivity and response bias (Green & Swets, 1966; Macmillan & Creelman, 1991). Detection theory measures can be designated as either parametric or nonparametric on the basis of their distributional assumptions, and it is common practice to use pairs of sensitivity and bias indices that have been similarly categorized. Although the validity of this classification has been questioned, it is still widely used and is referenced here for convenience (Caldeira, 1980; Macmillan & Creelman, 1991; Richardson, 1972; Simpson & Fitter, 1973). The most frequently used measures of perceptual sensitivity include the parametric index d' and its nonparametric analog A'. The most common bias measures are the parametric index [beta] and the nonparametric indices B" and [B'.sub.H] (Green & Swets, 1966; Grier, 1971; Hodos, 1970; Pollack & Norman, 1964; see Appendix A for computing formulas).
Two additional measures of response bias that have recently been developed include the parametric index c (Ingham, 1970; Macmillan & Creelman, 1990; Snodgrass & Corwin, 1988) and the nonparametric measure [B".sub.D] (Donaldson, 1992). Evaluations in the area of recognition memory have indicated that these bias indices appear to be more effective than [beta], B", or [B'.sub.H], chiefly because they maintain their effectiveness over the full range of sensitivity from chance to perfect performance and because they provide accurate estimates of bias even when computed from collapsed or group data (Donaldson, 1992; Macmillan & Creelman, 1990; Snodgrass & Corwin, 1988). Thus it was proposed that investigators consider computing either c or [B".sub.D] in place of other indices to measure bias in recognition memory experiments.
The purpose of the present investigation was to verify the applicability of this recommendation to vigilance--the study of observers' ability to remain alert to potential critical signal occurrences for prolonged periods (Davies & Parasuraman, 1982; Mackworth, 1948, 1961; Swets & Kristofferson, 1970; Warm, 1984; Warm & Jerison, 1984). A reliable detection theory measure of response bias is particularly critical in this field of study because alterations in response bias are a predominant feature of vigilance performance (Parasuraman, 1979; Parasuraman & Davies, 1977; Parasuraman, Warm, & Dember, 1987; Warm & Jerison, 1984). In particular, response bias typically becomes more conservative or cautious over the course of a vigilance session. To date, however, the relative effectiveness of alternative bias measures has not been explored in this area. All of the bias indices except [B".sub.D] have been used in vigilance, and the problems associated with selecting the most appropriate measure have already been addressed in at least one study (Matthews, Davies, & Holley, 1993).
The present study was designed to provide a comprehensive evaluation of the efficacy of the five alternative bias indices so that practical guidelines regarding their utility in vigilance could be generated. Toward that end, an empirical evaluation of the bias measures that are most commonly used in vigilance research ([beta], B", and [B'.sub.H]) and of the new indices that are beginning to be used (c and [B".sub.D]) was completed via data from three vigilance experiments. The data not only resembled those that a typical vigilance researcher might wish to analyze with the theory of signal detection but also could be subjected to an extensive empirical evaluation. To provide the comprehensive assessment, we used five analytical techniques (analyses of variance and estimates of [[omega].sup.2], intercorrelations between bias and sensitivity, intercorrelations between bias indices, an analysis of residualized bias measures, and comparisons of average and collapsed bias indices) in evaluating the relative effectiveness of the five bias measures in the present set of experiments.
We conducted three experiments involving factors that affect response bias in vigilance, each with a different sample of observers. All three experiments employed a 40-min task divided into four consecutive 10-min periods of watch and were administered without performance feedback. Participants were required to detect small increments in the height of a single 32 x 4 mm white line, which flashed on and off at a rate of 20 events/min in the center of a gray computer screen. Responses to critical signal lines (3 mm taller than normal, unless otherwise indicated) occurring within the 3-s interstimulus interval were recorded as hits, whereas all other responses were recorded as false alarms. Before engaging in the task, each individual received training in the form of a two-alternative forced-choice discrimination task as well as a 10-min practice session, both with feedback.
In Experiment 1 (N = 48), the aim was to examine the sensitivity of each bias measure to an experimental manipulation of signal probability, which entailed informing observers a priori of the probability of critical signal occurrence (.05, .25, .50, or .75) they would encounter during the session. On the basis of previous vigilance experiments (Baddeley & Colquhoun, 1969; Williges, 1969, 1971, 1973), we expected that bias would be most conservative when signal probability was .05 and would become progressively more lenient as the probability increased to .75.
The manipulations in Experiment 2 (N = 72)--payoff and signal salience--were included so that the five bias indices could be evaluated in a manner similar to that of Snodgrass and Corwin (1988). As in their study, two levels of signal salience were used: low (0.5 mm difference between the height of the standard nonsignal line and that of the critical signal line) and high (3 mm difference in height). The low-salience condition was designed to be sufficiently difficult to elicit chance performance, thereby making it possible to identify which bias indices continue to reflect variations in bias when sensitivity is at or near chance. In both conditions the signal probability was .25.
The payoff manipulation represents another nonperceptual manipulation for assessing the relative sensitivity of the five bias measures. As in Snodgrass and Corwins (1988) experiment, three levels of payoff (conservative, neutral, and liberal) were developed via a point system, in which differential values and costs were placed on correct and incorrect responses, and the offer was made of a monetary reward ($25) for the best performance. Previous vigilance studies have shown that response bias tends to be lenient when either costs are low or values are high but conservative when the converse is true (Davenport, 1968, 1969; Levine, 1966). Hence we expected that response bias would be most conservative in the conservative payoff condition (high costs and low values) of the present study and most lenient in the liberal payoff condition (low costs and high values).
Experiment 3 (N = 24) was designed to examine the relative sensitivity of each bias index to the nonperceptual manipulation of shifts in signal probability from training to test. Colquhoun and Baddeley (1964, 1967) first used this procedure to demonstrate that the signal probability during training can have a substantial impact on the response criterion during the subsequent vigil. On the basis of their findings, we expected that the 12 observers in Experiment 3 who were trained on a probability of .40 and shifted to a higher test probability of .75 would be influenced by their experience during training to adopt an overall conservative response bias that would become more lenient over the course of the watch. Conversely, we expected that the 12 observers who were switched from a training probability of .40 to a lower test probability of .05 would exhibit an overall lenient bias that would become more cautious over time.
RESULTS AND DISCUSSION
Receiver Operating Characteristic Curves and Assumption Tenability
Before evaluating the relative efficacy of the bias measures, we examined binormal receiver operating characteristic (ROC) plots of the data to assess the tenability of the detection theory assumptions of normality and equal variance, under which the ROC will be linear with a slope of 1.00. The ROC plot in Figure la contains four operating characteristics that correspond to the mean z-scores for overall session hits (H) and false alarms (FA) in the four signal probability conditions in Experiment 1. As can be seen in the figure, the ROC was linear; its slope of 0.97 did not differ significantly from 1.00, F(1, 3) = .22, p > .05. The ROC plot in Figure lb contains one curve for low signal salience near the chance diagonal and a second curve for high salience, each of which contains three operating characteristics that correspond to the conservative, neutral, and liberal payoff conditions in Experiment 2. These ROCs were also linear, with slopes of 1.01 for low salience and 0.99 for high salience. Neither slope differed significantly from 1.00 (p > .05).
Although the data from the two signal probability shift groups in Experiment 3 could not be similarly assessed because at least three distinct operating characteristics are needed to trace an ROC curve, the outcomes from the first two experiments indicate that the normality and equal variance assumptions were tenable in the present set of experiments. The validity of these assumptions in the current study signifies that subsequent interpretations of the relative effectiveness of the various bias measures were made from estimates of sensitivity and bias that had not been distorted by nonnormality or unequal variances.
[FIGURE 1 OMITTED]
Data Transformations for Analyses of Sensitivity and Bias
Detection theory measures of perceptual sensitivity (d' and A') and response bias ([beta], c, B", [B.sub.'H], and [B.sub."D]) were derived from each individual's proportions of hits and false alarms in each experiment. Any proportions of 0 and 1 were first adjusted via the procedure recommended by Snodgrass and Corwin (1988) to permit the calculation of those measures of sensitivity and bias derived from the standard normal curve. Finally, in all analyses of variance (ANOVAs), Box's [member of] adjustment to degrees of freedom was used for F tests involving the only within-subject variable, periods of watch.
Measures of Sensitivity (d', A')
Although not the primary focus of the present set of experiments, both the parametric and nonparametric measures of sensitivity were examined in each experiment because they did enter directly into subsequent correlational analyses. In the present investigation, d' and A' were functionally equivalent measures of perceptual sensitivity that consistently yielded comparable outcomes. First, the two measures were strongly correlated (see Table 1) in all three experiments. Second, in each experiment, perceptual sensitivity declined significantly over the course of the watch, as indexed by separate ANOVAs of d' and A' (p < .0001 for all tests). Finally, ANOVAs revealed that although neither measure was affected by the nonperceptual manipulations of signal probability and payoff (p > .05), both d' and A' did exhibit a significant main effect at p < .0001 for the perceptual manipulation of signal salience in Experiment 2, Fs(l, 66) = 196.56 and 376.53, respectively.
Perceptual sensitivity in the condition of high signal salience (mean d' = 2.2, mean A' = 0.89) exceeded that of the low-salience condition (mean d' = 0.43, mean A' = 0.64), where performance was very close to chance. For both measures, this pattern of results conforms to the tenets of signal detection theory, which specify that perceptual sensitivity should be affected by a perceptual manipulation of signal salience but not by nonperceptual manipulations such as probability and payoff. As with the ROC curves, such outcomes further validate the application of signal detection theory in the current study.
Although these results indicate that applying signal detection theory herein is appropriate, the ROCs specifically support only the d' measure of sensitivity. Although A' is popularly regarded as a nonparametric index that does not make particular assumptions about underlying distributions, it does indeed place constraints on the form of the ROC, and these are not completely compatible with the ROC form implied by d' (Macmillan & Creelman, 1991; Richardson, 1972). The two ROCs on binormal plots are similar at low levels of sensitivity but not at higher levels (approximately, when d' [greater than or equal to] 2.00), where the curves for A' are concave upward rather than linear. Hence an empirical ROC cannot simultaneously support both d' and A'. To the extent that d' is an appropriate measure in the present investigation, A' would be considered inappropriate by that logic.
Nevertheless, we chose to include A' for several reasons. First, the two measures did consistently exhibit a high degree of correspondence in all experiments. Second, vigilance researchers who either know or suspect that the normality and equal variance assumptions have been violated in their experiments will probably turn to a nonparametric model. The most common nonparametric alternative to d' is A', a measure that is used nearly twice as often as d' in vigilance (See, Howe, Warm, & Dember, 1995). Third, previous research in vigilance has indicated that A' is the most suitable nonparametric alternative to d' because its ROCs, while disparate at high levels of sensitivity, are more similar than those for other nonparametric measures, such as [A.sub.g] (Craig, 1979).
Ultimately, the concern is whether the use of A' in the current investigation undermines subsequent evaluations of the relative efficacy of the five bias indices. Given that the only evaluative technique that might be affected by inadequacy in A' involves the correlations between sensitivity and bias, its inclusion does not alter our final assessment. However, the correlations involving A' should be interpreted with caution for the reasons just described. We felt it important to present all correlations, given the paucity of such reports in other studies in which detection theory measures of sensitivity and bias have been used.
ANOVAs and [[omega].sup.2] Estimates for Response Bias ([beta], c, B", [B.sub.'H], [B.sub."D])
ANOVAs and estimates of [[omega].sup.2] were first used in each experiment to evaluate the sensitivity of each bias index to the nonperceptual manipulations of signal probability, payoff, and probability shifts. The ANOVAs provided an assessment of the statistical reliability of the results, whereas estimates of [[omega].sup.2] were used to determine the proportion of variance accounted for by the principal experimental manipulations.
Experiment 1. Inspection of the mean values of all five bias indices from Experiment 1, in which participants were informed a priori of the signal probability during the vigil, indicated that, in accordance with previous vigilance studies, bias was most conservative at a probability of .05 and became progressively more lenient as the probability of signal occurrence increased to .75. Furthermore, in each case, response bias tended to become more conservative over the four periods of watch when the signal probability was either .05 or .25; otherwise, it tended to remain stable. These trends can be seen in the plot of mean c scores in Figure 2. Plots of the other four measures were similar.
[FIGURE 2 OMITTED]
Separate ANOVAs of each bias index revealed a statistically significant main effect for signal probability at p < .0001 for [beta], c, B", [B.sub.'H], and [B.sub."D]; Fs(3, 44) = 10.00, 65.14, 60.95, 68.94, and 75.34, respectively. However, the more fine-grained [[omega].sup.2] assessment of the sensitivity of each bias index to this manipulation indicated that it accounted for only 24% of the variance in [beta] scores but explained 70% (B"), 72% (c and [B.sub.'H]), and 75% ([B.sub."D]) of the variance in the four remaining indices.
Furthermore, although the temporal changes in mean bias scores (see Figure 2) suggested that each index ought to exhibit both a main effect for periods of watch and an interaction between signal probability and periods of watch, the only measure to do so was c. Periods of watch, F(3, 132) = 3.32, p < .032, and the interaction, F(9, 132) = 2.25, p < .034, each accounted for 1% of the variance in c and none of the variance in the other measures.
Experiment 2. In Experiment 2 both payoff and signal salience were manipulated to determine which bias indices would be effective at chance or near-chance performance. A bias index that is able to reflect variations in criterion regardless of the level of signal salience should exhibit (a) an effect for payoff, (b) no effect for signal salience, and (c) no interaction. The presence of either a main effect or an interaction involving salience would signify an undesirable association between bias and signal salience, which theoretically should not covary. Mean values of [beta], c, B", [B.sub.'H], and [B.sub."D] during the vigil are plotted in Figure 3 for the conservative, neutral, and liberal payoff matrices in each condition of signal salience. Inspection of Figure 3 reveals that whereas response bias varied accordingly with payoff at the high level of signal salience for all five indices, similar variations in bias at the low level of signal salience were observed only for c and [B.sub."D]. With [beta], B", and [B.sub.'H], the mean bias scores in the low-salience condition lay close to neutrality (a value of 1.00 for [beta] and 0.00 for the other two indices) regardless of payoff.
Separate ANOVAs of each bias measure revealed a significant main effect for signal salience for [beta], B", and [B.sub.'H] at p < .0001, Fs(1, 66) = 13.62, 23.12, and 16.33, respectively, as well as a significant interaction between payoff and salience for [beta] and B"; F(2, 66) = 5.93, p < .004, and F(2, 66) = 3.42, p < .039, respectively. The presence of these effects confirmed that the three indices were unable to measure bias when performance was at a chance level. In contrast, the ANOVAs revealed that neither the salience effect nor the interaction between payoff and salience was significant for c and [B.sub."D] (p > .05), signifying that they alone captured the differences in bias in the conservative, neutral, and liberal payoff conditions under low salience.
The implication of these effects is that c and [B.sub."D] are independent of sensitivity, whereas [beta], B", and [B.sub.'H] are constrained in their effectiveness by the level of sensitivity. The latter three measures can reflect variations in bias when sensitivity is high, but they become increasingly less effective as it approaches chance.
Estimates of [[omega].sup.2] for the key experimental manipulations in Experiment 2, which can be found in Table 2, emphasize the differential performance of the bias indices with regard to the effects of signal salience and payoff. For [beta], B", and [B.sub.'H], signal salience actually accounted for a greater proportion of variance than did the payoff manipulation, but none of the variance in either c or B"D could be explained by signal salience. Although both c and [B.sub."D] succeeded in this respect, the payoff manipulation itself accounted for a greater proportion of the variance in c than in [B.sub."D]. In addition, whereas c and [B.sub."D] were the only measures to detect the trend for bias to become more conservative over periods of watch, F(3, 198) = 18.65, p < .0001, and F(3, 198) = 4.02, p < .021, this variable again explained more of the variance in c than in [B.sub."D]. Overall, more of the variance in c was related to relevant factors than was the case for any other bias measure, indicating its superiority in this respect.
Experiment 3. The manipulation of concern in Experiment 3 was a shift in signal probability from training to test, which occurred in either a relatively high-to-low direction (RH-L, from p = .40 to .05) or a relatively low-to-high direction (RL-H, from p = .40 to .75). An effective measure of bias should exhibit an overall difference in bias between these two groups as well as diverging trends over the course of the watch (Colquhoun & Baddeley, 1964, 1967). Mean values of [beta] and c during the test phase for the RH-L and RL-H groups are plotted in Figure 4 as a function of periods of watch (plots of the other three measures were similar to that of c). Examination of the figure reveals that the overall level of response bias during the vigil was more conservative in the RL-H group than in the RH-L group. In addition, response bias tended to become more liberal over time in the former group and more conservative in the latter.
[FIGURE 3 OMITTED]
[FIGURE 4 OMITTED]
Separate ANOVAs of each bias index testing for the effects of group, periods of watch, and their interaction revealed that no source of variance in the analysis of [beta] reached statistical significance (p > .05). For c, B", [B.sub.'H], and [B.sub."D], there was a significant effect for group, F(1, 22) = 12.67, p < .002; F(l, 22) = 15.83, p < .001 ; F(l, 22) = 17.08, p < .0001; and F(1, 22) = 15.25, p < .001, respectively, as well as an interaction between group and periods of watch, F(3, 66) = 3.41, p < .042; F(3, 66) = 9.50, p < .0001; F(3, 66) = 8.67, p < .0001; and F(3, 66) = 6.80, p < .002, respectively. Estimates of [[omega].sup.2] indicated that group accounted for 27%, 31%, 33%, and 33% of the variance in c, B", [B.sub.'H], and [B.sub."D], respectively; the interaction accounted for 2%, 5%, 4%, and 2%. Hence in this instance, all measures except [beta] were adequate.
In addition to the ANOVAs and estimates of [[omega].sup.2], we also assessed the relative effectiveness of the five indices by examining the nature of their interrelations. As a first step, we obtained scatter diagrams plotting each bias measure against the others. The patterns revealed in these scatter diagrams were remarkably consistent across the three experiments. For that reason, only three representative plots from Experiment 1 are portrayed in Figure 5. As shown in the plot of [B.sub."D] versus [beta], scatter diagrams involving [beta] were highly curvilinear. This pattern is attributable both to the exceedingly large values of [beta] associated with conservative levels of bias and to its tendency to remain essentially unchanged at less conservative levels of bias when other measures exhibit considerable variability.
The plot of [B.sub."D] versus c in Figure 5 reveals that scatter diagrams involving the new parametric measure c tended to depict an ogival pattern, indicating that c exhibited greater variability at both conservative and lenient response biases than did the other indices. Finally, relationships among the three nonparametric measures were essentially linear, as represented by the plot of [B.sub."D] against [B.sub.'H].
Next, correlational analyses were conducted to probe further the interrelations among the bias measures revealed in the scatter diagrams as well as to determine whether the bias indices are correlated with their associated sensitivity measures. Table 1 contains partial correlations among the two indices of perceptual sensitivity and the five bias measures for overall session data, controlling for either signal probability (Experiment 1), payoff and signal salience (Experiment 2), or group (Experiment 3). As noted earlier, correlations involving A' should be interpreted with caution.
According to the tenets of signal detection theory, perceptual sensitivity and response bias are independent measures of performance that should therefore be uncorrelated. Although this appeared to be true for the three nonparametric bias measures, neither [beta] nor c was independent of d'. With respect to the relationship between d' and [beta], the positive partial correlations in the first two experiments indicated that enhanced sensitivity was associated with more conservative levels of [beta].
[FIGURE 5 OMITTED]
The correlations between d' and [beta] cause concern only when it is simultaneously noted that, with the exception of Experiment 3, they are generally indistinguishable in magnitude from the correlations between [beta] and other bias indices. The nature of these relationships implies that the traditional bias measure bears as strong a relationship to sensitivity as it does to bias, making it difficult to determine what type of measure [beta] represents and raising serious questions concerning its validity as a measure of response bias in vigilance.
Whereas [beta] was positively related to d', c was negatively correlated with perceptual sensitivity in Experiment 1, which signified that higher levels of perceptual sensitivity were associated with more lenient values of c. The non-independence of d' and c is somewhat unexpected, given recent assertions to the contrary (Macmillan & Creelman, 1990; Snodgrass & Corwin, 1988). Theoretically, because d' is the difference and c is the sum of two monotonically transformed variables that are themselves uncorrelated, these two measures should be independent. As revealed by the significant correlation in Experiment 1, this theoretical independence may not always hold empirically. The partial correlation between d' and c, however, though statistically significant, was nevertheless relatively small in magnitude, explaining only 13% of the variance in c.
This correlation becomes even less consequential when compared with the relatively substantial partial correlations between c and the three nonparametric bias indices. Unlike [beta], c was correlated with these alternative bias measures to a degree that was clearly distinguishable from its correlation with sensitivity. Ultimately, however, because only B", [B.sub.'H], and [B.sub."D] were not significantly correlated with sensitivity, they should be considered the most effective indices by this criterion.
One final observation regarding Table 1 concerns the partial correlations from Experiment 3, which differed in magnitude from those in the first two experiments. First, none of the correlations between sensitivity and bias was large enough to reach statistical significance. Second, the intercorrelations among all of the bias measures themselves were higher than in the previous two experiments. This occurrence is particularly noteworthy for assessments of [beta] and c. Specifically, although the correlations between [beta] and its competitors were larger in Experiment 3 than in the first two experiments, they continued to fall short of those involving the other four indices.
In addition, whereas c had been only moderately correlated with the three nonparametric measures in the first two experiments, it was now more comparable to these indices. This difference seems to stem from the fact that response bias was less extreme, with observers' criteria lying nearer the neutral bias point, in the last experiment than in the previous two (compare values of c in Figures 2 through 4). These observations, coupled with outcomes demonstrating c's strong ties to nonperceptual manipulations of probability and payoff, imply that c is similar to other bias measures in its portrayal of neutral and near-neutral biases but that it is unique in a manner that makes it much more effective when conservative and liberal biases are involved. A bias index capable of distinguishing among fine degrees of response conservatism would be particularly useful in vigilance situations, in which observers often adopt conservative criteria and become increasingly more cautious over time.
Analysis of Residualized Bias Measures
Although the partial correlations among the five bias measures were generally sizable, they never attained a value of 1.00, implying that each measure might possess some amount of unique variance not present in any alternative index. Consequently, we conducted an analysis of residualized bias measures to determine (a) if each bias index does possess some unique variance that is unrelated to all other measures considered simultaneously and (b) if any unique portion of each index would continue to reflect experimentally induced variations in response bias. Because the signal probability manipulation proved to be more powerful than either payoff or probability shifts, only the data from Experiment 1 were included in this evaluation.
To obtain the unique portion of each index when the alternative indices are considered simultaneously, each bias measure was residualized by regressing it on the four remaining indices. Values of 1 - [R.sup.2] from these analyses revealed that 47% of the variance in [beta] was unaccounted for by the four remaining bias measures, whereas 3% of the variance in c remained unexplained. Only 1% of the variance in each nonparametric index was unique. Separate one-way ANOVAs of each residualized bias index, with signal probability as the independent variable, revealed that only the unique variance in c was systematic (i.e., attributable to the probability manipulation); F(3, 44) = 2.34, p < .09. Calculation of [[omega].sup.2] revealed that signal probability accounted for 8% of the variance in residualized c scores. The F ratios for the remaining indices were all less than one and therefore not significant.
The results of this assessment are especially relevant for evaluations of the two parametric bias measures. Although only 3% of the variance in c was unrelated to the other bias measures, this unique variance continued to covary with the nonperceptual manipulation of signal probability and was therefore not merely error. Thus the unique variability evident in the scatter diagram in Figure 5 and in the present analysis can be described as advantageous because it enables c to make finer discriminations among levels of both lenient and conservative biases than is possible for any alternative measure. With respect to [beta], however, the extreme values associated with conservative levels of response bias can only be considered detrimental because the very large proportion of variance in [beta] unrelated to the other measures turned out to be error variance that did not covary with signal probability. In fact, this tendency to yield extreme values has been a leading complaint against [beta] and seems to be one factor behind its ineffectiveness (Jerison, 1967; Jerison, Pickett, & Stenson, 1965; Long & Waag, 1981).
Average and Collapsed Values of Response Bias
A final approach to evaluating the five bias measures was to assess their effectiveness when computed from collapsed or "group data. Collapsed bias estimates can be used to conduct a detection theory analysis when a data set contains false alarm probabilities of 0 or perfect hit probabilities of 1 for one or more observers. Because such probabilities correspond to infinite z scores, their occurrence makes it difficult to compute reliable estimates of bias for each individual. When this situation prevails, collapsed estimates of bias can be computed on the premise that group means will be 0 or 1 only if every individual has no false alarms or perfect hit rates. Such techniques can be quite useful in vigilance, in which the typically low false alarm probabilities may preclude the derivation of individual estimates of bias (Davies & Parasuraman, 1982; Warm & Jerison, 1984). Before applying signal detection theory in such instances, however, one would naturally wish to know how accurate collapsed estimates of bias will be. An effective bias index would yield similar estimates when computed from either individual or collapsed data.
Following the procedure described by Macmillan and Kaplan (1985), in each experiment we computed average bias within each experimental condition in the usual manner by first calculating a bias score for each individual and then averaging the results. Collapsed bias was obtained by first computing the mean hit and false alarm scores across observers within each experimental condition and then calculating bias from the group means. Average and collapsed values of bias were computed for the practice session as well as the four periods of watch in each condition. To measure the similarity between these five pairs of average and collapsed values, a distance measure D was computed in each condition for each index; D = [SIGMA][([X.sub.i] - [X.sub.i]).sup.2], where [X.sup.i] represents an average score and Xt is a collapsed score (Cronbach & Gleser, 1953). High values of D represent large discrepancies between the two sets of average and collapsed measures. The mean D values, obtained by averaging across conditions in each experiment, are presented in Table 3.
In general, the index [beta] consistently yielded disproportionately large discrepancies, chiefly because of its tendency to produce extremely high values for conservative observers. Both c and B" produced the smallest discrepancies. Therefore, when large numbers of Os and Is are present in the data and one must use collapsed estimates, these results indicate that the estimates will be most similar to the "true" bias when either c or B" is used and least similar when [beta] is used.
The index [beta]. The overwhelming conclusion that emerges from the present empirical assessment of five alternative bias measures is that the traditional bias index [beta] is an inadequate measure of response bias in vigilance. Five techniques were used to evaluate each bias measure, and [beta] consistently made a dismal showing. This measure was least sensitive to the nonperceptual manipulations of signal probability, payoff, and probability shifts. In addition, [beta] was no longer effective at differentiating variations in response bias tendencies when sensitivity approached chance. Furthermore, intercorrelations involving [beta] revealed that it bore little resemblance to alternative bias measures and also exhibited undesirable correlations with sensitivity. The analysis of residualized bias measures revealed that the unique variability in this bias measure, generated chiefly by the extreme values of [beta] associated with conservative levels of bias, was only error variance. Finally, estimates of average and collapsed bias were most discrepant for [beta].
Although previous studies have indicated that the new index c is superior to [beta], this study is the first to demonstrate the relative ineffectiveness of the traditional bias index in comparison with other available measures that might be computed and in terms of several methods of evaluation. Consequently, we recommend that vigilance researchers avoid using [beta] to measure the response bias of observers. (Note: See Appendix B for a discussion of In([beta]), the natural logarithm of [beta].)
The index c. In addition to revealing the inadequacy of [beta] as a measure of response bias in vigilance, the present results indicated that c was altogether superior to [beta] and at the same time relatively more effective than or comparable to B", [B.sub.'H], and [B.sub."D]. The index c was generally more sensitive than alternative indices to non-perceptual manipulations and to time-dependent changes in bias. Unlike [beta], B", and [B.sub.'H], the index c remained an effective measure of bias at chance performance levels. More important, the analysis of residualized bias measures showed that this measure possesses some unique property not found in other indices that enhances its sensitivity to nonperceptual manipulations. Namely, only c was able to differentiate among the four levels of signal probability in Experiment 1 after it had been residualized to remove any variance that could be explained by other bias measures. Finally, estimates of average and collapsed bias were most similar for c.
The sole evidence against c was its significant partial correlation with d' in Experiment 1 ; however, as noted earlier, the degree of correspondence was small and does not therefore provide strong evidence for an association between the two measures. Furthermore, the many other tests that were conducted indicated that c is by far the most effective bias index in vigilance, despite the correlation with sensitivity.
The index [B.sub."D]. Although c has been shown to be the superior index of response bias, it is a parametric measure that is most appropriately applied in conjunction with the d' measure of sensitivity when the assumptions governing the classic model of signal detection theory have been met. Consequently, we also examined separately the relative effectiveness of the three nonparametric bias measures to determine which should be used along with a nonparametric index of sensitivity, such as A'. On the basis of its capacity to remain effective at chance performance levels, its ability to detect time-dependent changes in bias, and its relatively greater sensitivity to nonperceptual manipulations of probability and payoff, [B.sub."D] can be considered the most effective nonparametric index in vigilance. This outcome provides initial empirical evidence that the correction to the formula for B", which led to the derivation of the new measure [B.sub."D], does indeed produce a more effective bias index (Donaldson, 1992).
Recommendations. Overall, we conclude that c is the most effective bias index in vigilance and recommend that vigilance researchers in both laboratory and practical settings use c rather than [beta] whenever a parametric model is involved. When a nonparametric model is used, we recommend that [B.sub."D] be computed in place of its nonparametric alternatives, B" and [B.sub.'H]. The consistency between the outcomes of the present study and those of earlier investigations further suggests that they may apply generally to the many other domains in human factors in which signal detection theory is frequently used.
APPENDIX A: COMPUTATIONAL FORMULAS
In all the following formulas, F[??] and H are proportions of false alarms and hits, respectively, whereas z represents the standardized normal deviate associated with FA or H. The ordinates for noise (N) and signal-plus-noise (SN) are calculated at z.
d' = [x.sub.FA] - [z.sub.H]
A = [1/2] + (H-FA) (1 + H - FA)/(4 H) (1 - FA)
[beta] = ordinate SN/ordinate N
c = .5 ([z.sub.FA] + [z.sub.H])
[B.sub.'H] = 1 FA(1 - FA)/H(1 - H); when H [less than or equal to] 1 - FA
[B.sub.'H] = H(1 - H)/FA(1 - FA) - 1; when H [greater than or equal to] 1 - FA
B" = H(1 - H) FA(1 - FA)/H(1 - H) + FA(1 - FA); when H [greater than or equal to] FA
B" = FA(1 - FA) - H(1 - H)/FA(1 - FA) + H(1 - H); when H [less than or equal to] FA
[B.sub."D] = (1 - H)(1 - FA) - (H)(FA)/(1 - H)(1 - FA) + (H) (FA)
APPENDIX B: THE NATURAL LOGARITHM OF [beta]
The present results have demonstrated that [beta] is an inadequate measure of response bias. It should be noted that [beta] may be transformed before analysis by taking the natural logarithm (Macmillan & Creelman, 1990, 1991; Snodgrass & Corwin, 1988); however, the transformed measure was not considered here because it is used much more rarely than is [beta]. The index [beta] takes on values between 0 and 1 when bias is liberal and values that range from 1 to positive infinity when bias is conservative, which represents a scale that does not produce equal intervals. This in itself may be regarded as sufficient grounds for choosing c over [beta] because an equal-interval measure is preferable to one that is not.
The natural logarithm, or the power to which the base e = 2.718281828 must be raised to obtain a given value, serves to transform [beta] into an interval scale measure, In([beta]). The value of In([beta]) is 0 when the criterion is neutral. Liberal criteria yield negative values, and conservative criteria yield positive values of In([beta]). The transformation also truncates the scale of the measure--that is, double-digit [beta]s become single-digit values of In([beta])--and is often used when the data are characterized by exceedingly large [beta] values, as in the present studies.
As Macmillan and Creelman (1991) have pointed out, two measures that are related by a logarithmic transformation will have equivalent isobias functions because the transformation is monotonie. Consequently, like [beta], In([beta]) will have isobias curves that converge toward a neutral value of bias as sensitivity decreases, rendering it unable to differentiate among levels of response bias when performance is at a chance level. To verify this empirically, [beta] was replaced with In([beta]) in the analysis of variance in Experiment 2 of the present investigation. As with [beta], the effects of payoff, signal salience, and their interaction were all statistically significant, with salience accounting for a greater proportion of variance than payoff. These results verified that In([beta]) too fails as a measure of bias when performance is at a chance level.
Several other statistical tests were conducted to determine whether In([beta]) might be more effective than [beta] in other respects. The general evaluations of the relative effectiveness of the bias indices remained the same, regardless of whether [beta] or its transform was used. In([beta]) was more highly correlated with other bias measures than was [beta] in all three experiments, as would be expected with the conversion to an interval scale. However, the proportions of variance in In([beta]) accounted for by signal probability and by payoff were consistently smaller than those for its parametric alternative, c. Furthermore, as with [beta], when In([beta]) was residualized to remove the effects of the four alternative indices, the remaining variance was unrelated to the signal probability manipulation in Experiment 1. Finally, In([beta]) continued to produce the greatest discrepancies between average and collapsed bias estimates in all experiments.
In summary, in the context of vigilance research, [beta] is an inadequate measure of bias, and a logarithmic transformation does little or nothing to improve its effectiveness.
Baddeley, A. D., & Colquhoun, W. P. (1969). Signal probability and vigilance: A reappraisal of the "signal-rate" effect. British Journal of Psychology, 60, 169-178.
Caldeira, J. D. (1980). Parametric assumptions of some "nonparametric" measures of sensory efficiency. Human Factors, 22, 119-120.
Colquhoun, W. P., & Baddeley, A. D. (1964). Role of pretest expectancy in vigilance decrement. Journal of Experimental Psychology, 68, 156-160.
Colquhoun, W. P., & Baddeley, A. D. (1967). Influence of signal probability during pretraining on vigilance decrement. Journal of Experimental Psychology, 73, 153-155.
Craig, A. (1979). Nonparametric measures of sensory efficiency for sustained monitoring tasks. Human Factors, 21, 69-78. Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarity between profiles. Psychological Bulletin, 50, 456-473.
Davenport, W. G. (1968). Auditory vigilance: The effects of costs and values on signals. Australian Journal of Psychology, 20, 213-218.
Davenport, W. G. (1969). Vibrotactile vigilance: The effects of costs and values on signals. Perception and Psychophysics, 5, 25-28.
Davies, D. R., & Parasuraman, R. (1982). The psychology of vigilance. London: Academic.
Donaldson, W. (1992). Measuring recognition memory. Journal of Experimental Psychology: General, 121, 275-277.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. Grier, J. B. (1971). Nonparametric indexes for sensitivity and bias: Computing formulas. Psychological Bulletin, 75, 424-429.
Hodos, W. (1970). A nonparametric index of response bias for use in detection and recognition experiments. Psychological Bulletin, 74, 351-354.
Ingham, J. G. (1970). Individual differences in signal detection. Acta Psychologica, 34, 39-50.
Jerison, H. J. (1967). Signal detection theory in the analysis of human vigilance. Human Factors, 9, 285-288.
Jerison, H. J., Pickett, R. M., & Stenson, H. H. (1965). The elicited observing rate and decision processes in vigilance. Human Factors, 7, 107-128.
Levine, J. M. (1966). The effects of values and costs on the detection and identification of signals in auditory vigilance. Human Factors, 8, 525-537.
Long, G. M., & Waag, W. L. (1981). Limitations on the practical applicability of d' and [beta] measures. Human Factors, 23, 285-290.
Mackworth, N. H. (1948). The breakdown of vigilance during prolonged visual search. Quarterly Journal of Experimental Psychology, 1, 6-21.
Mackworth, N. H. (1961). Researches on the measurement of human performance. In H. W. Sinaiko (Ed.), Selected papers on human factors in the design and use of control systems (pp. 174-331). New York: Dover. (Reprinted from Medical Research Council Special Report Series 268, 1950, London: His Majesty's Stationery Office)
Macmillan, N. A., & Creelman. C. D. (1990). Response bias: Characteristics of detection theory, threshold theory, and "nonparametric" indexes. Psychological Bulletin, 107, 401-413.
Macmillan, N. A., & Creelman, C. D. (1991). Detection theory: A user's guide. Cambridge: Cambridge University Press.
Macmillan, N. A., & Kaplan, H. L. (1985). Detection theory analysis of group data: Estimating sensitivity from average hit and false-alarm rates. Psychological Bulletin, 98, 185-199.
Matthews, G., Davies, D. R., & Holley, P. J. (1993). Cognitive predictors of vigilance. Human Factors, 35, 3-24.
Parasuraman, R. (1979). Memory load and event rate control sensitivity decrements in sustained attention. Science, 205, 924-927.
Parasuraman, R., & Davies, D. R. (1977). A taxonomic analysis of vigilance performance. In R. R. Mackie (Ed.), Vigilance: Theory, operational performance, and physiological correlates (pp. 559-574). New York: Plenum.
Parasuraman, R., Warm, J. S., & Dember, W. N. (1987). Vigilance: Taxonomy and utility. In L. S. Mark, J. S. Warm, & R. L. Huston (Eds.), Ergonomics and human factors: Recent research (pp. 11-23). New York: Springer-Verlag.
Pollack, I., & Norman, D. A. (1964). A nonparametric analysis of recognition experiments. Psychonomic Science, 1, 125-126.
Richardson, J. T. E. (1972). Nonparametric indexes of sensitivity and response bias. Psychological Bulletin, 78, 429-432.
See, J. E.. Howe, S. R., Warm, J. S., & Dember, W. N. (1995). Meta-analysis of the sensitivity decrement in vigilance. Psychological' Bulletin, 117, 230-249.
Simpson, A. J., & Fitter, M. J. (1973). What is the best index of detectability? Psychological Bulletin, 80, 481-488.
Snodgrass, J. G., & Corwin, J. (1988). Pragmatics of measuring recognition memory: Applications to dementia and amnesia. Journal of Experimental Psychology: General, 117, 34-50.
Swets, J. A., & Kristofferson, A. B. (1970). Attention. Annual Review of Psychology, 21, 339-366.
Warm, J. S. (1984). An introduction to vigilance. In J. S. Warm (Ed.), Sustained attention in human performance (pp. 1-14). Chichester, England: Wiley.
Warm, J. S., & Jerison, H. (1984). The psychophysics of vigilance. In J. S. Warm (Ed.), Sustained attention in human performance (pp. 15-60). Chichester, England: Wiley.
Williges, R. C. (1969). Within-session criterion changes compared to an ideal observer criterion in a visual monitoring task. Journal of Experimental Psychology, 81, 61-66.
Williges, R. C. (1971). The role of payoffs and signal ratios in criterion changes during a monitoring task. Human Factors, 13, 261-267.
Williges, R. C. (1973). Manipulating response criterion in visual monitoring. Human Factors, 15, 179-185.
Judi E. See received her Ph.D. in experimental psychology from the University of Cincinnati in 1994. She is a human factors engineer at Logicon Technical Services, Inc., in Dayton, Ohio.
Joel S. Warm received his Ph.D. in experimental psychology in 1966 from the University of Alabama. He is a full professor in the Department of Psychology at the University of Cincinnati.
William N. Dember received his Ph.D. in experimental psychology from the University of Michigan in 1955 and is a full professor in the Psychology Department at the University of Cincinnati.
Steven R. Howe received his Ph.D. in psychology from the University of Cincinnati in 1980. He is an associate professor in the Psychology Department at the University of Cincinnati.
Date received: May 16, 1995
Date accepted: June 4, 1996
JUDI E. SEE, (1) Logicon Technical Services, Inc., Dayton, Ohio, and JOEL S. WARM, WILLIAM N. DEMBER, and STEVEN R. HOWE, University of Cincinnati, Cincinnati, Ohio
(1) Requests for reprints should be sent to Judi E. See, Logicon Technical Services, Inc., P.O. Box 317258, Dayton, OH 45437-7258.
Caption: Figure 1. Binormal ROC plots from (a) Experiment 1 and (b) Experiment 2 (C = conservative; N = neutral; L = liberal).
Caption: Figure 2. Mean values of c for each level of signal probability as a function of periods of watch.
Caption: Figure 3. Mean values of [beta], c, B", [B.sub.'H], and [B.sub."D] during the vigil for the three payoff matrices in each condition of signal salience. (Error bars represent the standard error of the mean.)
Caption: Figure 4. Mean values of [beta] and c during the test phase for the relative high-to-low (RH-L) and relative low-to-high (RL-H) groups as a function of periods of watch. (Error bars represent the standard error of the mean.)
Caption: Figure 5. Scatter diagrams for [beta] (top), c (middle), and the nonparametric indices (bottom).
TABLE 1 Partial Correlations for Indices of Sensitivity (d', A') and Bias ([beta], c, B", [B'.sub.H], [B".sub.D]) in Experiments 1, 2, and 3 A [beta] c B" d' 0.94 ** 0.31 * -0.36 * 0.03 0.80 ** 0.26 * -0.07 0.22 0.97 ** 0.10 -0.23 -0.20 A' 0.19 -0.40 ** 0.09 -0.05 -0.21 0.05 0.01 -0.27 -0.24 [beta] 0.23 0.29 * 0.39 ** 0.40 ** 0.83 ** 0.90 ** c 0.73 ** 0.75 ** 0.94 ** B" [B'.sub.H] [B'.sub.H] [B".sub.D] d' -0.04 -0.20 0.20 0.03 -0.25 -0.27 A' 0.05 -0.14 0.09 -0.06 -0.28 -0.30 [beta] 0.13 0.04 0.28 * 0.19 0.85 ** 0.78 ** c 0.71 ** 0.80 ** 0.75 ** 0.86 ** 0.95 ** 0.99 ** B" 0.96 ** 0.87 ** 0.97 ** 0.81 ** 0.99 ** 0.94 ** [B'.sub.H] 0.95 ** 0.88 ** 0.97 ** Note: df - 45 (Experiment 1, top); df= 68 (Experiment 2, middle); df = 21 (Experiment 3, bottom); *p < .05, "p < .01. TABLE 2 Proportion of Variance in Response Bias Indices ([[omega].sup.2]) Accounted for by Salience, Payoff, Salience x Payoff, and Periods of Watch in Experiment 2 Bias Index Effect [beta] c B" [B'.sub.H] [B".sub.D] Salience 0.10 0.00 0.15 0.11 0.00 Payoff 0.09 0.27 0.12 0.09 0.16 Salience x 0.08 0.00 0.03 0.02 0.00 Payoff Periods of 0.00 0.03 0.00 0.00 0.01 Watch TABLE 3 D Scores Representing the Similarity between Average and Collapsed Bias Values for [beta], c, B", [B'.sub.H], and [B".sub.D] in Experiments 1, 2, and 3 Experiment [beta] c B" [B'.sub.H] [B".sub.D] 1 4.36 0.10 0.12 0.17 0.12 2 2.95 0.12 0.10 0.18 0.17 3 2.69 0.07 0.08 0.16 0.15 Mean 3.33 0.10 0.10 0.17 0.15
|Printer friendly Cite/link Email Feedback|
|Author:||See, Judi E.; Warm, Joel S.; Dember, William N.; Howe, Steven R.|
|Date:||Mar 1, 1997|
|Previous Article:||Augmented, pulsating tactile feedback facilitates simulator training of clinical breast examinations.|
|Next Article:||Combining time and intensity effects in assessing operator information-processing load.|