Printer Friendly

The Dimensional Model of Driver Demand: Visual-manual tasks.

ABSTRACT

Many metrics have been used in an attempt to predict the effects of secondary tasks on driving behavior. Such metrics often give rise to seemingly paradoxical results, with one metric suggesting increased demand and another metric suggesting decreased demand for the same task. For example, for some tasks, drivers maintain their lane well yet detect events relatively poorly. For other tasks, drivers maintain their lane relatively poorly yet detect events relatively well. These seeming paradoxes are not time-accuracy trade-offs or experimental artifacts, because for other tasks, drivers do both well. The paradoxes are resolved if driver demand is modeled in two orthogonal dimensions rather than a single "driver workload" dimension. Principal components analysis (PCA) was applied to the published data from four simulator, track, and open road studies of visual-manual secondary task effects on driving. PCA reduced the task metrics to two underlying orthogonal components (hereafter, dimensions) which were consistent across studies, herein designated as physical and cognitive demand. Physical demand is associated with lateral and longitudinal driver performance (lane crossings, standard deviation of lateral position and speed), with correlated surrogate metrics of task time, step count, total glance time, number of glances, and subjective workload. Cognitive demand is associated with event detection (RT and miss rate), with correlated surrogate metrics of mean single glance time, long single glances, speed reduction, and task errors. The Dimensional Model of Driver Demand allows for a common simplified understanding of all these measures of visual-manual secondary task effects on driver performance.

CITATION: Young, R., Seaman, S., and Hsieh, L., "The Dimensional Model of Driver Demand: Visual-Manual Tasks," SAE Int. J. Trans. Safety 4(1):2016, doi:10.4271/2016-01-1423.

INTRODUCTION

Drivers must manage multiple primary driving subtasks concurrently. These include controlling the vehicle's movement, path and speed via steering, accelerating, and braking; making decisions about speed and lane maneuvers; monitoring and operating and vehicle systems; maintaining awareness of the road and traffic situation; detecting and responding to roadway events (such as hazards, traffic lights, brake lights on a forward vehicle, etc.). Drivers also manage various device-related secondary tasks that they may want or need to perform in the vehicle while on the way to a destination, such as radio tuning or song selection, dialing a phone number, entering a destination, etc. To minimize any possible negative effects of device-related secondary tasks on the primary driving subtasks, it is necessary to have a means of determining how, in what ways, and by how much, some secondary tasks may have the potential to interfere with primary driving subtasks. There are non-device-related secondary tasks as well, such as eating, drinking, grooming, interacting with passengers and/or pets, reading signs along the roadway, gawking at stopped vehicles, etc. However, the scope of the current paper is restricted to primary driving subtasks, and the effect of device-related secondary visual-manual tasks on those subtasks.

In psychology, the classical one-dimensional (1-D) attention model [1,2] is that there is an all-or-none, single-channel bottleneck which can deal with data from only one signal at a time, so that data from a signal arriving during the response time (RT) to a previous signal has to wait until the decision mechanism becomes free. Thus task A must be stopped to perform task B, and task A cannot be continued until task B stops. Other psychology researchers [3,4] extended this classical 1-D model by allowing for a sharing of the assumed single attentional resource rather than all-or-none within the single channel. This extended 1-D model suggests that a task subtracts "attention" from this pool, leaving less "attention" for concurrent tasks, thus task B produces detrimental effects on task A, but not a complete cessation of A. Some form of this extended 1-D model appears to be implicitly held by many in the field of driver attention and performance.

However, other experiments indicated some degree of independent processing of signals. Wickens [5] developed the "Multiple Resource" model, which has four "dimensions" with independent processing across the dimensions, and across the subcategories for each dimension (for review see [6]). The four dimensions and their subcategories were: "1. Codes" (spatial vs. verbal), "2. Modalities" (visual vs. auditory), "3. Stages" (perception, cognition, responding), and "4. Responses" (spatial-manual vs. vocal-verbal).

Later experiments, however, found cross-modality effects that were inconsistent with independent processing for the visual and auditory subcategories within the modality dimension [7]. For example, auditory and visual stimuli (as well as tactile) produce a similar pattern of interference with response time in dual-task situations [8] [9, Figure E.5].Such cross-modality interference does not necessarily, however, support a 1-D central attentional resource model. Modern cognitive neuroscience has identified three central attentional resources in the brain that must be considered for a complete understanding of driver attention.

Three Attentional Networks

There is now considerable evidence that any attentional model that involves only one central attentional resource is incomplete. In particular, neuroimaging procedures have identified three brain networks that underlie different types of attention (for reviews see [10,11,12,13]. These attentional networks include:

* Orienting Attention, which selects information from sensory input;

* Executive Attention, which involves mechanisms for monitoring and resolving conflict among thoughts, feelings and behavioral responses; and

* Alerting Attention, which achieves and maintains a state of high sensitivity to incoming stimuli.

The function and processing efficiency of the three networks have been assessed with the Attention Network Test (ANT), a flanker-type task in which differences in response time (RT) between various test conditions are used to evaluate each network [14]. A The results of testing with the ANT and neuroimaging procedures indicate that the three networks engage separate brain mechanisms and are functionally independent. However, there are some interactions among them in real life and in certain tasks, and a complete and coherent understanding of the interactions between them is still an active area of research [15,16].

Application of Three Attentional Networks to Driving

Applications of measurements of the three attentional networks to driving is described by [17,18]. Greater understanding of the three attentional processes as applied to driver performance has been further investigated by [19,20,21,22,23].

Our own studies with brain imaging have shown a direct relationship of a specific area in the orienting attention network to event detection and response during simulated driving [24,25]. We measured event detection and response using the Detection-Response Task (DRT) [9], which was recommended in 2015 for publication as an ISO standard. Using the DRT in a simulated driving environment with magnetoencephalography (MEG), our team found that the response time to a red light (with or without a concurrent speech task) is predicted by the level of activation in the right superior parietal lobe [24]. This area is part of the orienting attention network (not the executive attention network, which is commonly assumed to be the only underlying attentional mechanism for secondary task effects on driving). However, the orienting attention network including the right superior parietal lobe can be modulated by top-down influence from the executive attention network in the midline frontal areas, as shown in an functional magnetic resonance imaging (fMRI) study during simulated driving while doing secondary tasks with a DRT [25]. In less technical terms, direct brain imaging during driving-like scenarios shows that the response times to external events are determined first and foremost by bottom-up stimuli to the part of the orienting attention network whose level of activation almost perfectly predicts response times, but orienting attention can itself be modulated by top-down influences from executive attention networks.

It is intuitively obvious that alerting attention, which is involved in the sleep-wake cycle, "trumps" both orienting attention and executive attention, because these attentional networks cease to process external stimuli when someone falls asleep. Indeed, if the driver has observable drowsiness (such as from falling asleep, head nodding, closed eyelids, or the signs of an observable "behavioral" microsleep [26], the odds of a crash increase to 63 to 1 relative to a matched baseline [27], a far higher odds ratio than any secondary task ever recorded in a naturalistic driving study [97]. The true crash odds ratio for drowsy driving may be even higher than 63 to 1 because "EEG" microsleeps [26], unlike "behavioral" microsleeps, are not outwardly observable, and so would enter into the "not drowsy" category in an odds ratio calculation in the usual naturalistic driving study face video analysis methods, biasing the drowsiness OR downwards. In short, the valid OR for drowsy driving may be even higher than 63 to 1 because it is likely that EEG microsleep episodes were not correctly assigned to the drowsy driving category. Indeed, EEG microsleeps may be what have sometimes been attributed to the attentional effect of cognitive demand in causing crash (e.g., looked-but-did-not-see), when many such crashes may instead be caused by decreased alerting attention, an entirely different attentional effect than cognitive demand. In fact, there is evidence from naturalistic driving studies that cognitive demand from some secondary tasks such as cell phone conversation may reduce crashes, likely by reducing drowsiness somewhat (i.e., improving alerting attention) [27].

In the current paper, two previous publications on the Dimensional Model for visual-manual tasks are briefly reviewed, and the data are represented in a common framework using similar metric names where they overlap. Two additional studies are then analyzed and the results examine to determine if the Dimensional Model can be validated with additional datasets. The companion paper [28] extends the Dimensional Model of Driver Demand to auditory-vocal and mixed-mode tasks (i.e., with visual-manual-auditory-vocal modalities in the same task).

GENERAL METHODS

Primary and secondary task performances were measured and the data were analyzed using Principal Component Analysis (PCA). PCA was invented in 1901 by Karl Pearson [29], and it has been widely used in many scientific fields [30]. Its mathematical properties with application examples are available in many textbooks [31,32] or on websites [33]; space does not permit the technical details to be presented here. Simply, PCA reduces the dimensionality of the data set from that of the original number of variables to a few major components; in the current study, all the driver demand metrics were reducible to only two components with little loss in explanatory power in the four examples analyzed.

PCA as employed here operates on the correlation matrix between the metrics. A correlation uses standardized metrics (i.e., each metric is converted to a mean of zero by subtracting the original mean, and normalized to a standard deviation of one by dividing by its standard deviation). The PCA operates on the entire matrix of correlations between all metrics. At times in this report, however, individual correlations between just two metrics are calculated and discussed to clarify the meaning or understanding of the PCA results. The correlation coefficient by itself indicates the strength of the relationship between the two variables. However, the high correlations between many of the metrics in this study is not always appreciated from just the high correlations coefficients. Therefore, the probability values that estimate the statistical significance of the correlation coefficients are given to full precision in this paper using engineering format. The usual publication practice of just using an asterisk to indicate "p < .001" for all statistically significant correlations with a p value less than that does not do justice to the strength of the correlations between the metrics observed in these examples.

The initial stage of PCA finds the "loadings" or weights placed onto each of the metrics that make up each component. Coefficients can then be derived from those loadings using a mathematical process, and the individual tasks can then be scored using those coefficients. For example, if a task originally has 100 metrics measured on it that are highly inter-correlated, then PCA analysis can reduce those 100 metrics to say only 2 scores (these scores are effectively new metrics that are an optimum weighted composite of the original 100 metrics). The original metrics can be reproduced to a high degree of accuracy by just using the 2 scores. That is, the two scores contain almost all the information that was in the original 100 metrics.

The scores of the individual tasks on the components are guaranteed by the PCA to be orthogonal (i.e., to have a correlation of exactly 0). This result guarantees that the two sets of task scores on the two components are accounting for different sources of variance in the original dataset of all the metrics. PCA thus gives rise to a unique solution for the task scores (there is only a single solution which guarantees orthogonality in the task scores between every component with every other). Factor analysis rotation methods can be used to rotate the components after the PCA is completed, to align the final scores slightly better with original individual metric data, or in an attempt to understand better the underlying factors. But rotation will cause the component scores for the tasks to become correlated to varying degrees, depending upon the amount of rotation, so the underlying dimensions after rotation are then more properly referred to as factors. To keep it simple here, no rotation of the components was done, except in special circumstances where indicated, with the results placed into appendices.

Note that an intrinsic mathematical property of PCA is that the number of tasks evaluated must be at least one more than the number of metrics evaluated, because otherwise the covariance matrix is singular and no unique PCA solution is possible. In other words, the number of metrics that can be evaluated by a PCA is limited to one less than the number of tasks on which data was collected in the study. It is thus beneficial to have many tasks if possible, because more metrics can then be evaluated at one time. In the current paper, in the four examples given, the number of tasks ranged from 78 (Example 1) to only 5 (Example 4).

The PCA in the statistical package Minitab[R] 17.2.1 [34] was mainly used for the analyses, with results checked in Stata 13.1 [35]. Note that occasionally one statistical package may present the component loadings on the negative of an axis compared to the axis loadings in another statistical package (a "sign" flip). This is a well-known property of PCA. A sign flip of an axis in no way affects the orthogonality or the relative positions of the points in the 2-D space. The two solutions can be equated by simply taking the negative of the metric loadings on the axis that is flipped, and the negative of the task scores on that component. Such a sign flip is not a rotation as described above.

For the reader who may be familiar with factor analysis, but not PCA, it may be useful to distinguish the two. Although PCA can be used as the initial stage of a factor analysis, factor analysis attempts to "simplify" the PCA solution even further by rotating the components, or attempting other various linear transformations of the components, in an attempt to "simplify" the factors (i.e., to make each factor have a value of 1 for one metric, and all other metrics have a value of 0, and so forth, for all factors extracted). After such linear transformations, the vectors are no longer components, but factors. There is one and only one principal component solution, but there are an infinite number of factor solutions. The final factors do not necessarily have a simple relationship to the original metrics or components. The unique solution provided by PCA does not attempt to isolate one metric vs. another, but instead attempts to provide the best description of the original data with the least number of components. Therefore, the Kaiser-Meyer-Olkin measure of "factorial simplicity" [36] (available with the "estat" post-estimation command in Stata [35]) is appropriate for factor analysis, but not for PCA. Because PCA provides a unique solution that has a simple relationship to the original data, it is therefore used throughout the main body of this paper rather than factor analysis using rotated components, which are confined to Appendix FS.

This paper provides validation examples for the Dimensional Model of Driver Demand, with closed-road, track, open road, and simulator datasets described in four published studies. A number of glance metrics were evaluated, including glances off the road as well as glances to the device. In the examples provided here, glances to the in-vehicle system under test are presented, rather than glances off the road, for reasons discussed in Example 4 under "metrics."

Since the four studies whose data are analyzed here were completed and published before the recent publication of the SAE Standard J2944 [37] providing common definitions of driving performance measures and statistics, the correspondence of a particular metric in the four studies to the definitions in that Standard is not readily determinable. For example, J2944 (Section 10.2.1.1) provides for 10 different ways to define a lane departure, depending upon which part of the vehicle (front tire, any tire, bounding box of mirror, vehicle body, etc.) encounters which part of the lane marker (middle of lane boundary, inside edge of lane boundary, outside edge of lane boundary, etc.) Although the precise definition of a lane departure likely varied somewhat between the four studies, which could affect the absolute number of lane departures recorded, the PCA method operates on the standardized metrics, which tends to ameliorate any slight differences in definitions of a lane boundary. In fact, as shown, the solutions for the underlying dimensions are similar in the four examples, despite substantial differences in the methods, protocols, or even venues (simulator, track, or open road field study). Similarly, SAE J2396 [38] provides definitions for the glance-related metrics used in the four studies, but again it is difficult to determine after the fact to what extent there were differences, if any, in the definitions of the glance-related metrics in the four studies, which did not relate them to the SAE J2396 standard [38] at the time of the study. Despite these uncertainties, the results of the dimensional analysis of the four studies were consistent, indicating that the PCA method is robustly measuring the underlying demands on the driver, despite differences in the protocols of the studies.

VALIDATION EXAMPLE 1

Dataset 1: Young and Angell (2003)

Dataset 1 was from a closed-road test of 78 visual-manual infotainment and primary driving "anchor" tasks with 82 drivers in 5 vehicles [39] (see also [40]).

Task Set 1

All tasks in the entire dataset were analyzed, but they are grouped for ease of presentation as follows:

* Address Book (enter street, name, phone, etc.)

* Anchor (primary driving tasks such as adjust mirrors, vent, set cruise control, adjust HVAC, etc.)

* Car Phone (call from tag, call from list, 10-digit dial)

* DialCell (10-digit dial on portable cell phone)

* Entertainment (e.g., tune, change band, find CD track)

* Internet (get weather, get stock quote, etc.)

* Navigation (find school on map, enter destination)

Metrics 1

All metrics were tabulated over a single task completion while driving, after training to criterion while stationary, with the training-test cycle repeated for each task, one at a time. Fifteen performance metrics were collected:

* Task metrics: task completion time (TaskTime); percent unsuccessful task completions (Unsu%).

* Vehicle metrics: number of times tire touched or crossed lane marker during task (LaneX); number of times more than 5 mph above or below instructed speed of 40 mph (SpeedX). (Vehicle metrics are often termed "driving performance" metrics (e.g., Alliance Principle 2.1B [47, p. 40]).

* Glance metrics to displays or controls of device under test: total glance time (TGT); number of glances to the in-vehicle system (Glances); mean single glance time (MSGD).

* Driver metrics: subjective workload (Workload); subjective situation unawareness, (the complement of situation awareness) (Sit_UnAw).

* Event detection and response metrics: percent of total visual events missed (Miss%); percent of front visual events missed, where front event is activation of LED light on hood of vehicle (Miss%_front); percent of side visual events missed, where side event is activation of an LED light above the left outside mirror (Miss%_side); response time to all visual events (RT); response time to front visual events (RT_front); response time to side visual events (RT_side).

Results 1

An analysis of the correlation matrix [41, Table 2] shows that there were high correlations within two main subsets of the metrics. Hence, all 15 metrics reduced to only 2 using PCA. Dimension 1 accounted for 61% of the total variance in the dataset, dimension 2 accounted for 17%. As such, these 2 dimensions can reproduce 78% of the variance on all 15 original driver performance metrics. That is, the driver performance results for every task for the 15 metrics can be reproduced with a high degree of accuracy with only 2 scores for that task. (Note: We here use the term "driver performance" to indicate the inclusion of event detection and response metrics with the lane and speed maintenance metrics usually referred as "driving performance" metrics.)

Following PCA, the 2-D metric loadings on the two main components are plotted in Figure 1.

Cluster analysis [35] on the original mean data for all 15 metrics (the same data used for the PCA) was used to independently solve for the metric clusters, shown in Figure 2.

The cluster analysis in Figure 2 confirms the visual impression from Figure 1 that there are two main clusters of metrics.

Physical Demand Metrics 1

The blue lines in the left half of Figure 2 show that the set of metrics in the lower right quadrant of Figure 1 (blue circles) forms one cluster. These metrics load positively on Dimension 1 and negatively on Dimension 2. This cluster is composed of all the task, vehicle, and driver metrics assessed, as well as the number of glances (Glances), total glance time (TGT), and percent unsuccessful task completions (Unsu%). These metrics are all tightly correlated with each other (see [41, Table 2] for correlation matrix), and therefore are highly redundant. The vehicle metrics concerning lane and speed maintenance ("Driving Performance"), when measured during actual rather than simulated driving, are placed into the category of "ground truth" metrics in [42].

However, the Task, Glance, and Driver metrics that load into Dimension 1 are equally valid "surrogate" metrics for these "ground-truth" Vehicle metrics concerning lane and speed maintenance, because they are highly correlated with them, and are therefore equally suitable for representing Dimension 1. The PCA method provides the optimal combination of the information in all these metrics to form Dimension 1. For shorthand, we here label Dimension 1 as a physical demand dimension (even though it also contains the subjective metric of a driver's perceived workload of a secondary task), because it has metrics that can be measured with direct physical observation (counts of lane and speed deviations, number of glances, task time, etc.). However, the Dimensional Model results indicate that it is equally valid to regard Dimension 1 as a "driving performance" dimension, and treat the non-vehicle metrics loading on this dimension as "surrogate" metrics for the "ground truth" driving performance measures of lane and speed maintenance.

Cognitive Demand Metrics 1

The cluster analysis in Figure 2 (right) also identifies a second main cluster of metrics. These metrics form the cluster in the upper left quadrant of Figure 1 (red squares). This cluster also has tightly-correlated metrics that are redundant with each other (but not with metrics loading onto Dimension 1). These are mainly event detection and response metrics, but it is of interest that mean single glance duration (MSGD) clusters with these metrics, and not the Dimension 1 metrics. Thus, the PCA results (Figure 1) and the cluster analysis results (Figure 2) independently indicate that MSGD is more closely associated with event detection and response than the metrics in cluster 1, despite the fact that the TGT metric in cluster 1 is the product of MSGD and number of glances. That is, this result indicates that MSGD adds no information to TGT beyond that in the number of glances. In other words, MSGD provides independent information about the second dimension of driver demand that is not predictable by the total glance time (TGT) or the number of glances to the device under test (Glances). This result may seem strange to some (that multiplying two variables could give rise to a third variable that uncorrelated with one of the original variables), but a multiplication of two variables is a non-linear transformation. In short, this analysis suggests that any effect on driver performance of Total Eyes-Off-Road Time (TEORT) or TGT to the device arises from the number of glances and not the duration of those glances. The PCA of the data in this example indicated that MSGD, and by implication the proportion of long glances LGP, have a separate effect on event detection performance not determinable from other glance metrics (number of glances, TGT, and TEORT). Because most of the Dimension 2 metrics are related to the detection and response to events, we hence here label this dimension a "cognitive" demand dimension for short-hand, but it is equally valid to label this dimension as an "event detection and response" dimension.

The PCA and cluster analysis thus confirm in independent analyses that each metric cluster forms a separate dimension for this set of tasks under the experimental conditions of this track study [39,41]. In short, an objective grouping of the metrics using cluster analysis on the original data (Figure 2), closely corresponds to the subjective visual impression of the metric groups in the PCA loading space in the Dimensional Model (Figure 1).

The results show that two dimensions capture a high percentage of the total variance in all 15 metrics assessed. The third dimension represents a peripheral/central vision effect for event detection, and is minor, accounting for only 5% of the remaining variance [41], and so is not considered further here. The remaining components constitute so such a small percentage of the little variance individually that are almost certainly statistical noise in the data.

Task Scores 1

The tasks can be scored on these two main components by forming the sum of the cross-product of the 15 metrics for each task with the loading coefficients for the component (for more information on this aspect of PCA, see any multivariate statistical book, for example [31,32]). The task scores are plotted in Figure 3.

Tasks Scoring High on Physical Demand

Figure 3 shows that the Address Book and Navigation Tasks (blue circles and blue inverted triangles in the lower right quadrant) have relatively high S1 scores, reflecting the higher physical demand of those tasks (longer task times, more glances, more button presses, etc.). The vehicle metrics (LaneX, SpeedX) are also larger for those tasks. However, this effect may arise because of the confounding factor of TaskTime. With a longer driving time, the lane position varies more, even for baseline driving with no task [43, Figure 166] (a similar confound also happens for the standard deviation of speed). In other words, lateral and longitudinal vehicle variation metrics (e.g., standard deviation of lane position, standard deviation of speed) do not meet the statistical requirement that the samples must be drawn from a time-invariant population. (A similar result holds for the counts of the lane deviations LaneX and speed deviations SpeedX, but not speed itself which tends to load negatively on the cognitive demand dimension because of driver self-regulation [67].)

This confounding with driving time does not mean that the driving performance variability metrics are invalid as metrics for indicating driving safety. It does mean that their properties are not the same as statistical variability measures (whose values do not systematically increase with the number of samples). In other words, the commonly used driving performance variability metrics are inherently confounded by the duration of the temporal window in which they are evaluated. For example, the standard deviation when applied to samples drawn from a fixed population does not systematically increase with the number of samples or the time window over which it is measured, it just produces a more accurate estimate of the true population standard deviation. The fact that the SDLP tends to increase with time (and the number of samples), whether "just driving" with no task [43, Figure 133], or with the same secondary task performed repeatedly, indicates that it and other driving performance measures that are correlated with drive time do not meet the definition of a stationary process in statistics [44], and so they are not valid estimates of a population variance.

This confounding does not mean that driving performance metrics are not useful or valid for addressing "distraction" issues. They are equally valid as any other Dimension 1 metric (TaskTime, Glances, TGT, subjective workload, etc.). If their values on a secondary task are compared to the values of a "just driving" baseline task, or some other benchmark task such as "radio tuning," it is necessary, however, that the time over which data is collected and the road geometry be the same, or there will be a confounding of driving performance variables by TaskTime. Artificially adjusting all tasks to be equal duration by repeating them over and over again for a fixed time interval is not a satisfactory solution to eliminate such confounding. This method effectively "impeaches" Dimension 1, which is the major underlying component of task demand for visual-manual tasks that was intended to be measured (see further discussion of this topic in Example 4). Artificially forcing all visual-manual tasks to have equal task time effectively equalizes all the Dimension 1 metrics, destroying task differentiation along the Dimension 1 axis, leaving only the differentiation along the Dimension 2 axis.

The fact that the driver performance metrics co-vary with task time (and number of glances, etc.) also does not imply that tasks should simply be made as short as possible. A short task can have poor or good event detection performance, depending upon the design of the task. For example, re-designing a task to be completed as short a time as possible by placing many buttons at a single level rather than in menus can increase mean single glance duration and reduce event detection performance. Conversely, a vehicle manufacturer who "chunks" a short but cognitively-demanding visual-manual task into manageable subtasks that are intentionally strung out over a longer task time (within limits), can arguably make it less cognitively demanding for the driver to perform it, with a plausible net benefit for driving safety.

For example, some tasks, such as fine adjustment of the radio treble (top symbol on y-axis in Figure 3), scored relatively high on the second dimension. These are relatively short tasks with relatively few glances, yet they have relatively high miss rates and long RTs. Conversely, the navigation and address book tasks (blue symbols) with high scores on the first component, generally have low scores on the second component. This is a paradox from a simple one-dimensional driver workload model, but there is no contradiction from the two-dimensional driver demand model presented here. Task scores in the two dimensions are orthogonal, so high scores on one dimension can be associated with either high or low scores on the other dimension; there is a correlation of zero between task scores on the two dimensions, as guaranteed by PCA.

One might also attempt to explain the apparent "paradox" of some short tasks having high cognitive demand as a simple tradeoff between concentrating on lane maintenance or on event detection, in a kind of speed-accuracy trade-off [45,46]. That is, participants attempt to maintain their lane with more accuracy, and so lose speed in their responses to the RT task. However, other tasks have both relatively good lane maintenance and relatively good event detection (tasks in the lower left quadrant of Figure 3 for example), so this paradox cannot be explained as a simple time-accuracy trade-off. It is also not a simple methodological artifact of shorter tasks, because again many shorter tasks have relatively good event detection as well as relatively good lane maintenance.

Task Scores on Cognitive Demand

Figure 3 shows that the task scores are distributed in a band from the lower left corner with "anchor" or primary driving tasks (red squares), up to certain "entertainment" tasks in the upper middle (two rightward pointing grey triangles at the top middle). It is obvious that adding the second dimension causes the tasks to be differentiated from each other much better with both dimensions than using only a single dimension. That is, the Dimensional Model offers improvements over the common practice of investigating single metrics one at a time in a vain attempt to segment the tasks with a single metric, or even a set of single metrics all chosen from a single dimension.

Simplification to Two Original Metrics

If the Dimensional Model is valid, then all the metrics within each set are redundant with each other. Hence, the PCA method gives rise to the "simplification" hypothesis; namely, only two metrics from the original dataset should be needed to characterize each task in terms of the two underlying driver demand dimensions, with little loss of information. Although all 15 original metrics have a statistically significant correlation with the task scores (S1) on the first component, TaskTime had the highest correlation (r = 0.955, df = 76, p = 5.5E-42), with the regression fit shown in Figure 4.

Similarly, RT had the highest correlation of all the metrics with the task scores S2 on the second component (r = 0.706, df = 76, p = 5.1E-13), with the regression fit in Figure 5. The dashed lines that the 95% prediction interval for S2 in Figure 5 is wider than for S1 in Figure 4, but the task scores on S2 in Figure 5 are still within or near the prediction interval.

Given these encouraging results, the simplification hypothesis was tested by using just TaskTime and RT to span the 2-D score space (Figure 6).

Visual comparison of Figures 3 and 6 shows similar relative positions of the tasks in the 2-D space, except for a slight counterclockwise rotation of the points in the original RT and TaskTime metrics in Figure 6 vs. Figure 3, which is composed of the scores based on all 15 metrics.

The original RT and TaskTime are weakly correlated [41, Table 2] (r = 0.281, df = 76, p = 0.012). It is preferable to have orthogonal estimates of the underlying task scores, because then the errors across the two metrics are independent, making the error ellipse axes aligned horizontally and vertically rather than tilted (see General Discussion later). Multiple regression analysis can solve for the estimated orthogonal principal component scores S1 and S2 (see Figure 3 for the original S1 and S2 scores) from the original RT and TaskTime means, as per Equations (1) and (2).

S1 = -4.763 + 0.13071 TaskTime + 0.7920 RT (1)

S2 = -3.271 - 0.05030 TaskTime + 1.6289 RT (2)

Using Equation 1, S1 is predicted with a slightly higher correlation by TaskTime and RT conjointly (r = 0.981, df = 76, p = 7.0E-56), than by TaskTime alone (r = 0.955, df = 76, p = 5.5E-42, Figure 4). Using Equation 2, S2 is predicted with a higher correlation by TaskTime and RT conjointly (r = 0.914, df = 76, p =1.7E-31) than by RT alone (r = 0.706, f = 76, p = 5.1E-13, Figure 5). Figure 7 plots the S1 and S2 scores as predicted by RT and TaskTime conjointly, using Equations 1 and 2.

By visual inspection, Figure 7 matches even better with Figure 3 (the original task scores solved for using 15 metrics) than did the original individual TaskTime and RT metrics (Figure 6). That is, RT and TaskTime conjointly can accurately predict the orthogonal task scores in Figure 3 that originally required all 15 metrics. The estimated S1 and S2 from RT and TaskTime conjointly are now virtually orthogonal (r = -0.06, df = 76, p = 0.62), even more so than the already low estimates from RT and TaskTime individually (r = 0.281, df = 76, p = 0.012).

Rotation

The points in the 2-D plots in Figures 6 and Figure 3 line up slightly better after Varimax rotation of the components, which is a standard orthogonal rotation that balances the variances explained by the two components (see detailed rotation results in Appendix F). Because orthogonal estimates of S1 and S2 can be successfully made from TaskTime and RT conjointly (Figure 7 and previous paragraph), rotation was not further pursued in this study.

Control for Correlation between TaskTime and S1, RT and S2 in Example 1

One may argue that the high predictive ability of the single metric TaskTime for S1 (based on 15 metrics) shown in Figure 4, arose simply from the fact that the single metric TaskTime was included in the calculation of the composite task score S1, so TaskTime is just correlating mainly with itself. The counter-hypothesis to this argument is that the other variables besides TaskTime in Metrics Cluster 1 are sufficiently redundant with TaskTime that a reduced Dimension 1 (with TaskTime left out of the PCA) should give rise to about the same task scores as if TaskTime had been included.

Likewise, other variables than RT in Metrics Cluster 2 should be sufficiently redundant with the RT metric that they give rise to about the same task scores for PCA, whether the three RT variables are included in the PCA or not. In short, the counter-hypothesis predicts that it should make little difference in the relative task score locations in the final 2-D score space, whether TaskTime and the three RT variables are included in the PCA or not. This counter-hypothesis was tested by dropping TaskTime and all 3 RT variables from the original dataset, and performing PCA on the reduced set of metrics.

The results given in Appendix G show that there is little effect whether TaskTime and the three RT metrics are included in the PCA or not, the predictions of RT and TaskTime are still good. This control confirms the redundancy of all those metrics loading within Dimension 1, and all those loading within Dimension 2.

Discussion 1

A Principal Components Analysis of a closed-road study of the effects on driving of 78 secondary tasks of varying types found that 15 driver demand metrics can be reduced to 2 principal components (dimensions), with little loss of explanatory power. This result led to a further simplification hypothesis that the task scores as found by PCA for the 15 metrics may be reasonably approximated with a reduced set of only 2 of the original metrics (instead of 15).

This further simplification hypothesis was tested using two non-glance metrics (TaskTime and RT). These non-glance metrics sufficed to span the two-dimensional space for the complete data set of 15 metrics, which included task, vehicle, glance, driver, and 6 detection and response metrics (3 RTs and 3 miss rates to center and side event lights and their combination). In short, the original dataset for all 15 metrics is reproducible to a high degree of accuracy from only two of the original metrics (TaskTime and RT). A control analysis (Appendix G) was conducted which eliminated TaskTime and the three RT metrics from the PCA. TaskTime and RT could still well predict the task scores in this reduced PCA 2-D score space. That means that TaskTime and RT alone can reproduce the data of the 13 other metrics to a high degree of accuracy, because of the enormous redundancy (i.e., collinearity) within the subsets of the metrics that load onto Dimension 1 and Dimension 2 respectively.

The Dimensional Model predicts that other pairs of metrics besides TaskTime and RT should be able to reasonably span this 2-D space as well (e.g., for example TGT and MSGD, as per the Alliance Guidelines Principle 2.1A [47]). However, the current choice of TaskTime and RT requires fewer resources to test and analyze than glance metrics, allowing more tasks to be tested with a given amount of resources, so these other options were not explored systematically for visual-manual tasks.

The metrics in some current test methods do not span both dimensions of driver demand for visual-manual tasks. For example, the Occlusion Goggles test (e.g. NHTSA Guidelines [48]) offers no additional information beyond that contained in TaskTime, a Dimension 1 metric. For the 13 visual-manual tasks in the CAMPDWM project [51,52], the correlation between static task time [52] and Total Shutter Open Time (TSOT) [52, Table Q-6] for the 13 visual-manual tasks was nearly perfect (r = 0.993, df = 11, p = 9.6E-41; see Appendix Table E1). TSOT thus provides no information beyond that of static task time, and therefore the Occlusion Goggle test provides no information about how tasks will score on the second dimension of driver demand for visual-manual tasks. In short, the simpler and easier measurement of static task time with just a stop watch will do all that the occlusion goggles test does. NHTSA erroneously rejected the use of task time as a metric (see General Discussion below), and adopted the occlusion goggles test as a final recommended test protocol, despite the fact that the occlusion goggles test is measuring the same underlying physical demand as TaskTime. Perhaps the occlusion goggles test was thought to have better "face validity" because the TSOT gives the appearance of being related to TGT and number of glances metrics, which may be associated to some degree with crash risk according to some studies. But according to the evidence presented here, none of these other Dimension 1 metrics provide any information beyond that contained in TaskTime, or even the number of manual steps in the task [73], which is highly correlated with TaskTime and all the other Dimension 1 metrics [84]. As pointed out, being able to use number of manual steps as an effective surrogate metric is of "great value for phases in product development where no operational prototype may yet exist" [42]. Its relevance to analytically modelling dual-task interactions during actual driving by participants trained on the task under assessment has been well proven by the Example 1 dataset [39, 40].

Likewise, the Alliance Principle 2.1B is based on the driving performance metrics of number of lane exceedances and standard deviation of headway as measured in a simulator. These are considered as closer to "ground truth" metrics [49] in actual driving and may arguably be more closely related to driving safety (as are driver event detection and response metrics) than "surrogate" measures, such as task time or eye glance metrics. These driving performance measures are unfortunately both highly correlated (i.e., confounded) with driving time, even when no task is performed at all (i.e., "just driving"). One way to avoid this confounding is to compare the secondary tasks to a "just driving" baseline, where the time of the baseline driving is the same as the task time of the secondary task. It can then be determined to what extent there is a change from baseline driving, without confounding by task time. It is also possible to statistically control for differences in task time by using partial correlation methods. Even if time matching or partial correlation methods are done, the driver performance metrics as in Alliance Principle 2.1B [47] will still capture only the Dimension 1 demand according to the Dimensional Model. The Model requires that driving performance metrics must be augmented with a Dimension 2 metric (such as an event detection metric) in order to assess both dimensions of driver demand.

As mentioned in the Results 1 section above, there is a trade-off between Dimension 1 and Dimension 2. Thus a long task can be made shorter to reduce its score on Dimension 1, but it has to be carefully designed so as not to increase the mean single glance duration (or long glance proportion), or performance on Dimension 2 will get worse (Event Detection performance will suffer). Likewise, a vehicle manufacturer may deliberately wish to increase the time of a task in order to break into more easily manageable "chunks", thereby reducing long glances and improving event detection performance on Dimension 2. The overall goal should be to achieve a "sweet spot" as many tasks do, with relatively low scores in the two dimensions, or with relatively lower scores than a benchmark task or tasks measured in the two dimensions.

VALIDATION EXAMPLE 2

Dataset 2: Young (2012)

Young [50] analyzed the driver performance data from 13 visual-manual tasks from the CAMP-DWM study (see [51,52] for task details) on a closed test track with 69 drivers, of which 42 (61%) had eye data scored. Data were also collected in a simulator and on public highways in addition to the track. The simulator eye data were not scored in the CAMP-DWM study because of budget constraints, and so they were not available for validation purposes (that is, they could not be used to determine if the task scores would be the same with and without eye movement metrics).

In addition, the open-road data had fewer tasks than on the track or lab (e.g., some tasks, such as the destination entry task on the 2001 model vehicle tested, was judged to be too unsafe to perform on a public highway with traffic, even in the CAMP-DWM "platoon" scenario [50, Figure 2]).

Therefore, only the track data were analyzed in Example 2, using PCA as previously presented [50]. The original mean data are given in Appendix B, Table B1.

Task Set 2

Complete descriptions of the visual-manual tasks whose data were analyzed in Example 2 are given in [52, Appendix B]. A description of the HVAC, RadioEasy, and RouteTrace tasks is given in [50, Appendix].

Metrics 2

The CAMP-DWM track study tabulated over 100 driver and vehicle performance metrics, the mean data for many of which were published in Appendix Q of the CAMP-DWM report [52]. Young [50, Section 2.2.1] selected 10 metrics from the full set that were the closest match to those in the first PCA study [41] of driver demand (see Example 1). (Because of redundancy, many other subsets of metrics from the CAMP-DWM study could have been chosen for the PCA with comparable results.) The selected metrics included objective vehicle measures (number of lane and speed deviations), as well as driver eye movements, task performance variables such as number of steps and successful task completion, as well as RT and miss rate to real events (forward vehicle brake or decelerate, rear vehicle turn signal as seen through rear-view mirrors). All metrics were tabulated only during a task. The glance metrics evaluated here were based on glances to the display or controls for the device under test, and not all glances off the road, which includes glances to locations other than the road or the device under test, such as mirrors, windows, instrument cluster, etc. The specific metrics selected for the validation test of the Dimensional Model were selected to be close to those in Example 1, with the limitation that only 12 metrics could be picked because there were only 13 visual-manual tasks (as stated previously, the number of metrics must be one less than the number of tasks):

* Task metrics: mean task completion time (TaskTime).

* Vehicle metrics: standard deviation of lane position (SDLP) - a continuous measure of lane keeping during the task; speed difference (SpeedX) - the difference of maximum and minimum speed during task.

* Glance metrics to displays or controls of device under test: total glance time (TGT); mean single glance time (MSGD).

* Event detection and response metrics: For the center high mount stop lamp (CHMSL) in the forward vehicle (which was illuminated without an accompanying deceleration) - percent misses (Miss%_CHMSL) and mean response time (RT_CHMSL); for the lead-vehicle deceleration (LVD) event (lead-vehicle "coast-down" deceleration without brake-light illumination) - percent misses (Miss%_LVD) and mean response time (RT_LVD). Note that these events were single surprise events, not the repeated DRT events employed in Validation Example 1.

Other subsets of the 100 or so CAMP-DWM metrics were not systematically explored, but they would be expected to give the similar result that all could be condensed into only two scores per task after PCA. Details of the data collection methods in the CAMPDWM study are given in [51,52], and details of the PCA methods used here are given in [50].

Results 2

Principal Component Analysis (PCA) of the CAMP-DWM track data found that the 10 metrics analyzed reduced to two major components (dimensions) (Figure 8).

Only two components (as in Example 1) captured 78% of the total variance in the original 10 metrics - 53% of the variance on Dimension 1, and 25% on Dimension 2 (Figure 8) [50].That is, all the data for all 10 metrics for all the tasks tested can be reproduced with only 2 metrics per task instead of 10, with little loss of explanatory power.

The metrics fell into two major metrics clusters according to an independent cluster analysis of the original data (Figure 9). Metrics falling into Cluster 1 were the Task and Vehicle metrics (left half of Figure 14; blue circles in lower right quadrant of Figure 8). Metrics falling to Cluster 2 were the event detection and response metrics and MSGD (right half of Figure 9; red squares in upper left quadrant of Figure 8). This clustering pattern is consistent with the results for Validation Example 1 (Figure 1).

These consistencies in the Dimensional Model validation results in Examples 1 and 2, were despite the substantial differences in the testing procedures in Examples 1 and 2, as follows.

1. Fewer number of tasks. The CAMP-DWM study in Example 2 had only 13 visual-manual tasks, whereas Example 1 had 78 visual-manual tasks. A larger number of tasks tested in a given venue reduces error in estimating the correlation coefficients and consequently increases the accuracy of the principal component loadings.

2. Fewer glance measurements. The CAMP-DWM study measured the glance metrics on only 61% of participants in the CAMPDWM study, due to budget constraints, which increased the uncertainty of the glance metrics (TGT, Glances, and MSGD) in Example 2 relative to the non-glance metrics. Example 1 had all eye glances analyzed.

3. Surprise event vs. repeated events. The four event response metrics in the CAMP-DWM study in Example 2 (Miss%_ CHMSL, Miss_%LVD, RT_CHMSL, and RT_LVD) were to a single "surprise" event per task trial, either a CHMSL event or lead vehicle deceleration event. Example 1 had many repeated detection-response events per task trial. Hence, Example 2 had larger uncertainty in its detection and response values than the detection-response metrics in Example 1 because of the fewer trials. Also, there is the question whether the data from the repeated events in Example 1 can serve as a surrogate for the single surprise events (many doubt this based on their subjective intuition). This question can be answered empirically by determining if the pattern of responses across tasks to the two types of event is the same or not.

4. Events in different locations. Example 2 had vehicle events (CHMSL activation or lead vehicle deceleration) occur in the forward direction only. (The CAMP-DWM study also used side events in the detection of a turn signal in the rear vehicle in the platoon, but the data were noted by the CAMP-DWM investigators as being noisy and so were not analyzed here.) Example 1 used surrogate events randomly presented in the forward direction (DRT light on hood) or side direction (DRT light on top of left outside mirror).

5. Lead vehicle vs. no lead vehicle. Example 2 required the participant driver to follow and keep a constant distance and speed to a lead vehicle. Example 1 had only a single vehicle on a closed highway, with instructions to maintain lane position, and control speed at 40 mph (plus or minus 5 mph).

Despite these four substantial methodological differences, the two studies gave comparable results, showing the robustness of the Dimensional Model to major protocol differences. Example 2 may still be considered by some as having more "face validity" for representing an event that has safety relevance in "naturalistic" driving. Nonetheless, the results clearly show that relatively surprising events and repeated events have the same underlying event detection response dimension (which is plausible, since the same orienting attention network in the brain is responsible for determining the detection and response to events, whether expected or not). Therefore, the two major components (dimensions) in both studies were sufficiently similar in the clustering of their metrics that the dimensions are herein labelled as "physical demand" and "cognitive demand," in both studies, for reasons further explained in Discussion 2 below.

Figure 10 plots the task scores on the two dimensions shown in Figure 8.

Note in Figure 10 that three tasks (Radio, Read, and MAP), had two "difficulty" levels each, labels which had been created by the CAMPDWM investigators. The methods used by the CAMP-DWM investigators to rate these tasks as "easy" and "hard" were been derived from an analysis that is associated with what we have here termed physical demand (or Dimension 1). Indeed, the three tasks with two difficulty levels each had task scores which were higher on Dimension 1 for the "Hard" type vs. the "Easy" type (i.e., all 3 arrows point to the right, in the direction of higher S1 scores in Figure 15 for Radio, Map, and Read). However, because of the orthogonality between Dimensions 1 and 2, a "Hard" task can be either higher or lower in its Dimension 2 score, regardless of its Dimension 1 score. For example, RadioEasy and ReadEasy each had a higher S2 score than S1 score (the arrow from RadioEasy to RadioHard points downwards in Figure 10). Yet MapHard had a higher S2 score than MapEasy (the arrow from MapEasy to MapHard points upwards in Figure 10). These results shows the independence of the two dimensions. That is, whether a task was deemed "Hard" or "Easy" by the CAMP-DWM investigators appears to have been based on the single dimension that has classically been called "driver workload," which corresponds to dimension 1 in the Dimensional Model of the data. However, a task that is relatively easy on Dimension 1, can have either relatively higher cognitive demand (e.g. MapEasy vs. MapHard) or relatively lower cognitive demand (i.e., RadioEasy vs. RadioHard or ReadEasy vs. ReadHard), because the two dimensions are orthogonal according to the Dimensional Model.

A cluster analysis of the 13 visual-manual tasks in the original mean data in Appendix Table B1 was done. This analysis is independent of the PCA that gave rise to the task scores in Figure 10. Figure 11 shows the three major task groups that were identified by this objective clustering method.

Figure 11 shows that DestEntry is by itself in Cluster 1 (blue rightmost line in Figure 11). Figure 10 shows that DestEntry had the highest S1 score of the 13 tasks tested (blue circle to right of Figure 10), but an average S2 score.

Figure 11 identifies the Cluster 2 tasks by the red center lines. Figure 10 shows that the corresponding PCA scores (red squares in upper left quadrant) were relatively high on S2, and relatively low on S1, compared to the other tasks.

Figure 11 identifies the Cluster 3 tasks by the green leftmost lines. Figure 10 shows that the corresponding PCA scores (green diamonds in lower left quadrant) were relatively low on both S2 and S1, compared to the other tasks.

In sum, an objective grouping of the tasks using cluster analysis on the original data (Figure 11), closely corresponds to the subjective visual impression of the task groups in the PCA score space in the Dimensional Model (Figure 10). In other words, the cluster analysis in Figure 11 confirms the visual groupings of the tasks in Figure 10 as scored by the Dimensional Model.

Simplification to Two Original Metrics

As in Example 1, the results of the PCA indicate all the metrics that are highly correlated (i.e., redundant) with each other within each cluster. Hence, there is an opportunity to simplify the set of metrics even further. Theoretically, only two metrics from the original dataset should be enough to characterize each task on the two underlying driver demand dimensions, with little loss of information because of the high redundancy. In particular, TaskTime again had a high correlation with S1 (the task scores on the first component) (r = 0.945, df = 11, p = 1.1E-06), as it did in Example 1. RT_CHMSL had a statistically significant correlation with S2 (the task scores on the second component) (r = 0.716, df = 11, p = 0.005). (See Appendix H, Figures H.1 and H.2 for a visual confirmation of these findings using regression plots.) Given the statistically significant correlations between S1 and TaskTime, and S2 and RT_CHMSL, the Dimensional Model predicts that the RT_CHMSL vs. TaskTime plot with only two original metrics (Figure 12) should place the tasks in about the same locations and clusters as they were in the S2 vs. S1 score plot (Figure 10). That is, the Dimensional Model predicts that only two original metrics (e.g., as shown in Figure 12) should be able to segment the tasks as well as the task scores based on 15 metrics could in the original dataset, after PCA (as shown in Figure 10).

Figure 12 plots RT (substituting for S2) vs. TaskTime (substituting for S1). Figure 12 shows that the data points were mostly in similar locations as in Figure 10 with the original S1. (The legend in Figure 12 identifies the task clusters as identified by the analysis in Figure 11, which was based on all 10 metrics.) By visual inspection, the tasks in Figure 12 (based on only two original metrics) generally fall into the same clusters as they did in Figure 10 (based on all 10 original metrics). For example, MapEasy fell into Cluster 3 (green diamonds), and MapHard fell into Cluster 2 (red squares) in both Figure 12 (based on 2 metrics) and Figure 10 (based on 10 metrics).

Of the 13 tasks, the only one misclassified after simplification to 2 original metrics was RadioEasy. It was in Cluster 2 (S1 low, S2 high) in Figure 11, but fell near the tasks in Cluster 3 (S1 low, S2 low) in Figure 12 (note red square oddball RadioEasy task mixed in with the green diamond tasks in the lower left quadrant in Figure 12). RadioEasy was relatively low in its RT_CHSML metric compared to the other metrics that loaded onto Dimension 2. The S2 metric combines the effect of all the Dimension 2 metrics (LVD and CHMSL event detection, and MSGD) into a single score, and so represents the "gold standard" task score locations along the cognitive demand dimension. The RT_CHMSL metric by itself is generally consistent with this "gold standard" S2 dimensional score, except for the RadioEasy task. An explanation and discussion of this cluster misclassification is given in Appendix I, which cites references [53,54,55].

Discussion 2

The two main dimensions for driver demand shown in Example 2 were consistent with those found in Example 1, and were herein labelled cognitive and physical demand. This consistency occurred despite the considerable differences in test protocols between the two studies.

The consistency of the Dimensional Model results in the two studies indicates that DRT events are effective surrogates for assessing the attentional effects of secondary tasks for forward vehicle events. This consistency should not be surprising, because the response to events is handled by the same orienting attention network, whether events are single-trial or repeated. Specifically, attending to an event while viewing a driving scene is known to be subserved by an orienting network including the right superior parietal lobe [24], a key part of the orienting attention network [56], which also is influenced by a top-down executive attention network that involves midline frontal areas [25]. Thus there is scientific evidence for interpreting the second dimension as a "cognitive" attentional dimension. Hence, it is not surprising that the results for the two studies are consistent in the second dimension being associated with the detection and response to an event, as influenced by cognitive attentional mechanisms in the brain.

MSGD as a Cognitive Demand Metric

Both examples had the unexpected finding that the MSGD to the display and controls of the device was correlated with the RT and miss rate to the events in the 2-D metric loadings space, and not with any of the physical demand metrics, including other glance metrics (such as TGT and number of glances).

In addition, some (but not all) relatively short duration tasks with relatively low scores on Dimension 1 had relatively high scores on Dimension 2. That is, these tasks had relatively long MSGD and relatively poor event detection, placing them in the upper left quadrant of Figure 10 (red squares). Overall, there was little or no correlation between MSGD and TaskTime or any of the other Dimension 1 metrics, but it does correlate with the event detection metrics; that is why MSGD loads onto Dimension2 rather than Dimension 1. It is therefore of interest as to the reason why some short tasks (those with scores in the upper left quadrant of Figure 10 and Figure 3), are in the cluster of relatively short tasks that have long single glances and also relatively poor detection of visual events. There are at least two non-competing explanations for this effect.

One explanation for the clustering of MSGD with event detection metrics and not physical demand metrics in the visual-manual tasks analyzed in the current examples is that there is a common underlying cause of both the relatively long single glances and the relatively poor event detection; namely, the attentional effects of cognitive demand. Relatively high cognitive demand is contained in those tasks which score in the upper left quadrant of the 2-D score plot, and his higher demand causes both longer single glances and relatively poor event detection.

A second non-competing explanation is that the attentional effects of the cognitive demand result in the driver making longer glances to the controls and displays of the task under test, which then in turn cause the driver to miss the visual events. That is, the missed events and longer RTs are a by-product of the long single glances in the causal chain. It is clear that a long glance off the road will cause a driver to miss that portion of a road event (or a visual DRT event) that occurs while the eyes are on the device. If the event detection measuring apparatus in the experiment (or naturalistic driving study) is time-synched to the eye glance recording times, then it can readily be determined whether the missed events occurred when the eyes were off the forward scene. Otherwise, a search of the eye glance video is required, with a marker indicating when the event occurred, allowing a determination if the missed event occurred while the eyes were on or off the road.

Alternatively, a tactile DRT (TDRT) probe that is time-synched to the eye glance recording timestamps can be used in an experimental setting, because the response times and miss rates to a tactile stimulus are not affected by glances. Whether the eyes are on or off the road does not affect the response to a tactile stimulus. If misses to a tactile stimulus occur, they are more likely to be due to attentional effects than if a visual stimulus is used as a probe for attentional effects. Thus, the TDRT can be used to determine whether there is a common underlying attentional effect which causes both misses and long single glances. The misses and response times to the TDRT are therefore a suitable metric for assessing the attentional effects of cognitive demand that is isolated from the physical effect of eye glances per se [9]. Indeed, when the TDRT is used in the same experiment as the remote detection response task (RDRT) for the same tasks with the same participants, there is a similar pattern of RTs [9, Figure E.5] observed for the RDRT as the TDRT. However, there is an increase in miss rates that is larger for a visual-manual task measured with RDRT than with TDRT [9, Figure E.7a]. For a pure auditory-vocal task with no visual or manual properties (such as the n-Back task [57]), the RDRT and TDRT show the same pattern of effect across tasks for both RT and miss rate [9, Figures E.5 and E.7a]. The companion paper [28] shows that the miss rate to a TDRT stimulus is almost perfectly correlated with long single glances and mean single glance time metrics for visual-manual tasks, suggesting that while eyes-off-road time certainly effects miss rates to visual stimuli, that misses to non-visual stimuli increase for some tasks. In short, some tasks still load relatively high on Dimension 2 even when the TDRT is used instead of the RDRT.

This analysis indicates that the relatively poor event detection of the tasks in the upper left quadrant of the score plots is caused both by the direct attentional effects of cognitive demand on miss rates, and the indirect effect of the attentional effects of cognitive demand causing long single glances (which in turn increase miss rates). Regardless of the relative balance between these two effects, the tasks which score in the upper left quadrant of the score plots do so because the underlying cause of the attentional effects of cognitive demand. Therefore, we consider it reasonable to label the second dimension as a metric for the attentional effects of cognitive demand. For shorthand, we sometimes label Dimension 2 as "cognitive demand," although technically speaking, Dimension 2 arises from the "attentional effects of cognitive demand," and not cognitive demand itself, which is not directly observable except in brain imaging experiments [24,25].

Need for Two Orthogonal Dimensions to Discriminate Task Demands

Note that the "cognitive demand" metrics loading on Dimension 2 are fundamentally different in nature than the Dimension 1 metrics. The examples presented here clearly demonstrate that metrics such as RT, miss rate, and mean single glance duration are independent of the task duration (or any other Dimension 1 metric). Thus, the HVAC task in Figure 15 with a longer-than-average mean RT (associated with higher cognitive demand) is a relatively short task (associated with low scores on Dimension 1). On the other hand, the DestEntry task with an average mean RT (associated with lower cognitive demand) has a long task time (associated with high scores on Dimension 1). For visual-manual tasks, the perceived or subjective workload is correlated with Dimension 1 metrics and scores (see Figure 1). It follows that visual-manual tasks with low physical demand (and low perceived workload) can have relatively poor, relative average, or relatively good event detection. Hence, that is precisely why at least one additional metric along the second dimension as modelled here is required to capture both aspects of driver demand, and successfully discriminate visual-manual tasks in their effects on driving. Without analyzing the second dimension, many visual-manual tasks will (falsely) appear to have similar driver demand. That is, a one-dimensional model will cause many tasks to appear the same when they actually are easy to differentiate in the 2-D driver demand model presented here.

Conceptual Analogy for Dimension 1

Dimension 1 is here labelled "physical demand." The term "physical demand" is deliberately chosen, as the D1 metrics in Examples 1 and 2 represent a "force" moving something through distance (or time). For example, the physical demand required to lift a one-pound object a vertical distance of one foot is one foot-pound, a measure of "work" as defined in physics. The "work" in the case of driver demand represents the glances and button presses required over a time period (TaskTime) to complete a visual-manual task. For visual-manual tasks, a driver's subjective rating of the "workload" of a visualmanual task also falls into Dimension 1 (see Figure 1).(Note that this may not be the case for auditory-vocal and mixed-mode tasks [28].)

Those who prefer describing metrics in purely behavioral or "operational" terms may prefer to have a conceptual understanding of Dimension 1 as encompassing all those metrics which are correlated with vehicle performance, as measured by lateral and longitudinal standard deviation. However, it must be recognized that task time, the total number of glances (to any location), the total glance time (to any location), subjective workload, and all the other Dimension 1 metrics for visual-manual tasks are valid surrogates for the driving performance metrics of lateral and longitudinal variations, with equivalent power for predicting a task's location along the Dimension 1 axis.

Paradox Resolved

As previous described, some relatively short visual-manual tasks that are relatively low in physical demand (.e.g., tasks which tend to be subjectively rated as "easy") have worse object and event detection (and longer mean single glance durations) than tasks that are relatively "hard" in physical demand. This finding was called the "task duration paradox for event detection" by the investigators in the CAMP-DWM study [51, p. 8-44]. They state [51, p. 8-44] "Object and Event Detection measures were not readily interpretable due to a paradox in which shorter duration tasks had worse OED performance than longer tasks." They believed it be an artifact of the CAMPDWM methods, "The task duration paradox for event detection appears to be methodological in nature" [51, p. 8-44].

In short, the result that was called "paradoxical" in the original CAMP-DWM report [51, p. 8-44] is now easily and simply resolved. For example, the HVAC task with better lane maintenance, shorter task time, fewer glances, and lower total glance time, has less-than-average event detection; i.e., relatively long response time (RT) and a higher miss rate compared to the other 12 visual-manual tasks tested in the CAMP-DWM study. Conversely, destination entry (DestEntry), with poor lane maintenance (high SDLP), many short-duration glances (Glance), and long total glance time (TGT), has event detection (miss rate) and responses (RT) in the average range of the tasks tested. That is, a task can score relatively high on Dimension 2 but not Dimension 1 (e.g., the HVAC task), or vice versa (e.g. the DestEntry task which is average on Dimension 2 and relatively high on Dimension 1). This result is paradoxical only from a one-dimensional model, because a task cannot simultaneously have both high and low scores using any one-channel driver demand model, no matter how complicated the model. However, a task can have two independent scores with a simple 2-D model that allows for separate dimensions for physical and cognitive demand, thus resolving the paradox.

In other words, the paradox is easily resolved when the tasks are evaluated and scored in two (mathematical) dimensions, because event detection and response performance is independent of physical demand - these are separate and independent intrinsic properties of a task. Therefore, a task with relatively low physical demand can have relatively high cognitive demand (e.g., tasks in Cluster 2 in Figure 10), or relatively low cognitive demand (e.g., tasks in Cluster 3 in Figure 10). The result is that in a scatter plot of many such tasks, the correlation between the task scores in the two dimensions will be zero or near zero, as demonstrated in both Examples 1 and 2. These results validate the Dimensional Model because they provide evidence that the attentional effect of cognitive demand is a separate and independent dimension from physical demand. In other words, these first two examples are consistent in finding that the two primary dimensions of overall driver demand (cognitive and physical demand) have little or no association with each other.

VALIDATION EXAMPLE 3

Dataset 3: Dingus (1987)

Dingus [58] tabulated mean driver performance data for 27 visualmanual in-vehicle primary driving tasks in a 1985 Cadillac Sedan de Ville in an experimental test with 32 drivers on public highways. Some of the data were also published in Tables in [59,60,61,62], as identified in Appendix Table C1.

Task Set 3

Task descriptions of the 27 tested tasks are given in [59, Table 1]. One subset of 19 tasks were characterized as in-vehicle conventional tasks that are normally performed while driving. The remaining 8 tasks were performed on a navigation device.

Four of the conventional tasks were dropped from the current analysis. The "cruisecontrol" task had missing data. The "cassette tape," "turnSignal," and "Sentinel" tasks were dropped because the total glance time (TGT) did not equal the product of mean single glance time (MSGD) and number of glances (Glances) for those tasks, as it did for the other 23 tasks with non-missing data. This result does not necessarily mean that the data for the latter three tasks were bad, because the sum of a product term (TGT) across participants does not always have to equal the product of the means of MSGD and Glances across participants (a product of two variables is a nonlinear transformation). However, because for all the other tasks the TGT was equal to the product of the mean MSGD and mean glances, there is a chance the data were not correct for those three tasks so they were dropped from the analysis.

The radio tuning task, which was done on a traditional analog radio, is described in [59, Table 1] as, "Locate tuning knob; Adjust tuning knob to instructed digitally displayed frequency." The number of frequency steps is not specified so it was not possible to determine from the published information as to what extent this task matched or did not match the subsequent Alliance [47] and NHTSA [48] radio tuning benchmark task specifications.

The "Head+" navigation task was not classified in terms of complexity in [63], but had been classified as high demand in [58], and so it is herein assumed it would have been classified in the "complex" category if it had been so classified in [63].

The final task set analyzed here was therefore composed of 15 conventional tasks and 8 navigation tasks for a total of 23 tasks.

Metrics 3

There were 22 metrics described as having being collected in the study [58], of which the mean data for 9 of the metrics were published and therefore available for independent analysis. Four of these 9 metrics were excluded from further analysis:

* "Time out of lane" (LaneDevTime), because 6 of the tasks had no data for that metric and would be automatically dropped in PCA, which drops metrics with missing data in any cell.

* "TGT to device with no task errors" because a preliminary analysis showed the results were virtually identical to "TGT to device with task errors."

* "TaskTime measured from first glance" was dropped in favor of "TaskTime measured from the time the button was pressed to start the test." Preliminary analysis found these two TaskTime metrics to be almost perfectly correlated and including both would have contributed no additional information to the PCA.

* MSGD to the road, because other metrics pertinent to glances to the road were not published: total glances to the road, total eyes-on-road time, and percent time gazing to the road (gaze concentration). It is difficult to interpret the meaning of MSGD to the road by itself, without consideration of these other road glance metrics.

No event detection and response metrics were collected in [58].

The current study therefore analyzed the following 6 metrics to validate the Dimensional Model, which fell into three groups:

* Task metrics: task completion time (TaskTime), number of errors in completing task (TaskErrors).

* Vehicle metrics: Number of lane exceedances (LaneX).

* Glance metrics to displays or controls of device under test: total glance time (TGT); number of glances (Glances); mean single glance time (MSGD).

The mean data for the final 23 tasks and 6 metrics analyzed with PCA are reproduced in Appendix Table C2.

Results 3

Figure 13 shows the metric loadings onto the first two components according to the PCA conducted here. These 2 components captured 92% of the total variance in the original 6 metrics - 73% of the variance on Dimension 1, and 19% on Dimension 2. That is, the PCA results indicate that all the data for the 23 tasks tested can be simplified to only 2 metrics instead of 6, with little loss of explanatory power.

It appears by visual inspection that these metrics again fall into two clusters, seen in the lower right and upper left of Figure 13, as they did in Examples 1 and 2. An independent cluster analysis of the metrics on the mean data in Table C2 confirmed the impression from visual inspection that there were two major clusters of metrics in Example 3 (Figure 14).

The metrics in Figure 13 are coded according to the cluster analysis in Figure 14; metrics cluster 1 (physical demand) is blue; metrics cluster 2 (cognitive demand) is red. Consistent with Examples 1 and 2, the same type of metrics load onto Dimension 1: TaskTime, TGT, Glances, and LaneX (see blue lines to the left in Figure 14, and blue squares to the lower right in Figure 13). Also consistent with the first two Examples, MSGD loads onto Dimension 2 (cognitive demand). Event detection metrics were not collected in the Example 3 study, so it could not be verified that they load onto the cognitive demand dimension in Example 3 as they did in Examples 1 and 2. However, a new metric that was not tabulated in the first two Examples loads onto Dimension 2, namely TaskErrors. (Note: These are a count of task errors made by drivers in the secondary task itself, not errors made in the primary driving task.) The results in Figure 13 provide evidence that task errors made by drivers in this study [58] specifically increase the cognitive demand score of a task while driving. Contrary to conventional wisdom, task errors do not cause a task's physical demand score to be high, despite the fact that more errors would be expected to cause tasks to be longer, increasing their physical demand score. Thus the Dimensional analysis of the Example 3 dataset (combined with the Dimensional Model in Examples 1 and 2) predicts that if drivers make more errors in a given task than they do in other tasks (presumably because the task has poor human factors design), they will also tend to have longer glances to the task, increased miss rates, and longer response times to visual events (that is, secondary task errors primarily affect cognitive demand, not physical demand). Example 3 was the only one of the four Examples in the current study that specifically coded for task errors, so the fact that task errors loaded onto the cognitive demand dimension rather than the physical demand dimension in study [58] warrants replication.

Figure 15 plots the task scores for Example 3 in the two dimensions.

Figure 15 shows that all 8 navigation tasks (red circles) have higher scores on the second component (cognitive demand) than all 15 conventional tasks (blue squares). This observation is clarified by examining just the task scores on the second component (Figure 16).

The navigation tasks (red circles at top of Figure 15 and red bars to the right in Figure 16) are perfectly segmented by the S2 task score from the conventional tasks (see blue squares in Figure 15 and blue bars to the left in Figure 16). In short, the Dimensional Model analysis indicates that all the navigation tasks in study [58] had higher attentional effects from cognitive demand than all the conventional tasks in that open road experimental study.

Another study by the same group [63] assigned what it defined as "complexity" levels to these tasks based on these identical data, primarily based on the number of glances and lane crossing (LaneX) metrics [63, Table 3]. Study [63] thereby assigned these tasks to "simple," "moderately complex", and "complex" levels. For example, tasks with a smaller number of glances and a smaller number of lane deviations were assigned to the "simple" level, those with medium values to the "moderately complex" level, and those with higher values to the "complex" level. These complexity levels are given in Appendix Table C2, and plotted as the data labels at the end of the bars in Figure 16 (1 = Simple, 2 = Moderately Complex, 3 = Complex). Study [63] then associated these complexity levels with the crash/near-crash odds ratios for "simple, "moderate," and "complex" secondary tasks, which the investigators had previously published in the 100-Car study [64, Table 2.4].

The results depicted in Figure 16 do not support the "complexity" levels assigned to the tasks in study [63]. The numbers at the top of the bars indicating the complexity levels of tasks have no discernible relationship to their S2 scores. Figure 17 simply sorts the same bars in Figure 16 according to the "complexity" labels assigned in study [63] rather than their S2 scores. (The complexity labels are given by the numbers outside the ends of the bars in Figure 16.)

Figure 17 illustrates that the complexity levels assigned in study [63] have no clear relationship with the cognitive demand score assigned by the Dimensional Model. Two of the navigation tasks (DestDist+ and DestDirect+), which had higher S2 scores placing them to the right of Figure 16 with the other navigation tasks, were classified as "Simple" by study [64], as shown by the two oddball red bars in the "Simple" group to the left of center in Figure 17. Likewise, the "Complex" tasks as classified in [64] occurred over the complete range of S2 scores, from the lowest score among the sample of tasks tested (PowerMirror) to the highest (Heading+). The PowerMirror task classified as "complex" is an interesting task because it had 26 lane deviations, the most of any task tested (see Appendix Table C2).

However, it had relatively low MSGD values and no task errors, so it had the lowest cognitive demand score of any task tested. This single task shows up the independence of the cognitive demand and physical demand dimensions, and why tasks should not be classified as "simple" or "complex" based upon physical demand metrics.

Such contrasts in metrics from the two dimensions, and the complete mismatch between levels of "complexity" assigned by Dimension 1 metrics vs. Dimension 2 metrics, is not surprising from the viewpoint of the Dimensional Model. The assignment of "complexity" levels was explicitly made in study [63] primarily on the basis of the number of glances and LaneX metrics. These metrics are known from Example 1 (and the cluster analyses in Examples 1 and 2) to load into Dimension 1. In other words, the number of glances and lane crossing metrics load onto the dimension that the Dimensional Model terms "physical demand." PCA guarantees that the task scores on Dimension 1 and 2 are orthogonal (i.e., have correlation of 0), and so the metrics loading into Dimension 1 tend to have only minimal correlations with the metrics loading onto Dimension 2. The Dimensional Model therefore suggests that the Dimension 2 metrics are measuring a separate and independent attentional effect, that we here term "cognitive demand." This attentional effect is entirely separate and independent from the physical demand metrics, such as lane crossings or number of glances to a secondary task. In short, the Dimensional Model suggest that study [63] assigned levels of "complexity" to the tasks using metrics from the wrong dimension. This issue is not just a semantic one over the definition of "complexity". It is illustrative of the importance of understanding that there are separate underlying dimensions of physical and cognitive demand, and not a single "driver workload" dimension.

Perhaps a simple analogy can clarify the distinction between physical and cognitive demand in the Dimensional Model. For example, most would agree that walking 10 steps imposes a greater physical demand than walking 1 step. However, it is implausible to maintain that walking 10 steps has a greater cognitive demand than walking 1 step. Analogously, taking 10 glances to a device while performing a secondary task while driving does not necessarily imply that the secondary task has a greater attentional effect form cognitive demand than a task that takes only 1 glance. A 10-glance task could have less or more cognitive demand than a 1-glance task; it all depends on the nature of the task. The traditional single-dimensional concept of driver workload (falsely) assumes that tasks with more glances must be inherently more "complex" and therefore have greater cognitive "workload" than tasks with fewer glances. However, complexity is a cognitive attribute, not a physical attribute. The Dimensional Model is consistent in Examples 1-3 in showing that several of the metrics for Dimension 2 (cognitive demand) are mean single glance time, event response time and miss rates (whether single or repeated events), and number of errors in secondary task performance. Indeed, Figure 24 shows that the task scores on the cognitive dimension perfectly segment the navigation and conventional tasks in terms of the attentional effects of cognitive demand, despite the fact that there was no direct measurement of event detection or response in Example 3, as there was in Examples 1 and 2. The Dimensional Model maintains that Dimension 2 metrics are the valid metrics for task complexity, not the Dimension 1 metrics of glance time, number of glances, lane crossings, task time, and so forth. Note that Example 1 shows that for visual-manual tasks, subjective workload also loads on Dimension 1. That is, a driver's report of their perceived workload of a visual-manual task has to do with the physical demand of the task, not with the cognitive demand. In other words, the evidence in Example 1 demonstrates that a driver's subjective experience of the workload of a visual-manual task has to do with how much physical effort the driver has to do to complete the task (number of button presses, number of glances, etc.). This view is contrary to the concept of "driver workload" in much of the literature. It is possible that this assignment of tasks to groups that are labelled in terms of "complexity" based on metrics reflecting levels of physical effort arises from a 1-dimensional view of driver "workload" that is contrary to the Dimensional Model.

Attempting to group tasks along a "complexity" dimension using physical demand metrics is thus mixing apples and oranges. This mismatch becomes more obvious when the complexity categorization in [63] is compared to the clustering according to the attentional effects of cognitive demand as given by the task score plot for Example 3 (Figure 16 or 17).

VALIDATION EXAMPLE 4

Dataset 4: Ranney et al. (2011)

Ranney et al. [65], in a NHTSA-sponsored study to support the preparation of the NHTSA Guidelines [48], collected driver performance data from 5 visual-manual secondary tasks in a simulator with 100 participants, ages 25-64. The test participant was instructed and trained to follow at a constant distance and speed a lead vehicle that went through a complicated waveform of speed variations.

Task Set 4

Three tasks were performed on a smart phone (both iPhone and Blackberry): DialContact, Dial10-Digit, and TextMessage. The data were similar so were combined by Ranney et al. [65] in subsequent analyses. Two tasks were performed on the in-vehicle system, a 2010 Toyota Prius with a touch-screen infotainment/navigation system: DestEntry and RadioTune. The task details are given in [65, Appendix J].

Metrics 4

A number of the metrics for which data were collected in study [65] are shown in Table 1.

The response time and miss rate of the driver were measured by the driver's time to press (or not) a finger switch, in response to one of 6 increments in luminance at random intervals at different positions on the simulator screen in the forward field of view. (This is a variation of the RDRT [9], but with 6 lights instead of the standard 1.) The mean data were not all provided in Ranney et al. [65], so the data were digitized off the graphs as indicated in Table 1. MSGD was not graphed in the paper, so it was calculated as the Total Glance Time to the Device (TGT) [65, Figure 27] divided by Total Glances to the Device (Glances) [65, Figure 28]. In addition, Long Glance Proportion was not graphed in report [65], so it was calculated herein as the Long Glance Frequency (LGF) [65, Figure 32] divided by Total Glances to the Device (Glances) [65, Figure 28]. (Note: Because the metric values had to be estimated from the graphs, and some metric values had to be calculated from others, the findings in Example 4 should be taken as illustrative only, and not as an exact solution for the components and task scores in the Ranney et al. [65] study, unlike the previous three examples.)

Analysis of the metrics in the current study was complicated by the fact that the Ranney et al. study design was designed to investigate a number of experimental conditions simultaneously, which can be represented by the 2 x 2 design in Table 2.

The rows in Table 1 show the task time conditions investigated by Ranney et al. [65], which were "1 Trial" vs. "Drive." In the "Drive" condition, the participants performed the same task repeatedly within a fixed "drive time" of 150 s (2.5 min). In the "1 Trial" condition, the metrics for the first task trial of the repeated trials were separately tabulated. The columns in Table 1 show the glance measurement conditions, which were "glances to the device" vs. "glances not to the road." The latter is the sum of glances to the device and "other" glances that were not to the road but were not to the device, such as mirrors, windows, speedometer, etc. The sum of all the glance times to the 3 locations (device, other, and road) is the grand total glance time, which equals the total task time if the glance data were accurate.

The "1 Trial" data were the only data used to validate the Dimensional Model in this paper, because the "Drive" condition forces the artificial situation that all tasks have the same task time. The "Drive" condition was deliberately done by Ranney et al. for reasons stated as, "Metrics derived from continuous task performance over an interval of fixed duration are independent of task duration. The mean values obtained using this approach represent the average magnitude of performance degradation for a given secondary task at any point in time" [65, p. 54]. The NHTSA Guidelines [48, p. 11210] further state in reference to the Ranney et al. method of repeating tasks, "Metrics that summarize performance over varying durations are influenced by differences in task duration. In contrast, metrics that normalize for task duration summarize task performance over equivalent time intervals and thus represent the expected magnitude of performance degradation at any point in time during which a task is performed."

However, the "Drive" task time condition with repeating tasks creates several complications, such as predicting crash risk in naturalistic driving studies from experimental data, because tasks are not repeatedly performed in real-world driving. That is, performing the tasks repeatedly for to achieve a fixed duration causes all the task times to no longer correspond to the mean task times in real-world driving. In addition, the method of equating task time in all the tasks basically forces all the Dimension 1 metrics for all the tasks to be constant, because all the Dimension 1 metrics are highly correlated with task time. Physical demand is therefore forced to become constant by the experimental protocol, so the Dimension 1 metrics will no longer have any predictive power for the physical demand of any task. Because physical demand constitutes the majority of the variance for visual-manual tasks as shown in the first three examples, this protocol essentially eliminates that variance, which was stated to be one of the main goals of the NHTSA investigation. As a result, anomalies occur such as noted in the NHTSA Guidelines [48, p. 11210], "Destination entry was no more demanding than radio tuning when task duration effects were eliminated." This, of course, is exactly the result expected for the Dimensional Model. When TaskTime is made constant, tasks can only be discriminated on the Dimension 2 axis, and as Figure 15 in Example 2 shows, DestEntry and RadioHard tuning have nearly equal scores on the Dimension 2 axis. For the current study, therefore, only the single trial task time condition was analyzed.

Ranney et al. [65] also investigated the use of glance measurements made to the device (e.g., total glance time or TGT, as called for in the Alliance Guidelines [47]), vs. glance measurements made off the road (e.g., Total Eyes off Road Time or TEORT, as called for in the NHTSA Guidelines [48]). These alternatives are shown in the columns of Table 2. Ranney et al. [65, p. 58] stated that they mainly used glance measurements off the road rather than to the device, because it was "...easier to determine when the driver was looking straight ahead" and "there may be some uncertainty involved in determining exactly when the driver is looking at a secondary task display while driving." Without citing evidence, they also state, "The duration of glances away from the roadway center was considered of more direct relevance to safety than the portion of the duration that was directed to the secondary task display" [65, p. 18] and "... the total EORT is perhaps more important from the perspective of overall road safety" [65, p. 59]. They acknowledge the counterargument that, "In more naturalistic settings, some percentage of total EORT is necessary to maintain situation awareness" [65, p. 58]. In fact, the 100-Car study data [64] showed that glances to mirrors and side windows reduce crash risk. Therefore mixing in "Other" glances with glances to the device of interest raises questions about whether TEORT in a laboratory setting can successfully estimate crash risk.

Furthermore, Ranney et al. [65, Table 32] show that for a visualmanual destination entry task, about 57% of the total task time is composed of glance dwell time to the forward location, about 25% to the secondary task display and controls (total glance time to the device, or TGT), and about 18.0% to "Other" locations (meaning all remaining possible glance locations). Thus the TEORT percentage is the sum of the "Other" and "TGT" percentages or 43% of task time. However, the TGT percentage is only 25% of task time, or 25/43 = 58% of TEORT. Thus, for example, a 12-second TEORT as per the current NHTSA Guidelines [48] criteria, is equivalent to a 7-second TGT because of confounding from "Other" glances. For these reasons, only the glance data to the device were used for further analysis.

The combination of the 1-Trial task time estimate and the using glances data to the device, requires that only data in the upper left cell (green) in Table 2 needs to be analyzed here.

PCA cannot have more variables that it has data points or a "singular matrix" error occurs, meaning that the correlation matrix cannot be inverted, so principal component scores cannot be calculated. Because Ranney et al. [65] had only 5 tasks, then at most 4 metrics could be selected from Table 1 for PCA in this Example. The four metrics chosen were TaskTime, Glances, MSGD, and RT, as they were the closest approximation to the metrics used in the previous Examples. Only single trial results for each task were used, and only glances to the task, thereby being comparable to Examples 1-3, for reasons stated in the previous paragraphs in this section. In other words, selecting only trial 1 data (without artificially repeating tasks for a fixed duration), and only for glances to the device (rather than glances off the road, which includes glances to other locations than the device_, is consistent with the validation data used in Examples 1 to 3. Therefore only 4 of the Ranney et al. metrics could be analyzed with PCA (because there were only 5 tasks). The analyzed data are given in Appendix Table D1.

Results 4

Figure 18 shows the metric loadings onto the first two components according to the PCA. These 2 components captured 97% of the total variance in the original 4 metrics - 56% of the variance on Dimension 1, and 41% on Dimension 2.

An independent cluster analysis of the data in Appendix D1 shows the metrics fall into two major clusters (Figure 19). MSGD and RT fall into the cognitive demand cluster as previously identified, and TaskTime and Glances fall into the physical demand cluster. The points in Figure 18 use symbols are keyed to the clusters as identified in Figure 18. It is clear by visual inspection that the cluster analysis in Figure 19 identified the same clusters that would be found by visual inspection of Figure 18, made up of the metric loadings in the PCA as done for the validation of the Dimensional Model. MSGD and RT are again in the upper left quadrant (red squares with relatively high S2 loadings and relatively low S1 loadings). TaskTime and Glances are again in the lower right quadrant (blue circles with relatively high S1 loadings and relatively low S1 loadings). Thus, even with only 4 metric loadings possible with only 5 visual-manual tasks, the results are similar to the 15 metric loadings for 78 visual-manual tasks in Example 1, or Examples 2 and 3.

Figure 20 shows the tasks scored according to the two components. Text Message and DestEntry scored highest on Dimension 1 among the few tasks tested, but DestEntry had the lowest S2 score, and TextMessage had the highest, showing that the S2 dimension easily discriminated between these tasks when the S1 dimension did not.

Figure 21 shows that the metrics can be reduced from the full set of metrics (only 4 in this Example) to 2 non-glance metrics (with TaskTime representing the first dimension, and RT representing the second). The tasks fall into similar locations for Figure 21 (with only 2 original metrics) and Figure 20 (requiring four metrics to create the task scores). Other cross-combinations of two original metrics from the two dimensions gave rise to generally similar results (not shown), as long as one metric was picked from the physical demand cluster and one from the cognitive demand cluster.

Discussion 4

There were large differences in the experimental protocols and venue in Example 4 compared to the other examples (e.g. Example 4 was conducted in a simulator using a unique complicated protocol, whereas Examples 1 and 2 were on a closed road and Example 3 was on an open road). Despite these differences, the patterns of metric and score results in Example 4 were similar to those in Examples 1-3, showing the robustness of the Dimensional Model to variations in experimental protocols.

GENERAL DISCUSSION

Four examples were presented, each of which gave similar results when applied to visual-manual tasks according to the Dimensional Model analysis methods. From 78% to 97% of the variance in the four examples

across the many different metrics included in the analyses could be explained by only two components (dimensions). That is, only two dimensions were sufficient to capture the variability in all the metrics in every example examined. Another way of stating this result is that from two scores for each task in every Example would be sufficient to reproduce all the original metric values to a high degree of accuracy. This result occurs because of the high degree of redundancy between the particular clusters of metrics that were identified both by the PCA and independent cluster analyses in all four examples, with consistent results.

Another useful contribution of this study is that the methods in the four Examples demonstrate that eye movements were not needed to predict the location of the secondary tasks in the 2-D score space, but the scores could be reliably predicted by just TaskTime and RT for the visual-manual tasks tested in the four examples.

A consistent finding arising from validation of the Dimensional Model with these four datasets is that the second dimension captures a different dimension of driver demand than number of glances or total glance time to a device, or measures of lateral and longitudinal variability of the vehicle. This second dimension emphasizes an aspect of visual-manual tasks that has not been previously well recognized, which is that visual-manual tasks not only have physical demand but have attentional effects of cognitive demand, even higher than do auditory-vocal tasks [17]. That is, visual-manual tasks tend to have even more attentional effects from cognitive demand than do auditory-vocal tasks, as evidenced by the higher miss rates and RTs of visual-manual tasks to single-trial events such as lead-vehicle CHMSL activation [67, Figure 6], or stationary light flashes as in the RDRT [Example 1]. Stationary flashing lights and moving lights activate different types of neurons in the primate visual cortex [54], but the effect on the orienting attention network may be similar, given the similar results in Experiments 1 and 2. To paraphrase Yogi Berra, driving, like baseball, "is 90% mental. The other half is physical."

Independence of MSGD and Lateral Deviations?

In addition to validating that only two dimensions are needed to describe driver performance during secondary tasks, all four examples in this paper were consistent in showing a minimal effect of MSGD on vehicle lateral variability metrics (whether LaneX or SDLP). MSGD loads on Dimension 2, and lateral variability measures load on Dimension 1, so this result helps to validate the Dimensional Model An early opposite prediction made from measurements of the SDLP while drivers performed a simulated task on a cathode-ray tube was that the probability that a car would exceed a lane boundary was larger for longer in-car glance durations [66]. However, Table 3 shows that this early prediction in [66] is not supported by the data in Examples 2, 3, and 4 examined here, and only weakly supported in Example 1, which had so much statistical power from its many tasks that even the relatively low r value of 0.5 between LaneX and MSGD was statistically significant.

Studies of pure auditory-vocal tasks such as cell phone conversation concur with the predictions of the Dimensional Model for visualmanual tasks in finding that variations in lateral deviations in vehicle position are either not affected or even improved by the attentional effects of cognitive demand, likely because of driver self-regulation (see [67, Section 1.3.5.2]). This counterintuitive result was predicted by the Dimensional Model in these examples because lateral variations such as SDLP and lane crossings load onto Dimension 1, and MSGD (indicative of long glances) loads onto Dimension 2, and these two dimensions are uncorrelated in their task scores, and minimally correlated in the metric loadings on the components. Long mean single glance durations tend to be correlated with speed reduction, which improves safety by increasing time gaps to lead vehicles. This effect arising from driver self-regulation tends to offset the attentional effects of cognitive demand on RT [67, Section 1.3.5.1]. Hence, while the standard deviation of speed was a Dimension 1 variable in these examples, speed difference was a Dimension 2 variable [67].

TaskTime

TaskTime Benefits

The three studies that collected event data (Examples 1, 2, and 4) were consistent in that for visual-manual tasks, the scores of the tasks on the Dimensions were well predicted by only two original metrics: TaskTime and RT. The event could be either a repeated event such as the DRT (Examples 1 and 4), or a relatively "novel" event such as a lead vehicle CHMSL light activation or deceleration (Example 2), showing the generalizability of the Model for simulator, closed road, and open highway venues for studies of visual-manual secondary task effects on driving.

Although other pairs of original metrics could have been chosen to span the complete score space, TaskTime and RT were mainly chosen for the Examples because both are time-based. Time-based metrics are particularly useful in the study of attentional processes, as physical time is the same metric for the brain (and mind) as it is for behavior. Posner [68,69] has long championed the use of "chronometrics" for the study of attentional processes in the mind and brain. For a review of mental chronometry, see [70].

The importance of TaskTime as a metric that is useful for assessing at least one dimension of driver demand for visual-manual tasks has long been recognized in the literature [71,72,73,74,75,76,77]. The continued importance of TaskTime is confirmed by the Dimensional Model Examples 1-4, and also by driving safety experts [78].

False Rejection of Static TaskTime

NHTSA [48, IV.B, p. 11208], however, rejected static task time as a metric in its Guidelines:

In the 1990s, SAE International worked to develop a recommended practice for determining whether or not a particular navigation system function should be accessible to the driver while driving. The draft recommended practice (SAE J2364) [79,80,81] asserted that if an in-vehicle task could be completed within 15 seconds by a sample of drivers in a static (e.g., vehicle parked) setting, then the function was suitable to perform while driving. NHTSA conducted a preliminary assessment of the diagnostic properties of this proposed rule. Ten subjects, aged 55 to 69 years, completed 15 tasks, including navigation system destination entry, radio tuning, manual phone dialing, and adjusting the Heating, Ventilation, and Air Conditioning (HVAC) controls in a test vehicle. Correlations between static task performance and dynamic task performance were relatively low. The results were interpreted to suggest that static measurement of task completion time could not reliably predict the acceptability of a device. Based on these results, NHTSA looked to other metrics and methods for use in assessing secondary task distraction in subsequent research.

NHTSA did not provide a citation for the "preliminary assessment" it refers to in the above quotation. However, the NHTSA study [82] matches the description in the NHTSA quotation above. That study found that "static" task time (measured by doing the task by itself with no simulated driving) had an [r.sub.2] value of 0.39 (r = 0.6245, df = 13, p = 0.013) with dynamic task time measured while driving on a track. The study used 15 in-vehicle secondary tasks, a number of which were on the "Clarion Eclipse[R] Voice Activated Audio Navigation (VAAN) system" which used voice recognition and auditory output. Hence, this study was not a valid test of static task time as described in SAE J2364 [79] which specifically excludes "voice-activated" controls from its scope, as it was intended for only visual-manual tasks. This may be on reason the low correlation observed in this study has not been supported by later studies of visual-manual tasks only, which find that task time is highly correlated across all venues tested [83,84,85]. For example, Appendix E in the current paper shows the correlations between the NHTSA-sponsored CAMP-DWM study visual-manual task times across venues [52]. Here, there was an [r.sub.2] value of 0.9898 (r = 0.9949, df = 11, p = 2.7E-12; bolded numbers in Appendix Table E2) between static TaskTime and dynamic TaskTime measured while driving on a track. Furthermore, task time for visual-manual tasks measured in the same venue (whether simulator, closed-road, or open-road) also has high correlations with all the Dimension 1 metrics as shown in Examples 1-4 in the current paper. In short, the NHTSA study [82] study incorrectly included auditory-vocal tasks, and its results have not been supported in later studies, suggesting that NHTSA made a false negative error when it rejected static task time as a suitable metric for driver demand. Task time has high correlations across venues as shown in Appendix E, and therefore task time measured in any venue should have high predictive validity for all other Dimension 1 metrics for visual-manual tasks in Examples 1-4 in this study. Curiously, NHTSA adopted TSOT as a suitable test venue, and rejected static task time, even though TSOT is tightly predicted by static task time ([r.sub.2] = 0.9870, r = 0.9935, df = 11, p = 1.0E-11; see italicized number in Table E.2). That result means that static task time can substitute for TSOT as a metric in these studies without loss of information (meaning use of the occluded goggles are statistically unnecessary). The NHTSA TSOT criterion of 12 s is equivalent to a 24 s static task time, using a 1.5 s shutter open time and a 1.5-second shutter closed time as per the NHTSA protocol [48, OCC.5.a, p. 11242].

Other Concerns with TaskTime

However, there are those who are concerned about the use of task time as a surrogate for Dimension 1, because it is not a direct "ground truth" [49] metric of driving performance, such as lane deviations, SDLP, or SD headway. It is possible that there are improved secondary task designs in newer vehicles than those in these Examples which do not have an increase in SDLP with TaskTime that is larger than the increase in SDLP during "just driving" for a matched duration. TaskTime also has the disadvantage that it is not generalizable as a Dimension 1 metric for auditory-vocal tasks, because lateral measures are not correlated with the duration of an auditory-vocal task as they are for a visual-manual task. Indeed, lateral deviations tend to become fewer during auditory-vocal tasks compared to just driving (see [50, Section 1.3.4.2] for citations to 30 studies with this finding).

In answer to these concerns, the Dimensional Model allows driving performance metrics such as SDLP to be substituted for TaskTime with little loss in estimating the physical demand of visual-manual tasks. In other words, any other metric besides TaskTime that loads highly on Dimension 1 would also be satisfactory for predicting task scores on Dimension 1, because all metrics that load highly on Dimension 1 are tightly correlated. For example, the Alliance [47, Principle 2.1A] adopted Total Glance Time to the device, which has equal predictive validity with TaskTime for the Dimension 1 task scores for visual-manual tasks. The NHTSA Guidelines [48] adopted Total Eyes off Road Time, which includes off-road glances to other locations than the device under test, such as mirrors, speedometer, or left and right windows. If the glances to these other locations are minimal in a particular simulator test which takes steps to artificially limit such glances (which occur for over 50% of the glances in naturalistic driving studies [64]), then TEORT would also have similar predictive validity as TGT does for Dimension 1 task scores. Of course, proper controls must be put in place to adjust for the confounding effect of TaskTime if SDLP or other Dimension 1 metrics are used in place of TaskTime. For example, if comparing SDLP for a secondary task to SDLP in a baseline driving with no secondary task, the TaskTime of the baseline driving comparison period must equal the TaskTime of the secondary task to control for the confound.

In sum, for visual-manual tasks, other Dimension 1 metrics such as TGT, TEORT, number of glances, SDLP, SD of headway, and even subjective workload ratings are all tightly correlated with TaskTime, so any of them are predicted by the Dimensional Model to do about as well as TaskTime in assessing tasks, which is almost as effective as assessing them according to their Dimension 1 scores as shown in the Examples. However, glance metrics are expensive and time-consuming to collect and tabulate, and the driving performance metrics of lane and speed maintenance typically require a simulator, although surrogates not requiring a simulator have been proposed [[86,87,88]. TaskTime is a simple, inexpensive, and easy-to-collect surrogate metric that is as valid as any of the other Dimension 1 metrics in estimating the physical demand of visual-manual tasks, at least as per the evidence in the four examples given here.

Limitations of Dimensional Model Validation

Cognitive Demand Metric Limitations

A limitation on using MSGD, RT, or other metrics for indicating the cognitive demand of visual-manual secondary tasks is that it may be necessary to control for potential confounding effects of road geometry. For example, a simulator test [89] found that drivers' MSGD to secondary tasks can vary with road geometry. On straight roads, drivers' MSGD to secondary tasks was 1.2 s, compared to 1.8 s on curves. Thus the identical task measured on a curved road would (falsely) appear from the MSGD metric to have more cognitive demand than if it were measured on a straight road. A three-hour open-road test with a wide variety of secondary tests found that the MSGD of in-vehicle tasks performed on two-lane roads was 0.92 s, which was statistically smaller than in city or four-lane roads of about 1.1 s [58, Figure 14]. Thus the same task conducted on a two-lane road would appear to have less attentional effects of cognitive demand than the identical task conducted in city driving or on a four-lane road. The correct relative rankings of the tasks on the attentional effects of cognitive demand by the use of the Dimension 2 metrics identified here could still be determined if tasks were tested under the same driving demand conditions for the duration of each task (e.g. a straight road with no or minimal traffic), or task trials were conducted under carefully randomized driving conditions.

Another potential confounding variable for MSGD and other cognitive demand metrics is driving experience. For example, one open-road study with three hours of driving on city streets and rural roads [90] found that 29% of novice drivers took single glances longer than 3 s to the in-car task, but none of the experienced drivers did. Again, the confounding variable of driving experience can be controlled for by selecting participants with equivalent driving experience, or carefully balancing for that variable.

In addition, there is the confounding variable of age. In an open road experiment [58], drivers 50 years of age or older not only longer MSGD than younger drivers, but they also had more task errors. The increased RTs of older drivers vs. younger drivers is also well documented and the many studies need not be cited here. Both of these effects, singly or especially together, would make it appear that a task had different cognitive demand effects on attentional mechanisms than it actually did across the general population of drivers, if there were no adjustment or control for age. A test attempting to measure the attentional effects of cognitive demand may control for age by using stratification by age, for example, or other well-established procedures in epidemiology or statistics.

Physical Demand Metric Limitations

Physical demand metrics are likewise susceptible to biases from confounding variables. For example, older drivers (50+) have longer TGT than younger drivers for the identical tasks [58, Figure 21]. Tasks performed by older drivers would appear to have greater physical demand than the identical tasks performed by younger drivers. A test purporting to measure physical demand may also control for age by using stratification, for example.

Experimental Studies Limitation

The four examples used to validate the Dimensional Model were all carefully-controlled experiments, with tasks performed with one or more experimenters in the vehicle in the closed-road tests in Examples 1 and 2, in the open-road test in Example 3, and likely near the simulator in Example 4. These conditions could have biased the metrics in unknown ways because of observer bias.

The road conditions in the open-road test in Example 3 may have been made under certain traffic conditions and specific road geometries that would not necessarily be the same in other field studies, or in naturalistic driving, which could have biased the metric loadings in unknown ways. For example, under different traffic or road conditions in field or naturalistic studies, as discussed in the cognitive demand limitation above, it is possible that MSGD may load onto Dimension 1 rather than Dimension 2, and therefore be correlated with number of eye glances and TGT and TEORT metrics, which it was not in the four studies examined here.

Older In-Vehicle Technology Limitation

The four examples were from published studies in the literature, which used a wide range of technologies for device-related tasks that were current at the time of each study, but which do not necessarily represent current technology and design in vehicles being sold today. However, a more recent study [28] conducted with modern vehicle technology were consistent with the results here for visual-manual tasks in older vehicles. That consistency suggests that the Dimensional Model is a general model of driver demand for visual-manual tasks that is reflecting drivers' intrinsic attentional processes and the effects of physical demand, and is not specific to a particular in-vehicle or portable device hardware or software configuration.

Questions for Future Research

Prediction of Unknown Metric Values or Across Venues

The Dimensional Model hypothesizes that a simulator test without using glance metrics should be able to predict the glance metrics for a simulator, track or road test. Likewise, because the dimensions were essentially the same whether in simulator, track, or open road venues, the task scores in one venue should be able to make reasonably valid estimates of the relative task scores in another venue (potentially including naturalistic driving). (This suggestion is related to a statistical method called "multiple imputation" for estimating missing values, offered in the Stata statistical package [35].)

For the Example 2 study, the Dimensional Model makes the testable prediction across venues that the RT and TaskTime data from the CAMP-DWM simulator venue [51,52] should allow reasonable prediction of the simulator eye glance metrics. This prediction cannot be directly tested without expensive manual reduction of the eye glance data in the CAMP-DWM the simulator test, which did not have its eye movement data analyzed for cost reasons. However, the prediction can be tested across venues, because the track and road data did have at least some of the eye glance data analyzed in the CAMP-DWM study (see Example 2 methods). The Dimensional Model also offers an opportunity to predict across venues. For example, the Dimensional Model scores for the tasks conducted in the simulator (without eye glance metrics), should allow prediction of the task scores on the track or road (with glance metrics). The predictive validity of the S1 and S2 score metrics combining the information from many metrics in the simulator, should be considerably higher than that obtained by attempting to predict the track metric values using only one simulator metric at a time (the procedure used by the CAMP-DWM project investigators).

Likewise, for the Example 1 study, the predictive validity for the combined score metrics from the Dimensional Model should be even better than has been done previously. For example, the track data in Example 1 were successfully predicted by the "static load" test (now termed the "semi-static" test in the ISO DRT standard [9]), which is based on doing the task while viewing a driving scene while using the RDRT [39,40]. However, the predictions of the track data were done from the stationary data using only single metrics on a 1-1 prediction basis. Forming the PCA of the metrics in one venue condenses all the information in all the metrics to 2 scores for each task. Finding the correspondence of the scores in one venue to the scores in another venue is thus a much more powerful method than looking at single metrics at a time, which has been the norm until now in the field of driving safety metrics.

Indeed, the multivariate statistical method of canonical correlation [91] is similar to this suggested procedure for cross-venue prediction. It can be successfully used in future research to establish the optimum multivariate prediction from multiple metric sets in one venue (e.g., a simulator or even analytical metrics like number of manual steps [76]) to another (e.g., road, track, and naturalistic venues).

Estimating Error Boundaries of Task Scores

It is useful to be able to estimate and display the uncertainty level of the location of the points in the final 2-D score space, to increase overall understanding of the dataset. The uncertainty level is equivalent to solving for the error ellipses around the scores in the 2-D space. Such error ellipses are useful in obtaining a sense of the statistical significance of the distances between scores in the final 2-D score space. Such error ellipses are also useful, for example, to correctly set a driver demand criterion or criteria based on a task's location in the 2-D score space. The criterion or criteria need(s) to be set large enough to account for the uncertainty of a secondary task, if one is chosen as the benchmark for setting the criterion [92].

One of the definitive textbooks on PCA does not mention how to estimate uncertainty levels in the final PCA scores [32]. Major statistical programs also do not offer estimates of error boundaries for PCA scores in their standard package (for example, neither the Minitab [34] nor Stata [35] programs offer this feature). A public-domain program that does offer this feature is iPCA [93,94].

Alternatively, a method developed by the lead author can be used with existing packages. After calculation of the PCA scores, the scoring coefficients used to generate those scores could then be used to estimate the error boundaries of the individual scores, as follows. The upper error limit of each of the original metrics for each task can be entered as a vector after estimating the PCA loadings and their "coefficients," which allow prediction of scores for new tasks after the PCA is performed. The inner product of this error vector with the scoring coefficients (the scoring coefficients are not the loadings matrix but are derived from it) would then project the upper confidence limit into the same 2-space as the original task scores. The same method works for estimating the lower error limit of each task score. The upper and lower limits in 2-space define the boundary of the error ellipse which can then be drawn in a straightforward manner from the four points solved for using this method, which define the ends of the major and minor axes in the error ellipse. Applying this method (or other methods of error estimation of 2-D task scores such as the iCPA [93,94]) offers a promising area for future research for the Dimensional Model of Driver Demand.

Why Do Some Short Tasks Have Relatively Poor Event Detection and Others Don't?

Still unresolved is the question of what the design characteristics are that cause some relatively short tasks to have relatively poor event detection and longer glances (that is, higher cognitive demand) than other relatively short tasks? This question provides an emerging research area for the human factors of driving and design of invehicle secondary tasks, which has been raised by these findings. One plausible hypothesis advanced here (but not yet proven) is that drivers will persist in performing a short task to its end, rather than interrupt and resume it, thus saving on "switching costs" back to the roadway and then back again to the secondary task, in a multi-tasking environment. Hence, during some short tasks with certain (still unknown) design characteristics, drivers tend to make longer glances to the device, increasing their mean single glance duration and long glance proportion during the few glances that they do make, which increases their miss rates to external events.

Estimating Relative Crash Risk

It is outside the scope of this paper to estimate relative crash risk for secondary tasks based on the Dimensional Model and metrics analyzed here, which were drawn from experiments and not naturalistic driving. However, future research should theoretically be able to make more valid estimates of relative crash risk for visual-manual tasks based on the contributions of this research. In particular, the Dimensional Model implies that relative risk estimates for visual-manual tasks will need to take into account not only the physical demand, but also the attentional effects of cognitive demand, which arise from visual-manual secondary tasks performed concurrently while driving. It should therefore be theoretically possible to use the scores of the visual-manual tasks on the two dimensions as identified by the Dimensional Model to make valid estimates of the relative risks of visual-manual tasks in naturalistic driving studies, although this hypothesis remains to be convincingly tested.

Attempts to estimate crash risk (or, more properly, relative crash risk) from surrogate metrics have a long history [95]. Such predictions must be done carefully, because of the presence of unknown "intervening variables" that may not always be controlled for in experimental studies [49]. However, the methods of epidemiology were designed specifically to find and adjust for intervening variables.

Tijerina et al. [96, pp. 1-8 to 1-12] further caution that the standard Pearson correlation coefficient is not suitable for predicting the risk of a crash (or presumably its relative risk) from a secondary task metric, because a crash is rare and a binary event, and therefore has a distribution unlike the more normal distributions with the typical driver behavior experimental metrics. This issue is also addressed by epidemiological analysis methods, which were specifically developed to handle rare events (such as rare diseases), and so are based on the binomial theorem and not the normal distribution.

Tijerina et al. [96] also use Bayes' statistical theorem in an attempt to show the difficulty in quantitatively establishing the safety relevance of a driver demand metric. However, the arithmetic given in their example [96, pp. 1-12 to 1-13] is incomplete, because their argument appears to be based on an implicit assumption that the prevalence of a risk factor before a crash (or safety-relevant event) is a direct estimate of its causal role in a crash. In fact, a valid estimate of the cause of a crash cannot be made solely by the prevalence (or odds) of a risk factor before a crash (this is a common error in the field of driving safety). A valid estimate of the causal relation between a safety-relevant event and a risk factor is made by comparing the prevalence (or odds) of a risk factor associated with a safety-relevant event to the prevalence (or odds) of the risk factor occurring in appropriate baseline samples without a safety-relevant event. A valid estimate of the relative risk can then be made after adjustment for confounding variables. (See [97] for further discussion.) Thus, although it would be challenging, there is no known roadblock that would prevent the relative risk of a secondary task as determined in naturalistic driving, to be estimated from its two task scores in the Dimensional Model measured in an experimental venue, once the proper correspondence was established between the experimental and naturalistic driving venues, using methods such as those described in the previous section "Prediction of Unknown Metric Values or Across Venues."

Even before such new research is carried out, the Dimensional Model already sheds some light on the interpretation of the causes of safety-critical events in naturalistic driving data. For example, recent analyses of the 100-Car naturalistic driving data suggest that long glances are a larger risk factor in causing crashes than total glance time or total eyes-off-road time [98,99]. This result is supported by a recent analysis of the SHRP-2 naturalistic driving data for rear-end safety-critical events (combined crashes and near-crashes), which found that the odds of a safety-critical event go up with the duration of the last glance before a crash, and that the earlier glances during a task (that contribute to total eyes-off-road time before a crash) are not as predictive [100]. The evidence in Examples 1-4 here is that the second dimension of driver demand reflects the attentional effects of cognitive demand, and the evidence shows that event detection and response metrics load heavily on that dimension. Event detection and response are "ground truth" metrics for driving safety [49], perhaps as much as (if not more than) the first dimension metrics of driving performance such as lane maintenance. Recall from Examples 1-4 that the second dimension loads with MSGD, which is an indicator for long glances (particularly if there are few glances), but does not load with total glance time to the device (or total eyes-off-road time). Putting these observations together, it is possible that the cognitive demand dimension may ultimately be found to have a heavier weight than the physical demand dimension in estimating the relative crash risk of visual-manual tasks, despite the fact that the physical demand dimension explains at least twice the variance in experimental data than does the cognitive demand dimension in the examples here. The results of this analysis of data according to the Dimensional Model, confirm the importance of event detection and response in driving safety, which has long been emphasized [101,102,103], but not always fully appreciated. Even if the attentional effects of cognitive demand are not as large a contributor to relative crash risk for visual-manual tasks as physical demand, we hypothesize that any valid estimate of the relative crash risk for visual-manual tasks must take into account the attentional effects of cognitive demand of visual-manual tasks, as well as the physical demand of those tasks. This has not been true in many reports of naturalistic driving studies to date [e.g. 64], which have focused almost exclusively on the physical demand of visual-manual tasks.

Extension to Auditory-Vocal and Mixed-Mode Tasks

The scope of the current paper is limited to visual-manual tasks. The extension to auditory-vocal tasks is not straightforward, because for example there are no glances to a device, because there is no display screen by definition for a pure auditory-vocal task (such as cell phone conversation). There are glances off the road to various other locations for auditory-vocal tasks such as glances to mirrors, speedometer, or out the left and right windows. But these reduce the relative risk (as indicated by the odds ratio) of a safety-critical event in naturalistic driving studies [e.g., 64], so it biases the TEORT too high, and therefore relative risk estimates made from the TEORT metric may be biased high for auditory-vocal tasks. TaskTime may also not necessarily be a valid metric for pure auditory-vocal tasks, because some driving performance metrics (such as lane maintenance) do not correlate with TaskTime for pure auditory-vocal tasks as they do for visual-manual tasks [67]. Likewise, mixed-mode tasks (which are usually auditory-vocal-visual tasks in today's vehicles, and are often mislabeled as "voice" tasks), bring even further complications in directly applying the Dimensional Model for visual-manual tasks as developed in this paper to auditory-vocal and mixed-mode tasks. Despite these difficulties, the Dimensional Model has been extended in a preliminary manner to auditory-vocal and mixed-mode tasks as described in the companion paper [28].

SUMMARY/CONCLUSION

There are two major underlying dimensions of demand on drivers when they perform visual-manual tasks. Just the two scores of a task on these two dimensions can account for most of the variance in dozens of driver performance metrics. These two dimensions of driver demand for visual-manual tasks are consistent across the four experimental studies examined, despite wide variations in methods, venues, and the full set of particular metrics used. These dimensions were herein termed physical demand, associated with lateral and longitudinal vehicle metrics and their surrogates, and cognitive demand, associated with event detection metrics and their surrogates.

The Dimensional Model resolved the paradoxical finding that for some visual-manual secondary tasks, drivers maintain their lane well yet detect events relatively poorly. Or, for certain other visual-manual tasks, the same drivers maintain their lane relatively poorly yet detect events relatively well. For yet other visual-manual tasks, drivers maintain their lane well and have relatively good event detection. This seeming paradox was resolved by modeling driver demand in two mathematical dimensions rather than one. In this study, principal components analysis (PCA) was applied to the published data from four simulator, track, and open-road field studies of visual-manual secondary task effects on driving. PCA reduced the correlated task metrics to two underlying orthogonal dimensions which were consistent across studies. The physical and cognitive demand dimensions represented not only the "ground truth" metrics of lateral and longitudinal vehicle performance and event detection, but also surrogates for those metrics such as task time, eye glances, etc. We showed how the metrics of task time and response time could assess the two dimensions, although other metric pairs are theoretically suitable as well.

In conclusion, the Dimensional Model of Driver Demand allows for a common simplified understanding of the two dimensions underlying almost all the commonly-used measures of the effects of visualmanual secondary tasks on driver performance.

REFERENCES

[1.] Welford, A., "Single-Channel Operation in the Brain," Acta Psychologica 27:5-22, http://www.sciencedirect.com/science/article/pii/0001691867900406, 1967, doi:10.1016/0001-6918(67)90040-6.

[2.] Broadbent, D.E. Decision and Stress. Oxford, England: Academic Press, http://psycnet.apa.org/psycinfo/1972-08224-000, 1971.

[3.] Moray, N., "Where Is Attention Limited? A Survey and a Model," Acta Psychologica 27:84-92, 1967.

[4.] Kahneman, D. Attention and Effort. Englewood Cliffs, NJ: Prentice Hall, https://www.princeton.edu/~kahneman/docs/attention_and_effort/Attention_lo_quality.pdf, 1973.

[5.] Wickens, C.D., "The Effects of Divided Attention on Information Processing in Manual Tracking," Journal of Experimental Psychology. Human Perception and Performance 2(1):1-13, http://psycnet.apa.org/journals/xhp/2/1/1/, 1976, doi:10.1037/0096-1523.2.1.1.

[6.] Wickens, C.D., "Multiple Resources and Mental Workload," Human Factors 50(3):449-455, http://www.ise.ncsu.edu/nsf_itr/794B/papers/Wickens_2008_HF_MRT.pdf, 2008, doi:10.1518/001872008x288394.

[7.] Ho, C. and Spence, C. The Multisensory Driver: Implications for Ergonomic Car Interface Design. Hampshire, England: Ashgate Publishing Ltd., http://www.psy.ox.ac.uk/publications/464237, 2008.

[8.] Merat, N. and Jamson, A., "The Effect of Stimulus Modality on Signal Detection: Implications for Assessing the Safety of in-Vehicle Technology," Human Factors: The Journal of the Human Factors and Ergonomics Society 50(145-158), http://www.ncbi.nlm.nih.gov/pubmed/18354978, 2008, doi:10.1518/001872008X250656.

[9.] ISO/DIS 17488. "Road Vehicles--Transport Information and Control Systems--Detection-Response Task (DRT) for Assessing Attentional Effects of Cognitive Load in Driving," (2015): 75 pages. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=59887.

[10.] Posner, M.I. and Fan, J. "Attention as an Organ System," In Topics in Integrative Neuroscience: From Cells to Cognition, edited by Pomerantz J.R.. 31-61. Cambridge, UK: Cambridge University Press, https://www.sacklerinstitute.org/cornell/people/jin.fan/publications/publications/ANT_AS_ORGAN_SYSTEM.pdf, 2008.

[11.] Petersen, S.E. and Posner, M.I., "The Attention System of the Human Brain: 20 Years After," Annual Review of Neuroscience 35:73, http://cnsweb.bu.edu/Profiles/Mingolla.html/cnsftp/cn730-2007-pdf/posner_petersen90.pdf, 2012.

[12.] Posner, M.I. Attention in a Social World. Oxford University Press, 2012.

[13.] Rothbart, M.K. and Posner, M.I., "The Developing Brain in a Multitasking World," Developmental Review 35:42-63, https://blogs.uoregon.edu/mposner/files/2015/05/Multitasking-DR-2015-1v2zjhn.pdf, 2015.

[14.] Fan, J., McCandliss, B.D., Sommer, T., Raz, A. et al., "Testing the Efficiency and Independence of Attentional Networks," Journal of Cognitive Neuroscience 14(3):340-347, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.474.442&rep=rep1&type=pdf, 2002.

[15.] Fan, J., Gu, X., Guise, K.G., Liu, X. et al., "Testing the Behavioral Interaction and Integration of Attentional Networks," Brain and Cognition 70(2):209-220, http://www.sciencedirect.com/science/article/pii/S027826260900027X, 7// 2009, doi:http://dx.doi.org/10.1016/j.bandc.2009.02.002.

[16.] Callejas, A., Lupianez, J. and Tudela, P., "The Three Attentional Networks: On Their Independence and Interactions," Brain and Cognition 54(3):225-227, http://www.sciencedirect.com/science/article/pii/S0278262604000302, 2004, doi:10.1016/j.bandc.2004.02.012.

[17.] Foley, J., Young, R., Angell, L. and Domeyer, J., "Towards Operationalizing Driver Distraction," proceedings of 7th International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Bolton Landing, New York, http://drivingassessment.uiowa.edu/sites/default/files/DA2013/Papers/010_Foley_0.pdf, June 17-20 2013.

[18.] Spence, C. and Ho, C., "Multisensory Interface Design for Drivers: Past, Present and Future," Ergonomics 51(1):65-70, http://dx.doi.org/10.1080/00140130701802759, 2008/01/01 2008, doi:10.1080/00140130701802759.

[19.] Bedard, M. and Weaver, B., "The Attention Network Test: A New Approach to Determine Safety to Drive?" The Eye and the Auto, Detroit Michigan, 2011.

[20.] Lopez-Ramon, M.F., Castro, C., Roca, J., Ledesma, R. et al., "Attentional Networks Functioning, Age, and Attentional Lapses While Driving," Traffic injury prevention 12(5):518-528, http://dx.doi.org/10.1080/15389588.2011.588295, 2011.

[21.] Lopez-Ramon, M.F., Castro, C., Roca, J., Ledesma, R. et al., "Analyzing the Relationship between Behavioral Measures of the Attentional Networks Performance & Self-Report Data of Attention-Related Driving Errors," presented at 2nd International Conference on Driver Distraction and Inattention, Gothenburg, Sweden, September 5-7 2011.

[22.] Roca, J., Lupianez, J., Lopez-Ramon, M.-F. and Castro, C., "Are Drivers' Attentional Lapses Associated with the Functioning of the Neurocognitive Attentional Networks and with Cognitive Failure in Everyday Life?", Transportation Research Part F: Traffic Psychology and Behaviour 17(0):98, http://www.sciencedirect.com/science/article/pii/S1369847812001039, 2//2013, doi:http://dx.doi.org/10.1016/j.trf.2012.10.005.

[23.] Roca, J., Crundall, D., Castro, C., Lopez-Ramon, M.F. et al., "Assessment of Drivers' Attentional Performance Using the ANTI-V," proceedings of 2nd International Conference on Driver Distraction and Inattention, Gothenburg, Sweden, 2011.

[24.] Bowyer, S., Hsieh, L., Moran, J., Young, R. et al., "Conversation Effects on Neural Mechanisms Underlying Reaction Time to Visual Events While Viewing a Driving Scene Using Meg," Brain Research 1251:151-161, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741688/, 2009, doi:10.1016/j.brainres.2008.10.001.

[25.] Hsieh, L., Young, R.A., Bowyer, S.M., Moran, J.E. et al., "Conversation Effects on Neural Mechanisms Underlying Reaction Time to Visual Events While Viewing a Driving Scene: FMRI Analysis and Asynchrony Model," Brain Research 1251:162-175, http://www.researchgate.net/publication/23415337_Conversation_effects_on_neural_mechanisms_underlying_reaction_time_to_visual_events_while_viewing_a_driving_scene_fMRI_analysis_and_asynchrony_model, 2009, doi:10.1016/j.brainres.2008.10.002.

[26.] Peiris, M.T.R., Jones, R.D., Davidson, P.R., Carroll, G.J. et al., "Frequent Lapses of Responsiveness During an Extended Visuomotor Tracking Task in Non-Sleep-Deprived Subjects," Journal of Sleep Research 15(3):291-300, http://dx.doi.org/10.1111/j.1365-2869.2006.00545.x, 2006, doi:10.1111/j.1365-2869.2006.00545.x.

[27.] Young, R.A., "Drowsy Driving Increases Severity of Safety-Critical Events and Is Decreased by Cell Phone Conversation," proceedings of 3rd International Conference on Driver Distraction and Inattention, Gothenburg, Sweden, http://document.chalmers.se/download?docid=19e9af22-8aec-4b5e-95d5-c24d9d286020, September 4-6 2013.9.

[28.] Young, R.A., Hsieh, L. and Seaman, S., "The Dimensional Model of Secondary Task Effects on Driving: Extension to Auditory-Vocal and Mixed-Mode Tasks," Proceedings of Society of Automotive Engineers, Detroit, MI, April 2016.

[29.] Pearson, K., "On Lines and Planes of Closest Fit to System of Points in Space," Philosophical Magazine 2:559-572, http://stat.smmu.edu.cn/history/pearson1901.pdf, 1901, doi:10.1080/14786440109462720.

[30.] Young, R., "Principal-Component Analysis of Macaque Lateral Geniculate Nucleus Chromatic Data," Journal of the Optical Society of America 3:1735-1742, http://www.researchgate.net/publication/19388572_Principal-component_analysis_of_macaque_lateral_geniculate_nucleus_chromatic_data, 1986.

[31.] Jackson, J.E. A User's Guide to Principal Components. Vol. 587: John Wiley & Sons, http://onlinelibrary.wiley.com/book/10.1002/0471725331, 2005.

[32.] Jolliffe, I.T. Principal Component Analysis (2nd Ed.). Springer Series in Statistics. New York: Springer-Verlag, http://wpage.unina.it/cafiero/books/pc.pdf, 2002.

[33.] https://en.wikipedia.org/wiki/Principal_component_analysis.

[34.] Minitab[R], Version 17.2.1.https://www.minitab.com/en-us/, 2015.

[35.] Stata, Version 13.1. http://www.stata.com/, 2015.

[36.] Kaiser, H.F., "An Index of Factorial Simplicity," Psychometrika 39(1):31-36, http://link.springer.com/article/10.1007/BF02291575, 1974.

[37.] SAE International Surface Vehicle Recommended Practice, "Operational Definitions of Driving Performance Measures and Statistics," SAE Standard J2944, Iss. Jun. 2015.

[38.] SAE International Surface Vehicle Recommended Practice, "Definitions and Experimental Measures Related to the Specification of Driver Visual Behavior Using Video Based Techniques," SAE Standard J2396, Iss. Jul. 2000.

[39.] Angell, L., Young, R., Hankey, J., and Dingus, T., "An Evaluation of Alternative Methods for Assessing Driver Workload in the Early Development of In-Vehicle Information Systems," SAE Technical Paper 2002-01-1981, 2002, doi:10.4271/2002-01-1981.

[40.] Young, R., Aryal, B., Muresan, M., Ding, X. et al., "Road-to-Lab: Validation of the Static Load Test for Predicting On-Road Driving Performance While Using Advanced in-Vehicle Information and Communication Devices," Proceedings of the Third International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, http://drivingassessment.uiowa.edu/DA2005/PDF/35_DickYoungformat.pdf, 2005.

[41.] Young, R. and Angell, L., "The Dimensions of Driver Performance During Secondary Manual Tasks," proceedings of Driving Assessment 2003: The Second International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Park City, Utah, http://drivingassessment.uiowa.edu/DA2003/pdf/25_Youngformat.pdf, July 21-24 2003.

[42.] NHTSA, "Proposed Driver Workload Metrics and Methods Project," United States Department of Transportation, Washington, DC USA, http://wwwnrd.nhtsa.dot.gov/departments/Human%20Factors/driverdistraction/PDF/32.PDF.

[43.] Johansson, E., Carsten, O., Janssen, W., Jamson, S. et al., "HASTE Deliverable 3: Validation of the HASTE Protocol Specification," August, http://www.its.leeds.ac.uk/projects/haste/downloads/Haste_D3.pdf. 2005.

[44.] https://en.wikipedia.org/wiki/Stationary_process.

[45.] Meyer, D.E., Irwin, D.E., Osman, A.M. and Kounois, J., "The Dynamics of Cognition and Action: Mental Processes Inferred from Speed-Accuracy Decomposition," Psychological Review 95(2):183-237, http://www.researchgate.net/publication/19781733_The_Dynamics_of_Cognition_and_Action_Mental_Processes_Inferred_From_Speed-Accuracy_Decomposition, 1988, doi:10.1037//0033-295X.95.2.183.

[46.] Drury, C.G. "Managing the Speed-Accuracy Trade-Off." In The Occupational Ergonomics Handbook, edited by Karwowski W. and Marras W. S.. 677-691. Boca Raton, FL USA: CRC Press, 1999.

[47.] Alliance, "Statement of Principles, Criteria and Verification Procedures on Driver-Interactions with Advanced in-Vehicle Information and Communication Systems, June 26, 2006 Version," Alliance of Automobile Manufacturers Driver Focus -Telematics Working Group, Washington, DC, http://www.autoalliance.org/index.cfm?objectid=D6819130-B985-11E1-9E4C000C296BA163. 2006.

[48.] NHTSA, "Visual-Manual NHTSA Driver Distraction Guidelines for Portable and Aftermarket Electronic Devices (Docket NHTSA-2013-0137)," http://www.regulations.gov/#!searchResults;rpp=25;po=0;s=NHTSA-2013-0137;dct=FR%252BPR%252BN%252BO%252BPS%252BSR, 2014.

[49.] DOT, "Proposed Driver Workload Metrics and Methods Project," U.S. Department of Transportation, Washington, D.C., http://wwwnrd.nhtsa.dot.gov/departments/Human%20Factors/driver-distraction/PDF/32.PDF. 1995.

[50.] Young, R., "Event Detection: The Second Dimension of Driver Performance for Visual-Manual Tasks," SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 5(1):297-316, 2012, doi:10.4271/2012-01-0964.

[51.] Angell, L., Auflick, J., Austria, P., Kochhar, D. et al., "Driver Workload Metrics Task 2 Final Report," DOT HS 810 635, Crash Avoidance Metrics Partnership (CAMP), http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Cr ash%20Avoidance/Driver%20Distraction/Driver%20Workload%20Metrics%20Final%20Report.pdf. 2006.

[52.] Angell, L., Auflick, J., Austria, P., Biever, W. et al., "Driver Workload Metrics Project, Task 2 Final Report, Appendices," National Highway Traffic Safety Administration, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/2006/Driver%20Workload%20Metrics_appendices.pdf, 2006.

[53.] Young, R.A., Lesperance, R.M. and Meyer, W.W., "The Gaussian Derivative Model for Spatial-Temporal Vision: I. Cortical Model," Spatial Vision 14:261-319, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.59 91&rep=rep1&type=pdf, 2001, doi:10.1163/156856801753253582.

[54.] Young, R. and Lesperance, R., "The Gaussian Derivative Model for Spatial-Temporal Vision: II. Cortical Data," Spatial Vision 14(3):321-389, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.5991&rep=rep1&type=pdf, 2001, doi:10.1163/156856801753253591.

[55.] Regan, D. and Beverley, K., "Looming Detectors in the Human Visual Pathway," Vision Research 18(4):415-421, http://www.sciencedirect.com/science/article/pii/0042698978900512, 1978, doi:10.1016/0042-6989(78)90051-2.

[56.] Bowyer, S., Hsieh, L., Moran, J., Young, R. et al., "Conversation Effects on Neural Mechanisms Underlying Reaction Time to Visual Events While Viewing a Driving Scene Using Meg," Brain Research 1251:151-161, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741688/, 2009, doi:10.1016/j.brainres.2008.10.001.

[57.] Young, R.A., Hsieh, L. and Seaman, S., "The Tactile Detection Response Task: Preliminary Validation for Measuring the Attentional Effects of Cognitive Load," Proceedings of 7th International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Bolton Landing, NY, http://drivingassessment.uiowa.edu/sites/default/files/DA2013/Papers/012_Young_0.pdf, June 17-20 2013.

[58.] Dingus, T. A. "Attentional Demand Evaluation for an Automobile Moving-Map Navigation System." Ph.D., Virginia Polytechnic Institute and State University, http://trid.trb.org/view.aspx?id=261899, 1987.

[59.] Dingus, T.A., Hulse, M.C., Antin, J.F. and Wierwille, W.W., "Attentional Demand Requirements of an Automobile Moving-Map Navigation System," Transportation Research Part A: General 23(4):301-315, https://www.researchgate.net/publication/223660210_Attentional_demand_requirements_of_an_automobile_moving-map_navigation_system, 1989, doi:10.1016/0191-2607(89)90013-7.

[60.] Wierville, W.A., Antin, J.F., Dingus, T.A. and Hulse, M.C. "Visual Attentional Demand of an in-Car Navigation Display System." In Vision in Vehicles II. Proceedings of the Second International Conference on Vision in Vehicles, edited by Gale A G, Freeman M H, Haslegrave C M, Smith P and Taylor S P. 307-316. Nottingham, United Kingdom, 1988.

[61.] Dingus, T.A., Antin, J.A., Hulse, M.C. and Wierwille, W.W., "Human Factors Test and Evaluation of an Automobile Moving-Map Navigation System. Part I: Attentional Demand Requirements," General Motors Research Laboratories, Warren, MI. 1986.

[62.] Dingus, T.A., Antin, J.A., Hulse, M.C. and Wierwille, W.W., "Human Factors Test and Evaluation of an Automobile Moving-Map Navigation System. Part I: Attentional Demand Requirements," General Motors Research Laboratories, Warren, MI. 1986 63. Dingus, T. and Klauer, S., "The Relative Risks of Secondary Task Induced Driver Distraction," SAE Technical Paper 2008-21-0001, 2008.

[64.] Klauer, S.G., Dingus, T.A., Neale, V.L., Sudweeks, J.D. et al., "The Impact of Driver Inattention on near-Crash/Crash Risk: An Analysis Using the 100-Car Naturalistic Driving Study Data (Report No. DOT HS 810 594)," National Highway Traffic Safety Administration, Washington, DC, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash Avoidance/Driver Distraction/810594.pdf. 2006.

[65.] Ranney, T.A., Baldwin, G.H.S., Parmer, E., Martin, J. et al., "Distraction Effects of Manual Number and Text Entry While Driving," DOT HS 811 510 National Highway Traffic Safety Administration, Washington, DC, http://www.nhtsa.gov/DOT/NHTSA/NVS/Crash%20Avoidance/Technical%20Publications/2011/811510.pdf, 2011.

[66.] Zwahlen, H., Adams, C. and De Bald, D., "Safety Aspects of CRT Touch Panel Controls in Automobiles," Vision in Vehicles II. Proceedings of the Second International Conference on Vision in Vehicles, Nottingham, UK, http://trid.trb.org/view.aspx?id=927092, September 14-17 1987.

[67.] Young, R., "Self-Regulation Minimizes Crash Risk from Attentional Effects of Cognitive Load during Auditory-Vocal Tasks," SAE Int. J. Trans. Safety 2(1):67-85, 2014, doi:10.4271/2014-01-0448.

[68.] Posner, M.I., "Timing the Brain: Mental Chronometry as a Tool in Neuroscience," PLoS Biol 3(2):e51, http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0030051, 2005.

[69.] Posner, M.I. Chronometric Explorations of Mind. Lawrence Erlbaum, 1978.

[70.] Meyer, D.E., Osman, A.M., Irwin, D.E. and Yantis, S., "Modern Mental Chronometry," Biological Psychology 26(1):3-67, http://www.sciencedirect.com/science/article/pii/0301051188900130, 1988, doi:10.1016/0301-0511(88)90013-0.

[71.] Green, P., "Visual and Task Demands of Driver Information Systems," University of Michigan Transportation Research Institute (UMTRI), http://umich.edu/~driving/publications/UMTRI-98-16.pdf. 1998.

[72.] Green, P., "The 15-Second Rule for Driver Information Systems," Proceedings of the ITS America Ninth Annual Meeting, http://umich.edu/~driving/publications/ITSA-Green1999.pdf, 1999.

[73.] Green, P., "Estimating Compliance with the 15-Second Rule for Driver-Interface Usability and Safety," Proceedings of the Human Factors and Ergonomics Society Annual Meeting, http://umich.edu/~driving/publications/HFES-Green1999.pdf, 1999.

[74.] Green, P. "Why Safety and Human Factors/Ergonomics Standards Are So Difficult to Establish." In Human Factors in Transportation, Communication, Health and the Workplace, edited by de Waard D., Brookhuis K.A., Moraal J. and Toffetti A.. Maastricht, the Netherlands: Shaker Publishing, http://www.umich.edu/~driving/publications/PGreen-Turin.pdf, 2002.

[75.] Green, P. and Shah, R., "Safety Vehicles Using Adaptive Interface Technology (Task 6): Task Time and Glance Measures of the Use of Telematics: A Tabular Summary of the Literature," University of Michigan, Transportation Research Institute, Ann Arbor, MI, http://deepblue.lib.umich.edu/bitstream/handle/2027.42/92350/102882.pdf?sequence=1. 2004.

[76.] SAE International Surface Vehicle Recommended Practice, "Calculation of the Time to Complete In- Vehicle Navigation and Route Guidance Tasks," SAE Standard J2365, Iss. May 2002.

[77.] Foley, J., "Lessons Learned from the Development of J2364," SAE Technical Paper 2005-01-1847, 2005, doi:10.4271/2005-01-1847.

[78.] Burns, P., Harbluk, J., Foley, J.P. and Angell, L., "The Importance of Task Duration and Related Measures in Assessing the Distraction Potential of in-Vehicle Tasks," Proceedings of the Second International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI 2010), Pittsburgh, Pennsylvania, USA, http://www.auto-ui.org/10/proceedings/p12.pdf, November 11-12, 2010.

[79.] SAE International Surface Vehicle Recommended Practice, "Navigation and Route Guidance Function Accessibility While Driving," SAE Standard J2364, Stab. Jun. 2015.

[80.] Green, P., "Estimating Compliance with the 15-Second Rule for Driverinterface Usability and Safety,'' Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting, 1999.

[81.] Green, P., "The 15-second Rule for Driver Information Systems," ITS America Ninth Annual Meeting Conference Proceedings, Washington, DC, 1999.

[82.] Tijerina, L., Parmer, E. and Goodman, M.J., "Preliminary Evaluation of the Proposed SAE J2364 15-Second Rule for Accessibility of Route Navigation System Functions While Driving," Proceedings of the Human Factors and Ergonomics Society Annual Meeting, http://trid.trb.org/view.aspx?id=696051, 2000.

[83.] Angell, L., Young, R., Hankey, J., and Dingus, T., "An Evaluation of Alternative Methods for Assessing Driver Workload in the Early Development of In-Vehicle Information Systems," SAE Technical Paper 2002-01-1981, 2002, doi:10.4271/2002-01-1981.

[84.] Young, R., Aryal, B., Muresan, M., Ding, X. et al., "Road-to-Lab: Validation of the Static Load Test for Predicting On-Road Driving Performance while Using Advanced in-Vehicle Information and Communication Devices," Proceedings of the Third International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, http://drivingassessment.uiowa.edu/DA2005/PDF/35_DickYoungformat.pdf, 2005.

[85.] Young, R.A., Angell, L.S., Sullivan, J.M., Seaman, S. et al., "Validation of the Static Load Test for Event Detection During Hands-Free Conversation," Proceedings of the Fifth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Big Sky, Montana, http://drivingassessment.uiowa.edu/DA2009/037_YoungAngell.pdf, June 22-25 2009.

[86.] Iwasa, T. and Hashimoto, T., "Study of Reproducibility of Pedal Tracking and Detection Response Task to Assess Driver Distraction," SAE Int. J. Trans. Safety 3(2):110-117, 2015, doi:10.4271/2015-01-1388.

[87.] Iwasa, T., Hashimoto, T. and Nagashima, H., "A New Bench Test Method for Driver Distraction of Visual-Manual and Auditory-Vocal Tasks," Proceedings of FISITA 2014 World Automotive Congress (F2014-AHF-019), Maastricht, the Netherlands, http://www.fisita2014.com/, June 2-6. 2014.

[88.] Uno, H. and Nakamura, Y., "Study on Assessment Method Using Tracking Task as Subsidiary Probe Task to Estimate the Demand of Operating an in-Vehicle Information Device," poster at Human Factors and Ergonomics Society Europe Chapter 2014, Lisbon, Portugal. http://www.hfes-europe.org/wpcontent/uploads/2014/11/UnoPoster2014.pdf, October 8-10 2014.

[89.] Tsimhoni, O. and Green, P., "Visual Demand of Driving and the Execution of Display-Intensive in-Vehicle Tasks," Proceedings of the Human Factors and Ergonomics Society Annual Meeting 45(23):1586-1590, http://www.umich.edu/~driving/publications/HFESTsimhoni2001.pdf, 2001.

[90.] Wikman, A.-S., Nieminen, T. and Summala, H., "Driving Experience and Time-Sharing During in-Car Tasks on Roads of Different Width," Ergonomics 41(3):358-372, http://www.tandfonline.com/doi/abs/10.1080/00140139818708, 1998, doi:10.1080/001401398187080.

[91.] https://en.wikipedia.org/wiki/Canonical_correlation.

[92.] Young, R.A., "Evaluation of the Total Eyes-Off-Road Time Glance Criterion in the NHTSA Visual-Manual Guidelines," Transportation Research Board, submitted.

[93.] Jeong, D.H., Ziemkiewicz, C., Fisher, B., Ribarsky, W. et al., "iPCA: An Interactive System for PCA-Based Visual Analytics," Computer Graphics Forum 28(3):767-774, http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/iPCA.pdf, June 2009.

[94.] http://ipca-interactive-principal-component-ana.software.informer.com/.

[95.] Wierwille, W.W. and Tijerina, L. "Modelling the Relationship between Driver in-Vehicle Visual Demands and Accident Occurrence." In Vision in Vehicles VI, edited by Gale A., Brown I.D., Taylor S.P. and Haslegrave C.M.. 233-243. Amsterdam: Elsevier, http://www.safetylit.org/citations/index.php?fuseaction=citations.viewdetails&citationIds%5B%5D=citjournalarticle_242937_38, 1998.

[96.] Tijerina, L., Kiger, S., Rockwell, T., and Wierwille, W., "NHTSA Heavy Vehicle Driver Workload Assessment Final Report Supplement--Task 5: Heavy Vehicle Driver Workload Assessment Protocol (DOT HS 808 467)," Washington, DC: National Highway Traffic Safety Administration, US Department of Transportation, 1996. http://ntl.bts.gov/lib/jpodocs/repts_te/2214.pdf.

[97.] Young, R., "Revised Odds Ratio Estimates of Secondary Tasks: A Re-Analysis of the 100-Car Naturalistic Driving Study Data," SAE Technical Paper 2015-01-1387, 2015, doi:10.4271/2015-01-1387.

[98.] Liang, Y. "Detecting Driver Distraction." (Ph.D. Thesis), The University of Iowa, 2009, http://ir.uiowa.edu/etd/248.

[99.] Liang, Y., Lee, J.D. and Yekhshatyan, L., "How Dangerous Is Looking Away from the Road? Algorithms Predict Crash Risk from Glance Patterns in Naturalistic Driving," Human Factors: The Journal of the Human Factors and Ergonomics Society 54(6):1104-1116, http://hfs.sagepub.com/content/54/6/1104.short, 2012, doi:10.1177/0018720812446965.

[100.] Victor, T., Bargman, J., Boda, C.-N., Dozza, M. et al., "Analysis of Naturalistic Driving Study Data: Safer Glances, Driver Inattention, and Crash Risk," Transportation Research Board, http://onlinepubs.trb.org/onlinepubs/shrp2/SHRP2prepubS08AReport.pdf, 2014.

[101.] Angell, L., "Effects of Secondary Task Demands on Drivers' Responses to Events During Driving: Surrogate Methods and Issues (Abstract)," Proceedings of the Fourth International Driving Symposium on Human Factors in Driving Assessment, Training, and Vehicle Design, Stevenson, Washington, http://drivingassessment.uiowa.edu/DA2007/PDF/005_Angell.pdf, 2007.

[102.] Rupp, G. and Angell, L., "Chapter 3: Conceptualizing Effects of Secondary Tasks on Event Detection," (Warrendale, SAE International, 2010), 49-72, doi:10.4271/R-402.

[103.] Angell, L.S., "The 'Looking but Not Seeing' Phenomenon: Could It Be Happening to Us? Are We Looking and Not Seeing?" Proceedings of International Symposium on Naturalistic Driving, https://secure.hosting.vt.edu/www.apps.vtti.vt.edu/PDFs/NDRS-presentations/Angell.pdf, August 31 2010.

[104.] http://www.toyota.com/csrc/driver-distraction-cognitivemodelvalidation. html.

CONTACT INFORMATION

richardyoung9@gmail.com

ACKNOWLEDGMENTS

We appreciate the helpful comments of Paul Green of the University of Michigan on portions of the earlier drafts. We also thank Michael Posner at the University of Oregon for comments on the attention networks in the brain. We particularly thank Huizhong Guo, Graduate Research Assistant in the Human Factors & Statistical Modeling Lab, Department of Civil and Environmental Engineering, University of Washington, for independently verifying the principal component analyses in this paper for Examples 2-4. The analyses reported in Examples 2 and 3 were supported by a research contract from the Toyota Collaborative Safety Research Center to our research team at Wayne State University from July 1, 2011 to December 31, 2014 [104].

DEFINITIONS/ABBREVIATIONS

ANT - Attention Network Test

CAMP - Crash Avoidance Metrics Partnership

CF_delay - Car following delay

CHMSL - Center High Mount Stop Lamp

Component - A set of values of linearly uncorrelated metrics

D1 - Dimension 1 (synonym for Component 1)

D2 - Dimension 2 (synonym for Component 2)

DestDirect+ - Point in direction of destination [59, Table 1]

DestDist+ - Determine distance to destination [59, Table 1]

DestEntry - Destination entry

df - degrees of freedom (for a statistical test)

DRT - Detection Response Task [9]

Dimension - As used here, the minimum number of coordinates needed to specify the location of a task within the mathematical space spanned by its driver demand metrics. This definition is a more mathematical one than the dictionary definition of dimension as simply an attribute or a feature of a task (which was that used in the CAMP-DWM study [51, Section 8.4.1.1, p. 8-23]).

Driver Performance - A more general term than driving performance, as it also includes event detection and response metrics as well as driving performance metrics.

Driving Performance - Its narrow definition refers only to "ground truth" lane and speed maintenance metrics (e.g. Alliance Principle 2.1B [47, p. 40]). Its broader definition may include surrogate metrics that are highly correlated with lane and speed maintenance metrics.

DWM - Driver Workload Metrics

EORT - Eyes-Off-Road Time

Event Detection - Detection of events in any sensory modality as measured by hits and misses, and the response

and Response - times to the hits [9]. Event Detection and Response metrics are "ground truth" driver performance metrics along with lane and speed variability metrics.

GazeCon - Gaze concentration

"ground truth" metrics - The broader definition as used in [42] refers to any metric whose values are gathered on a closed test track or an open road with traffic. The narrow definition as used herein applies only to driver performance metrics.

Heading+ - Determine the general vehicle heading [59, Table 1]

HVAC - Heating-Ventilation-Air Conditioning system in a vehicle

LaneDevTime - Time out of lane

LaneX - Number of times tire touched or crossed lane marker during task

LGF - Long Glance Frequency

LGP - Long Glance Proportion

LVD - Lead-Vehicle Deceleration

Metric - A synonym for variable or measure. Used here to mean a variable used to measure the driver demand of a secondary task.

Miss% - Percent of total visual events missed

Miss%_CHMSL - Percent of CHMSL events missed

Miss%_front - Percent of front visual events missed

Miss%_LVD - Percent of LVD events missed

Miss%_side - Percent of side visual events missed

MSGD - Mean Single Glance Time

Mixed-Mode - Tasks with visual-manual and auditory-vocal modalities in the same task. So-called "voice" tasks which include visual images on a screen, are "mixed-mode" tasks, not true "voice" tasks (which have only auditory-vocal content).

NHTSA - National Highway Traffic Safety Administration

n-Back - An auditory-vocal task requiring the participant to remember the previous digit that was n digits prior to a currently spoken digit.

OED - Object and Event Detection

PC1 - Principal Component 1 (the single component that explains the highest possible amount of variance in the dataset by itself)

PC2 - Principal Component 2 (the single component that explains the remaining highest possible amount of variance in the dataset by itself, after removal of PC1)

PCA - Principal Components Analysis

RDRT - Remote Detection Response Task

RT - Response time

RT_CHMSL - Response time to CHMSL events

RT_front - Response time to front visual events

RT_side - Response time to side visual events

RT_LVD - Response time to LVD events

S1 - Score on first dimension

S2 - Score on second dimension

SD headway - Standard deviation of headway

SDLP - Standard deviation of lane position

Sit_UnAw - situation unawareness (subjective)

SpeedX - Number of times more than 5 mph above or below instructed speed of 40 mph

Static - Doing a task by itself with no simulated or actual driving

Surrogate Metrics - Metrics such as glance metrics, TaskTime, driver secondary task errors, etc. which are not "ground truth" driver performance metrics, but which may correlate highly with "ground truth" driver performance metrics, and can therefore successfully predict them.

TaskTime - The duration of a task, whether measured statically by itself, or during simulated driving or actual driving on a closed or open road, or in naturalistic driving.

TDRT - Tactile Detection Response Task

TEORT - Total-Eyes-off-Road Time (includes TGT as subset)

TGT - Total Glance Time (to device under test)

TSOT - Total Shutter Open Time

Unsu% - Percent of unsuccessful task completions

APPENDIX

APPENDIX A. CORRELATION RESULTS FOR EXAMPLE 1

Table A1 summarizes some correlation results for Example 1.

Table Al. Summary of correlation results for Validation Example 1.

Metric          Statistic    S1        S2      S1_rot     S2_rot

TaskTinie       r           0.955    -0.260     0.953      0.266
                [r.sup.2]  91.3%      6.7%     90.9%       7.1%
                P           5.5E-42   0.022     2.54E-41   0.018
RT              r           0.578     0.706     0.135      0.903
                [r.sup.2]  33.4%     49.9%      1.8%      81.5%
                P           3.0E-08  51.E-13    0.240      1.4E-29
TaskTinie & RT  r           0.981     0.914     0.966      0.907
                [r.sup.2]  96.2%     83.5%     93.3%      82.3%
                P           7.0E-56   1.7E-31   2.1E-46    2.9E-30

Metric          S1_reduced  S2_reduced  S1_reduced_rot  S2_reduced_rot

TaskTinie         0.970      -0.195         0.925           0.343
                 94.0%        3.8%         85.6%           11.8%
                  3.5E-48     0.087         1.2E-33         0.002
RT                0.475       0.469         0.161           0.648
                 22.6%       22.0%          2.6%           42.0%
                  1.1E-05     1.5E-05       0.159           1.4E-10
TaskTinie & RT    0.974       0.623         0.987           0.879
                 94.9%       83.5%         97.4%           77.3%
                  9.2E-51     1.7E-31       4.2E-62         3.7E-26


Row 1 in Table A1 labelled "TaskTime" shows the correlation of TaskTime with each of the composite metrics associated with the Task Scores S1 (physical demand) and S2 (cognitive demand). TaskTime predicts better than RT all factors connected with S1 (green cells), regardless of whether S1 is rotated, reduced (i.e., does not include TaskTime in its calculation), or both rotated and reduced. All the correlations are about equally strong.

The center row in Table 1 labelled "RT" shows the correlation of RT with each of the composite metrics associated with the Task Scores S1 (physical demand) and S2 (cognitive demand). RT predicts better than TaskTime all factors connected with S2 (green cells), regardless of whether S2 is rotated, reduced (i.e., does not include RT in its calculation), or rotated and reduced. RT is best correlated with the rotated S2 (dark green cell).

The bottom row in Table 1 labelled "TaskTime and RT" shows the correlation of the conjoint TaskTime and RT with each of the composite metrics associated with the Task Scores S1 (physical demand) or S2 (cognitive demand) as indicated in the top row. The conjoint RT estimate is formed by doing a multiple regression of the dependent metric indicated in the top row with the TaskTime and RT metrics. The predicted dependent metric is then correlated with the observed dependent metric to estimate the correlation. The results show that the conjoint effect of TaskTime and RT predicts slightly better than the higher r value of TaskTime or RT alone for all factors, which is why the row labelled "TaskTime & RT" is colored Green.

A visual comparison of the reduced PCA scores in Figure 16 (without TaskTime and RT) with the full PCA scores in Figure 2 (with TaskTime and RT) shows that the pattern of scores is nearly the same, suggesting that TaskTime and RT are good predictors of S1 and S2 measured with other metrics, even without TaskTime and RT in the set

To support this visual comparison statistically, the S1 and S2 scores from the reduced metrics set without TaskTime and RT were well predicted by TaskTime and RT, even though they were not included in the reduced PCA. TaskTime is still tightly correlated with the reduced S1 task scores (r = 0.970, df = 76, p = 3.5E-48, Figure not shown). TaskTime and RT conjointly predict the reduced S1 scores (formed without TaskTime or any RT metric) (r = 0.974, df = 76, p = 9.2E-51, Figure not shown). The r value for the conjoint TaskTime and RT solution for the reduced S1 (0.974) is only trivially higher than the r value for TaskTime alone for the reduced S1 (0.970), indicating that almost all the variance in the S1 reduced factor is explained almost entirely by TaskTime, with no contribution from RT. This result is plausible, given that TaskTime and RT have little correlation, and thus make independent non-overlapping contributions to Dimension 1 and 2.

Likewise, RT is still correlated with the reduced S2 task scores (r =0.469, df = 76, p = 1.5E-05, Figure not shown), even though none of the three RT metrics were included in the reduced PCA that formed those reduced S2 scores. The r value for the conjoint TaskTime and RT solution for the reduced S2 component (0.623) is slightly higher than the r value for RT alone (0.469), indicating that almost all the variance in the S2 reduced component is explained almost entirely by RT, with little contribution from TaskTime. This result is again plausible, given that TaskTime and RT have little correlation, and thus make independent non-overlapping contributions to Dimension 1 and 2.

If the reduced loading axes in Figure 15 are given a Varimax rotation, TaskTime is still highly correlated with the reduced and rotated S1 task scores (r = 0.925, df = 76, p = 1.2E-33, Figure not shown). However, RT is slightly better correlated with the reduced S2 task scores with rotation (r = 0.648, df = 76, p = 1.4E-10, Figure not shown) than without (r = 0.469). The correlation of TaskTime and RT conjointly with the reduced rotated score S1 (r =0.987, df = 76, p = 4.2E-62, Figure not shown), is only slightly higher than that for TaskTime alone (r = 0.925), again showing the independence of TaskTime and RT. However, the correlation of TaskTime and RT conjointly with the reduced rotated score S2 (r =0.879, df = 76, p = 3.7E-26, Figure not shown), is only slightly higher than that for RT alone (r = 0.648), again indicating the independence of TaskTime and RT.

APPENDIX B. DATA AND SCORE RESULTS FOR EXAMPLE 2

Table B1 reprints Table 3 from Young (2012) [50], which is the CAMP-DWM data [52, Appendix Q] analyzed in that paper. Detailed task descriptions are given in [52, Appendix B]. The scores S1 and S2 from the PCA are given in the right-most two columns.

Table Bl. Metrics used for Validation Example 2. See main text for
Example 2 and Definitions/Abbreviations section for definition of
Metrics.

ID   Task        Task Time   TGT   Glances  SDLP  Speed X  MSGD

 1.  Coins        17.6       6.7     7.3    0.65    4.52   0.90
 2.  Cassette     15.9       4.7     5.7    0.61    4 34   0 85
 3.  HVAC         11.4       6.3     5.6    0.55    3.84   1.15
 5.  ManualDial   24.4      11.0    11.3    0.68    5.67   0.97
 4.  RadioEasy    11.4       5.7     4.6    0.52    3.73   1.28
14.  RadioHard    16.7       9.0     7.7    0.59    4.34   1.16
16.  CDTrack7     18.4       9.8     9.1    0.63    4.79   1.03
17.  RouteTrace   26.9      12.9    11.4    0.82    6.14   1.19
22.  DestEntry   104.5      57.6    50.3    0.84    8.47   1.15
24.  ReadEasy     21.3      10.7    10.1    0.62    5.13   1.03
25.  ReadHard     30.9      15.3    13.7    0.64    5.73   1.04
28.  MapEasy      17.4       6.6     6.7    0.65    4.53   0.96
29.  MapHard      25.2      12.7    10.7    0.74    5.63   1.10

ID   Task        Miss%_CHMSL  Miss%_LVD  RT_CHMSL  RT_LVD

 1.  Coins          26.4        51.2       2.39     5.94
 2.  Cassette       27.2        44.5       2.30     4.63
 3.  HVAC           34.8        72.3       2.69     6.24
 5.  ManualDial     30.2        24.5       2.30     5.94
 4.  RadioEasy      33.3        72.7       2.38     5.93
14.  RadioHard      34.2        38.5       2.77     5.42
16.  CDTrack7       15.8        46.3       2.19     5.49
17.  RouteTrace     33.1        41.8       2.96     6.67
22.  DestEntry      27.5        31.4       2.72     6.62
24.  ReadEasy       30.2        48.1       2.19     5.78
25.  ReadHard       18.8        26.9       2.26     5.76
28.  MapEasy        33.3        51.8       2.24     5.46
29.  MapHard        32.2        46.5       2.61     6.83

ID   Task           S1       S2

 1.  Coins       -0.9908  -0.7002
 2.  Cassette    -1.9540  -1.9013
 3.  HVAC        -1.5675   2.3859
 5.  ManualDial   0.2929  -1.1171
 4.  RadioEasy   -2.0342   2.0826
14.  RadioHard   -0.7468   1.0657
16.  CDTrack7    -0.9041  -1.8630
17.  RouteTrace   1.8745   1.9732
22.  DestEntry    6.6564  -0.2774
24.  ReadEasy    -0.7083  -0.4805
25.  ReadHard     0.4616  -1.9594
28.  MapEasy     -1.3437  -0.4140
29.  MapHard      0.9641   1.2055


APPENDIX C. DATA AND SCORE RESULTS FOR EXAMPLE 3

Table C1. Metrics examined for Validation Example 3 with their sources.
(Note: Duplicate data from the Tables in the five sources are as
indicated.)


    Group    Metric                            Abbreviation

1.  Task     task completion time              TaskTime
2.  Vehicle  lane exceedances                  LaneX
3.  Glance   total glance time (*)             TGT
4.           number of glances (*)             Glances
5.           mean single glance duration (*)   MSGD
6.           mean single glance duration (**)  MSGDroad
7.  Driver   Task errors                       Errors

                Table Reference
    Group    a.   b.   c.      d.  e.

1.  Task     T13  T13              T12
2.  Vehicle  T15  T15  T4      T3  T14
3.  Glance   T8   T1   T2          T7
4.           T10  T2   T3      T3  T9
5.           T10  T2   T3      T3  T9
6.           T12                   T11
7.  Driver   T17                   T16

Notes:
(*) glances to displays or controls of device under test
(**) glances to road (not used)
a. Dingus (1987) [58]
b. WierwiUeetal. (1988) [60]
c. Dingus et al. (1989) [59]
d. Dingus and Klauer (2008) [63]
e. Dingus et al. (1986) [63]

Table C2. Data and Task Score Results S1 and S2 for Validation Example
3. A. The original data (from sources shown in Table C1). B. Task
Scores from PCA. C. The level of "complexity" assigned in [63]. The
definitions of the tasks are given in [59, Table 1].

A.
Task Type     Count  Task           TaskTime  LaneX    TGT  Glances

Conventional   1.    PowerMirror      10.65     21    5.71   6.64
               2.    TuneRadio        13.18     10    7.60   6.91
               3.    Temperature       4.89      8    3.50   3.18
               4.    Vent              2.13      0    1.13   1.83
               5.    InfoLights        2.24      3    1.75   2.12
               6     Balance           3.73      2    2 23   2.59
               7.    Speed             1.07      0    0.78   1.26
               8.    FollowTraffic     1.30      0    0.98   1.31
               9.    FuelRange         3.95      5    3.00   2.54
              10.    FuelEconomy       3.82      3    2.87   2.48
              11.    Defrost           3.56      3    2.86   2.51
              12.    Time              1.36      0    1.04   1.26
              13.    Tone              2.57      1    1.59   1.73
              14.    RemainingFuel     1.99      1    1.58   1.52
              15.    Fan               2.54      1    1.95   1.78
Navigation    16.    RoadName+        15.87      8   10.63   6.52
              17.    DestDist+         1.99      0    1.83   1.73
              18.    DestDirect+       2.34      0    1.57   1.31
              19.    CrossStreet+     11.08      8    8.63   5.21
              20.    RoadDist+        13.04      9    8.84   5.78
              21.    CorrectDirec+     3.72      1    2.96   2.04
              22.    ZoomLevel+        5.27      4    4.00   2.91
              23.    Heading+          5.72      3    3.58   2.76

A.                                       B.            C.
Task Type     Count  MSGD  TaskErrors    S1      S2    Level [63]

Conventional   1.    0.86       0       2.660  -3.251  Complex
               2.    1.10       2       2.862  -1.628  Complex
               3.    1.10       0       0.149  -0.7%   Complex
               4.    0.62       0      -2.033  -0.549  Simple
               5.    0.83       0      -1.375  -0.530  Complex
               6     0.86       0      -1.062  -0.487  Complex
               7.    0.62       0      -2.343  -0.431  Simple
               8.    0.75       0      -2.119  -0.213  Simple
               9     1.19       0      -0.308  -0.203  Complex
              10.    1.14       0      -0.566  -0.075  Complex
              11.    1.14       0      -0.589  -0.073  Complex
              12.    0.83       0      -2.021  -0.064  Simple
              13.    0.92       1      -1.460  -0.024  Moderate
              14.    1.04       0      -1.477   0.150  Moderate
              15.    1.10       0      -1.223   0.202  Simple
Navigation    16.    1.63      10       4.397   0.217  Complex
              17.    1.06       2      -1.346   0.430  Simple
              18.    1.20       2      -1.288   0.737  Simple
              19.    1.66      14       3.450   0.950  Complex
              20.    1.53      20       4.023   1.000  Complex
              21.    1.45       5      -0.227   1.192  Complex
              22.    1.40      14       0.893   1.406  Complex
              23.    1.30      22       1.003   2.042  Complex (*)

Note (*) not classified in [63] but here assumed it would have been
rated "Complex" from its rating in [58] as "High Attentional Demand"


APPENDIX D. DATA FOR EXAMPLE 4

Ranney et al. (2011) [65] did not always provide mean data for the tasks and metrics. In such cases, they were digitized off the indicated Figures.

Table D1. Data for the four metrics analyzed in Example 4 (digitized
from Figures in [65]).

Count  Task         TaskTime (a)  Glances (b)  MSGD (c)  RT (d)

1.     DialContact     34.05         8.57       1.12     0.995
2.     DestEntry       72.65        25.71       0.76     0.980
3.     DialloDigit     41.49        11.43       1.10     1.010
4.     RadioTune       24.28         8.1        1.04     0.960
5.     TextMessage     77.21        21.43       1.24     1.030

Count   S1 (e)   S2 (f)

1.     -0.9956   0.4911
2.      1.4293  -1.8982
3.     -0.3335   0.6438
4.     -1.7608  -0.6282
5.      1.6606   1.3915

Notes:
a. TaskTime for first trial (65. Figure 21]
b. Glances to device for first trial [65, Figure 28].
c. MSGD to device for first trial [65, Figure 27].
d. RT to one of 6 lights in forward view [65, Figure 20].
e. Score on first dimension.
g. Score on second dimension.


APPENDIX E. TASKTIME CORRELATIONS ACROSS VENUES IN CAMP-DWM STUDY EXAMPLE 2

Table E1. TaskTime and TSOT Mean Data from CAMP-DWM study Appendices
`[52].

                Appendix[right arrow]  Q5      Q6     Q8
Count  TaskNum  TaskName               static  TSOT   simulator

 1.     1.      Coins                   9.68    6.98   17.57
 2.     2.      Cassette               11.15    6.37   17.42
 3.     3.      HVAC                    8.00    5.55   11.55
 4.     4.      RadioEasy               5.97    4.66   11.09
 5.     5.      ManualDial             16.65   11.66   28.60
 6.    14.      RadioHard               9.70    7.3    19.04
 7.    16.      CDTrack7               14.67    8.34   18.63
 8.    17.      RouteTrace             14.14   11.55   42.66
 9.    22.      DestEntry              64.14   48.53  135.30
10.    24.      ReadEasy               14.23    9.59   23.42
11.    25.      ReadHard               19.26   13.22   35.43
12.    28.      MapEasy                10.43    8.06   19.22
13.    29.      MapHard                14.62   13.41   32.29

       Q23     Q20
Count  track   road

 1.     17.57  17.68
 2.     15.92  14.71
 3.     11.44  10.94
 4.     11.43  10.45
 5.     24.43  26.78
 6.     16.70  15.39
 7.     18.37  16.76
 8.     26.93   (*)
 9.    104.49   (*)
10.     21.27   (*)
11.     30.85   (*)
12.     17.36   (*)
13.     25.2    (*)

(*) These tasks were tested on the road.

Table E2. Correlation matrix for Table E1, with p values and numbers of
tasks. The high correlation between static TaskTime (with no simulator)
and track TaskTime is in bold, and between static TaskTime and TSOT is
italicized.

              static    TSOT      simulator  track     road

static     r   1         0.9935    0.9837     0.9949   0.8666
           P             1.0E-11   1.5E-09    2.7E-12  1.2E-02
           n  13        13        13         13        7
TSOT       r   0.9935    1         0.9927     0.9976   0.9678
           P   1.0E-11             1.5E-09    3.8E-14  3.5E-04
           n  13        13        13         13        7
simulator  r   0.9837    0.9927    1          0.9949   0.9749
           P   1.5E-09   1.5E-09              2.6E-12  1.9E-04
           n  13        13        13         13        7
track      r   0.9949    0.9976    0.9949     1        0.9821
           P   2.7E-12   3.8E-14   2.6E-12             8.2E-05
           n  13        13        13         13        7
road       r   0.8666    0.9678    0.9749     0.9821   1
           P   1.2E-02   3.5E-04   1.9E-04    8.2E-05
           n   7         7         7          7        7


APPENDIX F. ROTATION OF COMPONENTS IN EXAMPLE 1

To see if the data points from the TaskTime and RT (Figure 6) could be aligned more exactly with the original Task Scores in Figure 3 by using a rotation of the original PCA axes, the PCA components in Figure 1 were treated to the standard Varimax rotation, which is an orthogonal rotation that balances the variances explained by the two components. The rotated factor loadings (they are no longer principal components after rotation) are shown in Figure F1.

The rotated scores on these factors are shown in Figure F2. It is evident by visual inspection that the overall pattern is better aligned to the original TaskTime and RT metrics in Figure 6.

The correlation between TaskTime and the task scores remains high after rotation (r = 0.953, df = 76, p = 2.5E-41). Figure F3 shows that all the rotated S1 scores are within or near the prediction interval, as was the case in Figure 3 with the unrotated S1 scores.

The correlation between RT and the Factor 2 task scores is improved after rotation (r = 0.903, df = 76, p = 1.4E-29). This result is illustrated in the regression plot of Figure F4, which now has a tighter grouping of task scores around the regression line than the unrotated task scores for the original S2 in Figure 5 (r = 0.706).

Generally speaking, rotated scores are no longer orthogonal, meaning that they became correlated. In this particular case, however, the rotation is small enough that the scores are fortunately still orthogonal, with an r = 2.7E-14, or near 0. However, the original raw RT and TaskTime metrics are not strictly orthogonal, because they are weakly correlated (r = 0.365, df = 76, p = 0.001). Hence, the error ellipses around individual tasks, if they were plotted in the two-dimensional space of Figure 6 formed by RT and TaskTime, would not be vertical or horizontal, but tilted, because of the correlation between the two metrics (see General Discussion). Because RT and TaskTime are not orthogonal, their error boundaries are more complicated to evaluate, because the cross-correlation between them must be taken into account in estimating those conjoint error boundaries.

The original RT and TaskTime means can be transformed to the rotated orthogonal scores by Equations (F1) and (F2).

S1_rot = -0.2998 + 0.05342 TaskTime - 0.3002 RT (F1)

S2_rot = -2.578 - 0.00503 TaskTime + 1.0155 RT (F2)

The rotated S1 scores for all 15 metrics were predicted by TaskTime and RT conjointly (r = 0.966, df = 76, p = 2.1E-46, Figure not shown) about equally as well as were the unrotated S1 scores in Figure 11 (r = 0.981, df = 76, p = 7.0E-56). The rotated S2 scores for all 15 metrics were well predicted just by RT (r = 0.903, Figure F1). Yet the rotated S2 scores were even better predicted by RT and TaskTime conjointly (r = 0.907, df = 76, p = 1.4E-29, Figure not shown). Figure F5 therefore plots the predicted rotated S2 scores (S2_rot) by the predicted rotated S1 scores (S1_rot). By visual inspection, Figure F5 is an even better match to the plot in Figure 6 formed by just the two original TaskTime and RT metrics.

APPENDIX G. CONTROL FOR CORRELATION BETWEEN TASKTIME AND S1, RT AND S2 IN EXAMPLE 1

To control for the correlation between TaskTime and S1, and RT and S2, TaskTime and the three RT metrics were dropped from the original dataset analyzed in Example 2. PCA was then performed on the reduced set of metrics. It was then assessed whether TaskTime and RT could accurately reflect the positions of the points in the score space that had been solved for without TaskTime and the RT metrics.

This control gave rise to the loading plot in Figure G1 (now based on only 11 metrics instead of 15).

It can be seen by comparing the loadings for the reduced PCA in Figure G1 (without TaskTime and RT) with the loadings for the full PCA in Figure 1 (with TaskTime and RT) that the pattern of metric loadings in the 2-D space are nearly the same. This result suggests that the task scores of the full and reduced PCA should also show about the same pattern. Figure G2 is the scatterplot of the task scores in the reduced PCA score space of Figure G1.

A visual comparison of the reduced PCA scores in Figure G2 (without TaskTime and RT) with the full PCA scores in Figure 2 (with TaskTime and RT) shows that the pattern of scores is about the same, suggesting the counter hypothesis is correct - TaskTime and RT appear to be good predictors of S1 and S2 even when they are solved without including TaskTime and RT in the PCA analysis. In short, the results of the control are that there is little effect whether TaskTime and the three RT metrics are included in the PCA or not. This result confirms the redundancy of all those metrics loading in Dimension 1, and all those loading into Dimension 2. A statistical validation of the results from this visual comparison between Figure G2 and Figure 2 is given in Appendix A. Appendix A also gives a table with all the correlation results discussed in the Results 2 section and Appendices F and G, with some additional observations supporting these main results.

APPENDIX H. CONTROL FOR CORRELATION BETWEEN TASKTIME AND S1, RT AND S2 IN EXAMPLE 2

For Validation Example 2, Figure H1 shows the prediction of S1 by TaskTime. There were no tasks outside the 95% prediction interval (dashed lines).

The regression test did not flag DestEntry as an outlier (all the points were within the 95% prediction interval given by the dashed lines), but the visual appearance of Figure H1 gives the impression that DestEntry is an outlier. To control for a possible outlier effect, DestEntry was dropped from the correlation and the correlation was still statistically significant (r = 0.875, df = 10, p = 0.0002).

Likewise RT_CHMSL had a statistically significant correlation with S2 (the task scores on the second component) (r = 0.716, df = 11, p = 0.005). Figure H2 shows the prediction of S2 by RT_CHMSL. No task S2 scores were outside the 95% prediction interval (dashed lines).

APPENDIX I. EXPLANATION OF INCONSISTENT RADIOTUNE TASK IN EXAMPLE 2

The Example 2 results showed a slight inconsistency in the location of the RadioHard and RadioEasy tasks when changed from 10 metrics in Figure 10 to only the RT_CHMSL and TaskTime metrics in Figure 12. One explanation for this slight inconsistency comes from an examination of the Miss%_LVD metric vs. the TaskTime metric (Figure 13), another set of two metrics, one from each dimension.

Figure I1 shows that the LVD miss rate during RadioEasy (72.7%) was almost double the LVD miss rate for RadioHard (38.5%) in this closed-road track study. (Indeed, the Miss%_LVD for RadioEasy was also the highest of any task in the open-road CAMP-DWM data [51, Figure 4-3].) The downward direction of the arrow connecting the Easy and Hard radio tuning tasks in Figure I1 agrees with the overall S2 task score downward arrow result in Figure 10, but is the opposite of the CHMSL_RT upward arrow in Figure 12. The much higher LVD miss rate for RadioEasy compared to RadioHard resulted in the combined S2 score for RadioEasy being higher than that for the S2 score for RadioHard in Figure 10. In short, RadioEasy had almost double the lead vehicle deceleration misses than did RadioHard (Figure 13), which overpowered the slight increase in the RT_CHMSL scores shown in Figure 12 for RadioHard vs. RadioEasy in the composite S2 score.

The CAMP-DWM study investigators also noted the high miss rate for detecting lead-vehicle deceleration for a shorter and "easier" task (RadioEasy vs. RadioHard, MapEasy vs. MapHard, and ReadEasy vs. ReadHard) as an artifact of the LVD detection task [51, p. 3-3]: "It should be noted that a portion of the LVD events were not detectable within the task length of short visual-manual tasks. One possible explanation for this may be that in short duration tasks the stimulus is below the optical looming threshold for detection." The CAMP-DWM investigators further noted [51, Section 8.2.2.2, p. 8-5]: "The Lead Vehicle Deceleration (LVD) event was an appealing stimulus for detection during car following...because it had apparent relevance to rear-end crash occurrence." However, they recommend against its use for short tasks, presenting data [51, Figure 8-1] suggesting that the lead vehicle deceleration had not built up to above-threshold looming levels to detect it during the period of short-duration task performance. It follows from the CAMP-DWM hypothesis that a shorter task such as RadioEasy should also have a longer RT to optical looming than a longer task such as RadioHard. However, this deduction is not supported by the CAMP-DWM data, because there is no statistically significant correlation between RT_LVD and TaskTime (r = 0.428, df = 11, p = 0.144). The CAMP-DWM hypothesis is also not supported by the response times to the RadioHard and RadioEasy tasks, as can be seen in a plot of RT_LVD against TaskTime (Figure I2).

Figure I2 shows that the RadioHard task has shorter RTs to the lead vehicle deceleration than RadioEasy. The longer RTs to the RadioEasy task cannot be explained as per the CAMP-DWM hypothesis as an inability to detect the deceleration in the lead vehicle, because if there is a response, the lead vehicle deceleration must have been detected by definition. In other words, this result is contrary to the CAMP-DWM "artifact" hypothesis. This result implies that the reduction in the ability to detect lead vehicle decelerations during some shorter tasks (such as easy vs. hard radio tuning) is a real phenomenon, and is not an experimental artifact of the CAMP-DWM methods. Note that short task times are not always associated with long response times to external events. The two dimensions are orthogonal, so shorter task times can be associated with longer event response times (as in Cluster 2), or with shorter event response times (as in Cluster 3). The attentional effects of secondary task performance on the orienting attention network (i.e., its cognitive demand) is dependent upon those aspects of the task that contribute to its cognitive demand, not its physical demand.

The result in Figure I2 also indicates that the sole use of RT_CHMSL (which is a rapid onset event) in Figure 12, is not sufficient to fully represent the second driver demand dimension because it does not capture driver responses to gradual events, such as a slow lead vehicle deceleration, during some shorter task types (such as RadioEasy). Possible reasons for this finding and its implications are given in Discussion 2.

The results in this Appendix I indicate that the LVD gradual deceleration event (without brake lights being activated), had a higher percentage of misses for a shorter task (RadioEasy) than a longer task (RadioHard). This result has a possible explanation in that the visual system in the brain is primarily responsive to derivatives in space-time [53,54]. Slow changes are not responded to by the visual system, and therefore are not responded to by the orienting attention network [10], which underlies the RT to RDRT stimuli [24]. The right superior parietal lobe [24] is part of the orienting attention network [10], and activation levels in that same location are known to predict the response time to rapid-onset visual events while driving. That is, from the standpoint of a human visual system, a slow deceleration of a lead vehicle is not really an "event" for driving, defined as something with a large change in either space or time (such as brake light activation on a forward vehicle, a vehicle cut-in, etc.). There are hypothesized to be "looming detectors" in the human visual pathway [55] for detecting such slower changes; that is, there are hypothesized to be neurons responsive to objects that increase in size, and others to a decrease in size. During driving, all objects in the forward visual field are increasing in size in the optical flow field. Hence, to detect looming of a lead vehicle, the visual system must make a difficult discrimination between objects that are increasing in size (a decelerating forward vehicle) at a faster rate than the increasing size of objects in the entire rest of the visual field. The visual mechanism for detecting such slight changes in the rate of change of the optical flow field is entirely different than the visual mechanisms for responding to rapid spatio-temporal luminance changes (such as brake light activation on a forward vehicle, or a vehicle cut-in).

Because lead vehicle deceleration without brake light activation has apparent relevance to rear-end crash occurrence [51, p. 8-5], there may be a limitation of the use of an event such as brake light activation or a DRT RT and miss rate to capture looming effects while driving. A practical implication of this result is the conjecture that to reduce rear-end collisions from deceleration of a lead vehicle when the driver lifts off the accelerator pedal for an extended time, without brake light activation, may require either an additional signal to be installed in the rear of vehicles to indicate that deceleration is occurring without the brake pedal having been pressed. Or, a collision avoidance system is required in the following vehicle that is sensitive enough to detect the gradual deceleration of a lead vehicle, and warn the driver in the following vehicle if the distance and time gap to the lead vehicle is becoming rapidly shorter (that is, that rapid looming is occurring).

Published 04/05/2016

Richard Young, Sean Seaman, and Li Hsieh

Wayne State University

Scores on Second Dimension by Task

PowenMirror     3
TuneRadio       3
Temperature     3
Vent            1
InfoLights      3
Balance         3
Speed           1
FollowTraffic   1
FuelRange       3
FuelEconony     3
Defrost         3
Time            1
Tone            2
RemainingFuel   2
Fan             1
RoadName+       3
DestDist+       1
DestDirect+     1
CrossStreet+    3
RoadDist+       3
CorrectDirect+  3
ZoomLevel+      3
Heading+        3

S2 vs. Tasks Grouped by "Complexity"

Vent           1
Speed          1
FollowTraffic  1
Time           1
Fan            1
DestDist+      1
DestDirect+    1
Tone           2
RemainingFuel  2
PowerMirror    3
TuneRadio      3
Temperature    3
InfoLights     3
Balance        3
FuelRange      3
FuelEconomy    3
Defrost        3
RoadName+      3
CrossStreet+   3
RoadDist+      3
CorrectDirec+  3
ZoomLevel+     3
Heading+       3

Table 1. Metnes investigated by Ranney et al. [65].

Metric
Group       Metric                            Abbrev.

Task        Time to complete task             TaskTime
Vehicle     Number of lane crossings          LaneX
            Standard deviation of headway     SDheadway
            Standard deviation lane position  SDLP
Glance      Number of glances                 Glances
            Mean single glance duration       MSGD
            Gaze concentration                GazeCon
            Long glance frequency             LGF
            Long glance proportion            LGP
Event       Car following delay               CF_delay
Detection/  Percent misses                    Miss%
Response    Response time                     RT

Metric
Group       Figure Reference (a)

Task        21
Vehicle     22
            24
            17
Glance      28
            calc. (b)
            26
            32
            calc. (c)
Event       15, 18
Detection/  14, 19
Response    20

Notes:
(a) Figure references are to Ranney et al. [65].
(b) Calculated from data in [64, Figure 27]
(c) LGF divided by Glances.

Table 2. Metric conditions investigated by Ranney et al. [65].

                        Glances
                     Device  Not Road

          1 Trial    I.      II.
TaskTime  Drive (*)  III.    IV.

Note: (*) Task was repeatedly performed during 150 s drive.

Table 3. Correlation between lateral variability metrics and MSGD for
the Examples in tins paper.

Example                        Metric 1  Metric 2  r      df  P

1. Young & Angell (2003) [41]  LaneX     MSGD      0.500  76  3.1E-06
2. Young (2012) [50]           SDLP      MSGD      0.179  11  0.56
3. Dingus (1978) [58]          LaneX     MSGD      0.280  22  0.21
4. Ranney et al. (2011) [65]   LaneX     MSGD      0.293   3  0.63
COPYRIGHT 2016 SAE International
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Young, Richard; Seaman, Sean; Hsieh, Li
Publication:SAE International Journal of Transportation Safety
Article Type:Report
Date:Apr 1, 2016
Words:32093
Previous Article:Smart lighting for enhancing perception of pedestrians based on visual properties.
Next Article:The Dimensional Model of Driver Demand: Extension to Auditory-Vocal and Mixed-Mode tasks.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters