Printer Friendly

The Dimensional Model of Driver Demand: Extension to Auditory-Vocal and Mixed-Mode tasks.

ABSTRACT

The Dimensional Model of Driver Demand is extended to include Auditory-Vocal (i.e., pure "voice" tasks), and Mixed-Mode tasks (i.e., a combination of Auditory-Vocal mode with visual-only, or with Visual-Manual modes). The extended model was validated with data from 24 participants using the 2014 Toyota Corolla infotainment system in a video-based surrogate driving venue. Twenty-two driver performance metrics were collected, including total eyes-off-road time (TEORT), mean single glance duration (MSGD), and proportion of long single glances (LGP). Other key metrics included response time (RT) and miss rate to a Tactile Detection Response Task (TDRT). The 22 metrics were simplified using Principal Component Analysis to two dimensions. The major dimension, explaining 60% of total variance, we interpret as the attentional effects of cognitive demand. The minor dimension, explaining 20% of total variance, we interpret as physical demand. The 22 metrics were well segmented into three groups, using cluster analysis, which independently agreed with the metric groups that were visually evident in the 2-D loadings plot. One cluster was associated with cognitive demand, and the other with physical demand. An additional cluster was discovered which loaded in an opposing fashion to cognitive demand along the cognitive demand dimension, and was called cognitive facilitation. The Dimensional Model, by eliminating the correlation between the task scores on the dimensions, also minimizes the biasing effects arising from interactions between metrics. The extended Dimensional Model allows for a simplified understanding of the effects of all modes of secondary tasks on driver performance.

CITATION: Young, R., Hsieh, L., and Seaman, S., "The Dimensional Model of Driver Demand: Extension to Auditory-Vocal and Mixed-Mode Tasks," SAE Int. J. Trans. Safety 4(1):2016, doi:10.4271/2016-01-1427.

INTRODUCTION

This study extends the scope of the Dimensional Model to secondary tasks commonly performed while driving, including Visual-Manual, Auditory-Vocal, and Mixed-Mode driver-vehicle interfaces. The current study thus extends the Dimensional Model presented in the companion paper [1], which was limited to Visual-Manual tasks in previously published datasets. The extended Dimensional Model, introduced for the first time in the current paper, describes the effect of all modes of secondary tasks on primary driving subtasks in a single Two-Dimensional Model. The dimensions were solved for using Principal Component Analysis, which simplified 22 driver performance metrics to 2 in an analysis of a new dataset.

The driver metrics included in the Model included event detection and response metrics and their surrogates. These loaded into the major dimension, which we interpreted as representing the attentional effects of cognitive demand. We also included surrogate metrics for lateral and longitudinal vehicle variability, including glance metrics, which formed the minor dimension in this study, which we interpreted as physical demand. The goal of the study was to determine if all of the metrics for every task in the study could be reduced to two scores for every task, with little loss in explanatory power, as it was possible to do for Visual-Manual tasks [1].

An independent cluster analysis of the same data confirmed the assignment of the 22 metrics to the 2 dimensions. The cluster analysis identified three clusters, of which two formed opposite poles on the cognitive demand dimension. Thus, although there were three clusters of metrics, there were only two underlying dimensions.

It is not straightforward to extend the Visual-Manual Dimension Model in [1] to Auditory-Vocal and Mixed-Mode tasks, because the metrics for such tasks often behave in distinctly different ways than they do for Visual-Manual tasks. For example, the duration of a task (herein, TaskTime) is not as correlated with vehicle metrics such as lane and speed variability for Auditory-Vocal tasks, as it is for Visual-Manual tasks [2]. There are also cross-modality effects that indicate there can be interference between the visual and auditory modes [3]. For example, many studies have shown that Auditory-Vocal tasks can produce slight delays in responses to a visual event; e.g., [4, Figure E.5]. Such cross-modality interference is assumed by some to support a 1-D central attentional resource "bottleneck" model [1]. Indeed, visual, auditory, and tactile stimuli produce a similar pattern of interference with response time in dual-task situations [5], suggesting a single common central attentional resource that is common between modalities. However, only a few studies in the driving safety field explicitly recognize that there is more than one central attentional resource (e.g. [6]). Indeed, modern cognitive neuroscience has identified three central attentional brain networks (Orienting, Executive, and Alerting attention; see [1] for summary), all of which must be considered for a complete understanding of secondary tasks on driver performance. An Auditory-Vocal task, such as cell phone conversation, causes a slight delay in the processing of visual information, due to its effects on the orienting attention system [7]. However, doing a "moderately" demanding task (which included cell phone conversation in the classifications used in [8]), reduced the relative crash/near-crash risk estimate for drowsiness in naturalistic driving (compared to a baseline matched in time of day, closeness to junction, and other variables) from 55 to 1 to 24 to 1 [9, Table 5]. Although the relative risk of drowsiness still remains, in aggregate, quite high at 24 to 1, the halving of the relative risk by conversation is likely due to the activating effect of the conversation on alerting attention, which by definition countervails drowsiness.

In the current paper, the data from a recent study [10] is examined in an attempt to determine to what degree the Dimensional Model of Driver Demand can be coherently extended from Visual-Manual tasks to pure Auditory-Vocal and Mixed-Mode tasks. The data are represented in a common framework with the same analysis method as used in the companion paper concerning Visual-Manual tasks [1], with similar metric names where they overlap between studies.

The off-road glance metrics in the current dataset are separately analyzed on a single metric basis (that is, metric by metric) in a second companion paper [10]. The current study analyzed these off-road glance metrics in a multivariate analysis with many other driver behavior metrics to adjust for interaction effects. These interaction effects arise from strong correlations between a third metric (say long glance proportion) and two other metrics of interest (say mean single glance duration and total eyes-off-road time). These interaction effects can bias the results in subtle ways when metrics are singly analyzed rather than using a multivariate approach. One advantage of the Dimensional Model is that it is a multivariate method that scores the tasks on orthogonal axes, and therefore it adjusts for biasing effects that may arise from such interactions. The interactions between off-road glance metrics are analyzed in Appendix C, using multivariate and univariate methods to illustrate the differences.

METHOD

Dataset

The dataset analyzed for this study was derived from a video-based "surrogate driving" test using a 2014 Toyota Corolla vehicle in which 24 test participants performed 15 different secondary tasks (see [10] for details of the test protocols). Unlike the Analysis of Variance statistical method, the PCA statistical method requires only the mean across participants for each metric for each task, not the data at the individual participant level. The complete dataset analyzed in this study is given in Appendix Table A1.

Tasks

All 15 secondary tasks in the dataset in Appendix Table A1 were analyzed in the main analysis reported here. See [10] for further description of individual tasks and task steps. Two of the tasks (0-Back and 1-Back) were pure Auditory-Vocal tasks with no manual steps, so they met the SAE J2988 [11] definition of Hands-Free because they did not require more than 1 button press. Five of the tasks were Mixed-Mode tasks, which were Auditory-Vocal interface accompanied by a touch-screen visual display that updates to prompt for further voice or manual inputs and/or to provide confirmation of voice inputs already provided to the voice recognition system by the driver. Four of those five Mixed-Mode tasks were Hands-Free as they required only a single manual button press (of a voice button in this case), and therefore they met the Hands-Free definition in SAE J2988 [11]. The Mixed-Mode Navigation Destination Entry task (NavDest) required a button press to activate voice recognition, and then a touch screen press at the end, so it did not meet the SAE J2988 definition of Hands-Free because it had more than 1 button press. There were also 7 Visual-Manual tasks with no Auditory-Vocal interface modes. Five of the Visual-Manual tasks had a matching counterpart Mixed-Mode task, so it can be determined to what extent using a voice response, instead of a manual response, affected the driver performance metrics. Appendix C gives an analysis of just these paired tasks.

As mentioned above, five of the tasks in this study had an Auditory-Vocal interface accompanied by a visual display. We are careful not to refer such tasks as "voice" tasks as it would be misleading. The term "voice" task we reserve for a pure Auditory-Vocal task with no visual display, and at most one button press to activate the voice system, thereby meeting the Hands-Free definition in SAE J2988. The "n-Back" tasks in this study meet this definition, but the other tasks do not. The term "Mixed-Mode" (or its synonym "Multi-Modal" should be used to refer to tasks which have an Auditory-Vocal interface accompanied by a visual display, or accompanied by both a visual display and a manual interface such as button presses, knob turns, or presses on a touch screen. The term "Visual-Manual" should be reserved for tasks with a visual display and with manual input to the system, and no auditory or vocal interface mode.

Task Trials

All metrics were averaged over three task trials while driving, after training to criterion (see [10] for further description of training protocols). The exception to the three trials was for assessment of adherence to the NHTSA Guidelines [12] Visual-Manual glance criteria in a second companion paper [10], for which only the first trial was used, in accordance with the NHTSA Guidelines [12] protocol. In the current paper, only the averages across all 3 trials were used, in accordance with the ISO DRT protocol [4] and Alliance Guidelines protocol [13], to improve the accuracy of the estimates.

Metrics

A number of driver performance metrics or their surrogates were collected, with some key metrics of interest listed below:

* Task metrics: task completion time (TaskTime); number of errors made during task performance (Errors).

* Glance metrics away from the forward view: Total Eyes-off-Road Time (TEORT_NR), where NR is an abbreviation for "Not Road"; number of glances (Glances_NR); mean single glance duration (MSGD_NR).

* Glance metrics to displays or controls of device under test: total glance time (TGT_TR), where "TR" is an abbreviation for "Task-Related"; number of glances to the in-vehicle system (Glances_TR); mean single glance time (MSGD_TR).

* Event detection and response metrics: percent of tactile detection response task (TDRT [4]) events missed (Miss%_tdrt); response time to tactile events (RT_tdrt).

The abbreviations and definitions for the complete set of 22 metrics are given in Appendix A, Table A2. The main body of this paper analyzes the metrics for glances off the road as well as glances to the in-vehicle displays and controls of the device under assessment. Appendix C analyzes just the glance metrics off the road, to investigate the effects of interaction between these glance metrics. Appendix C gives an analysis of just the off-road glance metrics.

Statistical Methods

Principal Component Analysis

Primary and secondary task performances were measured and the data were analyzed using Principal Component Analysis (PCA). Citations for the PCA method are given in the companion paper [1]. Simply, PCA reduces the dimensionality of the data set from that of the original number of variables to a few major components; in the current study, as in the companion paper [1], all of the driver demand metrics for all tasks were reducible to only two component scores, with little loss in explanatory power for the entire dataset of all metrics.

The PCA in the statistical package in Stata 13.1 [14] was used for the analysis, with results checked and plotted in Minitab[R] 17.2.1 [15]. Minitab changed the sign of the y-axis, but otherwise the results were identical to the Stata results. The sign of an axis is purely arbitrary in PCA; sometimes an axis will point in a positive direction to indicate an increased effect, and sometimes in a negative direction. Regardless of the direction of an axis, all points in the 2-D space will have the same positions with respect to one another regardless of the sign of the axis. To make it easier for the reader, in the results given here and in the companion paper [1], all axes (whether physical or cognitive) in the positive direction indicate increased demand relative to the average of the tasks in the dataset, while all axes in the negative direction indicate decreased demand relative to the average of all tasks in the dataset.

The PCA used in this study operated on the correlation matrix. In a correlation, the scales of the metrics are converted to a mean of zero by subtracting the mean, and dividing by the standard deviation so the length of the axis is one (i.e., a Z-score). The length of the metric loadings on a given component are then of length one (that is, the square root of the sum of the squared loadings of each metric on each component is one). The metric loadings, although of length one, are not themselves necessarily orthogonal (that is, the correlation between them is not necessarily zero). However, the scores of each secondary task on each component are guaranteed by the PCA method to be orthogonal (i.e., have correlation zero) to the scores of the secondary tasks on every other component. In the most commonly used PCA method, as employed here, the ratio of the variance of each set of scores for each component is set equal by the PCA method to the ratio of the variances explained by each component. For example, assume 60% of the variance in the whole dataset is explained by the first principal component, and 20% by the second component, for a ratio of 3 to 1. Then the variance of the set of task scores for the first set of task scores will be in a ratio of 3 to 1 relative to the variance of the set of task scores for the second component. In this way, the distances in the score plot in the x and y directions are equated for the relative variance explained by each score vector.

To say that the original dataset with 22 metrics can be reproduced to a high degree of accuracy with only the 2 score metrics is not a trivial statement. Literally, the 2 scores can be used to predict the original scores by back-transforming the scores to the original metric scales if desired, with the removal of all the statistical "noise" represented by the higher-order components. When the transformed metrics are plotted alongside the original metrics, the total sum of the squared distances between them will explain 80% of the variance in the original data set, and provide potentially improved estimates of the original task values (by removing statistical noise). The new estimates of the "true" values of the original metrics are based on the information contained in all 22 metrics. Therefore, the transformed scores should theoretically be closer to the "ground truth" for every original metric, because it effectively has the statistical noise removed, compared to the original metric which includes statistical noise. Basically, the PCA method allows the extraction of the underlying structural components in the dataset. Transformation of the PCA scores back to the scales of the original dataset to predict the original 22 metrics was not done in this study, but could be done in future research by anyone, solely based on the data given in Appendix A, Table A1. In addition, new tasks that were not in the original dataset can be scored using the same set of principal components for validity checking purposes, if all 22 metrics were also available. However, if the metrics set could be reduced to only 2 or 3 original metrics, collecting data and scoring new tasks with regard to previous components would be easier. Again, these additional analyses are left for future research.

There are variations of the PCA method which analyze the covariance matrix rather than correlation matrix, or rotate the scores after extraction to "simplify" the alignment of the scores with the axes, but such alternative analyses were not systematically explored in this study. Any rotation of the score plots destroys the orthogonality in the score sets, and there is no unique mathematical solution as there is with using the PCA scores themselves, as was done here. Transforming the length of the score axes to unity rather than according to the variance explained would create a uniform score space, such that the confidence interval around each score in the final 2-D plot would be a circle rather than an ellipse, but the issue of confidence intervals around individual points in the score plot, and alternative scales for the final score axes, are left for future research.

The PCA method typically requires the number of tasks to exceed the number of metrics, or a "singular matrix" error often results in a PCA statistical program. In the current study, the number of metrics (22) did exceed the number of tasks (15), but the PCA was completed properly without error, so all 22 metrics were retained for the main analysis.

This study was designed and the data collection was completed before the publication of the SAE Recommended Practice J2944 [16] providing common definitions of driving performance metrics and statistics. However, the definitions of driver performance metrics used in this study are in given Appendix Table A2. We note that in the companion paper [1], the same two dimensions were found after analysis of the datasets in four different studies, despite wide differences in the metric definitions and venues (simulator, closed-road, or open-road). The PCA method is robust to such differences in venues and definitions, because it operates on relative metrics (taken with respect to the average of each metric across tasks), and standardized metrics (the scale for each metric is standardized to length 1 by dividing by the standard deviation across tasks). By avoiding absolute metrics, any slight differences in definitions are ameliorated. In addition, SAE J2396 [17] provides definitions for the glance-related metrics used in the four studies, and the glance definitions as used in this study were consistent with that Recommended Practice.

Partial Correlation Analysis

The methods of partial correlation analysis and methods for reducing the effects of interactions between the metrics in Appendix C are based on equations and causal diagrams in [18, Sections 3.3 and 3.4]. The Stata statistical program [14] partial correlation command "pcorr" was used to check the partial correlation calculations. These methods of adjustment in statistics are conceptually similar to those used in epidemiology [19,20], and the interested reader can consult most textbooks in statistics or epidemiology to learn more about them. The reader unfamiliar with these methods can still however obtain an understanding of the results of this paper without having detailed knowledge of these statistical methods.

RESULTS

The correlation table between the metrics is given in Appendix Table B1. A high degree of correlation was seen between the metrics within particular subsets, and little correlation between metrics in different subsets. This result was consistent with previous studies [21, Table 2]. PCA takes advantage of these correlations to eliminate redundancy between the metrics. Hence, in this study, all 22 metrics could be reduced to only two using PCA. Dimension 1 accounted for 60% of the total variance in the dataset, while Dimension 2 accounted for 20% of the total variance (that had not been explained by Dimension 1). The "scree" plot showing the diminishing percentage of the total variance explained by the successive components out to 22 is shown in Figure 1.

Figure 1 shows that the components beyond the first two contributed little to the total variance. They are essentially statistical noise, and so the higher-order components can be safely discarded, greatly simplifying the analysis. That is, Figure 1 shows that only 2 dimensions can reproduce 80% of the variance on all 22 original driver performance metrics. Hence, the driver performance results for all 15 tasks for all 22 metrics can be reproduced to a high degree of accuracy with only two scores for each task on the respective dimension. (Note: We here use the term "driver performance" to specify the wider set of event detection and response metrics combined with lane and speed maintenance metrics. The narrower set of only lane and speed maintenance metrics are commonly referred to as "driving performance" metrics [13].) Following PCA, the loadings of the metrics on the two main components are plotted in Figure 2.

Using the "Cluster Variables" option in the Minitab program [14] on the original mean data for all 22 metrics (the same data in Appendix Table A1 that was entered into the PCA) objectively determined three major metric clusters (Figure 3).

The objective and independent cluster analysis depicted in Figure 3 confirms the visual impression from Figure 2 that there are three major metric clusters, which we here term physical demand, cognitive demand, and cognitive facilitation. (Note that these three clusters of metrics are embedded in the 2-D loading space shown in Figure 2, so there are still only two dimensions in the data despite the fact that there are three clusters of metrics.)

Physical Demand Metrics

The gray lines in the left half of Figure 3 indicate what we here term physical demand metrics, which are the metrics in the upper portion of Figure 2 (gray diamonds, surround by gray ellipse). These metrics load positively on Dimension 2 (higher physical demand). (Note that the axes have swapped places from those in study [1] using Visual-Manual tasks only. Physical demand is now on the y-axis, and the attentional effects of cognitive demand on the x-axis. This result indicates that the inclusion of Mixed-Mode and Auditory-Vocal tasks has made the cognitive demand dimension the major axis, and the physical demand dimension the minor axis.)

The physical demand metrics cluster is composed of surrogates for driving performance (lane and speed variability), as described in the companion paper analyzing Visual-Manual tasks only [1]. Lane and speed maintenance metrics ("Driving Performance"), when measured during actual rather than simulated driving, are regarded as "ground truth" metrics [22]. The surrogate metrics for these ground truth driving performance metrics for physical demand are shown in the "physical demand" metrics cluster in Figures 2 and 3, and include TaskTime, glances to the road (Glances_RD), total glance time to the road (TGT_RD), Total Eyes-Off-Road Time (TEORT_NR**, a NHTSA Guidelines metric), number of manual steps (Steps_man), number of task-related glances (Glances_TR) and total task-related glance time (TGT_TR***, an Alliance Principle 2.1A metric). These metrics are all tightly correlated with each other (see Appendix B1 for correlation matrix), and therefore are essentially redundant, with such high correlations with each other that they are almost collinear. In other words, the metrics that load into relatively high positions on Dimension 2 are nearly-equivalent "surrogate" metrics for the "ground-truth" driver performance metrics concerning lane and speed maintenance, because of the high correlations shown in [1], and they are therefore effective surrogate metrics for physical demand. Hence, they are suitable for representing Dimension 2 in this study, which is the physical demand dimension; it was Dimension 1 in the companion paper [1]. In other words, the metrics that load into relatively high levels of Dimension 2 are equally valid "surrogate" metrics for the "ground-truth" driver performance metrics concerning lane and speed maintenance, because they are highly correlated with them as shown in [1], and are therefore equally suitable for representing the physical demand dimension, which, in this study, is Dimension 2. That is, the PCA method provides the optimal combination of the information in all these physical demand metrics to form Dimension 2. For shorthand, we therefore here label Dimension 2 as a physical demand dimension. Note that this dimension also contains the subjective metric of a driver's perceived workload for Visual-Manual secondary tasks, as described in [1, Example 1]. However, the physical demand dimension mainly concerns metrics that can be objectively measured from direct physical observation (lane exceedance, lane and speed variability, number of glances to a device, task time, etc.).

There is tight coupling of the physical demand metrics with lateral and longitudinal variability driving performance as shown in the Dimensional Model results in [1] for Visual-Manual tasks. This result suggests that it may also be reasonable to regard the physical demand dimension as a "driving performance" dimension rather than a "physical demand" dimension. The non-vehicle metrics (such as TEORT, TaskTime, etc.) that also load onto this dimension might then be regarded as "surrogate" metrics for the "ground truth" driving performance measures of lane crossings and speed variability in the Alliance Guidelines Principle 2.1B.

These physical demand metrics are as tightly correlated with each other, more so than they are correlated with metrics in other cluster. This result is what is effectively shown by the cluster analysis in Figure 3. As an independent verification, consider the matrix of physical demand metric correlations given in Appendix B, Table B1, upper left corner, with the metric names colored in grey, along with the subset of correlations. These correlations have a mean r value of 0.73 within the physical demand cluster. The mean r value of the metrics in the physical demand cluster (gray metric names) with those in the cognitive demand cluster (red metric names) is only 0.30, and with the cognitive facilitation cluster (green metric names) is only -0.26.

Cognitive Demand Metrics

The cluster analysis in Figure 3 also identifies a second major cluster of metrics (center red lines). These metrics are those in the lower right quadrant of Figure 2 (red squares surrounded by red ellipse). This cluster also has tightly-correlated metrics that are largely redundant with each other as shown in Appendix B1 (metric names colored in red), but they are only weakly uncorrelated with metrics loading onto the physical demand dimension (metric names and correlations colored in red). Metrics that are only weakly correlated with one another form different clusters in a cluster analysis (Figure 3), or load into different dimensions in a PCA (Figure 2).

As an independent verification, consider the matrix of cognitive demand metric correlations given in Appendix B, Table B1, middle columns and rows, with the metric names colored in red, and the subset of cognitive demand metric correlations colored in red (center of table). These correlations have a mean r value of 0.73 within the cognitive demand cluster red submatrix (center of Table). The mean r value of the metrics in the cognitive demand cluster (red metric names) with those in the physical demand cluster (gray metric names) is only 0.30. The mean r value of the metrics in the cognitive demand cluster (red metric names) with those in the cognitive facilitation cluster (green metric names) is -0.69 (cells colored yellow), indicating that the cognitive facilitation metrics are an "opponent" process to the cognitive demand metrics. The two clusters are at opposite ends of the same cognitive dimension; the cognitive facilitation cluster does not form a third dimension.

The cognitive demand metrics were previously shown for Visual-Manual tasks to load onto the dimension for event detection and response in study [1], for either remote DRT events or real vehicle events such as brake light activation or deceleration of a forward vehicle. This result suggests that it may also be reasonable to regard this dimension as an "event detection and response" dimension rather than a "cognitive demand" dimension, for those who prefer behavioral rather than cognitive science terminology. The metrics besides event detection and response (such as MSGD, LGP, etc.) that also load onto this dimension might then be regarded as "surrogate" metrics for the "ground truth" driver performance metrics of event detection and response to external road events or the detection response task (DRT) [4], as suggested by the CAMP-DWM study findings [39,40] using brake light activations and lead vehicle decelerations.

It is of practical and theoretical interest that mean single glance duration (MSGD) clusters with these cognitive demand metrics, and not with the physical demand metrics. This finding is true for both task-related MSGD (metric 9 in Appendix Table B1, MSGD_TR***, Alliance Principle 2.1A) or off the road MSGD (metric 12 in Appendix Table B1, MSGD_NR**, NHTSA Guidelines).

Likewise, the NHTSA Guidelines LGP metric (metric 13 in Appendix Table B1, LGP_NR**) clusters with the cognitive demand metrics and not the physical demand metrics. Thus, the PCA results (Figure 2) and the cluster analysis results (Figure 3) independently suggest that MSGD and LGP are more effective surrogates for the event detection and response (cognitive demand) dimension than they are effective surrogates for metrics indicating physical demand. The loading of MSGD into the cognitive rather than the physical dimension occurs despite the fact that the TGT and TEORT metrics in the physical cluster are formed by the product of MSGD and the number of glances, and thus MSGD adds no information to TGT or TEORT beyond the number of glances. In other words, MSGD and LGP provide independent information about the cognitive dimension of driver demand that is not predicted by TGT, TEORT, or the total number of glances to the device or off the road during a secondary task.

This result may seem strange that multiplying two variables could give rise to a third variable (TEORT or TGT) that is now uncorrelated with one of the original variables (MSGD). However, a multiplication of two variables is a non-linear transformation, so this result is quite possible mathematically and in fact occurs in driving, whether in the surrogate driving venue used in this paper, or as shown in [1], in simulator, closed-road, and open-road venues. These results suggest that any causative effect on lateral and longitudinal vehicle variability) of TEORT or TGT arises solely from the number of glances and not the total duration of those glances, the mean single glance duration, or the long glance proportion (LGP) of those glances.

The converse result is also suggested by the Dimensional Model. The PCA of the data in the current study suggests that MSGD and LGP are independent indicators of event detection and response performance (cognitive demand), and the physical demand metrics have little or no influence on event detection and response. In other orders, event detection and response behavior of a driver is not predictable from number of glances, TGT, or TEORT, or any other physical demand metric. Because the LGP and MSGD metrics in the cognitive cluster are related to the detection and response to events, we believe it is correct to label this dimension a "cognitive" dimension in this study as well as in the companion study [1] of Visual-Manual tasks. However, it is equally valid to label this dimension as an "event detection and response" dimension as previously mentioned, for those who prefer behavioral rather than cognitive science terminology.

The PCA and cluster analysis thus confirm in independent analyses that each metric cluster forms a separate dimension for the set of tasks under the experimental conditions of this study. An objective grouping of the metrics using cluster analysis on the original data Young et al / SAE Int. J. Trans. Safety / Volume 4, Issue 1 (April 2016) 77 (Figure 3), closely corresponds to the subjective visual impression of the metric groups in the PCA loading space in the Dimensional Model (Figure 2).

Cognitive Facilitation Metrics

The cluster analysis in Figure 3 identified a third group of metrics, not previously well recognized explicitly (green lines to right of Figure 3). These metrics are identified by the green ellipse around the green circles to the left of Figure 2. These metrics are on the negative x-axis, and so form an opponent process to metrics in the cognitive demand cluster on the positive x-axis (red ellipse around the red squares in Figure 2). As such, this cluster of metrics does not represent a third dimension, it represents instead simply an "opposite" or "complement" to the cognitive demand cluster. Metrics in the cognitive facilitation cluster are negatively correlated with the metrics in the cognitive demand cluster. We have termed these metrics "cognitive facilitation," as they have an opposite effect from the "cognitive demand" metrics.

In other words, the cognitive demand and physical demand metrics were only weakly correlated with one another, and so formed different clusters in the cluster analysis (Figure 3). Or, these two groups load into different dimensions in a PCA (Figure 2). However, metrics that are highly negatively correlated with metrics in another cluster can form a cluster at the opposite pole of one dimension, as did the "cognitive facilitation" metrics (Figure 2, green symbols).

As an independent verification, consider the matrix of cognitive demand metric correlations given in Appendix B, Table B1, with the metric names colored in green, and the subset of cognitive demand metric correlations colored in green (lower right corner of Table B1). These correlations have a mean r value of 0.70 within the cognitive facilitation cluster green submatrix (lower right of Table). The mean r value of the metrics in the cognitive facilitation cluster (green metric names) with those in the physical demand cluster (gray metric names) is only -0.26. The mean r value of the metrics in the cognitive cluster (green metric names) with those in the cognitive demand cluster (red metric names) is -0.69 (cells colored yellow), indicating that the cognitive facilitation metrics are an "opponent" process to the cognitive demand metrics. The two clusters are at opposite ends of the same cognitive dimension; the cognitive facilitation cluster does not form a third dimension.

A relatively high score on the cognitive facilitation metrics indicates the opposite of cognitive demand, or what we here term "cognitive facilitation." In this study, the cognitive facilitation metrics were all glance metrics that involved glances to the road, or glances to "other" locations than the road or to the displays and controls of the device. Glances to other locations are not a new "dimension," because if participants are glancing to the device, they cannot simultaneously be glancing to the road or to "other" locations than the device or the road. Likewise, glances to the road are not a new dimension, because if participants are glancing to the road, they cannot simultaneously be glancing to the device or to "other" locations than the device or the road. The cognitive facilitation metrics therefore just represent the opposite or complement of the metrics at the positive end of the cognitive dimension, and do not represent another dimension.

Dimensions Summary

In summary, the overall results for the metrics show that the first two dimensions together captured 80% of the total variance in all 22 metrics assessed. That is, just two metrics from every task can recreate the original data in Appendix A, Table A1 that required all 22 metrics to describe. In a PCA, there are as many potential dimension as metrics (22). As shown in the scree plot (Figure 1), the higher dimensions from 3 to 22 each individually constitute such a small percentage of the remaining 20% of the variance (that is, that which is not explained by the two main dimensions), that the higher dimensions beyond 2 represent nothing more than statistical noise in the data, and so they are not considered further in this study.

An independent cluster analysis of the original mean data in Appendix A, Table A1, confirmed these findings. Cluster analysis successfully segmented the metrics into three groups in the two-dimensional loadings space (Figure 2). We termed the dimensions physical demand, and cognitive demand. The cognitive dimension had one end being "positive" cognitive demand, and the other end "negative" cognitive demand, that we here termed cognitive facilitation. The physical dimension has only positive physical demand, because the metrics that load on it cannot physically have negative values (e.g., there cannot be a negative TaskTime, or a negative number of glances, or a negative count of manual task steps).

Task Scores

The tasks can now be scored on these two components by forming the sum of the cross-product of the 22 metrics for each task with the loading coefficients for each component as per standard PCA methods. Figure 4 plots the task scores so calculated.

Cognitive Demand Task Scores

Figure 4 shows that all Mixed-Mode tasks (green symbols in center column) scored lower on the first component (cognitive demand) than all Visual-Manual tasks (red arrowhead symbols in the right half of Figure 4). This result confirms previous research showing that the attentional effect of cognitive demand is higher for Visual-Manual tasks than for Mixed-Mode tasks [23]. This point is emphasized by the fact that all the Mixed-Mode interfaces in the current study reduced the attentional effects of cognitive demand as measured by Score 1, even compared to a single press of a radio preset button (RadioPreset_VM at bottom of Figure 4). Despite this reduction in the attentional effects of cognitive demand, the Mixed-Mode tasks still had more attentional effects of cognitive demand than baseline driving with no task (magenta triangle at left), and the pure Auditory-Vocal 0-Back task (leftmost blue circle). The hands-free Mixed-Mode tasks (green diamonds) had about the same cognitive demand as the pure Auditory-Vocal 1-Back task (blue circle near green diamonds).

Physical Demand Task Scores

On the physical demand dimension (y-axis in Figure 4) there was a relatively wide range of task scores within the Mixed-Mode types (green symbols) and Visual-Manual types (red symbols). Figure 4 shows that the hands-free Mixed-Mode types (green diamonds) had higher physical demand (as measured by Score 2) than most Visual-Manual types (red rightward triangles in bottom right of Figure 4). The higher physical demand for Mixed-Mode tasks may seem counter-intuitive given that there was less TEORT for Mixed-Mode tasks compared to Visual-Manual tasks (see second companion paper [10]). This anomaly results from the indirect interaction effects of the high correlation between LGP and MSGD on TEORT (see Appendix C).

In other words, the Mixed-Mode tasks consistently showed reduced attentional effects of cognitive demand compared with Visual-Manual tasks, but they also generally showed relative increases in physical demand according to their PCA scores. This result is illustrated more clearly by considering the paired tasks, which have identical task start and goal states, but one member of the pair is performed with a Visual-Manual interface, and the other with a Mixed-Mode interface. Figure 5 replots the data in Figure 4 with just the paired tasks.

In Figure 5, the upward and leftward pointing red dashed arrows indicate a decrease in cognitive demand (Score 1) but an increase in physical demand (Score 2) for the same task with a Mixed-Mode interface vs. a Visual-Manual interface. The arrows in Figure 5 show that four of the five paired tasks had relatively more physical demand and relatively less cognitive demand for the Mixed-Mode condition than they did for the Visual-Manual condition (dashed red arrows pointing in 10 o'clock direction in Figure 5). That is, Mixed-Mode tasks trade off a reduction in the attentional effects of cognitive demand (x-axis), for an increase in physical demand (y-axis), according to this dimensional analysis.

The exception was the NavDest task, with the highest levels of physical demand (top of Figure 5), which the gray dashed arrow shows there was no increase in physical demand for the Mixed-Mode task (green square) compared to the same task executed with Visual-Manual controls (red rightward triangle). That is, the non-hands-free Mixed-Mode NavDest task (green square) scored about the same physical demand as did the Visual-Manual NavDest_VM task, yet had a relative decrease in the attentional effects of cognitive demand.

Task Score Clusters

Figure 6 is a cluster analysis of the task scores in Figure 4.

The five major clusters found in Figure 6 are identified by the elliptical overlays on the points in Figure 4, as shown in Figure 7.

Figure 7 confirms the finding shown in Figure 4 that hands-free Mixed-Mode tasks (green ellipse cluster #2) had a higher physical demand (y-axis) than the Visual-Manual tasks in cluster 4 (orange ellipse in lower right of Figure 7), but a lower attentional effect of cognitive demand (x-axis).

Figure 7 also illustrates the 2 groups of outlier clusters found in the cluster analysis in Figure 6. Cluster 5 is composed of Visual-Manual destination entry (NavDest_VM) and searching of a point-of-interest using Bing (BingPOI_VM). The scores of this tasks were exceptionally high in physical demand compared to the other Visual-Manual tasks, and exceptionally high in cognitive demand compared to the Auditory-Vocal and Mixed-Mode tasks. Going to a Mixed-Mode implementation of the navigation destination entry task reduced the cognitive demand (x-axis), but the physical demand stayed just as high as it was for the Visual-Manual mode of the task.

The increased physical demand for Mixed-Mode tasks shown in Figure 7 is a surprising result, given that consideration of single metrics at a time in [10] indicates that there are fewer glances to the screen and smaller TEORT values for Mixed-Mode tasks than Visual-Manual task [10]. The reason for these surprising result has to do with the high correlation between MSGD and LGP, which acts a "suppressor" on TEORT (see Appendix C). When the task data are plotted on orthogonal axes as in Figure 7, the suppressor effects are removed.

Simplification to Two Original Metrics

To the extent that the Dimensional Model has internal validity, then all the metrics within a given metrics cluster are essentially redundant with each other. The Dimensional Model hence gives rise to the "simplification" hypothesis; namely, only two metrics from the original dataset should be needed to characterize each task in terms of the two underlying driver demand dimensions. There will be some loss of information, because the task scores in the Dimensional Model combine the information from the all the metrics in a unique and optimal way. However, all the original metrics in the physical demand cluster have a statistically-significant correlation with the task scores (S2) on the second component, and could hypothetically be used as substitutes for the task scores S2. TaskTime had the highest correlation (r = 0.96, df = 13, p = 1.5E-08), with the regression fit shown in Figure 8.

Figure 8 shows that the observed values of the S2 score by TaskTime were all tasks that were well within the 95% prediction interval (dashed lines), confirming the goodness of the fit. That is, TaskTime can be effectively used as a surrogate for the physical demand task score, based on the combined information from all the 22 metrics.

Note that other metrics loading into the physical metrics cluster could have also been used to represent the physical demand task score. For this study, those other metrics would be Glances_RD, TGT_RD, TEORT_NR**, Steps_man, Glances_TR, or TGT_TR***. In addition, the companion study [1] showed that lane crossings, lateral variability, or a longitudinal variability metric, and other physical demand metrics developed for Visual-Manual tasks, could likely have been used, if they had been measured in the current study.

Likewise for Score 1, all the metrics in the cognitive demand cluster had a positive correlation with Score 1, and all the metrics in the cognitive facilitation cluster has a negative correlation with Score 1. MaxGlance_TR (the average duration of the longest task-related glance for every participant) had the highest correlation of all the metrics with the task scores S1 on the first component (r = 0.97, df = 13, p = 1.9E-09). However, the goal is to use non-glance metrics wherever possible, to save the time and expense of eye glance measurements. The highest correlation among the non-glance metrics in Clusters 2 and 3 is MissPct_tdrt (r = 0.86, df = 13, p = 4.0E-05), with the regression fit in Figure 9.

The dashed lines show that the 95% prediction interval for Score 2 in Figure 9 is wider than for Score 1 in Figure 8, but the task scores for the attentional effects of cognitive demand in Figure 9 are still all within the 95% prediction interval of Miss%_tdrt.

Note that other metrics loading into the cognitive metrics cluster could have also been used to represent the cognitive demand task score. For this study, these other metrics would be MaxGlance_TR, MSGD_TR***, Glances%_TR, RateGlances_TR, MSGD_NR**, LGP_NR**, MinGlance_TR, Miss%_tdrt, RT_tdrt, or CompletionErrors. In addition, any of the cognitive facilitation metrics could have been used with a negative coefficient: MSGD_ RD, MaxGlance_RD, MinGlance_RD, or Glances%_other. In addition, the companion study [1] showed that RT and miss rate to the remote DRT, or any other metric loading on the cognitive demand dimension could have been used to represent the attentional effects of cognitive demand, if they had been measured in the current study.

Given these encouraging results for the two chosen individual non-glance metrics, the simplification hypothesis was tested by using just TaskTime and MissPct_tdrt to span the 2-D score space (Figure 10).

Visual comparison of Figures 4 and 10 shows there are similar relative positions of the task points in the 2-D space.

The original TaskTime and Miss%_tdrt plotted in Figure 10 are nearly virtually uncorrelated or orthogonal (Appendix B, Table B1, r = -0.03 df = 13, p = 0.923). It is preferable to have orthogonal estimates of the underlying task scores, because then the errors across the two metrics are independent, making the error ellipse axes aligned horizontally and vertically rather than tilted (see General Discussion in [1]). Because TaskTime and Miss%_tdrt are already nearly orthogonal, further analysis to find another orthogonal pair of single metrics is not necessary to reasonably represent the physical and cognitive axes for this dataset. For those who wish to use other metrics, any other pairs of metrics could be chosen and examined, as long as one metric is picked from the physical cluster, and one from the cognitive cluster, according to the Dimensional Model. The individual metrics that are most strongly correlated with the task scores, however, should have slightly better prediction results for the "ground truth" task scores based on all the metrics, when the individual metrics are reduced to 2, as in Figure 10.

DISCUSSION

Data was collected from 24 participants during use of the "infotainment" system in a 2014 Toyota Corolla in a video-based surrogate driving venue. There were 15 secondary tasks of varying modes of interaction (various combinations of auditory, vocal, visual, and manual modes), with 6 of them being "paired tasks," which had the same start and end states, but were performed with different modes of interaction. Data from 22 variables (metrics) were collected, including eye movement data. A Principal Components Analysis found that the 22 metrics could be reduced to 2 principal components (dimensions), with little loss of explanatory power.

The major dimension, explaining 60% of total variance, we interpreted as the attentional effects of cognitive demand. The minor dimension, explaining 20% of total variance, we interpret as physical demand. The 22 metrics were well segmented into three groups, using cluster analysis, which independently agreed with the metric groups that were visually evident in the 2-D loadings plot. One cluster was associated with cognitive demand, and the other with physical demand. An additional cluster was discovered which loaded in an opposing fashion to cognitive demand along the cognitive demand dimension, and was called cognitive facilitation. The Dimensional Model, by eliminating the correlation between the task scores on the dimensions, also minimized the biasing effects arising from interactions between metrics. The extended Dimensional Model allows for a simplified understanding of the effects of all modes of secondary tasks on driver performance.

Simplification to Two Original Metrics

This fact that only two dimensions were need to span the space of all 22 metrics led to a further simplification hypothesis from the Dimensional Model that the two PCA task scores (which combined the information from all 15 metrics) could be reasonably approximated with a reduced set of only 2 of the original metrics. This simplification hypothesis was tested using two non-glance metrics (TaskTime and Miss%_tdrt). These non-glance metrics sufficed to span the two-dimensional PCA space to a reasonable approximation, which had been derived from the complete data set of 22 metrics (including many glance metrics) measured for 15 tasks with different subsets of modes (Visual-Manual, Mixed-Mode, Auditory-Vocal, and baseline just driving). The Dimensional Model predicts that other pairs of metrics besides TaskTime and Miss%_tdrt should also be able to reasonably span this 2-D space (e.g., for example MSGD_TR and TGT_TR, as per the Alliance Guidelines Principle 2.1A [13]). However, the current choice of TaskTime and RT requires fewer resources to test and analyze than glance metrics, allowing more tasks to be tested with a given amount of resources, so these other potential options were not further explored for these tasks.

Comparison of Mixed-Mode Tasks to Visual-Manual Tasks

This study found that a hands-free Mixed-Mode interface that uses a display screen has relatively less cognitive demand but relatively greater physical demand than a Visual-Manual task with same start and goal states. It cannot yet be determined with available data whether such a hands-free Mixed-Mode interface will increase or decrease driver performance (and safety-related critical events) compared to a Visual-Manual interface for performing the same task in naturalistic driving data.

However, the results of this study contradict the assertion that the attentional effects of cognitive demand from tasks that employ voice commands is substantial, as claimed in some other studies [24,25,26]. The analysis of the results showed in Figure 7 that all of the six Mixed-Mode tasks investigated here, and both pure Auditory-Vocal tasks, had a reduced attentional effect of cognitive demand compared to pressing a single touch screen preset button one time to tune to a new radio station (the RadioPreset_VM task). Opposite to what is claimed in some studies [24,25,26], it was rather the physical demand from the voice command tasks studied here that showed a relative increase compared to most of the Visual-Manual tasks studied here (except for NavDest_VM and BingPOI_VM). This relative increase in physical demand was only evident in the multivariate methods used here, and was not evident in examining only single glance metrics in a univariate manner [10], which showed a decrease in TEORT for example for Mixed-Mode tasks vs. Visual-Manual paired tasks (see Appendix C for explanation).

Comparison of Mixed-Mode Tasks to Pure Auditory-Vocal Tasks

Adding a display screen to an Auditory-Vocal interface is likely done by vehicle manufacturers and/or their suppliers in an attempt to assist the driver with the voice command interface, such as providing confirmation of voice inputs and enabling users to see the menu choices instead of, or in addition to, hearing them. Because only voice commands are made in a Mixed-Mode interface, rather than manual button presses (except for pressing a "voice activation button") such an interface qualifies as "hands-free" according to the SAE J2988 [11]. Indeed, a hands-free Auditory-Vocal interface such as with the OnStar Hands-Free Calling is known to carry no additional crash risk beyond that of baseline driving without using the system [27]. Whether adding a visual display to an Auditory-Vocal hands-free task enhances or detracts from driver performance compared to a pure Auditory-Vocal interface was not tested in this study, nor does such a study exist in the published literature to our knowledge. For example, studies have not been published to our knowledge which test if whether the screen is covered up, that driver performance diminishes or improves when using a pure Auditory-Vocal interface, vs. a Mixed-Mode interface when the screen is visible.

However, given the current findings that participants performed the pure Auditory-Vocal 0-Back task with less cognitive and physical demand than even the Mixed-Mode tasks, it is hypothesized that a pure Auditory-Vocal hands-free task might have reduced the attentional effects of cognitive demand compared to Mixed-Mode tasks. An Auditory-Vocal task such as cell phone conversation may even improve alerting attention and hence reduce drowsiness [9].

Drivers may also reduce speed during performance of Auditory-Vocal tasks because of self-regulation [2,28]). It has not yet been tested to our knowledge whether drivers performing a Mixed-Mode task might also show evidence of self-regulation such as slowed speeds.

A pure Auditory-Vocal system may have relatively poor voice recognition for some dialects, or may have a complex voice tree structure. Such negative attributes might potentially increase driver frustration and reduce customer acceptance to some degree, but the results of this study imply that pure Auditory-Vocal tasks with equivalent cognitive demand to 0-Back will have lower attentional effects from cognitive demand than even Mixed-Mode tasks.

Eye Movements Not Needed to Estimate Demands

A potentially useful contribution of this study is that the methods applied in this dataset (as in the four datasets examined in the companion paper [1]) demonstrate that eye movements are not needed to predict the attentional effects of cognitive demand, nor the physical demand, of secondary tasks (i.e., the location of the secondary tasks in the 2-D score space). The dimensional analysis employed here found that the task locations in the 2-D score space could be reliably approximated by just TaskTime and the percentage of misses to a tactile DRT, at least for the 15 tasks with multiple different types of modalities assessed here. For Visual-Manual tasks, TaskTime and RT to an event were sufficient to approximate the task locations in the 2-D score space.

This result is useful, because the eye glance metrics of TEORT, MSGD, and LGP have complex interactions between them, which causes a task's TEORT value in a univariate analysis method to underestimate the physical demand of the task. In addition to not using eye movement data, a multivariate analysis that considers the available information in all three metrics as per the Dimensional Model is another way to adjust for the downward bias from interactions of the glance metrics (see Appendix C).

Physical vs. Cognitive Demand Dimensions

A consistent finding arising from the Dimensional Model as applied to this dataset is that the cognitive demand dimension captures a different dimension of driver demand than number of glances or total glance time to a device, or measures of "driving performance" involving lateral and longitudinal variability of the vehicle. The cognitive demand dimension became the dominant dimension after adding in Mixed-Mode and pure Auditory-Vocal tasks in the same dataset as Visual-Manual tasks. In the companion paper [1] that applied the Dimensional Model methods to Visual-Manual tasks only, the physical demand dimension was dominant, although the cognitive demand of Visual-Manual tasks was evident in the second dimension.

The current study found that the attentional effect of cognitive demand was actually greater for Visual-Manual tasks than Mixed-Mode tasks or Auditory-Vocal tasks. This result emphasizes an aspect of Visual-Manual tasks that has not been previously well recognized. Namely, that Visual-Manual tasks not only have physical demand, but also have attentional effects of cognitive demand. This cognitive demand was even higher for visual-tasks than Mixed-Mode or pure Auditory-Vocal tasks for the tasks studied here, as well as elsewhere [23] {deleted 30]]. The finding that Visual-Manual tasks tend to have even more attentional effects from cognitive demand than do pure Auditory-Vocal tasks, is also evidenced by the higher miss rates and RTs of Visual-Manual tasks to single-trial events such as lead-vehicle CHMSL activation [23, Figure 6; 1, Example 1], or to remote DRT events in stationary vehicle in a simulator [39,40].

The finding in the current study that Mixed-Mode tasks using voice commands have a smaller cognitive demand, but a larger physical demand, than do Visual-Manual tasks, is the reverse of what some studies claim [24,25,26]. That is, the current study shows that Visual-Manual tasks, which are often thought to have little or no cognitive demand compared to Mixed-Mode tasks with a voice command interface, actually have more cognitive demand than such Mixed-Mode tasks. Mixed-Mode tasks with a voice command interface can actually have more physical demand than Visual-Manual tasks when all metrics are considered in a multivariate analysis, despite the fact they have a lower TEORT in a univariate analysis [10]. This result occurs even though TEORT strongly loads on the physical demand for Visual-Manual tasks [1]. Appendix C describes the possible biasing effects that can occur between glance metrics that can give rise to this result.

This study is consistent with the results of the companion paper [1] in finding that TaskTime is a useful single non-glance metric that serves as a surrogate for many traditional driving performance metrics, including Total Eyes-Off-Road Time, Total Glance Time to the device. The importance of TaskTime as a metric that is useful for assessing the physical dimension of driver demand for Visual-Manual tasks has long been recognized in the literature [29,30,31,32,33,34,35], and driving safety experts [36]. Examples 1-4 in the companion paper [1] confirm this for Visual-Manual task. The importance of TaskTime in understanding the physical demand placed on drivers for Mixed-Mode tasks as well as Visual-Manual tasks is shown by the current results.

However, TaskTime could not be used for estimating the physical demand of the pure Auditory-Vocal tasks in this study. The n-Back tasks used here had a TaskTime that was fixed by the experimental protocols at 30 seconds (that is, the n-Back tasks were machine-paced, and not participant paced, so the participant's behavior had no influence on the TaskTime for these tasks. Likewise, the Baseline (just driving) condition was fixed at 30 seconds, so the Dimensional Analysis methods described here were not suitable for assessing the physical demand of that task either. Therefore this experiment did not have the correct tasks and procedures to determine whether the TaskTime metric (and any physical demand metric that is highly correlated with TaskTime, such as TEORT, and longitudinal and lateral variability metrics), is suitable assessing the physical demand of Auditory-Vocal tasks.

Dimensional Model Contributions to Current Issues

The TaskTime Debate

There has been considerable debate about whether TaskTime or more direct measures of driving performance (e.g., lateral variability measures such as SDLP) should be used to assess physical demand [35]. However, the Dimensional Model predicts, and the results of the current studies confirm, that any metric that falls into the same physical demand cluster (such as longitudinal or lateral vehicle variability) can be chosen to be a surrogate for physical demand, at least for the tasks studied here. Hence, the results of this study and the companion paper [1] indicate that both sides in the TaskTime debate [35] were right; longitudinal and lateral variability can be used directly (as per the Alliance Principle 2.1B [13]), or TaskTime can be used as a surrogate measure for those metrics, with equal validity, at least for the Visual-Manual and Mixed-Mode tasks examined in this study.

A limitation of TaskTime (and a limitation in its use in the Dimensional Model) is that for pure Auditory-Vocal tasks, TaskTime is not a suitable metric. There is little if any decrement in lateral and longitudinal variability metrics with TaskTime during Auditory-Vocal tasks, and 30 studies cited in [2, Section 1.3.5.2] actually indicate a reduction (i.e. improvement) in lateral/longitudinal variability during Auditory-Vocal tasks.

In another regard, TaskTime itself must be controlled for in any use of lateral and longitudinal variability metrics, because these metrics are not time-invariant while driving, and systematically increase with driving time, even if no task is performed [37].

Glance-Based vs. Driving Performance Metrics

There is also considerable debate in the driving safety field whether the glance-based metrics (such as TGT and MSGD to a device) are better or worse than driving performance (lateral and longitudinal variability) as outcome measures for Visual-Manual secondary tasks performed while driving. The Alliance Principle 2.1 [13, p. 39] states "Systems with visual displays should be designed such that the driver can complete the desired task with sequential glances that are brief enough not to adversely affect driving." The Alliance used glance-based metrics in its Principle 2.1A, based on an absolute criterion for TGT to the device of 20 s, and MSGD to the device of 2 s. However, driving performance measures were put into Principle 2.1B, defined as number of lane exceedances and car following headway variability, relative to a reference task. In other words, compliance method 2.1A uses the glance measures as surrogates for adverse effects on driving, and compliance method 2.1B uses adverse driving as a surrogate for glance measures. There was no known scientific basis to reject one set of metrics over the other, so the controversy could not be scientifically resolved. Hence, the Alliance used glance-based criteria for 2.1A, and driving performance criteria for 2.1B. Vehicle manufacturers who signed onto the Alliance Guidelines could choose whichever option they preferred (A or B) to meet Principle 2.1.

According to the Dimensional Model, each of the Alliance Principle 2.1 A and B options had some things right, and some things wrong. What the two option both have right according to the Dimensional Model is that they each has at least one metric that loads into the physical demand dimension. Total Glance Time to the device in Principle 2.1A loads onto the physical dimension [1]. Similarly, in Principle 2.1B, lane exceedances and car following headway variability also load into the physical demand dimension [1]. So 2.1A and 2.1B each have a metric that adequately assesses physical demand.

With respect to cognitive demand, Alliance Principle 2.1A contains MSGD, which is a metric loading onto cognitive demand (i.e., event detection and response), at least for the Visual-Manual tasks that are the scope of the Alliance Guidelines. Because it loads into the same cluster of metrics as do event detection and response, at least for Visual-Manual tasks [1], Alliance Principle 2.1B covers both dimensions of driver demand in the Model for Visual-Manual tasks in [1]. However, Alliance Principle 2.1B does not have a metric that loads into the cognitive demand dimension (lane exceedances and car following headway variability both load into the physical demand dimension. Therefore, Alliance Principle 2.1B does not have a metric that assesses the cognitive demand dimension (i.e., event detection and response) for Visual-Manual tasks. To be fair, the Alliance Guidelines have only Visual-Manual tasks in their scope, and Auditory-Vocal tasks were specifically excluded from the scope. It was not widely appreciated in the driving safety at the time of issuance of the Alliance Guidelines in 2006 that Visual-Manual tasks also have attentional effects from cognitive demand, independently of the effects of glances to the displays and controls of the secondary task being assessed. It was also not appreciated at the time that MSGD to a secondary task is a metric that specifically measures the attentional effects of cognitive demand, not physical demand, at least for Visual-Manual tasks. It was therefore an unforeseen benefit at the time that the attentional effects of cognitive demand could be measured by MSGD. Likewise, Principle 2.1B more than adequately covers the physical demand dimension, because it has two criteria that must be met (lane exceedances and headway variability), both loading on the physical demand dimension. However, Principle 2.1B does not have a metric that covers event detection and response.

On the contrary, Principle 2.1B is has the benefit over Principle 2.1A in that it uses a relative criteria, comparing to a reference task, "under standard test conditions (e.g., same drivers, driving scenario)" [13, p. 40]. Principle 2.1A, on the other hand, uses absolute criteria (20 s TGT and 2 s MSGD). It is known from the ISO DRT cross-site studies [4, Appendix E] that there is a large amount of variability between the absolute metric values collected at different sites and setups, even for sites using an identical apparatus, procedure, and driving setup (static, surrogate driving, simulator, closed road, or open road). However, all the metrics are highly correlated across all sites and setups [4, Tables E.4 and E.5], showing that the relative pattern of responses across sites and setups is almost identical, even when there are substantial and statistically significant differences in the absolute metric values. Compliance to Principle 2.1B thus is more replicable across sites and setups than compliance to Principle 2.1A, in that the same results as to which tasks are compliant or not compliant will be more repeatable across test sites and setups using Principle 2.1B.

Comparison to Baseline Driving

It is not yet widely recognized that many of the metrics loading into the physical demand dimension are highly correlated with driving time, at least for Visual-Manual tasks, as discussed in [1]. For example, even as driving time increases, lane and speed variability metrics and TEORT systematically increase, even if no secondary task is performed. The consequences of this fact are many. For example, one must be careful in comparing to a baseline time period of a fixed number of seconds. Comparing lane and speed variability measures during a short secondary task to a baseline "just driving" period of 2 minutes can give the (false) appearance of reduced physical demand compared to baseline driving. Even pure Auditory-Vocal tasks such as 0-Back and 1-Back with a standard forced task time of a fixed duration (say 2 minutes), will show a (false) increase in physical demand (say lane and speed variability) if it is mistakenly compared to a shorter baseline period. Thus even if there is minimal physical demand in the task itself, falsely demand will inadvertently arise in the measurements from the mere fact that it takes time to do the task, which will be associated with physical demand metrics if a person is engaged in driving, whether actual or simulated.

Limitations of Dimensional Model Extension

Cognitive Demand Metric Limitations

The companion paper [1] pointed out the need to control for potential confounding effects of road geometry for simulator, closed-road, or open-road studies. This limitation would not apply to the surrogate driving setup in the current study, which had a random video scene that was equated in road geometry across all tasks. In other words, the correct relative rankings of the tasks on the attentional effects of cognitive demand by the use of the Dimension 2 metrics identified here were determined by having task trials conducted under randomized driving conditions. The confounding variable of driving experience was controlled for by selecting participants with a minimum number of years of driving experience. In addition, the confounding variable of age was controlled for age by selecting participants with equal numbers in four age groups.

Physical Demand Metric Limitations

Physical demand metrics are also susceptible to biases from confounding variables. For example, older drivers tend to have longer TGT to the device than younger drivers for the identical tasks. That is, tasks performed by older drivers would appear to have greater physical demand than the identical tasks performed by younger drivers. A test attempting to measure physical demand may control for age by balancing its data sample for age. It is also possible to control for age statistically after the fact, if necessary.

Limitations of Extension

The main goal of the current paper was to extend the Dimensional Model of Visual-Manual tasks in [1] to Auditory-Vocal and Mixed-Mode tasks.

The extension to Auditory-Vocal tasks was not straightforward for several reasons, including that there are no glances to a device because there is no display screen for pure Auditory-Vocal tasks (such as cell phone conversation after dialing and before disconnecting, or the n-Back task tested here). In addition, the Auditory-Vocal tasks in the current study had a fixed duration of 30 seconds, and so TaskTime could not sensibly be used for these Auditory-Vocal task in comparison to Mixed-Mode and Visual-Manual tasks where TaskTime was not constant, but determined by the test participant. Nonetheless, the use of many metrics in this study in a Dimensional Model framework and analysis was sufficient to place the 0-Back and 1-Back tasks in their expected locations in the 2-D score space, as showing an increment in the attentional effects of cognitive demand compared to baseline with no task, and an increase in cognitive demand for 1-Back compared to 0-Back, consistent with previous studies [4, Annex E; 38].

Difficulties were also found in extending the Dimensional Model to Mixed-Mode tasks (which are usually Auditory-Vocal-visual tasks in today's vehicles, and are often mislabeled as "voice" tasks). There were some complications in directly applying the Dimensional Model for Visual-Manual tasks to Mixed-Mode tasks. For example, the relationship between the NHTSA Guidelines [12] glance metrics was fundamentally different for Mixed-Mode tasks than for Visual-Manual tasks (see Appendix C).

Despite these difficulties, the Dimensional Model has been preliminarily extended in this paper to Auditory-Vocal and Mixed-Mode tasks.

Questions for Future Research

Additional Validation Datasets

This paper provides a single validation dataset for the Dimensional Model of Driver Demand, after extension to Mixed-Mode and Auditory-Vocal tasks. However, it would be fruitful to test the model from additional datasets that contain Auditory-Vocal, Mixed-Mode, and Visual-Manual tasks in the same study. Examples include the simulator, closed and open road CAMP-DWM study dataset [39,40], the MIT AgeLab open road study using a 2010 Lincoln vehicle [41], and the ISO DRT studies including Auditory-Vocal and Visual-Manual tasks measured across 10 sites with widely varying setups (static, surrogate driving, simulator, closed-road, and open-road) [42].

Prediction of Unknown Metric Values or Across Venues

The extended Dimensional Model, as per the original Dimensional Model [1], hypothesizes that a simulator or surrogate driving test without using glance metrics should be able to predict the glance metrics for a track or road test. Likewise, because the dimensions were essentially the same whether in simulator, track, or open road venues as shown in [1], the task scores in one venue should be able to make reasonably valid estimates of the relative task scores in another venue.

Estimating Error Boundaries of Task Scores

As mentioned in [1], it would be useful to be able to estimate and display the uncertainty level of the location of the points in the final 2-D score space, to increase overall understanding of the dataset. The uncertainty level is equivalent to solving for the error ellipses around the scores in the 2-D space. Such error ellipses are useful, for example, in obtaining a sense of the statistical significance of the distances between scores in the final 2-D score space. Such error ellipses are also useful for correctly setting a driver demand criterion or criteria based on a task's location in the 2-D score space. The criterion or criteria need(s) to be set large enough to account for the inherent measurement uncertainty characterizing the benchmark task used for setting the criterion or criteria [43].

Estimating Relative Crash Risk

It is outside the scope of this paper to estimate relative crash risk for secondary tasks assessed according to the extended Dimensional Model and metrics analyzed here, which were drawn from surrogate driving in experiments, and not naturalistic driving. However, future research should theoretically be able to make more valid estimates of relative crash risk for Visual-Manual, Auditory-Vocal, and Mixed-Mode tasks based on the contributions of this research. In particular, the Dimensional Model implies that relative risk estimates for Visual-Manual tasks will need to take into account not only the physical demand, but also the attentional effects of cognitive demand, which arise from all modes of secondary tasks performed concurrently while driving. It should therefore be theoretically possible to use the scores of the tasks on the two dimensions as identified by the extended Dimensional Model to make valid estimates of the relative risks of any mode of task in naturalistic driving studies, although this hypothesis remains to be convincingly tested.

Therefore, it should be noted that in the absence of this additional research, the relative measures of physical demand and the attentional effects of cognitive demand shown in this study do not necessarily indicate a concern in terms of driving safety. As noted in the previous paragraph, linking measured levels of cognitive and physical demand to real-world safety requires additional research.

It should also be noted that the naturalistic driving study data analyzed by the lead author to date estimates that the relative risk of crashes and near-crashes associated with Auditory-Vocal tasks while driving (such as cell phone conversation [44,45,46,47,48,49,50], or use of an embedded hands-free personal calling device such as OnStar [27]) is de minimis, or even protective. It therefore may be that the attentional effects of cognitive demand associated with the device-related tasks investigated here does not rise to the level of a safety-related concern.

Likewise, Mixed-Mode tasks (which usually combine Visual-Manual modes with Auditory-Vocal modes, but are often mislabeled as "voice" tasks), brought some further complications in directly applying the Dimensional Model for Visual-Manual tasks as developed in this paper to Auditory-Vocal and Mixed-Mode tasks. In particular, the relationship between the NHTSA glance metrics was fundamentally different for Mixed-Mode tasks than Visual-Manual tasks (see Appendix C). Despite these difficulties, the Dimensional Model has been extended in this paper at least in a preliminary manner to Auditory-Vocal and Mixed-Mode tasks.

SUMMARY/CONCLUSION

There are two primary underlying dimensions of demand on drivers when they perform Visual-Manual, Mixed-Mode, and Auditory-Vocal tasks while engaged in surrogate driving. Just two scores on these dimensions can account for most of the variance in 22 driver performance metrics for a given task. Consistent with the companion study [1], these dimensions were termed physical demand, associated with surrogates for lateral and longitudinal vehicle metrics, and the attentional effects of cognitive demand, associated with event detection metrics and their surrogates. The cognitive dimension can be either positive or negative, with some metrics indicating an increase in cognitive demand, and others indicating a decrease (what we here termed "cognitive facilitation.")

The physical and cognitive demand dimensions described here represented not only the "ground truth" metrics of lateral and longitudinal vehicle performance and event detection, but also surrogates for those metrics such as task time, eye glances, etc. We showed how the metrics of task time and misses of a tactile event could assess the two dimensions, although other metric pairs are suitable as well according to the extended Dimensional Model.

In conclusion, the extended Dimensional Model of Driver Demand allows for a common simplified understanding of the two dimensions underlying almost all the commonly-used measures of the effects of Visual-Manual, Auditory-Vocal, and Mixed-Mode secondary tasks on driver performance, including event detection performance on a cognitive demand dimension, and surrogate measures of lateral and longitudinal variability on a physical demand dimension.

REFERENCES

[1.] Young, R., Seaman, S., and Hsieh, L., "The Dimensional Model of Driver Demand: Visual-Manual Tasks," SAE Int. J. Trans. Safety 4(1):33-71, 2016, doi:10.4271/2016-01-1423.

[2.] Young, R., "Self-Regulation Minimizes Crash Risk from Attentional Effects of Cognitive Load during Auditory-Vocal Tasks," SAE Int. J. Trans. Safety 2(1):67-85, 2014, doi:10.4271/2014-01-0448.

[3.] Ho, C. and Spence, C. The Multisensory Driver: Implications for Ergonomic Car Interface Design. Hampshire, England: Ashgate Publishing Ltd., http://www.psy.ox.ac.uk/publications/464237, 2008.

[4.] ISO/DIS 17488. "Road Vehicles--Transport Information and Control Systems--Detection-Response Task (DRT) for Assessing Attentional Effects of Cognitive Load in Driving," (2015): 75 pages. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=59887.

[5.] Merat, N. and Jamson, A., "The Effect of Stimulus Modality on Signal Detection: Implications for Assessing the Safety of in-Vehicle Technology," Human Factors: The Journal of the Human Factors and Ergonomics Society 50(145-158), http://www.ncbi.nlm.nih.gov/pubmed/18354978, 2008, doi:10.1518/001872008X250656.

[6.] Foley, J., Young, R., Angell, L. and Domeyer, J., "Towards Operationalizing Driver Distraction," Proceedings of 7th International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Bolton Landing, New York, http://drivingassessment.uiowa.edu/sites/default/files/DA2013/Papers/010_Foley_0.pdf, June 17-20 2013.

[7.] Bowyer, S., Hsieh, L., Moran, J., Young, R. et al., "Conversation Effects on Neural Mechanisms Underlying Reaction Time to Visual Events While Viewing a Driving Scene Using Meg," Brain Research 1251:151-161, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2741688/, 2009, doi:10.1016/j.brainres.2008.10.001.

[8.] Klauer, S.G., Guo, F., Sudweeks, J. and Dingus, T.A., "An Analysis of Driver Inattention Using a Case-Crossover Approach on 100-Car Data: Final Report," DOT HS 811 334, U.S. Department of Transportation, Washington, D.C., http://www.nhtsa.gov/DOT/NHTSA/NVS/Crash%20Avoidance/Technical%20Publications/2010/811334.pdf. 2010.

[9.] Young, R.A., "Drowsy Driving Increases Severity of Safety-Critical Events and Is Decreased by Cell Phone Conversation," Proceedings of 3rd International Conference on Driver Distraction and Inattention, Gothenburg, Sweden, http://document.chalmers.se/download?docid=19e9af22-8aec-4b5e-95d5-c24d9d286020, September 4-6 2013.9.

[10.] Seaman, S., Hsieh, L., and Young, R., "Driver Demand: Eye Glance Measures," SAE Technical Paper 2016-01-1421, 2016, doi:10.4271/2016-01-1421.

[11.] SAE International Surface Vehicle Recommended Practice, "Guidelines for Speech Input and Audible Output in a Driver Vehicle Interface," SAE Standard J2988, Issued June. 2015.

[12.] NHTSA, "Visual-Manual NHTSA Driver Distraction Guidelines for Portable and Aftermarket Electronic Devices (Docket NHTSA-2013-0137)," http://www.regulations.gov/#!searchResults;rpp=25;po=0;s=NHTSA-2013-0137;dct=FR%252BPR%252BN%252BO%252BPS%252BSR, 2014.

[13.] Alliance, "Statement of Principles, Criteria and Verification Procedures on Driver-Interactions with Advanced in-Vehicle Information and Communication Systems, June 26, 2006 Version," Alliance of Automobile Manufacturers Driver Focus -Telematics Working Group, Washington, DC, http://www.autoalliance.org/index.cfm?objectid=D6819130-B985-11E1-9E4C000C296BA163. 2006.

[14.] Stata, Version 13.1. http://www.stata.com/, 2015.

[15.] Minitab[R], Version 17.2.1. https://www.minitab.com/en-us/, 2015.

[16.] SAE International Surface Vehicle Recommended Practice, "Operational Definitions of Driving Performance Measures and Statistics," SAE Standard J2944, Issued June 2015.

[17.] SAE International Surface Vehicle Recommended Practice, "Definitions and Experimental Measures Related to the Specification of Driver Visual Behavior Using Video Based Techniques," SAE Standard J2396, Issued July 2000.

[18.] Cohen, J., Cohen, P., West, S.G. and Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 3rd ed. Mahwah, New Jersey, USA: Lawrence Erlbaum Associates, Inc., 2003.

[19.] Last, J.M. A Dictionary of Epidemiology. 4th Ed. New York: Oxford University Press, 2001.

[20.] Rothman, K., Greenland, S. and Lash, T. Modern Epidemiology. 3rd Ed. Philadelphia, PA: Lippincott Williams & Wilkins, 2008.

[21.] Young, R. and Angell, L., "The Dimensions of Driver Performance During Secondary Manual Tasks," proceedings of Driving Assessment 2003: The Second International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Park City, Utah, http://drivingassessment.uiowa.edu/DA2003/pdf/25_Youngformat.pdf, July 21-24 2003.

[22.] NHTSA, "Proposed Driver Workload Metrics and Methods Project," United States Department of Transportation, Washington, DC USA, http://www-nrd.nhtsa.dot.gov/departments/Human%20Factors/driverdistraction/PDF/32.PDF.

[23.] Young, R., "Event Detection: The Second Dimension of Driver Performance for Visual-Manual Tasks," SAE Int. J. Passeng. Cars -Electron. Electr. Syst. 5(1):297-316, 2012, doi:10.4271/2012-01-0964.

[24.] Strayer, D.L., Cooper, J.M., Turrill, J., Coleman, J. et al., "Measuring Cognitive Distraction in the Automobile," AAA Foundation for Traffic Safety, Washington, DC, https://www.aaafoundation.org/sites/default/files/MeasuringCognitiveDistractions.pdf. 2013.

[25.] Strayer, D.L., Turrill, J., Coleman, J.R., Ortiz, E.V. et al., "Measuring Cognitive Distraction in the Automobile II: Assessing in-Vehicle Voice-Based Interactive Technologies," AAA Foundation for Traffic Safety, Washington, DC, https://www.aaafoundation.org/sites/default/files/Cog%20Distraction%20Phase%202%20FINAL%20FTS%20FORMAT_0.pdf. 2014.

[26.] Strayer, D.L., Turrill, J., Cooper, J.M., Coleman, J.R. et al., "Assessing Cognitive Distraction in the Automobile," Human Factors: The Journal of the Human Factors and Ergonomics Society 57(8):1300-1324, http://hfs.sagepub.com/content/57/8/1300.abstract, December 1, 2015 2015, doi:10.1177/0018720815575149.

[27.] Young, R.A. and Schreiner, C., "Real-World Personal Conversations Using a Hands-Free Embedded Wireless Device While Driving: Effect on Airbag-Deployment Crash Rates," Risk Analysis 29(2):187-204, https://www.researchgate.net/publication/23464587_Real-world_personal_conversations_using_a_hands-free_embedded_wireless_device_while_driving_effect_on_airbag-deployment_crash_rates?ev=prf_cit, 2009, doi:10.1111/j.1539-6924.2008.01146.x.

[28.] Young, R.A., "Driver Compensation: Impairment or Improvement?" Human Factors: The Journal of the Human Factors and Ergonomics Society 57(8):1334-1338, http://hfs.sagepub.com/content/57/8/1334. abstract, December 1, 2015 2015, doi:10.1177/0018720815585053.

[29.] Green, P., "Visual and Task Demands of Driver Information Systems," University of Michigan Transportation Research Institute (UMTRI), http://umich.edu/*driving/publications/UMTRI-98-16.pdf. 1998.

[30.] Green, P., "The 15-Second Rule for Driver Information Systems," Proceedings of the ITS America Ninth Annual Meeting, http://umich.edu/*driving/publications/ITSA-Green1999.pdf, 1999.

[31.] Green, P., "Estimating Compliance with the 15-Second Rule for Driver-Interface Usability and Safety," Proceedings of the Human Factors and Ergonomics Society Annual Meeting, http://umich.edu/*driving/publications/HFES-Green1999.pdf, 1999.

[32.] Green, P. "Why Safety and Human Factors/Ergonomics Standards Are So Difficult to Establish." In Human Factors in Transportation, Communication, Health and the Workplace, edited by de Waard D., Brookhuis K.A., Moraal J. and Toffetti A.. Maastricht, the Netherlands: Shaker Publishing, http://www.umich.edu/*driving/publications/PGreen-Turin.pdf, 2002.

[33.] Green, P. and Shah, R., "Safety Vehicles Using Adaptive Interface Technology (Task 6): Task Time and Glance Measures of the Use of Telematics: A Tabular Summary of the Literature," University of Michigan, Transportation Research Institute, Ann Arbor, MI, http://deepblue.lib.umich.edu/bitstream/handle/2027.42/92350/102882.pdf?sequence=1. 2004.

[34.] SAE International Surface Vehicle Recommended Practice, "Calculation of the Time to Complete in Vehicle Navigation and Route Guidance Tasks," SAE Standard J2365, Issued May 2002.

[35.] Foley, J., "Lessons Learned from the Development of J2364," SAE Technical Paper 2005-01-1847, 2005, doi:10.4271/2005-01-1847.

[36.] Burns, P., Harbluk, J., Foley, J.P. and Angell, L., "The Importance of Task Duration and Related Measures in Assessing the Distraction Potential of in-Vehicle Tasks," Proceedings of the Second International Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI 2010), Pittsburgh, Pennsylvania, USA, http://www.auto-ui.org/10/proceedings/p12.pdf, November 11-12, 2010.

[37.] Johansson, E., Carsten, O., Janssen, W., Jamson, H., Jamson, S., Ostlund, J., Brouwer, R., Mouta, and S., H., J., Antilla, V., Sandberg, H., & Luoma, J., "HASTE Deliverable 3: Validation of the HASTE Protocol Specification," European Commission Competitive and Sustainable Growth Programme, http://www.its.leeds.ac.uk/projects/haste/downloads/Haste_D3.pdf 2004.

[38.] Young, R.A., Hsieh, L. and Seaman, S., "The Tactile Detection Response Task: Preliminary Validation for Measuring the Attentional Effects of Cognitive Load," Proceedings of 7th International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Bolton Landing, NY, http://drivingassessment.uiowa.edu/sites/default/files/DA2013/Papers/012_Young_0.pdf, June 17-20 2013.6.

[39.] Angell, L., Auflick, J., Austria, P., Kochhar, D. et al., "Driver Workload Metrics Task 2 Final Report," DOT HS 810 635, Crash Avoidance Metrics Partnership (CAMP), http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/Driver%20Distraction/Driver%20Workload%20Metrics%20Final%20Report.pdf. 2006.

[40.] Angell, L., Auflick, J., Austria, P., Biever, W. et al., "Driver Workload Metrics Project, Task 2 Final Report, Appendices," National Highway Traffic Safety Administration, http://www.nhtsa.gov/DOT/NHTSA/NRD/Multimedia/PDFs/Crash%20Avoidance/2006/Driver%20Workload%20Metrics_appendices.pdf, 2006.

[41.] Reimer, B., Mehler, B., Dobres, J. and Coughlin, J., "The Effects of a Production Level 'Voice-Command' Interface on Driver Behavior: Reported Workload, Physiology, Visual Attention, and Driving Performance (Technical Report 2013-17a)," MIT AgeLab, Cambridge, MA USA, http://www.toyota.com/csrc/printable/demands-of-invehicleinterfaces.pdf. 2013.

[42.] ISO/DIS_17488. "Road Vehicles--Transport Information and Control Systems--Detection-Response Task (DRT) for Assessing Attentional Effects of Cognitive Load in Driving," In, (2015): 75 pages. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=59887.

[43.] Young, R.A., "Evaluation of Total Eyes-Off-Road Time Glance Criterion in NHTSA Visual-Manual Guidelines," Proceedings of Transportation Research Board 95th Annual Meeting, http://trid.trb.org/view.aspx?id=1392377, January 2016.

[44.] Young, R.A., "Driving Consistency Errors Overestimate Crash Risk from Cellular Conversation in Two Case-Crossover Studies," Proceedings of Proceedings of the Sixth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, Lake Tahoe, CA, http://drivingassessment.uiowa.edu/sites/default/files/DA2011/Papers/043_Young.pdf, June 27-30 2011.

[45.] Young, R.A., "Case-Crossover Studies of Cellular Conversation While Driving: Part-Time Driving in Control Windows Introduces Positive Bias in Relative Risk Estimates," Epidemiology, 2012.

[46.] Young, R.A., "Supplemental Digital Content: Control Analyses," Epidemiology 23(1), http://links.lww.com/EDE/A535, 2012.

[47.] Young, R., "An Unbiased Estimate of the Relative Crash Risk of Cell Phone Conversation while Driving an Automobile," SAE Int. J. Trans. Safety 2(1):46-66, 2014, doi:10.4271/2014-01-0446.

[48.] Young, R., "Revised Odds Ratio Estimates of Secondary Tasks: A Re-Analysis of the 100-Car Naturalistic Driving Study Data," SAE Technical Paper 2015-01-1387, 2015, doi:10.4271/2015-01-1387.

[49.] Young, R.A. "Cell Phone Conversation and Relative Crash Risk." In Encyclopedia of Mobile Phone Behavior, edited by Yan Zheng. 1274-1306. Hershey, PA, USA: IGI Global, http://services.igiglobal.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-4666-8239-9.ch102, 2015, doi:10.4018/978-1-4666-8239-9.ch102.

[50.] Young, R.A., "Driver Compensation: Impairment or Improvement?," Human Factors: The Journal of the Human Factors and Ergonomics Society 57:1334-1338, http://hfs.sagepub.com/content/57/8/1334.full.pdf+html, 2015, doi:10.1177/0018720815585053

[51.] http://www.toyota.com/csrc/driver-distraction-cognitive-modelvalidation.html.

CONTACT INFORMATION

richardyoung9@gmail.com

ACKNOWLEDGMENTS

The analyses in this paper were supported by a research contract from the Toyota Collaborative Safety Research Center to our research team at Wayne State University from July 1, 2011 to December 31, 2014 [51]. We thank Huizhong Guo, Graduate Research Assistant in the Human Factors & Statistical Modeling Lab, Department of Civil and Environmental Engineering, University of Washington, for verifying the statistical calculations in this paper and insightful technical comments on Appendix C.

DEFINITIONS/ABBREVIATIONS

Adjustment - "A summarizing procedure for a statistical measure in which the effects of differences in composition of the populations being compared have been minimized by statistical methods" [19, p. 3].

CAMP - Crash Avoidance Metrics Partnership

Component - A set of values of linearly uncorrelated metrics

CHMSL - Center High Mount Stop Lamp

Crude - The raw means, unadjusted for confounding or interaction effects

D1 - Dimension 1 (synonym for Component 1)

D2 - Dimension 2 (synonym for Component 2)

df - degrees of freedom (for a statistical test)

Dimension - As used here, the minimum number of coordinates needed to specify the location of a task within the mathematical space spanned by its driver demand metrics. This definition is a more mathematical one than the dictionary definition of dimension as simply an attribute or a feature of a task (which was the definition used in the CAMP-DWM study [39, Section 8.4.1.1, p. 8-23]).

Driver Performance - A more general term than driving performance, as it also includes event detection and response metrics as well as driving performance metrics.

Driving Performance - Its narrow definition refers only to "ground truth" lane and speed maintenance metrics (e.g. Alliance Principle 2.1B [13, p. 40]). Its broader definition may include surrogate metrics that are predictive of (i.e., correlated with) lane and speed maintenance metrics.

DRT - Detection Response Task [4]

DWM - Driver Workload Metrics

Event Detection and Response - Detection of events in any sensory modality as measured by hits and misses, and the response times to the hits [4]. Event Detection and Response metrics are "ground truth" driver performance metrics along with lane and speed variability metrics.

"ground truth" metrics - The broader definition as used in [22] refers to any metric whose values are gathered on a closed test track or an open road with traffic. The narrow definition as used herein applies only to driver performance metrics.

ISO - International Standards Organization

LGF - Long Glance Frequency

LGP - Long Glance Proportion

Metric - A synonym for variable or measure. Used here to mean a variable used to measure the driver demand of a secondary task.

Miss% - Percent of total visual events missed

Mixed-Mode - Visual-Manual and Auditory-Vocal modalities present in the same task. Also called Multi-Modal. For example, Auditory-Vocal tasks accompanied by visual images on a screen, are "Mixed-Mode" tasks. Mixed-Mode tasks are sometimes incorrectly called "voice" tasks, a term which should be restricted to tasks with only Auditory-Vocal modalities (see "Voice Tasks").

MSGD - Mean Single Glance Time

Multi-Modal - Synonym for Mixed-Mode

NavDest - Navigation Destination Entry task

NHTSA - National Highway Traffic Safety Administration

NR - Not Road

n-Back - An Auditory-Vocal task requiring the participant to remember the previous digit that was n digits prior to a currently spoken digit.

OED - Object and Event Detection

Paired tasks - Tasks that have the same start and end points even though they have different interface modes.

PC1 - Principal Component 1 (the single component that explains the highest possible amount of variance in the dataset by itself)

PC2 - Principal Component 2 (the single component that explains the remaining highest possible amount of variance in the dataset by itself, after removal of PC1)

PCA - Principal Components Analysis

POI - Point of interest

RDRT - Remote Detection Response Task

RT - Response time

S1 - Score on first dimension

S2 - Score on second dimension

SD headway - Standard deviation of headway

SDLP - Standard deviation of lane position

Static - Doing a task by itself with no simulated or actual driving

Surrogate Driving - A setup using a video of a driving scene, wherein the participant turns the steering wheel to position a red laser pointer dot in the center of the lane. Originally developed at Wayne State University, with high predictive value for relative driver performance metrics for simulated, closed-road, and open-road driving venues [4, Annex E].

Surrogate Metrics - Metrics such as glance metrics, TaskTime, driver secondary task errors, etc. which are not "ground truth" driver performance metrics, but which may correlate highly with "ground truth" driver performance metrics, and can therefore successfully predict them.

TaskTime - The duration of a task, whether measured statically by itself, or during simulated driving or actual driving on a closed or open road, or in naturalistic driving.

TDRT - Tactile Detection Response Task

TEORT - Total-Eyes-off-Road Time (includes TGT as subset)

TGT - Total Glance Time to region of interest. In the current study, the regions of interest were the device under test, the windshield, or "other" locations than the device or the windshield.

TR - Task-Related

VM - Visual-Manual

Voice Tasks - Tasks which have only Auditory-Vocal modalities, or at most one manual operation and thus meet the SAE J2988 [11] definition of "hands-free". Auditory-Vocal tasks which include visual images on a screen are not voice tasks, but "Mixed-Mode" tasks.

APPENDIX

APPENDIX A. ANALYSIS DATASET

Table A1 is the dataset of 24 variables and 15 tasks analyzed in this paper.

Table A1. Mean task data (see Appendix Table A2 for metric
abbreviations, definitions and notes).

                      Cluster->        Phys.      Phys.       Phys.
                      Metric->         1          2           3
TaskNum  TaskType     Task             Task Time  Glances_RD  TGT_RD

 1.      none         Baseline         30.64       8.40       28.59
 2.      AV_HF (*)    0Back (*)        25.38       8.39       22.52
 3.      AV HF (*)    1Back (*)        26.11      11.57       24.02
 4.      mix_HF (*)   DalContact (*)   34.84      14.50       31.03
 5.      mix_HF (*)   RadioPreset (*)  18.02       7.53       16.03
 6.      mix_HF (*)   NavCancel (*)    35.35      17.53       29.88
 7.      mix_HF (*)   RadioTune (*)    19.55       8.46       17.75
 8.      mix          NaDest           72.02      34.44       61.07
 9.      VM           DialContact_VM   17.28       9.07        8.95
10.      VM           RadioPreset_VM    5.59       3.00        3.35
11.      VM           NavCancel_VM     14.09       7.86        7.11
12.      VM           RadioPreset_VM   18.86       9.50        9.16
13.      VM           NavCancel_VM     68.68      32.86       31.97
14.      VM           PandoraTune_VM   16.75       8.63        8.96
15.      VM           BingPOI_VM       55.27      27.33       26.44

         Phys.          Phys.        Phys.       Phys.
         4              5            6           7
TaskNum  TEORT_NR (**)  Manualsteps  Glances_TR  TGT_TR (***)

 1.       1.70           0            0.097       0.009
 2.       1.50           0            0.042       0.002
 3.       1.42           0            0.028       0.001
 4.       2.95           1            1.847       1.222
 5.       3.90           1            0.583       0.341
 6.       4.83           1            2.333       1.410
 7.       1.31           1            0.375       0.222
 8.       9.79           2            5.542       3.905
 9.       8.03           4            7.722       7.540
10.       2.72           1            1.667       1.978
11.       6.70           3            6.028       5.973
12.       9.18           5            8.014       8.189
13.      36.06          23           28.06       33.22
14.       8.04           4            7.083       6.748
15.      27.95          19           23.29       25.32

         Cog.          Cog.           Cog.         Cog.
         8             9              10           11
TaskNum  MaxGlance_TR  MSGD_TR (***)  Glances%_TR  RateGlance_TR

 1.      0.009         0.120           0.031       0.003
 2.      0.002         0.050           0.009       0.002
 3.      0.001         0.040           0.005       0.001
 4.      0.475         0.513           3.49        0.052
 5.      0.217         0.552           1.94        0.033
 6.      0.602         0.520           3.89        0.065
 7.      0.189         0.617           1.11        0.019
 8.      1.379         0.755           5.37        0.076
 9.      1.816         1.016          44.80        0.469
10.      1.589         1.343          36.15        0.318
11.      1.714         1.033          44.44        0.446
12.      2.259         1.134          46.50        0.449
13.      3.468         1.275          50.43        0.418
14.      2.010         1.058          43.37        0.436
15.      2.968         1.178          48.65        0.437

         Cog.     Cog.        Cog.          Cog.        Cog.
         12       13          14            15          16
TaskNum  MSGD_NR  LGP_NR (*)  MinGlance_TR  Miss%_tdrt  RT_tdrt

 1.      0.433    0.019       0.009          5.31       401.4
 2.      0.330    0.023       0.001          7.68       507.4
 3.      0.357    0.025       0.001         15.08       698.4
 4.      0.386    0.018       0.154         17.15       608.9
 5.      0.344    0.006       0.155         14.67       590.9
 6.      0.430    0.013       0.256         20.20       537.5
 7.      0.274    0.006       0.179         15.77       577.9
 8.      0.490    0.020       0.378         13.91       582.8
 9.      1.057    0.052       0.382         18.75       589.7
10.      1.594    0.145       1.136         21.24       582.3
11.      1.085    0.075       0.441         23.98       639.4
12.      1.308    0.146       0.403         25.36       707.2
13.      1.339    0.144       0.238         23.28       677.6
14.      1.179    0.117       0.383         20.63       679.9
15.      1.283    0.131       0.306         21.96       641.9

         Cog.              Cog F.       Cog F.   Cog F.
         17                18           19       20
TaskNum  CompletionErrors  Glances%_RD  MSGE_RD  MaxGlance_RD

 1.      0                 93.3         5.98     10.71
 2.      0                 93.0         5.22      8.81
 3.      0                 92.0         3.10      6.29
 4.      4                 88.7         3.17      6.88
 5.      3                 89.4         3.15      6.04
 6.      6                 84.6         2.50      6.52
 7.      2                 91.0         3.02      6.30
 8.      1                 84.9         2.26      8.09
 9.      4                 50.6         1.04      2.34
10.      5                 58.0         1.30      1.96
11.      5                 48.8         0.94      2.02
12.      2                 46.1         0.94      2.27
13.      9                 44.6         0.97      3.10
14.      5                 49.7         1.05      2.12
15.      5                 45.0         0.95      2.94

         Cog F.        Cog F.
         21            22
TaskNum  MinGlance_RD  Glances%_other

 1.      2.674          6.64
 2.      2.540          6.99
 3.      0.964          7.96
 4.      0.787          7.80
 5.      1.407          8.62
 6.      0.364         11.50
 7.      0.710          7.90
 8.      0.234          9.78
 9.      0.296          4.56
10.      0.784          5.85
11.      0.267          6.75
12.      0.269          7.37
13.      0.126          4.99
14.      0.334          6.93
15.      0.145          6.34

Notes:
(*) Meets SAE J2988 [11] definition of Hands-Free as not requiring more
than 1 button press (e.g., to activate the voice recognition system).
(**) NHTSA Guidelines [12] glance metric, includes time it took to look
back at the windshield as per NHTSA Guidelines.
(***) Alliance Guidelines [13] Principle 2.1 A glance metric.


Table A2 gives the metric abbreviations and definitions for the dataset.

Table A2. Metric abbreviations and definitions.

     Demand       Metric
ID   Type         Abbreviation      Full Name

 1.  Physical     TaskTime          Task time

 2.  Physical     Glances_ED        Number of Gances
 3.  Physical     TGT_RD            Total Gance Time
 4.  Physical     TEORTNR (**)      Total Eyes-Cff-Road Time

 5.  Physical     Steps_man         Mnual
 6.  Physical     Glances_TR        Number of Gances

 7.  Physical     TGT_TR (***)      Total Gance Time

 8.  Cognitive    MaxGIance_TR      Miximum Glance Duration

 9.  Cognitive    MSGD_TR (***)     Man Single Gance Deration

10.  Cognitive    Glances %_TR      Percentage of total glances

11   Cognitive    RateGIances_TR    Gance Rate

12.  Cognitive    MSGD_NR"          Man Single Gance Deration

13.  Cognitive    LGP_NR (**)       Long Gance Proportion

14.  Cognitive    MnGIance_TR       Mnimum Gance Duration

15.  Cognitive    Mss %_tdit        TDRT percentage of misses

16.  Cognitive    RTtdrt            TORT response time

17.  Cognitive    CompletionErrors  Task completion errors

18.  Cog. Facil.  Glances %_RD      Percentage of total glances

19.  Cog. Facil.  MSGD_RD           Man Single Gance Deration
20.  Cog. Facil.  MixGlance_RD      Mximum Gance Duration
21   Cog. Facil.  MnGlance_RD       Mnimum Gance Duration
22.  Cog. Facil.  Glances % other   Percentage of total glances


     Glance    # Trials
ID   Targe     Averaged  Units

 1.  NA        3         s

 2.  Road      3         count
 3.  Road      3         s
 4.  Not Road  3         s

 5.  NA        NA        count
 6.  Device    3         count

 7.  Device    3         s

 8.  Device    3         s

 9.  Device    3         ms

10.  Device    3         percentage

11   Device    3         glances/s

12.  Not Road  3         ms

13.  Not Road  3         proportion

14.  Device    3         s

15.  NA        3         %

16.  NA        3         ms

17.  NA        NA        count

18.  Road      3         count

19.  Road      3         ms
20.  Road      3         s
21   Road      3         s
22.  Other     3         percentage



ID   Metric Definition (****)

 1.  Total Deration of the task in one trial,
     as measured by the start and stop signals
 2.  Number of glances made to the windshield
 3.  Summed duration of all glances to the windshield
 4.  Summed duration of all glances away from
     the windshield during trial 1 (**)
 5.  Analytic number of manual steps (*)
 6.  Number of glances to the display or controls
     of the device under test
 7.  Summed duration of all glances to display or
     controls of the device under test (***)
 8.  Mximum glance duration made to the display or
     controls of the device under test
 9.  Man duration of all single glances to the
     display or controls of the device under test (***)
10.  Percentage of total glances during TaskTime
     which were task-related
11   Rate of glances during TaskTime which
     were task-related
12.  Man Single Glance Duration away from windshield
     during trial 1 (**)
13.  Proportion of Glances >2 seconds away from
     windshield during tiiall (**)
14.  Mnimum single glance duration made to the display
     or controls of the device under test
15.  Man percentage of misses to Tactile Detection
     Response Task events
16.  Man Response Time for hits to Tactile Detection
     Response Task events
17.  Sum of all trials in which a participant failed
     to complete task successully and started over
18.  Percentage of all glances during TaskTime
     to windshield
19.  Man time a participant glanced to the windshield.
20.  Mximum glance duration made to the windshield.
21   Mnimum glance duration made to the windshield.
22.  Percentage of total glances during TaskTime to
     locations other than road car device

Notes:
(*) Minimum number of hard button presses, knob turns (each turn
requiring a separate grasp counts as a step), and screen touches
required to complete task according to analysis of the design.
(**) NHTSA Guidelines [12] glance metric, includes time it took to look
back at the windshield as per NHTSA Guidelines.
(***) Alliance Guidelines [13] glance metric.
(****) A11 metrics are during TaskTime, and are the average of the
participant's mean across three trials except where indicated.


APPENDIX B. METRIC CORRELATION RESULTS

Table B1 shows the results of the correlations between the metrics in the Appendix A, Table A1 dataset.

Table B1. Summary of correlations between metrics in Table A1. Bolded
correlations are statistically significant at p < 0.05, df = 13. Bold
italicized correlations are statistically significant negative
correlations at p < 0.05, df = 13.

                  1         2           3       4              5
                  TaskTime  Glances_RD  TGT_RD  TEORT_NR (**)  Steps-man

TaskTime           1         0.98        0.87    0.68           0.60
Glances_RD         0.98      1           0.80    0.74           0.66
TGT_RD             0.87      0.80        1       0.23           0.13
TEORT_NR (***)     0.68      0.74        0.23    1              0.99
Steps_man          0.60      0.66        0.13    0.99           1
Glance s_TR        0.61      0.68        0.13    0.99           0.99
TGT_TR (***)       0.59      0.66        o.11    0.99           0.99
MaxGlance_TR       0.37      0.49       -0.09    0.85           0.84
IVEG_DTR (***)     0.06      0.21       -0.32    0.59           0.58
Glances %_TR       0.00      0.13       -0.45    0.64           0.65
Ratedances _TR    -0.05      0.09       -0.47    0.57           0.58
MSGD_NR (**)      -0.05      0.07       -0.45    0.55           0.57
LGP_NR (**)        0.03      0.13       -0.39    0.61           0.64
Mndance_TR        -0.26     -o.14       -0.38    0.05           0.04
Mss%_tdrt         -0.03      0.16       -0.37    0.49           0.49
RT_tdrt            0.04      0.19       -0.22    0.41           0.42
CompletionErrors   0.20      0.31       -0.17    0.65           0.65
Giances %_RD      -0.01     -0.15        0.43   -0.64          -0.65
MSGD_RD           -0.05     -0.24        0.30   -0.53          -0.51
MaxGlance_RD       0.25      0.07        0.62   -0.42          -0.44
Mndance_RD        -0.24     -0.43        0.02   -0.49          -0.46
Glances %_other    0.15      0.15        0.47   -0.38          -0.45

                  6           7            8             9
                  Glances_TR  TGT_TR (**)  MaxGlance_TR  MSGD_TR (***)

TaskTime           0.61        0.59         0.37          0.06
Glances_RD         0.68        0.66         0.49          0.21
TGT_RD             0.13        0.11        -0.09         -0.32
TEORT_NR (***)     0.99        0.99         0.85          0.59
Steps_man          0.99        0.99         0.84          0.58
Glance s_TR        1           1.00         0.89          0.64
TGT_TR (***)       1.00        1            0.87          0.62
MaxGlance_TR       0.89        0.87         1             0.88
IVEG_DTR (***)     0.64        0.62         0.88          1
Glances %_TR       0.72        0.70         0.92          0.88
Ratedances _TR     0.66        0.64         0.89          0.87
MSGD_NR (**)       0.62        0.62         0.87          0.89
LGP_NR (**)        0.67        0.67         0.86          0.82
Mndance_TR         0.09        0.08         0.45          0.75
Mss%_tdrt          0.54        0.53         0.75          0.83
RT_tdrt            0.45        0.44         0.57          0.52
CompletionErrors   0.66        0.67         0.70          0.72
Giances %_RD      -0.72       -0.70        -0.92         -0.89
MSGD_RD           -0.58       -0.55        -0.82         -0.90
MaxGlance_RD      -0.50       -0.49        -0.76         -0.87
Mndance_RD        -0.53       -0.49        -0.70         -0.75
Glances %_other   -0.45       -0.48        -0.49         -0.43

                  10           11              12            13
                  Glances%_TR  RateGlances_TR  MSGD_NR (**)  LGP_NR (**)

TaskTime           0.00        -0.05           -0.05          0.03
Glances_RD         0.13         0.09            0.07          0.13
TGT_RD            -0.45        -0.47           -0.45         -0.39
TEORT_NR (***)     0.64         0.57            0.55          0.61
Steps_man          0.65         0.58            0.57          0.64
Glance s_TR        0.72         0.66            0.62          0.67
TGT_TR (***)       0.70         0.64            0.62          0.67
MaxGlance_TR       0.92         0.89            0.87          0.86
IVEG_DTR (***)     0.88         0.87            0.89          0.82
Glances %_TR       1            0.99            0.94          0.89
Ratedances _TR     0.99         1               0.92          0.84
MSGD_NR (**)       0.94         0.92            1             0.96
LGP_NR (**)        0.89         0.84            0.96          1
Mndance_TR         0.54         0.54            0.73          0.61
Mss%_tdrt          0.78         0.79            0.74          0.71
RT_tdrt            0.56         0.56            0.48          0.55
CompletionErrors   0.64         0.61            0.60          0.55
Giances %_RD      -1.00        -0.99           -0.94         -0.89
MSGD_RD           -0.84        -0.85           -0.78         -0.71
MaxGlance_RD      -0.90        -0.90           -0.86         -0.79
Mndance_RD        -0.62        -0.64           -0.53         -0.48
Glances %_other   -0.64        -0.60           -0.61         -0.56

                  14            15          16       17
                  MinGlance_TR  Miss%_tdrt  RT_tdrt  CompletionErrors

TaskTime          -0.26         -0.03        0.04     0.20
Glances_RD        -o.14          0.16        0.19     0.31
TGT_RD            -0.38         -0.37       -0.22    -0.17
TEORT_NR (***)     0.05          0.49        0.41     0.65
Steps_man          0.04          0.49        0.42     0.65
Glance s_TR        0.09          0.54        0.45     0.66
TGT_TR (***)       0.08          0.53        0.44     0.67
MaxGlance_TR       0.45          0.75        0.57     0.70
IVEG_DTR (***)     0.75          0.83        0.52     0.72
Glances %_TR       0.54          0.78        0.56     0.64
Ratedances _TR     0.54          0.79        0.56     0.61
MSGD_NR (**)       0.73          0.74        0.48     0.60
LGP_NR (**)        0.61          0.71        0.55     0.55
Mndance_TR         1             0.55        0.20     0.41
Mss%_tdrt          0.55          1           0.76     0.73
RT_tdrt            0.20          0.76        1        0.37
CompletionErrors   0.41          0.73        0.37     1
Giances %_RD      -0.54         -0.81       -0.58    -0.65
MSGD_RD           -0.61         -0.94       -0.76    -0.68
MaxGlance_RD      -0.64         -0.90       -0.72    -0.65
Mndance_RD        -0.44         -0.87       -0.73    -0.63
Glances %_other   -0.24         -0.17       -0.16    -0.24

                  18            19       20            21
                  GlancesS%_RD  MSGD_RD  MaxGlance_RD  MinGlance_RD

TaskTime          -0.01         -0.05     0.25         -0.24
Glances_RD        -0.15         -0.24     0.07         -0.43
TGT_RD             0.43          0.30     0.62          0.02
TEORT_NR (***)    -0.64         -0.53    -0.42         -0.49
Steps_man         -0.65         -0.51    -0.44         -0.46
Glance s_TR       -0.72         -0.58    -0.50         -0.53
TGT_TR (***)      -0.70         -0.55    -0.49         -0.49
MaxGlance_TR      -0.92         -0.82    -0.76         -0.70
IVEG_DTR (***)    -0.89         -0.90    -0.87         -0.75
Glances %_TR      -1.00         -0.84    -0.90         -0.62
Ratedances _TR    -0.99         -0.85    -0.90         -0.64
MSGD_NR (**)      -0.94         -0.78    -0.86         -0.53
LGP_NR (**)       -0.89         -0.71    -0.79         -0.48
Mndance_TR        -0.54         -0.61    -0.64         -0.44
Mss%_tdrt         -0.81         -0.94    -0.90         -0.87
RT_tdrt           -0.58         -0.76    -0.72         -0.73
CompletionErrors  -0.65         -0.68    -0.65         -0.63
Giances %_RD       1             0.86     0.90          0.65
MSGD_RD            0.86          1        0.93          0.92
MaxGlance_RD       0.90          0.93     1             0.73
Mndance_RD         0.65          0.92     0.73          1
Glances %_other    0.59          0.26     0.47         -0.01

                  22
                  Glances%_other

TaskTime           0.15
Glances_RD         0.15
TGT_RD             0.47
TEORT_NR (***)    -0.38
Steps_man         -0.45
Glance s_TR       -0.45
TGT_TR (***)      -0.48
MaxGlance_TR      -0.49
IVEG_DTR (***)    -0.43
Glances %_TR      -0.64
Ratedances _TR    -0.60
MSGD_NR (**)      -0.61
LGP_NR (**)       -0.56
Mndance_TR        -0.24
Mss%_tdrt         -0.17
RT_tdrt           -0.16
CompletionErrors  -0.24
Giances %_RD       0.59
MSGD_RD            0.26
MaxGlance_RD       0.47
Mndance_RD        -0.01
Glances %_other    1

Notes:
(**) NHTSA Guidelines [12] glance metric.
(***) Alliance Guidelines [13] glance metric.


Table B1 shows that within the three clusters identified by cluster analysis as in Figure 3 in the main body, the correlations are especially strong in the positive direction (boxed values). The metrics in the physical demand cluster (grey metric names) are tightly correlated with one another (grey colored cells, average correlation 0.73), and not as highly correlated with metrics in the other two clusters. The metrics in the cognitive demand cluster (red metric names) are also tightly correlated with one another (red colored cells, average correlation 0.73), and not as highly correlated with metrics in the physical demand cluster (grey metric names). The correlations between the cognitive facilitation cluster (metric names shaded green) are negatively correlated with those in the cognitive demand cluster (metrics shaded yellow, average correlation -0.69), providing further confirmation that the metrics in these two clusters are correctly identified as opponent processes. The two clusters only form one dimension between them, with the cognitive demand metrics loading on the positive end of the cognitive dimension, and the cognitive facilitation metrics loading on the negative end of the cognitive dimension.

APPENDIX C. INTERACTIONS BETWEEN GLANCE METRICS

A second companion paper [10] analyzes the NHTSA Guidelines [12] glance metrics of TEORT, LGP, and MSGD that are also included in the dataset examined here. However, it is not well recognized that there are interactions between these glance variables that can bias the results for any of the metrics if the data are examined only a single metric at a time, as is common practice in the driving safety literature. For example, tasks with higher proportions of long single glances (LGP) inherently give rise to longer mean single glance durations (MSGD), because MSGD is an average across all glances, including long glances. In other words, if a task gives rise to a relatively higher LGP, then MSGD will arithmetically increase compared to when there was a relatively smaller LGP. The converse is also true: if a task has a higher MSGD, then there is a higher probability that some glances will be longer than 2 s, increasing the LGP. This inherent interaction effect between the two glance metrics of LGP and MSGD is especially strong if there are relatively few glances. For example, a short duration task with only two glances, each longer than two seconds, will have the maximum possible LGP of 1 and also MSGD will be greater than 2 s.

As a result of this intrinsic interaction, there is the possibility of an indirect effect of the LGP on the correlation between MSGD and TEORT. Or conversely, there could be am indirect effect of MSGD on the correlation between LGP and TEORT. For example, long single glances have a potentially direct correlation with TEORT, because they are summed (along with all other glance durations) into TEORT. However, they also have a direct correlation with MSGD, as mentioned. Because MSGD is also correlated with TEORT, LGP has both a direct causal effect on TEORT but it may also have an indirect causal effect by acting on MSGD, which in turn acts on TEORT. If the correlation between LGP and MSGD is sufficiently high, it can bias the mean TEORT metric due to such interactions. That is, there could be an inherent bias of the TEORT glance metric when individually considered. To avoid such bias, it is necessary to consider the glance metrics as a group, including their interactions. The TEORT values can then be adjusted, to eliminate the bias, by controlling for the interaction effects using statistical procedures.

It is also possible that any interaction effects on TEORT might be different for Mixed-Mode and Visual-Manual tasks, a question also examined in this Appendix. As a result, any method that attempts to set criteria based on single metrics, has the potential to be invalid, because it could make both false positive and false negative errors due to ignoring interaction effects between metrics. PCA, like multiple regression, minimizes or eliminates the effects of such interactions. The Dimensional Model, since it is based on the PCA method, thus eliminates the interactions between the scores of the tasks on the dimensions, and hence eliminates the effects of such interactions.

Evidence for these potential interaction effects between glance variables are examined in this Appendix.

The paradox to be explained is how raw TEORT values can be lower for the same task with Mixed-Mode vs. Visual-Manual mode (e.g., Figure C3A), when the physical demand scores are higher for Mixed-Mode tasks than Visual-Manual (e.g., Figure C3B). The hypothesis is that MSGD and LGP produce an interaction effect on TEORT which is different for Visual-Manual vs. Mixed-Mode tasks.

If there are three variables which are all correlated with one another, then an interaction effect may exist. For example, the correlation between two of the variables can become greater or less if the effect of the third variable is removed. In particular, if MSGD and LGP are highly correlated, each could produce a suppressive or enhancement effect on TEORT depending upon the intercorrelations among the three glance variables. In the usual glance data collection methods, the three glance variables are separately measured and assumed to be independent. However, if there is an indirect interaction effect of MSGD and/or LGP on TEORT, then the value of TEORT could appear to be different than it actually is, due to the interaction effect. That is, if the interaction effect were removed by partialing it out, the adjusted value of TEORT could be higher or lower than the crude observed value.

This interaction hypothesis would explain why mixed-mode tasks have lower physical demand scores than visual-manual tasks with principal component analysis than they do (which removes the effects of shared variances or intercorrelations between variables, just as does multiple logistic regression analysis),.

Note: This Appendix now goes into some technical material on interaction effects in multivariate datasets. There are well-established methods in statistics to determine if there are such interactions between variables, and to remove them. The analysis methods follow those in the well-known multivariate statistics textbook by Cohen et al. [18], and the reader is referred to Section 3.3 of that textbook for explanation of the methods used here.

Evidence for interaction effects is examined first for all paired Visual-Manual and Mixed-Mode tasks, and then Visual-Manual and Mixed-Mode tasks are examined separately.

ALL PAIRED TASKS

Table C1 gives the paired task data from the NHTSA Guidelines [12] glance metrics (TEORT, MSGD, and LGP) extracted from Table A1 (TEORT = TEORT_NR** column 4; MSGD = MSGD_NR** column 12; LGP = LGP_NR* column 13. These are the same glance data analyzed in the second companion paper [10]. The correlation matrix is given at the bottom of Table C1. The bolded green cells indicate p < 0.05.

Table C1. Summary of NHTSA glance metric data for all paired tasks,
extracted from Appendix Table A1.

ID    Mode    Task                 MSGD      LGP        TEORT

 5.   Manual  Etestination Cancel     1.08      0.0748     6.6956
 6.   Manual  Etestination Entry      1.34      0.1440    36.0570
 7.   Manual  Contact Calling         1.06      0.0520     8.0272
 8.   Manual  Radio Preset            1.59      0.1450     2.7172
 9.   Manual  Radio Tuning            1.31      0.1460     9.1811
11    Mixed   Etestination Cancel     0.43      0.0131     4.8289
12.   Mxed    Etestination Entry      0.49      0.0203     9.7872
13.   Mxed    Contact Calling         0.37      0.0177     2.9467
14.   Mixed   Radio Preset            0.34      0.0061     3.8989
15.   Mxed    Radio Tuning            0.27      0.0059     1.3094
              MSGD                 1000         0.952      0.428
              LGP                     0.952  1000          0.516
              TEORT                   0.428     0.516   1000


Figure C1 shows the fitted line plots of the three possible pairs of TEORT, MSGD, and LGP.

It can be seen in plots Figure C1A and C1B that the regression line (red) rises slightly, and there is a relatively wide 95% confidence band (dashed lines), reflecting the relatively low correlations. In Figure C1A, the correlation between TEORT and MSGD is (r = 0.428, df = 8, p = 0.22). In Figure C1B, the correlation between TEORT and LGP is (r = 0.516, df = 8, p = 0.13). Figure C1C shows that all the Mixed-Mode tasks (blue circles) have a lower score on the x-axis (LGP) and a lower score on the y-axis (MSGD), contributing to the statistically significant correlation between MSGD and LGP (r = 0.952, df = 8, p = 0.00002).

Figure C2 shows the results of the PCA of the three glance metrics for this dataset.

Figure C2A shows that for the 10 paired tasks (5 mixed and 5 Visual-Manual), the LGP and MSGD metrics loaded higher on Dimension 1 (attentional effects of cognitive demand on the x-axis) and lower on Dimension 2 (physical demand on the y-axis). The attentional effect of cognitive demand automatically plots on the primary x-axis because it explains the highest amount of variance rather than the y-axis. That is, the attentional effect of cognitive demand is the dominant dimension for this particular set of tasks and metrics, explaining 76% of the total variance. TEORT loaded onto the physical dimension (y-axis), which is now the minor dimension, explaining 22% of the total variance, which does not overlap with the 76% explained by the first dimension.

Figure C2B shows that every Visual-Manual task (red squares) scored higher on the attentional effects of cognitive demand (x-axis) than every Mixed-Mode task (blue circles). This result is consistent with the CAMP-DWM study that found higher RTs to CHMSL activation in a forward vehicle during Visual-Manual tasks compared to Auditory-Vocal tasks, as shown in [2, Figure 6]. However, the higher RT for Visual-Manual tasks in the CAMP-DWM study could have been caused by having the eyes off the road, because the events in the CAMP-DWM study were a CHMSL activation or a deceleration of the lead vehicle. The attentional effects of cognitive demand in the current study were measured using the tactile DRT, which is not affected by eyes off the road [4]. That is, the dimensional analysis in the current study shows that the reduced RTs for Mixed-Mode vs. Visual-Manual tasks are caused by a reduced cognitive demand, not by eye glance effects. That is, this analysis indicates that for Mixed-Mode and Visual-Manual tasks, LGP, MSGD, and RT provide consistent measures of the attentional effects of cognitive demand.

Note in Figure C2B that the Mixed-Mode tasks (blue circles) tended to score higher on the second dimension (loading with TEORT) than all the Visual-Manual tasks (red squares), with the exception of manual Destination Entry (red square in upper right corner). This result appears contradictory to the TEORT mean data plotted on the y-axis in Figures C1A and C1B, which show that the Mixed-Mode tasks (blue circles) have generally lower or equal TEORT values to the Visual-Manual tasks (red squares).

The anomaly is illustrated for the Radio-Tuning task in Figure C3, which compares radio-tuning values for the crude TEORT means (Figure C1A, y-axis) vs. the scores on the second dimension (physical demand) (Figure C2B, y-axis). Figure C3A shows that the crude mean TEORT values are lower (better) for Mixed-Mode compared to Visual-Manual, suggesting an improvement for a Mixed-Mode interface. However, Figure C3B shows a lower (better) score in the second dimension (in this case, the physical demand dimension, which loads with TEORT) for Visual-Manual vs. Mixed-Mode, exactly the opposite pattern seen in Figure C3A.

Hence, the first question to be addressed is, "How can a Mixed-Mode task have a relatively lower crude TEORT mean than a Visual-Manual task, and yet have a relatively higher score on Physical Demand (which loads with TEORT) than a Visual-Manual task? An example of the anomaly is shown in Figure C3 for the Radio-Tuning task.

A related second question to be addressed is, "How can the same task performed in Mixed-Mode, with virtually no manual interaction and with only TEORT loading highly on the physical demand dimension, score more highly on the physical demand dimension than the paired task in Visual-Manual mode, which has more manual steps and more TEORT?" (For example, compare Mixed-Mode blue bar and Visual-Manual red bar in Figure C3B).

These questions are given a preliminary answer by a causal analysis of the glance metrics for the 5 paired tasks, which reveals that there are interaction effects of LGP and MSGD that bias the crude TEORT means.

Causal Analysis of Glance Metrics for Pooled Paired Mixed-Mode and Visual-Manual Tasks

To reiterate, the goal is to explain the apparent contradiction between: (1) the crude mean TEORT values which tend to be lower for Mixed-Mode interfaces than Visual-Manual interfaces for paired tasks (see locations on y-axis in Figures C1A and C1B; Figure C3A); and (2) the physical demand scores wherein Mixed-Mode interfaces tend to score higher than Visual-Manual interfaces for paired tasks (see locations y-axis in Figure C2A; Figure C3B). (Note: paired tasks are defined here as tasks that have the same start and end point.) The hypothesis is that MSGD and LGP have a confounding effect on TEORT that is different for Mixed-Mode vs. Visual-Manual tasks. In other words, the contradiction may arise from a differential bias in the TEORT metric arising indirectly from the strong interaction between LGP and MSGD, from which LGP can give rise to a confounding effect on TEORT by acting indirectly through MSGD (and likewise by MSGD acting indirectly through LGP on TEORT).

This question is first addressed by partialing out the indirect effects using partial correlation analysis, using the methods provided by Cohen et al. [18, Equations 3.3.11, p. 74], and verified in Stata 13.1 with the pcorr command (see Figure C4) on the mean data in Table C1.

Figure C4 shows the causal glance diagram for the pooled Visual-Manual and Mixed-Mode modalities, using the data in Table C1. The crude correlations are (b,d) and the adjusted (c,e).

Crude Effects

The crude effect of MSGD on TEORT (0.4275, d) is positive, meaning that longer MSGD values during a task may cause longer TEORT values for that task. The crude effect of LGP on TEORT (0.5163, b) is also positive, meaning that longer LGP values during a task may cause longer TEORT values for that task. However, these causal effects are not necessarily valid, because of indirect interaction effects from the strong positive causal relation between LGP and MSGD (0.9519, a) that may bias the crude effects either too high or too law compared to the true effects. However, these crude effects on TEORT may be influenced by bias from indirect interactions from MSGD and LGP.

Adjusted Effects

Partial correlation analysis is used to remove the interaction effects. The adjusted effect of MSGD on TEORT (-0.2438, e, orange box), after removal of the influence of LGP on MSGD (0.952, a, green box), which biased the crude MSGD effect on TEORT too high. The correlation is substantially less than the crude effect of MSGD on TEORT (0.4275, d). The MSGD effect on TEORT even reverses sign from positive to negative. This result means that the apparent positive causal effect of MSGD on TEORT (0.4275, d) is entirely due to the enhancement effect on MSGD of LGP (0.9519, a). The enhanced MSGD in turn indirectly causes a positive effect of MSGD on TEORT (0.4275, d). Likewise, the adjusted effect of LGP on TEORT (0.3948, c), after removal of the influence of MSGD on LGP (0.9519, a, green box), is smaller than the crude effect of LGP on TEORT (0.5163, b). This result indicates a second causal bias pathway that can indirectly effect TEORT. This type of redundancy is described in Cohen et al. [18, Figure 3.4.1, Model A] as "by far the most common pattern of relationship" in the behavioral sciences when there multiple variables that interact with each other.

These results indicate that the TEORT will be different in the crude metric than its true value. It is necessary to remove the biases due to interactions between glance variables to obtain a valid (unbiased) estimate of TEORT.

Note that the causal analysis is actually analyzing the causes of the underlying effects in the observed metrics, and is not simply a mathematical manipulation of correlation coefficients. The use of correlational analysis to deduce causes is explained in [18, Section 3.4], and the current analysis follows the treatment given there. Similarly, the methods to define and analyze the interaction effects among three variables using correlation/regression methods is further described in [18, Section 3.4], as mentioned.

Figure C1C illustrated the strong correlation of 0.9519 (a) between MSGD and LGP. This effect is shown as a mutual causal relation between MSGD and LGP by the two-headed arrow in Figure C4. Thus, a task and driver-vehicle interface that results in the driver having a relatively higher proportion of longer glances, will also cause the MSGD to increase, just due to arithmetic. Similarly, a task and driver-vehicle interface that results in an overall increase in the MSGD (i.e., the glances become relatively longer on average), will also, statistically, tend find that drivers will have more glances exceeding a 2 s duration, thereby increasing LGP. For example, in the extreme case, if all glances are longer than 2 s, then there will be a high MSGD and then the LGP will be at its maximum value of 1.

Analysis of the causal diagram in Figure C4 indicates that this strong intrinsic mutual causative relation between MSGD and LGP causes an indirect upward bias on TEORT. The causal chain is that increases in LGP increase MSGD, which in turn increases TEORT. The bias arising from LGP is indirect, because it occurs as a result of the causative effects on MSGD on LGP, which then in turn increases TEORT. The crude effect size of the effect of MSGD on TEORT is 0.4275 (d), but when the indirect effect of LGP is removed with partial correlation methods, the adjusted effect size declines to an adjusted r value of -0.2438 (e, orange cell in Figure C3).

In a similar manner, Figure C4 indicates that there is also a slight decline in the effect of LGP on TEORT after removal of interactions, from 0.5163 (b) to 0.3948 (c), suggesting an indirect effect on TEORT from the effect on LGP by MSGD. This indirect effect of MSGD on the correlation of LGP with TEORT was not as strong as the reverse indirect effect of LGP on the correlation between MSGD and TEORT. It is conceivable that the removal of the interaction effect may have been the cause of the Destination Entry task moving slightly to the right (increased attentional effect of cognitive demand) compared to Radio Tuning on the cognitive demand x-axis in Figure C2B in the dimensional analysis, compared to its original position on the x-axis (crude LGP) in Figures C1B and C1C.

The analysis of the pooled tasks in this section is complicated by the fact that Mixed-Mode tasks and Visual-Manual tasks are combined in the same analysis, which obscures possible heterogeneity between them. In the next two sections, separate analyses are conducted on Visual-Manual and Mixed-Mode tasks, to see if they have the hypothesized differences in the causal relationships between the three glance metrics.

VISUAL-MANUAL TASKS

Table C2 gives the Visual-Manual task data from the NHTSA Guidelines [12] glance metrics (TEORT, MSGD, and LGP) extracted from Table A1.

These are the same Visual-Manual task glance data analyzed in the second companion paper [10]. The correlation matrix is given at the bottom of Table C2. The yellow cells indicate a tendency for statistical significance (p < 0.1).

Table C2. Summary of off-road glance metric data for Visual-Manual
tasks, extracted from Appendix Table A1

ID  Mode    Task                MSGD   LGP     TEORT

5.  Manual  Destination Cancel  1.08   0.0748   6.6956
6.  Manual  Destination Entry   1.34   0.1440  36.0570
7.  Manual  Contact Calling     1.06   0.0520   8.0272
8.  Manual  Radio Preset        1.59   0.1450   2.7172
9.  Manual  Radio Tuning        1.31   0.1460   9.1811
            MSGD                1.000  0.856    0.037
            LGP                 0.856  1.000    0.330
            TEORT               0.037  0.330    1.000


Figure C5 plots the three possible pairs of the three NHTSA Guidelines [12] glance metrics (TEORT, MSGD, and LGP) as in Figure C1, but just for the five paired Visual-Manual tasks (dropping the Mixed-Mode tasks).

The results in Figure C6 for Visual-Manual tasks are similar to those in Figure C2 for the Mixed-Mode tasks combined with the Visual-Manual tasks. That is, the Visual-Manual tasks apparently "dominate" the pooled mixture after adding in the Auditory-Vocal tasks, because the combined task Figure C2 resembles the Visual-Manual Figure C6, more than it resembles the Mixed-Mode PCA results, as will be shown in the next Section. In both Figure C6A (Visual-Manual tasks) and Figure C2A (all tasks) the TEORT axis points upward (along the y-axis), and the LGP and MSGD metrics point to the right on the x-axis. Again, the tasks Contact Calling and Destination Cancel are slightly higher on the y-axis in Figure C6B for the PCA scores than Radio Tuning, but they are approximately equal in the crude mean plots in Figures C6A and C5B, suggestive of an interaction effect that is biasing the crude TEORT scores downward on the y-axis in Figures C5A and C5B.

Causal Analysis of Glance Metrics for Visual-Manual Tasks

A contradiction again exists (as there was for all tasks in the previous section) between (1) the PCA score locations on the y-axis in Figure C6B being relatively higher than Visual-Manual Radio Tuning, and (2) the mean values for TEORT shown on the y-axis of Figures C5A and C5B being relatively lower than Visual-Manual Radio Tuning. We hypothesize that this contradiction arises for the Visual-Manual tasks (as it did for the pooled tasks in the previous section), because of an indirect interaction effect from the strong correlation between LGP and MSGD, on the effects of LGP and MSGD on TEORT. These effects are illustrated in Figure C7. This causal diagram for the glance metrics for the 5 tasks with Visual-Manual modalities helps explain and adjust for this interaction effect.

Crude Effects

The crude effect of MSGD on TEORT (0.0367, d) is near zero, indicating little or no effect of MSGD on TEORT in the absence of bias. This relative orthogonality of MSGD and TEORT for Visual-Manual tasks was also evident in the Visual-Manual task examples in the companion paper [1]. The crude effect of LGP on TEORT (0.3302, b) is positive, meaning that longer LGP values during a Visual-Manual task cause longer TEORT values, in the absence of bias. However, these crude effects on TEORT may be influenced by bias from indirect effects on TEORT from MSGD on LGP, and LGP on MSGD.

Adjusted Effects

The adjusted effect of MSGD on TEORT (0.5052, e), after removal of the indirect influence of LGP on MSGD (0.8565, a, yellow box), is larger than the crude effect of MSGD on TEORT (0.0367, d). This result suggests that the apparent lack of correlation of MSGD with TEORT for Visual-Manual tasks (0.0367, d) may be a biased result. The correlation between LGP and MSGD (r = 0.8565, df = 3, p = 0.06, yellow box) is high. We make the plausible assumption that this influence is causal, and not simply an association, because if LGP or MSGD increases, it follows just from the arithmetic that the other must also increase. Removal of this enhancement effect of LGP on MSGD raises the magnitude of the MSGD effect on TEORT to 0.5052 (e) from near 0 (0.0367, d). This result suggest that MSGD may influence TEORT more than is evident in the crude TEORT magnitude, where the confounding influence of LGP indirectly suppresses the MSGD effect on TEORT. Similarly, the magnitude of the adjusted effect of LGP on TEORT (0.5792, c), after removal of the influence of MSGD on LGP (0.8565, a, yellow box), is also greater than its crude effect (0.3302, b). This result means that the LGP effect on TEORT is also enhanced when the confounding influence of MSGD is removed.

MIXED-MODE TASKS

Table C3 gives the Mixed-Mode task data from the NHTSA Guidelines [12] glance metrics (TEORT, MSGD, and LGP) extracted from Table A1.

These are the same Mixed-Mode task glance data analyzed in the second companion paper [10]. The correlation matrix is given at the bottom of Table C3. The yellow cells indicate a tendency for statistical significance (p < 0.1) and the bolded green cells indicate p < 0.05.

Table C3. Summary of NHTSA off-road glance metric means for Mixed-Mode
tasks, extracted from Appendix Table A1.

ID   Mode   Task                MSGD   LGP     TEORT

11.  Mixed  Destination Cancel  0.43   0.0131  4.8289
12.  Mixed  Destination Entry   0.49   0.0203  9.7872
13.  Mixed  Contact Calling     0.37   0.0177  2.9467
14.  Mixed  Radio Preset        0.34   0.0061  3.8989
15.  Mixed  Radio Tuning        0.27   0.0059  1.3094
            MSGD                1.000  0.821   0.916
            LGP                 0.821  1.000   0.692
            TEORT               0.916  0.692   1.000


Figure C8 plots the three possible pairs of the three NHTSA Guidelines [12] glance metrics (TEORT, MSGD, and LGP) as in Figure C1, but just for the five paired Mixed-Mode tasks (dropping the Visual-Manual tasks).

The regression lines rise along the x-axis for all three Mixed-Mode plots as they did for all tasks combined in Figure C1, rather than being relatively flat as they were for the Visual-Manual tasks in Figures C5A and C5B. The increase in TEORT with MSGD in Figure C8A (r = 0.916, df = 3, p = 0.03) is also not consistent with the four Examples for Visual-Manual tasks in [1], nor the Visual-Manual task plots in Figures C5A and C5B, where TEORT was independent (i.e., near zero correlation) of MSGD and LGP. In other words, unlike Visual-Manual tasks, this result for Mixed-Mode tasks indicates that TEORT is correlated with MSGD, and TEORT is no longer entirely predictable solely by the number of glances off the road, without having to consider MSGD. Furthermore, for Mixed-Mode tasks, the crude MSGD is not orthogonal with TEORT, whereas the crude MSG was orthogonal for Visual-Manual tasks. For Mixed-Mode tasks, unlike Visual-Manual tasks, MSGD is thus not an independent measure of the attentional effects of cognitive demand; it appears to be a measure of physical demand, as is TEORT [1]. The PCA results in Figure C9 are consistent this finding.

Figure C9A shows that the PCA loadings for the 5 Mixed-Mode tasks are quite different compared to the combined tasks in Figure C2A, and the Visual-Manual tasks in Figure C6A. In the loadings plot in Figure C9A, MSGD now loads closer to TEORT than LGP on the x-axis (here, physical demand), not with LGP on the y-axis as it did for all the tasks in Figure C2A, and the Visual-Manual tasks in Figure C6A.

Removal of these interaction effects by using the PCA method causes the Mixed-Mode task scores in this overall analysis to rise to an adjusted slightly higher location on the y-axis (physical demand) as shown in Figure C2B compared to paired Visual-Manual tasks. In essence, adding a display screen to an Auditory-Vocal interface to create a Mixed-Mode interface has a trade-off. While the attentional effects of cognitive demand are reduced, there is a slight increase in physical demand for a Mixed-Mode interface. While the reductions in the attentional effects of cognitive demand are reflected in reductions in the crude LGP values, the increases in TEORT are masked in the crude TEORT mean by the effects of MSGD on LGP. When these effects are reduced by partial correlation analysis, or by PCA, then the TEORT values rise to their "true" levels, and the underlying relative increase in physical demand from Mixed-Mode tasks compared to the paired Visual-Manual Tasks is more evident.

Exactly why Mixed-Mode tasks show an increase in physical demand compared to paired Visual-Manual tasks, when there are fewer glances to the displays and controls, and fewer button presses for the Mixed-Mode task than the Visual-Manual task, may not be objectively determinable from a consideration of just the glance metrics, and requires further research. An objective answer to this question may lie in a deeper analysis that considers other metrics, a number of which are also physical demand variables that are not glance-based, and therefore are not subject to the confounding effects of the complex interactions between the eye glance metrics. As one example, all the Mixed-Mode tasks have substantially longer task times than the paired Visual-Manual tasks. This approach is explored more fully in a later section.

Physical Demand of Mixed-Mode Tasks

In Figure C9A, the physical demand x-axis loading with both MSGD and TEORT now dominates the explanation for the overall variance for Mixed-Mode tasks, forming 87% of the total variance, compared to only 11% for the second dimension (formed by LGP) for Mixed-Mode tasks. This result is opposite the previous result in Figure C2A for all tasks, and Figure C6A for Visual-Manual tasks. There, the TEORT metric (associated with physical demand) loaded onto the y-axis, and the LGP and MSGD metrics (associated with the attentional effects of cognitive demand) loaded onto the x-axis. In other words, with Mixed-Mode tasks by themselves as shown in Figure C9A, TEORT and MSGD load high on the x-axis and low on the y-axis, and LGP loads high on both axes. Again, this result is the reverse of what was found for Visual-Manual tasks in Figure C6A and for all tasks in Figure C2A, where MSGD was paired with LGP on cognitive demand, which was on the x-axis, and TEORT was by itself on the physical demand y-axis. In short, MSGD indicates increased physical demand for Mixed-Mode tasks, but increased cognitive demand for Visual-Manual tasks or all tasks combined.

These differences in the loading plot for Mixed-Mode tasks are reflected in the PCA score plot of Mixed-Mode tasks. In the Mixed-Mode task score plot in Figure C9B, the four other Mixed-Mode tasks are all relatively higher than the Mixed-Mode Radio Tuning task on the x-axis (physical demand, loading with TEORT and MSGD). This is consistent with the crude mean plots in Figures C8A and C8B with TEORT on the x-axis, and in Figure C8C with MSGD on the x-axis, where the other four Mixed-Mode tasks are all further to the right (relatively higher physical demand) than Mixed-Mode Radio Tuning. Again, note that for Mixed-Mode tasks both MSGD and TEORT are physical demand metrics as shown in Figure C9A. In other words, the Dimensional Model indicates, and the crude MSGD and TEORT mean values are consistent, in that all other Mixed-Mode tasks are relatively higher than Mixed-Mode Radio Tuning in TEORT and MSGD, indicating they are higher in physical demand than Mixed-Mode Radio Tuning.

Cognitive Demand of Mixed-Mode Tasks

For the cognitive demand dimension, there is an inconsistency in the Dimensional Model cognitive demand scores on the y-axes of Figure C9B, and the crude values of the LGP glance metric shown in Figure C8. For example, the Mixed-Mode task score plot in Figure C9B from the Dimensional Model shows that three of the Mixed-Mode tasks (Destination Cancel, Destination Entry, and Radio Preset), all have lower scores on the second dimension (loading with LGP) than Mixed-Mode Radio Tuning. This result contradicts the crude mean plots in Figures C8B and C8C with LGP on the y-axis, where all the tasks are relatively higher than Mixed-Mode Radio Tuning in the attentional effects of cognitive demand. Thus, there is an apparent contradiction of the PCA Dimensional Model result in Figure C9B with the crude LGP results in Figures C8B and C8C, where the crude LGP value is higher for the Mixed-Mode tasks of Radio Preset, Destination Cancel, and Contact Calling than Mixed-Mode Radio Tuning. This contradiction suggests a possible underlying interaction effect from MSGD that is biasing the LGP estimates too high in the crude glance data, which is examined in Figure C10. The yellow cells indicate a tendency for statistical significance (p < 0.1) and the bolded green cells indicate p < 0.05.

Causal Analysis of Glance Metrics for Mixed-Mode Tasks

Figure C10 shows the causal glance diagram for the 5 tasks with a Mixed-Mode interface.

Crude Effects

The magnitude of the crude effect of MSGD on TEORT (0.9161, b) is large and positive, meaning that longer MSGD values during a task may cause longer TEORT values. The crude effect of LGP on TEORT (0.6920, b) is positive, meaning that longer LGP values during a task may cause longer TEORT values. However, these crude effects on TEORT may be influenced by bias from the causative relation between MSGD and LGP.

Adjusted Effects

The adjusted effect of MSGD on TEORT (0.8442, e), after removal of the influence of LGP on MSGD (0.8209, a, yellow box), is only slightly less than the crude effect of MSGD on TEORT (0.9161, b). This result means that the strong effect of MSGD on TEORT is a true effect for Mixed-Mode tasks. However, the adjusted effect of LGP on TEORT (-0.2622, c, blue box), after removal of the indirect influence of MSGD on LGP (0.8209, a, yellow box), is substantially less than its crude effect (0.6920, b), and even reverses sign. This result means that the apparent effect of LGP on TEORT is a biased result, and is not a valid effect. The effect of LGP on TEORT is entirely due to an indirect bias effect of MSGD on LGP. The enhanced LGP in turn indirectly causes the (false) appearance of a positive causal LGP effect on TEORT.

For Mixed-Mode interfaces, the net effect of these interactions between the glance variables will change the observed TEORT, unless biases are removed such as by using a principal components or multivariate analysis method. From just the crude TEORT means, one would conclude that tasks with a Visual-Manual interfaces have a tendency to have a higher physical demand than Mixed-Mode interfaces because they have a higher TEORT magnitude than the same task with a Mixed-Mode interface. That would argue that if the goal were to reduce physical demand one should re-design tasks with Visual-Manual interfaces to be Mixed-Mode. However, PCA removes biasing interactions. Therefore, with a dimensional analysis, Mixed-Mode interfaces now have a higher physical demand than the same task with a Visual-Manual interface. Therefore, if the criterion is based on physical demand alone, one should favor Visual-Manual interfaces to Mixed-Mode. However, Mixed-Mode interfaces sharply reduce cognitive demand compared to Visual-Manual interfaces (see Figure C2B). It is only by a careful consideration of the relative trade-offs in both physical and cognitive demand that a proper design decision can be made.

COMPARISON OF OTHER PHYSICAL DEMAND METRICS FOR MIXED-MODE VS. VISUALMANUAL TASKS

The analyses in this Appendix gave rise to the hypothesis that the physical demand of Mixed-Mode tasks was relatively higher than that of Visual-Manual tasks, as indicated by the extended Dimensional Model using PCA of the three NHTSA Guidelines glance metrics. The higher physical demand estimate from the extended Dimensional Model for Mixed-Mode tasks is not consistent with their crude mean TEORT values, which were lower for the Mixed-Mode tasks vs. the Visual-Manual tasks. The hypothesis that lower mean TEORT values for Mixed-Mode tasks was due to effects from interactions between the three glance metrics was tested and verified.

To test the surprising conclusion from the extended Dimensional Model that Mixed-Mode tasks actually have higher physical demand than Visual-Manual tasks, an examination was made of the other 19 metrics available in this test. Many of these other metrics also measure physical demand, as shown in the main body of this paper. Paired t-tests were used to compare the results, with the data and results shown in Table C4. The yellow cells indicate a tendency for statistical significance (p < 0.1), the light green cells indicate p < 0.05, and the dark green cells indicate p < 0.01.

Table C4. Summary of all metrics for paired Mixed-Mode and
Visual-Manual tasks, extracted from Appendix Table A1, with paired
t-test comparison results.

TaskNum  TaskType     Taak             TaskTime  Glances_RD  TGT_RD

 4.      mixHF (*)    DialContact (*)    34.8      14.5       31.0
 5.      mixHF (*)    RadioPreset (*)    18.0       7.5       16.0
 6.      mix HF (*)   NavCancel (*)      35.4      17.5       29.9
 7.      mixHF (*)    RadioTune (*)      19.6       8.5       17.8
 8.      mix          NavEtest           72.0      34.4       61.1
 9.      VM           DialContact_VM     17.3       9.1        9.0
10.      VM           RadioPreset_VM      5.6       3.0        3.3
11       VM           NavCancel_VM       14.1       7.9        7.1
12.      VM           RadioTune_VM       18.9       9.5        9.2
13.      VM           NavEfest_VM        68.7      32.9       32.0
                      mean mixed         36.0      16.5       31.2
                      mean VM            24.9      12.5       12.1
                      p (paired t)         .05       .09        .01

TaskNum  TEORT_NR99  ManualSteps  Glances_TR  TGT_TR (***)  MaxGlance_TR

 4.       2.9        1             1.85        1.22         0.48
 5.       3.9        1             0.58        0.34         0.22
 6.       4.8        1             2.33        1.41         0.60
 7.       1.3        1             0.38        0.22         0.19
 8.       9.8        2             5.54        3.91         1.38
 9.       8.0        4             7.72        7.54         1.82
10.       2.7        1             1.67        1.98         1.59
11        6.7        3             6.03        5.97         1.71
12.       9.2        5             8.01        8.19         2.26
13.      36.1        2.3          28.06       33.22         3.47
          4.6        1.2           2.14        1.42         0.57
         12.5        7.2          10.30       11.38         2.17
           .17        .19           .10         .11          .001

TaskNum  MSGD_TR (**)  Glances%_TR  RateGlance_TR  MSGD_NR  LGP_NR (**)

 4.      0.51           3.49        0.052          0.368    0.0177
 5.      0.55           1.94        0.033          0.344    0.0061
 6.      0.52           3.89        0.065          0.430    0.0131
 7.      0.62           1.11        0.019          0.274    0.0059
 8.      0.76           5.37        0.076          0.490    0.0203
 9.      1.02          44.80        0.469          1.057    0.0521
10.      1.34          36.15        0.318          1.594    0.1447
11       1.03          44.44        0.446          1.085    0.0748
12.      1.13          46.50        0.449          1.308    0.1458
13.      1.28          50.43        0.418          1.339    0.1436
         0.59           3.16        0.049          0.381    0.0126
         1.16          44.46        0.420          1.276    0.1122
          .001          3.E-05        .0001          .001     .01

TaskNum  MinGlance_TR  Miss%_tdrt  RT_tdrt  CompletionErrors

 4.      0.154         17.2        608.9    4
 5.      0.155         14.7        590.9    3
 6.      0.256         20.2        537.5    6
 7.      0.179         15.8        577.9    2
 8.      0.378         13.9        582.8    1
 9.      0.382         18.7        589.7    4
10.      1.136         21.2        582.3    5
11       0.441         24.0        639.4    5
12.      0.403         25.4        707.2    2
13.      0.238         23.3        677.6    9
         0.224         16.3        579.6    3.2
         0.520         22.5        639.2    5.0
          .18            .02          .12    .33

TaskNum  Glances%_RD  MSGD_RD  MaxGlance_RD  MinGlance_RD

 4.      88.7         3.165    6.879         0.787
 5.      89.4         3.149    6.041         1.407
 6.      84.6         2.502    6.517         0.364
 7.      91.0         3.022    6.297         0.710
 8.      84.9         2.261    8.087         0.234
 9.      50.6         1.037    2.339         0.296
10.      58.0         1.303    1.963         0.784
11       48.8         0.936    2.018         0.267
12.      46.1         0.945    2.265         0.269
13.      44.6         0.969    3.101         0.126
         87.7         2.820    6.764         0.700
         49.6         1.038    2.337         0.348
         7.E-05        .0004   IE-05          .029

TaskNum  Glances%_other

 4.       7.80
 5.       8.62
 6.      11.50
 7.       7.90
 8.       9.78
 9.       4.56
10.       5.85
11        6.75
12.       7.37
13.       4.99
          9.12
          5.90
           .015


Physical Demand

Table C4 shows that several physical demand metrics (gray shaded metric names) show larger physical demand values for Mixed-Mode tasks compared to Visual-Manual tasks. In particular, Mixed-Mode tasks have longer task times (TaskTime), a tendency for more glances to the road (Glances_RD), and longer total glance time to the road (TGT_RD). Visual-Manual tasks also have a tendency to have more task-related glances (Glances_TR) than Mixed-Mode tasks, consistent with the companion paper [10]. Thus, looking at more than just the three NHTSA Guidelines glance metrics, the conclusion of the extended Dimensional Model (after interaction effects are removed) is confirmed: there is a higher physical demand for Mixed-Mode tasks than Visual-Manual tasks.

Cognitive Demand

The PCA analysis of the NHTSA Guidelines glance metrics in the earlier sections of this Appendix found that the main difference between Mixed-Mode and Visual-Manual tasks was in the reduced attentional effects of cognitive demand for Mixed-Mode tasks compared to Visual-Manual tasks. This conclusion was also reached in the main body of this paper, according to the PCA and the extended Dimensional Model. This conclusion is independently confirmed by an examination of the paired t-tests for the cognitive demand metrics (red shaded metric names), and the cognitive facilitation metrics (green shaded metric names in Table C4). Table C4 shows that 7 of the cognitive demand metrics (red-shaded names in first row of table) showed an increased attentional effect of cognitive demand for Visual-Manual tasks compared to Mixed-Mode tasks. These metrics were: MaxGlance_TR, MSGD_TR***, Glances%_TR, RateGlance_TR, MSGD_NR, and Miss%_tdrt. Table C4 further shows that all five of the cognitive facilitation metrics (green shaded names to in first row to right of Table) showed an improved cognitive facilitation for Mixed-Mode tasks compared to Visual-Manual tasks. These were: Glances%_RD, MSGD_RD, MaxGlance_RD, MinGlance_RD, and Glances%_other. Thus, looking at more than just the three NHTSA Guidelines glance metrics, the conclusion of the extended Dimensional Model (in which all the interaction effects are removed) is confirmed: there is a lower attentional effect of cognitive demand (and a higher attentional effect of cognitive facilitation) for Mixed-Mode tasks than Visual-Manual tasks.

INDEPENDENT CONFIRMATION WITH ALTERNATE TEST

After completing the above causal analyses, a multiple regression was run on the original participant data. The design was 5 tasks, 2 interface modes, and 24 participants for 240 data lines. The independent variable was TEORT, with MSGD and LGP as continuous predictors and Task (5 task types), and Mode (Mixed vs. VM) as categorical predictors. The crude and adjusted means for Mixed-Mode tasks are shown in Figure C11.

Figure C11 confirms that Mixed-Mode tasks have an increased TEORT after adjustment for all indirect interaction effects between the three glance metrics (red bars), compared to the original crude TEORT effects (blue bars). The crude and adjusted means for Visual-Manual tasks are shown in Figure C12.

Figure C12 confirms that Visual-Manual tasks have a decreased TEORT after adjustment for all indirect interaction effects between the three glance metrics (red bars), compared to the original crude TEORT effects (blue bars). These opposite biasing effects on TEORT for Mixed-Mode vs. Visual-Manual tasks confirms the analyses of these interactions in the previous sections.

DISCUSSION OF APPENDIX C

The causal and regression analyses in this Appendix made three comparisons of the PCA adjusted score plots to the original crude data plots (for pooled tasks, Visual-Manual tasks, and Mixed-Mode tasks). All three analyses found that the strong correlation between MGSD and LGP in the original mean data caused interactive effects that biased the relative TEORT values of the tasks when considering the individual crude glance metrics one at a time, without concern from bias due to indirect interactions between these glance metrics.

A more valid relative scoring of the tasks was obtained by considering their score locations in the two Driver Demand Dimensions for the three NHTSA Guidelines glance metrics, which removes the bias arising from interactions between the original metrics. The removal of these biases was demonstrated for this simple 3-metric case, and indeed the physical demand of Mixed-Mode tasks in the PCA scores, which adjust for the bias, was found to be higher than suggested by the crude TEORT values of those tasks.

To independently test this conclusion, it was hypothesized that other physical demand metrics than the glance metrics should show the increased physical demand of the Mixed-Mode tasks more directly, without having to use the PCA to remove the indirect interaction effects from MSGD and TGT on TEORT, and other interactions that affect LGP. This hypothesis

was confirmed by using paired t-tests to compare all of the physical demand metrics. This independent comparison of the means found that TaskTime, and total glance time to the road, were both statistically significantly higher for Mixed-Mode tasks than Visual-Manual tasks, confirming that Mixed-Mode tasks did indeed have higher physical demand

than Visual-Manual tasks, despite their tendency for fewer task-related glances to the device.

A second independent confirmation of this result was made using multiple regression analysis, which also controls for indirect and subtle interactions between variables.

The results of all three analysis methods were consistent, in showing that indirect interactions between glance metrics causes Mixed-Mode tasks to have a lower TEORT observed value than their true TEORT value. On the other hand, Visual-Manual tasks have a higher observed TEORT value than their true TEORT value. The true value scan only be determined after removal of bias from interactions between the three NHTSA Guidelines glance metrics.

Limitations of Appendix C

The observed fact that subtle interaction effects of MSGD and LGP with TEORT can bias observed TEORT values is a new finding that has not been previously reported in the literature. This finding is preliminary, and should be replicated in other glance datasets before full acceptance. The glances examined here [10] were tabulated first by machine, with every glance reviewed and corrected by human inspection. Other datasets without careful validation of glances might not find the interaction effects discovered here. In addition, the mechanism by which MSGD and LGP can cause alterations in observed TEORT values is at present unknown. It may be a simple arithmetic cause, there could be some unknown attentional effect, some physiological property of the eye movement system, or some mechanism that is completely unknown at this time. For these interactions to be accepted as a true causal effect, it would be helpful to identify the underlying mechanism of the interactions between these glance metrics. In addition, the relative contributions to crash causation of the physical and cognitive demand of secondary tasks (when these two types of demand are orthogonally separated using the dimensional analysis methods presented here), is not yet well established and merits further research.

CONCLUSION OF APPENDIX C

In conclusion, the extended Dimensional Model of the three NHTSA Guidelines glance metrics gives a more valid representation of the relative locations of the tasks in the two dimensions of Cognitive Demand and Physical Demand, compared to the original crude mean glance metrics. It accomplishes this objective because principal component analysis removes interaction effects arising from the strong mutual interaction effects of MSGD and LGP, and each in turn with TEORT. These interaction effects can otherwise bias the crude TEORT magnitudes too high for Visual-Manual tasks, and too low for Mixed-Mode tasks, compared to their true values. The conclusion that the true TEORT of Mixed-Mode tasks was higher than the observed TEORT, and that the true TEORT of Visual-Manual tasks was lower than the observed TEORT, as confirmed by an independent multiple regression test. The higher physical demand of Mixed-Mode tasks vs. Visual-Manual tasks was confirmed by an independent paired t-test comparison of the means of the Mixed-Mode vs. Visual-Manual tasks using all 22 metrics in the original study. Future research should attempt to replicate these preliminary findings in additional datasets, and also attempt to estimate relative crash risk from the separate contributions of the physical and cognitive demand of secondary tasks.

Richard Young, Li Hsieh, and Sean Seaman Wayne State University
COPYRIGHT 2016 SAE International
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Young, Richard; Hsieh, Li; Seaman, Sean
Publication:SAE International Journal of Transportation Safety
Article Type:Report
Date:Apr 1, 2016
Words:26175
Previous Article:The Dimensional Model of Driver Demand: Visual-manual tasks.
Next Article:Forward Collision Warning: Clues to optimal timing of advisory warnings.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters