# Statistical shortcomings in traffic studies.

STATISTICAL SHORTCOMINGS IN TRAFFIC STUDIES: PART II

Introduction

Factorial design is a powerful tool that can provide information not otherwise obtainable from single-factor experiments. There are, however, limitations in its use: these are due more to practical than to theoretical constraints. One such limitation is a logistical one, applying to the availability of resources (e.g., time, labor, and the pool of experimental units). A second limitation is interpretive - the complexity of the factorial design experiment often makes it difficult to interpret and explain results.

This article explores the nature of these limitations, their impacts on factorial design, and methods for alleviating these impacts.

Logistic Limitations

In a simple 2 by 2 factorial design, the number of treatment combinations is four. As more factors or levels are added to the design, the number of treatment combinations expands geometrically. Doubling the number of levels of a 2 by 2 design to 4 by 4 yields 16 treatment combinations, four times that of the original design. Similarly, an ambitious 3 by 3 by 3 by 3 experiment comprises a total of 81 treatment combinations. The logistic difficulties implicit in this large total are illustrated by the following example.

Suppose a researcher wants to investigate the driving behavior of people over 65 years of age. The researcher plans to use four replicates - a factorial experiment requires a minimum of two replications to estimate sampling variance. For a 3 by 3 by 3 by 3 experiment, four replicates will require 324 subjects. Even if it is possible to recruit this number of subjects, the researcher can expect to lose some before the experiment is over. This attrition may be due to illness, "no shows," or "aborts" (i.e., subjects chose not to continue the experiment after starting it). The greater the number of subjects involved in the study, and the lengthier the study sessions, the greater the chance of subject attrition. Moreover, even if the researcher can recruit and retain all required subjects, scheduling problems, equipment failure, errors in recording data, or the loss of completed data sheets may also result in an unequal number of observations.

Interpretation Limitations

The second limitation of high order factorial design is the difficulty in substantive interpretation of the data because of the complexity of the phenomenon under investigation. For example, in a three-factor experiment, there will be one three-factor interaction term (A by B by C) and three two-factor interaction terms (A by B, A by C, and B by C). If the experiment shows that none of these terms are statistically significant, the researcher can say that the three factors affect the response variable independently of each other. On the other hand, if the three-factor interaction is significant, then each of the two-factor interaction terms must be investigated separately at each level of the third factor. If the three-factor interaction term is not significant, but any or all of the two-factor interactions are significant, then each of the single factors must be explained or interpreted with reference to the specific level of the two factors. And as more factors are introduced into the design, data interpretation grows in complexity.

For a 4-factor factorial design, there are 11 interactions that must be assessed. Some of the interaction terms may be statistically nonsignificant; others, marginally significant; still others, statistically significant. The researcher must evaluate the results, in light of the statistical evidence, for their practical significance or theoretical implication. However, the fact that statistical analysis indicates that a given interaction term is significant does not necessarily mean that it has substantive meaning. Rather, it could be that there is no interaction but, due to sampling variation, a large enough value was obtained to be declared significant. On the other hand, a significant interaction, while real, may run contrary to present knowledge. Such an interaction should not be ignored or dismissed as spurious for one of the advantages of factorial design is to suggest other approaches or provide greater insight into a particular research problem. Consequently, there is no formulaic easy way to analyze factorial experiments and interpret the results. Each experiment is unique and subject-matter oriented.

The consequence of the logistic limitation usually results in an unequal number of observations among the cells. The impact of this limitation is that statistical analyses become complex. In the extreme case where the disparity of cell sizes is too great, the whole experiment would be useless.

Impacts of Limitations:

Unequal Cell Sizes

A requirement for factorial experiments is that each cell must have an equal number of observations (n's). Statistical tests based on equal n's tend to be less sensitive to distortions when certain assumptions underlying the statistical tests are violated. Mild departure from the assumptions of normality and homogeneity of variance among the cells poses no serious problem in interpreting the outcome of an analysis of variance when the n's are equal. Furthermore, calculations are simpler with equal n's since the n's can be treated as a constant. To illustrate, the variance of the mean of the ith row and the jth column is [Sigma.sup.2]/[n.sub.ij]. If = n for all the cells, then the variance of the mean for any cell is [Sigma.sup.2]/n.

In practice even with careful planning and rigid control of a moderate-sized experiment, more often than not the researcher must work with unequal n's. Under certain conditions, however, there are quick and easy methods of analyzing data from an experiment with unequal cell frequencies. These methods are:

* Unweighted means - Applicable when the disparity in the n's is no greater than a ratio of 2 to 1 between the largest and the smallest n's, and most of them are in close agreement.

* Equal numbers within rows (columns). Applicable when the cell frequencies within any row (column) are equal.

* Proportional subclass numbers. Applicable when the cell frequencies are proportional, i.e., in the same proportion within any row or column. These methods provide exact statistical tests in the absence of significant interactions. Examples of these methods are available in reference [1].

Unweighted means

The method of unweighted means is used in many computer software packages to analyze data with unequal cell frequencies. Since this is a common practice, a numerical example is presented here to illustrate the rationale of the method together with the necessary calculations for the analysis of variance table.

The data in table 1 is contrived: the numbers are simple to facilitate computations of the analysis of variance table. There are four data points in all the cells except for cell (1,2) where there are three. Table 2 is the analysis of variance table and computations. Note that the harmonic mean (3.692) and not the arithmetic mean (3.750) is used to represent the average cell size. Also note that by definition the total sum of squares is 11.733 which in this case is not equal to the sum of the components. This discrepancy is known as nonorthogonality. When cell sizes are equal, the sum of components will be equal to the sum of squares and orthogonality is preserved. Conversly orthogonality is preserved when the cell frequencies are equal - the sum of component sums of squares equals the total sum of squares.

Table : Table 1 - Artificial data for a 2 by 2 factorial experiment with unequal cell sizes
```cell (1,1) cell (1,2)
data = 2,3,3,4 data = 3,3,3 Total of means = 6
total = 12 total = 9
mean = 3 mean = 3
# of obs = 4 # of obs = 3
cell (2,1) cell (2,2)
data = 3,4,4,2 data = 4,4,5,5 Total of means = 7.75
total = 13 total = 18
mean = 3.25 mean = 4.50
# of obs = 4 # of obs = 4
Total means = 6.25 Total of means = 7.50 Grand total = 13.75
```

[Tabular Data Omitted]

Multilinear Regression

When the disparity among the cell frequencies is great, then a least-square solution, as in multiple linear regression analysis, must be used. All computerized statistical packages have multiple linear regression routines. Before a multiple regression routine is selected, it is critical that the researcher understand the outputs of such a solution. Extensive literature exists on the methods commonly used in dealing with unequal cell frequencies.[2] A discussion of these methods is not within the limits of this article. Examples of applying linear regression techniques to a set of data involving unequal cell frequencies that yield different results are also available.[3] Finally, guidelines are available for selecting one of the four types of analysis of variance tables that are incorporated into the SAS-76 computer program.[4,5]

An Example

The analysis of variance shown in table 3 was taken from the open literature to serve as an example. It illustrates the inherent difficulty of large factorial design compounded by the problem of unequal cell frequency. In this table, there are six factors - A with two levels, B with four, C with two, D with three, E with two, and F with two. One complete replication would thus require 2 by 4 by 2 by 3 by 2 by 2 or 192 treatment combinations. Several treatment combinations were not observed, although it is not known which ones are missing. The omission of these unreported two-factor interactions raises questions about the validity of the analysis. Also, three of the unreported two-factor interaction terms - C by F, D by F, and E by F - involve the highly statistically significant factor F (30.07). Experience shows that when a main factor or factors are highly significant, there is a strong possibility that an interaction term involving that single factor is also statistically significant. In this example, the two-factor interactions: C by F, D by F, and E by F are not reported.

The two-factor interactions, B by E and B by F, were reported to be statistically significant at the 5-percent level. The B by E interaction may be accounted for by random variation since the F-ratio was marginally significant with a calculated probability level of 04.6. Notwithstanding, the highly significant B factor calls for further analysis before prematurely concluding that the B by E interaction was random variation due to sampling.

As for the B by F interaction, there are four levels of the B-factor and two levels of the F-factor yielding a total of eight B by F treatment combinations. In spite of the statistically significant B by F interaction term, the author of the report failed to conduct further analysis to determine which treatment combinations differ from the others. The flawed and incomplete analysis described above and illustrated in table 3 raises many other questions. For example, why are there so many missing observations? How many times was the experiment replicated? How valid are the results of the analyses of variance? Why are some of the two-factor interaction terms not reported? What is the interpretation of the statistically significant two-factor interaction terms? What are the practical implications of these statistically significant terms? Could the experiment be conducted on a reduced scale with four variables instead of six? Similar questions are all too frequently raised in readers' minds at the conclusion of traffic study reports.

Summary

The analysis of variance technique was developed and designed primarily to analyze experimental data in a simple, straightforward fashion. The factorial design in the analysis of variance framework is a powerful tool for analyzing several independent variables simultaneously. There are however, several practical reasons that tend to limit the use of high order factorial experiments. For example, one of the requirements of factorial design is that each treatment combination (cell) must be replicated more than once and that the frequency of observations be equal for all cells. Another limitation is the difficulty in interpreting results of the analysis. A high order factorial design introduces more sources of variation with a greater potential for interaction with other variables. This complexity means that substantive interpretation of the results and data becomes difficult and - possibly - misleading.

The first of these limitations, logistics, is rather handily addressed. In real-life seldom - if ever - are traffic studies using the analysis of variance technique conducted without missing observations; these missing observations result in the problem of unequal cell frequencies. Providing this disparity is not too great among the cells, there are statistical methodologies to overcome this shortcoming while retaining a meaningful analysis.

The second limitation, interpretation, requires more thought to remedy. Based on the above, the factorial design may appear somewhat quixotic - a simple analytic tool of limited practical use because of the complexity involved in substantive interpretation of the data collected. The complexity is not in the technique, but in the phenomenon under investigation. In fact, it is rather simple to write the computational formula for any high order factorial design once the computational formula for a two-way design is understood.

Conclusions

The aim of any research effort is to seek reliable information to questions or problems of interest. The first step is to define and identify the problem. Next, an experiment to elicit the needed information must be designed and planned. In planning the experiment, the working hypothesis is translated into a statistical hypothesis to be evaluated from the data. The planning should also include the identification of the amount and kind of data to be collected and the methodology for analyzing the data. Following conduct of the experiment, the final phase of the research effort is to interpret the data analyses and draw conclusions about the relations under study.

Given these steps in a research effort, researchers must not only pick the right method and study design that suits the analysis, but must also complete a full discussion of the interpretations.

All too frequently, authors neglect to explain the significance of their findings, instead leaving the reader to determine the meaning of statistical significance. Such an approach is unfair to the readers, especially those who have not followed the experiment from beginning to end; unfair to the data, since it leaves them essentially uninterpreted; and unfair to the study, since it neglects to expand upon the practical implications with respect to study objectives. To ensure the utility and comprehensibility of study findings, authors must carefully and thoroughly interpret their data and not just leave the figures in their "number-crunched" state. [Tabular Data Omitted] (1) Italic numbers in parenthesis identify references on page 26.

References

[1] George W. Snedecor and William G. Cochran. Statistical Methods, 6th Ed., Iowa State University Press, Ames, IA, 1967.

[2] F.M. Speed, R.R. Hocking, and O.P. Hackney. "Methods of Analysis of Linear Models with Unbalanced Data," Journal of the American Statistical Association, Vol. 73, No. 361, March 1978, pp. 105-11.

[3] John E. Overall and Douglas K. Spiegel. "Concerning Least Square Analysis of Experimental Data," Psychological Bulletin, Vol. 72, No. 5, 1969, pp. 311-22.

[4] O.J. Pendleton, M. von Tress, and R. Bremer. "Interpretation of the Four Types of Analysis of Variance Tables in SAS," Communications in Statistics, Part A, Theory and Methods, Marcel Dekker, Inc., New York, Vol. 15, No. 9, 1986, pp. 2785-2807.

[5] A User's Guide to SAS-76, SAS Institute, Raleigh, NC, 1976.

Harry S. Lum is a mathematical statistician in the Safety Design Division, Office of Safety and Traffic Operations Research and Development, Federal Highway Administration (FHWA). He has been with the FHWA since 1969, first as a member of the Urban Traffic Control System research team. Currently, he is involved with a Nationally Coordinated Program project on special highway users.