Printer Friendly

Developing a standard approach for testing new traffic control signs.


Each year, the Federal Highway Administration (FHWA) receives requests to test and develop highway signs. Until now, these signs were tested by the Office of Safety and Traffic Operations Research & Development using whatever means seemed most appropriate at the time. Sign testing was often incorporated into other ongoing studies; thus, the sign testing methods used have varied over the years. Moreover, the subjects participating in these tests have been volunteers drawn from a master list maintained at the TurnerFairbank Highway Research Center (TFHRC).

While this lack of standardization has been acceptable, it may not be the most effective or efficient way to do the work. First, the lack of consistency may well be reflected in the selection of functional signs; also, since the TFHRC subjects are not selected in a random manner, it is unclear if they represent a true sample of the driving public. Alongside these issues of the validity of the results is the fact that the lack of a prescribed method does not result in an efficient use of time or equipment.

To address these concerns, a study was recently conducted to:

* Develop a standard sign testing methodology.

* Identify which pieces of equipment should become standard for sign testing. * Examine the possible self-selection bias in the TFHRC subject pool.

It is anticipated that the results of this study can be applied to other agencies that conduct sign tests on an irregular basis. In particular, the conclusions about the testing methodology can help other agencies manage the task of sign testing.

This article summarizes the study results, including recommendations about a standard sign testing methodology and comments on the validity of the TFHRC subject pool.

Developing a Standard Sign Testing Methodology

Two principal steps were involved in developing this methodology:

1. Determining the dependent variables to be tested.

2. Identifying the critical design criteria.

Dependent variables

The purpose of sign testing is to determine how well a candidate sign meets its prescribed function. Functionality is evaluated in terms of a sign's performance with respect to one or more measures of effectiveness (MOE's).

Over the years, many MOE's--including conspicuity, legibility, certainty of meaning, and comprehension--have been employed in sign testing. However, researchers have long debated as to which measures most effectively lead to a successful sign evaluation.

A 1985 study, upon reviewing the results of many previous studies, concluded that conspicuity and comprehension are the most important measures with respect to sign design. Conspicuity is a measure of how well a sign is noticed on the roadway. Comprehension is a measure of how well the sign's meaning is communicated. (1) [1]

While these measures are important, there is one other measure to consider. Once the contents of a sign have been identified, drivers do not have to be looking at that sign to comprehend it. Consequently, the ease with which the contents are identified becomes just as important as the ease with which the sign is comprehended. This measure is known as recognition; and it differs from comprehension in that a sign does not have to be understood to be recognizable. (2)

Consequently, it was determined that these three MOE's-- conspicuity, comprehension, and recognition-are critical to the success of almost every sign, and should be incorporated into a standardized sign testing system. Operational definitions for each measure were established based on a review of the literature, and previous studies conducted by the FHWA.

Conspicuity. As noted, conspicuity is a measure of how well a sign is noticed on the roadway. The more conspicuous a sign is, or the easier it is to spot with respect to other visual stimuli, the better it will serve its function. However, conspicuity can be very difficult to measure, since there are a myriad of situations that exist on the roadway. The success of this MOE is critically dependent on environmental attributes; there is no laboratorybased method that can be simply designed and implemented to measure this variable. Consequently, conspicuity was not included in the final version of the testing methodology.

Comprehension. Comprehension is a measure of how well the sign's meaning is communicated to the driver, as it relates to the function it is to serve. A straightforward way to measure comprehension is to assess the correctness of a subject's response-that is, compare the meaning that a subject associates with a sign to that intended by the FHWA. If these meanings match, a sign can be considered to be understood, A sign can also be considered to be understood if a subject knows what response to take as a driver.

Recognition. Recognition relates to the identification of a sign's contents. For signs containing only text, this measure is commonly known as legibility. Generally, the easier it is to identify a sign's contents, the better the sign. For this measure, a subject does not need to comprehend the sign, just recognize its contents. There are several dependent variables that can be used to measure recognition. Two frequently used variables are response time and recognition distance. Response time measures the amount of time it takes for a subject to identify a sign's contents. Recognition distance measures the distance at which a sign's contents can be identified.

Design criteria

Data collection system. Once it was determined which dependent variables were to be measured, the next step was to determine how data collection was to take place. Certainly it would be economical to use a system that already existed, rather than build a new one. However, there were no such systems in place that would be available upon short notice. Therefore, it was necessary to design a new system that would be:

* Compact--since the system, when idle, could not occupy valuable work space at the TFHRC.

* Portable and easy to manage--since the equipment would possibly have to be assembled and dismantled every day at alternate data collection locations.

Given these system limitations, the dependent variables had to be easy to collect. As mentioned earlier, there did not appear to be any means of easily measuring conspicuity within the limits of a simple system. Comprehension, however, could be measured through the correctness of a subject's response. A sign could be displayed to the subject, who would then provide a response. The response would be evaluated based on its correctness.

Recognition could be measured either through response time or recognition distance. One way to collect response time data is to start a timer while displaying a sign to a subject. Once the subject identifies the sign's contents, he or she presses a button to stop the timer. The subject then verbalizes his or her interpretations to ensure that the contents were recognized; the response time is also recorded.

Data collection apparatus. The next step in the design process was to review data collection apparatuses used in previous studies. While it was expected that other configurations collected data for a different set of dependent variables, it was also expected that they could be modified to collect data for use in this study. From this review, the most promising apparatus clearly was that which was used in a 1988 FHWA study on seat belt signs. For this study, response time data were collected through the use of a Kodak Ektagraphic slide projector with an added shutter, a timer, and controls to operate the timer and the shutter. These pieces of equipment, along with a small rear-projection screen and all associated equipment, were contained on a 0.6-m by 0.9-m (2-ft by 3-ft) cart. (3)

This apparatus could easily be used to collect the data for the present study, since both comprehension and recognition data could be collected by showing slides onto the rear-projection screen. A shutter and timer were added to the configuration to collect the response time data. Other features were added to the setup to improve the experimental design and ease data collection. These features included adding a second slide projector to display a fixation point on which the subject could focus. A laptop computer with a detachable shelf was also added to reduce the time needed to record subjects' responses. The final testing configuration is shown in figure 1.

Test protocol Once the subject was selected, he or she was seated in front of the screen and handed the subject response button. The testing procedure was explained, and the subject was given the opportunity to ask questions. Response time data were collected first, and then comprehension data; this was because collecting the response time data shortly after the subject had just seen the same sign could have provided inaccurate data. However, this would not have been an issue if a sign was not tested for both comprehension and recognition for any one subject.

The experimenter sat by the side of the cart during the testing procedure, and the subject sat 1.4 m (4.5 ft) away from the cart, facing the screen that was attached to it. At this distance, and given the size of the image on the screen, this was equivalent to viewing a sign in the field at a distance of 30.5 m (100 ft). Once the introduction was completed, the experimenter ran the subject through three test signs to ensure that the directions were understood. Before displaying each sign, the experimenter would say, "next," which would act as a signal to mentally prepare the subject for the next slide (the subject would already be visually prepared since he or she was focused on the fixed point).

Test Subjects

The sample of test subjects must be representative of the general population in order for study results to be valid. Because the TFHRC subjects are self-selected, volunteering to participate in given studies, it is not known if they represent a true sample. To determine if the TFHRC subject pool is a valid sample, these subjects were compared to another, non-self-selected set of participants. If no significant difference was found between the two pools, it would be acceptable to continue using the TFHRC subject pool.

The first task in studying the TFHRC subject pool was to find an adequate comparison group that:

* Was not self-selected.

* Represented a good cross section of the driving public (i.e., contained people of varying ages, both sexes, and different races, cultures, and socioeconomic backgrounds.)

* Consisted of licensed drivers.

It was determined from a previous study that the Department of Motor Vehicles (DMV) could provide this pool. (4) Since all licensed drivers use the DMV, it offers a subject pool that is both large and easily accessible. Researchers could set up their testing equipment at the DMV and select subjects as they were leaving the facility. While this approach did not completely overcome the self-selection bias, the DMV subjects did not have to do anything beyond their planned activities to take part. This process is distinctly different from that for the TFHRC subjects, who call the TFHRC, schedule a visit, and drive out of their way, on their own time, to take part in the study.

It was questioned whether an adequate cross section of subjects would be available for testing at any one DMV location and if it would be necessary to test at different locations to ensure subject pool diversity. This concern was addressed in the previous study, in which screening tests were conducted at four distinctly different DMV stations (one in an urban area, the second in a densely populated suburb, the third in a less densely populated suburb, and the fourth in a rural area).(4) Analysis of the screening test results showed that there was no significant variation between sites. Consequently, it was determined that a local office of the Virginia DMV would provide the needed comparison pool for the present study. This site was chosen for its convenience to the TFHRC, in case study results indicated that future sign testing would need to be conducted at the DMV.

For this study, recognition data were collected on 18 signs, and comprehension data were collected on 15. The set of test signs included some taken directly from the Manual on Uniform Traffic Control Devices (MUTCD), as well as new signs that the subjects had never seen before. These new signs included variations on standard signs, samples of a seat belt sign, and a "Don't Drink and Drive" sign, among others. The three signs that were tested only for recognition were not included in the comprehension tests because they depicted a symbol that was not expected to convey any specific meaning with respect to the driving task. Figure 2 shows all 18 signs.

The recognition data were collected for each sign, and then the comprehension data were collected. The subjects tested at the TFHRC were between the ages of 18 and 25 and those over 65. This gap in ages was because they were taking part in another study that focused on older and younger drivers. The subjects tested at the DMV were of all ages; data were used, however, only for those age groups corresponding to the TFHRC sample. The age composition of the subjects was justified by the fact that drivers at these ages are overrepresented in accidents, and any study of these age groups would thus be beneficial to the whole driving population.

Table 1 summarizes the number of subjects used in this study for each location. Although not taken into account in table 1, three signs--G, H, and I-- were not included for all subjects, because it was decided that each subject should not be exposed to too many high profile signs, such as the "Don't Drink and Drive" signs.


In summary, the MOE's selected for the standard sign testing methodology and used as a means of evaluating the signs were comprehension and recognition. The respective dependent variables for these MOE's were response correctness and response time. Comprehension was the primary focus of the study, since it is the most important MOE. The study's statistical analyses were computed using SYSTAT software.

Comprehension analysis

Response correctness data were collected by displaying a sign to a subject, and having the subject voice his or her interpretation of the meaning of the sign. The experimenter then either recorded the correctness of the response, or the response itself, if it was noteworthy.

The responses were compiled, reviewed, and categorized to determine correct and incorrect responses for each of the 15 signs. The results of this categorization are in table 2.

Once the correctness data were categorized, Chisquare tests were conducted to determine if the percentage of correct responses between the two subject groups was the same. The results of the Chi-square test revealed that, at the 0.05 level, there was no significant difference between subject groups for any of the signs.

Response time analysis

Before any analyses could be conducted on the response time data, each data record was reviewed. If a record was unacceptable (e.g., no response was given by the subject), it was dropped from the analysis.

Next, tests were run on the acceptable data. First, summary statistics, such as the mean and standard deviation, were computed. These statistics are provided in table 3. Following this, each sign was analyzed with respect to the two subject groups to see if there was significant difference between response times. The Kolmogorov-Smirnov test was conducted, with comparisons made at the 0.05 level. This test determines whether two independent samples have been drawn from the same population. It would thus show if there were a statistically significant difference between the percentage of correct responses for the two locations for each sign. (SYSTAT is able to account for differing sample sizes when running this test--an important consideration when performing the K-S test.) Cumulative distributions were plotted for each sign, providing a graphic comparison between the response times of the two subject groups. No significant difference was found in the responses of the two groups.


The study's methodology, although successful in achieving the present study goals, could be improved to enhance the research findings. Specific methodological concerns are discussed below.

Standard sign testing methodology

The most prominent concern with the methodology involved collecting the response time data. It was apparent that a few subjects tried to comprehend the sign, rather than simply identify each sign's contents. The most effective way to address this problem would be to improve the instruction and training of the subjects.

A second concern relates to the recognition MOE. The existing procedure was able to account for the temporal aspects of recognition, that is, the amount of time it takes to recognize the sign; however, it did not incorporate the distance aspects. While measuring only the temporal aspects is acceptable as a means of measuring recognition, it may also be worthwhile to measure recognition distance in order to more fully explore this MOE.

The third drawback of the methodology is the static nature of the presentation. Using the FHWA Highway Driving Simulator could provide a dynamic testing methodology; however, it would first be beneficial to compare results from both testing methodologies to see if there is a significant difference. It may be shown that the static methodology is, in fact, acceptable.

Finally, the system could be improved by adding some measure of conspicuity. As discussed above, it was not feasible to recreate all of the different roadway scenarios to measure fully this MOE. However, some sign-specific dependent variables, such as color preference or ratio of background to legend, could be included to measure some conspicuity aspects.

Testing equipment

The testing equipment used in this study performed well. Using the Ektagraphic slide projector with an added shutter to control viewing onto the rear-projection screen proved to be an effective means of collecting comprehension data. Also, having the added timer--which measured the amount of time the shutter was open-- proved successful for collecting the recognition data. Other features of the setup, including the small mobile cart and the laptop computer, also performed well, with no flaws identified.

With this setup, the response time on the timer was manually entered into the laptop. This configuration worked well, and was found to be timeefficient. However, it may be worthwhile to link the timer with the laptop to enable response time to be automatically downloaded. Such a configuration would save time and eliminate the potential for transcription errors.

Test subjects

The only concern regarding test subjects had to do with the fact that the test was still voluntary at the DMV. This element of choice meant that drivers with the poorest abilities could still decline to participate in the study. Therefore, the DMV subjects were not a true cross section of the driving public. However, the self-selection bias was much weaker with the DMV subject group than with the TFHRC group; moreover, the composition of this group was significantly different from the TFHRC subject group.


The three goals of this study were successfully met, although the study's methodology could be enhanced, as discussed above. Specifically, an efficient and consistent testing methodology was developed, using equipment that was compact, portable, and easy to manage. Furthermore, by comparing the TFHRC subjects to a distinctly different subject pool, it was determined that the difference between these two pools was not statistically significant. Consequently, the potential bias found in the TFHRC subjects does not affect the outcome of FHWA studies that use these subjects.

The results of this study will be helpful for agencies that conduct sign tests on an irregular basis. The methodology developed is very straightforward, and most of the equipment is available off the shelf. Also, implementing such a system should not be time- or cost-intensive. Since the timer and response button would probably have to be assembled, the system enhancement linking the timer and laptop directly would be incorporated at the time of that assembly. The only other aspect of the methodology that may require outside assistance is the development of the stimulus slides. However, some graphic software companies provide such slide preparation services.


(1) M.T. Peitrucha and R.L. Knoblauch. Motorists' Comprehension of Reguiatory, Warning, and Symbol Signs, Volume I: Executive Summary, Biotechnology, Inc., Falls Church, VA, 1985.

(2) R.E. Dewar and J.G. Ells. "Methods of Evaluation of Traffic Signs." Information Design: The Design and Evaluation of Signs and Printed Material, Ronald Easterby and Harm Zwaga, eds., New York: John Wiley and Sons Ltd., New York, 1978, pp. 77-90.

(3) E. Alicandri and K. Roberts. "An Evaluation of Symbols for a Reguiatory Seat Belt Sign," unpublished internal report, Turner-Fairbank Highway Research Center, 1988.

(4) M.T. Peitrucha and R.L. Knoblauch. Motorists' Comprehension of Reguiatory, Warning, and Symbol Signs, Volume II: Technical Report, Biotechnology, Inc., Falls Church, VA, 1985.

Paul A. Pisano is a highway research engineer in the Office of Safety and Traffic Operations R&D, Information and Behavioral Systems Division, Federal Highway Administration.

1 Italic numbers in parentheses identify references on page 8.
Table 1.--Number of subjects in each group
 by sex and age
 Subject Group Subject Group
Sex 18-25 65+ 18-25 65+
 years years years years
Male 10 10 30 27
Female 10 10 30 21

COPYRIGHT 1992 Superintendent of Documents
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1992 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Pisano, Paul A.
Publication:Public Roads
Date:Jun 1, 1992
Previous Article:In-vehicle navigational devices: effects on the safety of driver performance.
Next Article:Rock riprap for protection of bridge abutments located at the flood plain.

Related Articles
Changable message signs: avoiding design and procurement pitfalls.
FHWA's photometric and visibility lab.
Sign simulator validated in FHWA study.
What makes the difference?
MUTCD The Millennium Edition.
The Millennium Manual Matters.
MgO Thickener Gives More Consistent Results.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters