Printer Friendly

Using forced choice discrimination to measure the perceptual response to light of different characteristics.


Different types of light source are available with a wide variety of spectral power distribution (SPD), these giving variations in color appearance of the light and color rendition of illuminated surfaces. In past studies researchers have investigated whether SPD affects the perceived amount of light in an illuminated space, defined below as spatial brightness. These studies have tended to find that SPD affects spatial brightness and that this is not predicted by quantities derived from V([gamma]), the CIE Standard Photopic Observer, such as illuminance and luminance. Consider, for example, lighting of equal illuminance but from two sources having different spectra--some results suggest that one illumination may be considered significantly brighter than the other [Fotios, 2001]. If this is a persistent and significant effect there are several implications. First, it would show that a photometric measure of 'how much light?' based solely on V([gamma]) is not appropriate to characterize the brightness of an interior space under different types of light sources even when the luminance distribution is constant. Second, for the lighting designer, lamp choice offers the opportunity to increase spatial brightness and/or to reduce the energy consumed by the lighting [Fotios, 2011]. Third, for the lamp manufacturer, there is an opportunity to create lamps with spectra that are purposely designed to enhance a perceptual attribute such as spatial brightness, which may be achieved with lower power consumption [Houser and others, 2004; Houser and others, 2009].

This paper emphasizes the psychophysical measurement of spatial bright-ness, but the methodological techniques are applicable to other characteristics of lighting and other psychophysical dimensions (for example, sound, taste) that can be measured using forced-choice discrimination. Table 1 defines some psychophysical terms that are employed in this manuscript.

Terms from psychophysics that are employed in this

Term                               Definition

Discrimination task  A forced-choice task in which a subject
                     is asked to choose between two (or
                     sometimes more) stimuli using a
                     perceptual criterion such as
                     brightness, colorfulness, sweetness, or
                     loudness. If three or more stimuli are
                     presented at once, it may be called a
                     ranking task.

Simultaneous         A form of spatial juxtaposition that
evaluation           allows two (or more) stimuli to be
                     evaluated at the same time. A common
                     example is the side-by-side mode,
                     although other arrangements are
                     possible, such as upper and lower.

Successive           A form of temporal juxtaposition that
evaluation           allows two (or more) stimuli to be
                     evaluated. Each stimulus is presented
                     only once, often, but not essentially,
                     at the same spatial location.

Sequential           Similar to successive evaluation,
evaluation           except each stimulus is alternated
                     back-and-forth several times before the
                     decision is recorded.

Bias                 A systematic distortion of the
                     responses that results from the
                     sampling procedure.

Interval bias        A consistent asymmetry in the response
                     frequencies whereby one interval
                     appears with greater frequency than
                     would be expected. "Interval" refers to
                     the temporal placement of a stimulus
                     condition. For example, in successive
                     evaluations, there are two intervals
                     where a stimulus condition could
                     appear, first or second.

Positional bias      A consistent asymmetry in the response
                     frequencies whereby the response to one
                     of the positions appears with greater
                     frequency than would be expected.
                     "Positional" refers to the spatial
                     placement of a stimulus condition. For
                     example, in simultaneous evaluations,
                     there could be two positions where a
                     stimulus condition could appear, left
                     or right.

Stimulus range bias  Bias associated with the range of
                     stimuli selected by the experimenter.

Stimulus frequency   A bias related to the distribution of
bias                 the magnitudes of the test stimuli
                     above (for example, brighter) and below
                     (for example, dimmer) than that of the
                     reference stimulus.

Centering bias       A bias that centers the range of
                     response on the midpoint of the range
                     of stimuli.

All possible pairs   An experimental procedure for
                     discrimination experiments where all
                     possible pairs of conditions are
                     compared. This method will eliminate
                     the centering bias that may occur when
                     a constant reference illuminant is

Spatial brightness emphasizes the ambient lighting of a space rather than lighting of objects or surfaces. Spatial brightness describes a visual sensation to the magnitude of the ambient lighting within an environment, such as a room or lighted street. Generally, ambient lighting creates atmosphere and facilitates larger visual tasks such as safe circulation and visual communication. This brightness perception encompasses the overall sensation based on the response of a large part of the visual field extending beyond the fovea. It may be sensed or perceived while immersed within a space or when a space is observed remotely but fills a large part of the visual field. Spatial brightness does not necessarily relate to the brightness of any individual objects or surfaces in the environment, but may be influenced by the brightness of these individual items. Many previous studies have used the term brightness, which is usually defined as the attribute of a visual sensation according to which a given visual stimulus appears to be more or less intense [Wyszecki and Stiles, 1982]. It is clear, however, from the manner in which visual judgments were made in these previous studies that the evaluation carried out is one which may be better identified as spatial brightness.

Past studies of spatial brightness under lighting of different SPD have used one of four procedures; matching, adjustment, category rating and discrimination. Recent articles have discussed best practice when using the procedures of matching [Fotios and others 2008], adjustment [Fotios and Cheal, 2010a; Logadottir and others, 2011] and category rating [Atli and Fotios, 2011; Fotios and Atli, 2012; Fotios and Houser, 2009]. This article discusses the remaining procedure, discrimination.

In discrimination tests (known also as brightness ranking in past studies [Fotios and Cheal, 2007, 2008]) test participants are presented with two visual scenes (such as rooms, booths, or light patches) in spatial or temporal juxtaposition. The luminances of both scenes remain constant and the participant is instructed to report which stimulus is brighter. This is usually a forced choice task, in which the response 'equally bright' is not allowed. The output is the frequency of responses by which one scene is considered to be the brighter of the pair.

The objective of this article is to raise awareness of aspects of experimental procedure that may provide a step towards improving best practice when using a discrimination procedure to evaluate spatial brightness.


In simultaneous evaluations, two (or more) stimuli are compared in juxtaposed spatial locations. In brightness discrimination studies these are often side-by-side fields, commonly left-right or top-bottom fields. Position bias is the suggestion that the brightness of the visual scene in one location may be favored over that of the second location, such as a tendency for test participants to indicate that the right-hand field is brighter than the left-hand field. Previous work reveals a bias for choosing the right-hand option when presented with identical left and right route choices [Taylor and Socov, 1974; Kang, 2004]. Position bias has been demonstrated in side-by-side brightness matching experiments [Fotios and others, 2008] and it may therefore be expected to persist within side-by-side brightness discrimination judgments.

Rea, Radetsky, and Bullough [2011] carried out three brightness discrimination tasks using side-by-side booths containing identical scale models of a parking lot scene, with each booth subtending a field of 18[degree] x 18[degree]. Their second experiment included null condition trials, with lighting of the same SPD and illuminance in both sides. Statistical analysis revealed a significant (p < 0.05) position bias; the right side was judged brighter 78 percent of the time when the conditions on both sides were the same, despite nearly identical photometric characteristics. Counterbalancing was employed in these tests to offset the effect of position bias.

Stephens and Bolander [2005] used a side-by-side presentation with the reference source illuminating the left-hand side. They included a null condition trial in their tests with 42 of their 68 test participants although they did not present any analyses of these data. The stimulus illuminating the left-hand field was set to 25 cd. (1) The stimulus in the right-hand field was set to either an equal luminous intensity (25 cd), a lower luminous intensity (17 cd), or a higher luminous intensity (35 cd), and was of the same SPD as the left-hand field. The results are shown in Table 2. When the right-hand field was set to 17 cd, the left-hand field at 25 cd should appear brighter if evaluations were made according to luminance differences; this is not evident in the results, with the majority of subjects voting for equal brightness. This suggests a bias in favor of the right hand field that overcame the higher luminance of the left-hand field. When both fields were set to 25 cd, a null condition of SPD and luminous intensity, judgments should have been for same brightness and approximately equal number of votes for the left- and right-hand fields. There are a large number of votes for the right-hand field to be brighter, again suggesting a bias to the right-hand field. When the right-hand field was set to 35 cd, a higher luminous intensity than the left-hand field at 25 cd, the majority of judgments are for the right-hand field, which is the expected result.

Null condition results from Stephens and Bolander (2005).
The shaded cells are those that should have received nearly all
of the votes (n = 42) if judgments, in the absence of bias, were
made according to differences in light levels

Judgment of Brighter    Number of Votes for the
Stimulus               Field That Was Brighter.
                         In These Trials, the
                       Left-Hand Field Was Fixed
                       at 25 cd, the Right-Hand
                        Field Was Set to One of
                           Three Light Levels

                                           17 cd  25 cd  35 cd

Right-hand side                                7     18     29

Same (equally bright)                         29     23     12

Left-hand side                                 6      1      1

Fotios and Cheal [2007] used discrimination to compare side-by-side booths at mesopic illuminances and their procedure included counterbalancing of the position of their stimuli. The results reveal a tendency for the left-hand booth to be reported as brighter more frequently than the right-hand booth, by 51 percent to 63 percent, but statistical analysis did not suggest these differences to be significant. They also included null condition trials, as did their subsequent study [Fotios and Cheal, 2011], and these data did not suggest differences between the left-hand and right-hand positions to be significant.

Thus, while the results of some studies do not reveal position bias, the results of others suggest a significant effect. We believe that a position bias should always be suspected and that it should be addressed with appropriate experimental design. Counterbalancing of the spatial locations should be employed to offset bias and null condition trials should be included to estimate the magnitude of bias.


An alternative approach to the spatial juxtaposition of simultaneous (side-by-side) evaluation of two stimuli is the temporal juxtaposition of successive and sequential evaluations in which the two stimuli are presented in temporal intervals (one after the other) often, but not essentially, at the same spatial location. In the successive mode each stimulus is presented only once and then a judgment is made. In the sequential mode each stimulus is alternated back-and-forth several times before the decision is recorded.

Interval bias [Yeshurun and others, 2008] is a consistent asymmetry in the direction of a certain response, for example a 'brighter' response for one interval which appears with a greater frequency than is expected. In temporal juxtaposition observers have to retain in their mind their sensory impression of the preceding stimulus while waiting for and then judging the current stimulus [Jakel and Wichmann, 2006]. A possible explanation of interval bias in successive evaluations is memory limitations: the observer either cannot or does not record an accurate sensory intensity in the first stimulus when making comparison with the second stimulus [Yeshurun and others, 2008]. Thus, a potential advantage of sequential evaluations over successive is that the repeated presentation of both stimuli allows for a response that is less reliant on memory.

Mental representations of previously encountered physical stimuli tend to be lower (for example, shorter in length, or less bright) than were the original stimuli [LaBoeuf and Shafir, 2006]. This was found by Uchikawa and Ikeda [1986] in their brightness matching results where stimuli were recalled as being darker with successive evaluation than with simultaneous evaluation. Royer and Houser [2012] asked forty test participants to evaluate the brightness of pairs of lighting conditions presented sequentially with unlimited alternations. Their eight metameric stimuli (3500 K e 20 K with [D.sub.uv] ranging from 0.0002 to 0.0011) were created by systematically varying either the red or blue primary of an RGB LED mixture. Each stimulus of the pair was presented for 5 s with a dark period of 0.01 s in between and the sequence alternated for a minimum of 30 s before the participant made a judgment. Counterbalancing was employed by presenting stimulus pairs in both orders and null condition trials were recorded to evaluate the extent of interval bias. There was a slight tendency for test participants to select the second stimulus as being brighter (55 percent for their 'blue' series and 56 percent for their 'red' series), but those percentages were not statistically different from chance (p = 0.53 and p = 0.48 for the 'blue' and 'red' series, respectively).

Two-interval forced-choice tasks are not simple, are not bias free, and are potentially difficult to interpret: Yeshurun and others, suggest using simultaneous evaluation to overcome some of the problems of two-interval forced choice [Yeshurun and others, 2008]. Jakel and Wichmann found that their four naive observers were still better at a spatial discrimination task than at a temporal discrimination task after 20,000 detection trials [Jakel and Wichmann, 2006].

Previous research exploring interval bias tends to have used successive evaluations, in which the stimuli are presented only once each, with judgments being made after observation of the second and without the opportunity to see the first stimulus again. Past studies of spatial brightness have tended to use sequential evaluation rather than successive. For example, Berman and others [1990] presented three alternations of the two lamps, each being presented for five seconds at a time with a dark period of 25ms in the 100ms changeover duration; Vrabel and others [1998] presented their stimuli for three seconds each, with a two second dark interval, and observers were able to ask for as many repeat presentations as needed. In these studies the repeated presentation of both stimuli may overcome interval bias due to memory affects as the internal brightness reference is repeatedly refreshed by observation of the external reference. The extent of any interval bias could be analyzed using null condition data, where the same SPD, illuminance and spatial distribution were used in each interval. However, there were no null condition trials within these studies. Two studies [Houser and others, 2009; Fotios and Cheal, 2010b] carried out to compare brightness judgments using simultaneous and sequential evaluations were not able to detect a difference, suggesting that interval bias was offset or was negligible in these trials.

In order to gather the most defensible data possible we recommended that the two stimuli be placed in both intervals (first or second source) for different trials when successive or sequential evaluations are employed. We suggest that sequential evaluations may alleviate the interval bias prevalent in successive evaluations through the repeated presentation of both stimuli but further data are required to confirm this.


To enable estimation of the illuminance ratio for equal brightness the discrimination task is repeated with the illuminance of one or both of the visual scenes varied through several steps. This range of illuminances is the stimulus range. Nominally, one scene is the reference and retained at a standard illuminance, while the second scene is set by the experimenter to a range of illuminances with discrimination judgments made at each. Stimulus range bias means that the experimenter's choice of stimuli affect the outcome of the experiment: a different stimulus range could lead to a different conclusion being drawn. Two studies indicate that stimulus range bias will affect the outcome of discrimination judgments [Fotios and Cheal, 2008; Teller and others, 2003].


Stimulus range effects are those produced by the range of stimuli selected by the experimenter. Poulton [1977] demonstrated stimulus range bias in investigations of loudness. Consider judgments of loudness for two different ranges of noises, 80 to 100 dB and 70 to 90 dB, with judgments gained using a rating scale of very quiet to very noisy. In the 80 to 100 dB range, respondents tended to put the 90 dB noise, the middle of the stimulus range, in the middle of the response range, marking the transition between acceptable and noisy. In the 70 to 90 dB range there was still a tendency to place the middle of the stimulus range at the middle of the response range, but this time the middle stimulus was only 80 dB. Stimulus range bias means that the 80 dB noise from the less intense range of noises tended to receive the same loudness rating as the 90 dB noise from the more intense range of noises. In practice the observer's judgment is a compromise, being determined partly by the actual loudness of the noise and partly by the position of the noise within the range of noises. Poulton [1977] further demonstrated range bias in a pitch discrimination trial using a constant reference, where this reference was either 1 Hz throughout the trial, absent after the first block of trials, or increased by 3 Hz after the first block of trials. The resultant judgments hardly changed, whereas they would be expected to do so if stimuli were judged only against the reference.

Range bias has been demonstrated in brightness discrimination judgments. Teller, Pereverzeva, and Civan [2003] sought brightness judgments of small (mainly 2[degree]), colored targets (red and blue in separate trials) presented on a white computer screen (42[degree]) of constant luminance. For each target color, a range of targets varying in luminance were presented in a random order, and observers reported whether the target was brighter or dimmer than the surround. Three ranges of target luminance were used in separate series of trials--for the red target these ranges had mid-point values of -0.6, -0.3 and 0.1 log luminance relative to the white surround. Typically 11 target stimuli were used in each range, increasing in steps of 0.05 log luminance units. Note that when the log luminance of the target relative to the surround was 0.0 there was no luminance contrast between the target and surround. In this situation it would be expected to find a 50 percent probability that the target would be judged brighter than the surround if there were no differences in brightness caused by color contrast of the red or blue targets against the white background. Seven test participants were used, each making 20 discrimination judgments per condition.

Table 3 illustrates the three ranges of log luminance of the target relative to the white surround and the percentage by which the red target was reported to be brighter than the surround. In each range the higher luminances tended to be reported as brighter than the surround while the lower luminances were dimmer than the surround. The targets of highest luminance in the lower range of luminances were also the lowest luminance targets in the middle range: the stimuli located at the upper end of the low range were judged to be brighter than the surround on approximately 100 percent of trials, while the identical stimuli when used at the lower end of the middle range were considered to be dimmer than the surround on approximately 100 percent of trials. Similarly, the highest luminance targets of the middle range of luminances were also the lowest luminance targets in the higher range, and again were judged to be brighter than the surround in approximately 100 percent of trials when placed in the middle range but dimmer than the surround in approximately 100 percent of trials when placed in the high range.

Results of experiment by Teller et al. (2003) showing the
percentage of judgments in which a target was considered to
be brighter than the surround

Log Luminance of     % of Judgments'
Target Relative to   That Target Was
White Surround      Brighter than the

                        High Range      Middle  Low Range

0.2                               100       -          -
0.1                                90       -          -
0                                  50       -          -
-0.1                               20     100          -
-0.2                                -      60          -
-0.3                                -      50          -
-0.4                                -      20         90
-0.5                                -       0         20
-0.6                                -       -          0
-0.7                                -       -          0
-0.8                                -       -          0

The percentages reported above are not precise, being read
from Figure 1 in Teller et al.

Thus a target of a particular log luminance relative to the white surround was reported to be both brighter and dimmer than the surround, depending on whether it was at the upper or lower end of a range of target stimuli. It appears that test participants were not judging the targets against the surround, the intended reference, but instead against the other target stimuli. Figure 1 summarizes how the results from different ranges would be interpreted.

Typically, the luminance of a target found in 50 percent of trials to be brighter than the surround would be used to indicate luminances for equal brightness. Each of the three luminance ranges used by Teller and others, hence suggests a different luminance for equal brightness. What this reveals is that each decision (brightness judgment about a particular target) was not made independently of the others in the series.

Stimulus range bias can be seen in results from the Teller and others, study because they purposefully used three different stimulus ranges: most other studies do not. Consider a study which had used only one range of luminances; in the absence of further data such as the findings from an alternative procedure, it would not be possible to tell whether the data gave a reliable estimate of luminances for equal brightness or was biased toward the center of the stimulus range.

It may be that these results can be explained by foveal vs. parafoveal perceptions: it is plausible to image the participant concentrating on the test target, which would have been imaged on the fovea, and paying less attention to the reference target, which would have subtended the parafoveal field. Evaluations of spatial brightness tend to use larger visual fields than was used by Teller and others [2003] and test and reference fields are of similar size (for example, side-by-side rooms). While further research is required to confirm whether the stimulus range bias revealed by Teller and others, can be generalized to judgments of spatial brightness, a cautious approach to experimental design suggests that stimulus range bias is at least considered.


Fotios and Cheal [2008] examined stimulus frequency bias. For brightness discrimination tests, stimulus frequency refers to the distribution of magnitudes of the test stimuli above (for example, brighter) and below (for example, dimmer) than that of the reference stimulus. Consider a test stimulus which is presented at 100, 200, 300, 400, and 500 lx for comparison with a reference stimulus of 400 lx and identical SPD and spatial distribution: in this case the stimulus frequencies are biased, being one level (500 lx) above equal brightness and three levels (100, 200 and 300 lx) below equal brightness. If this distribution were instead to be fairly balanced on both sides of the equal brightness condition, then the test stimulus would tend to be identified as being brighter or dimmer on a near equal number of occasions, while in the biased distribution, one stimulus will be identified as brighter more frequently than the other. When the response frequencies are not equal (for example, the responses brighter, or left, are given more frequently than the responses dimmer, or right) observers will tend to respond as if the frequencies were more nearly equal; this may arise from a preconception of chance, leading an observer to expect that where a large number of responses are given, each of the permitted responses will be correct on an approximately equal number of occasions [Poulton, 1989; Senders and Sowards, 1952].

Stimulus frequency bias means there is potential to suggest a difference between two stimuli when none exists (and similarly to suggest no difference between two stimuli when a difference exists). This can be seen in null condition data from brightness discrimination judgments [Fotios and Cheal, 2008]. These trials used simultaneous (side-by-side) evaluations with both booths lit using identical types of high pressure sodium (HPS) lamp. One booth (reference) was lit using a set to one of three illuminances (2.0, 7.5 and 15.0 lx) these being the bottom, middle and top of the six illuminances of the S-series of lighting classes for subsidiary roads [BSI, 2003]. The second booth (test) was set to one of several steps of illuminance as shown in Table 4--these were simply the full range of S-series illuminances, that is, 2.0, 3.0, 5.0, 7.5, 10.0 and 15.0 lx [BSI, 2003] and for the 2.0 lx reference case an additional comparison illuminance of 1.0 lx was used. In Table 4 the illuminance at which the test booth was expected to be equally bright as the reference booth (that is, equal illuminance) is highlighted using bold font: the ranges of illuminances either side of this point reveals where a stimulus frequency bias would be expected and its likely direction.

Steps of illuminance used in brightness discrimination
null condition trials with balanced and biased stimulus
frequencies (Fotios and Cheal, 2008). The illuminances shown in
bold within the shaded cells are those at which the two fields
should appear equally bright if judgments were made according to

Stimulus  Reference        Test
Sequence  Illuminance  Illuminance
          (lux)           (lux)

Biased    2.0                  1.0  2.0  3.0  5.0  7.5  10.0
          7.5                       2.0  3.0  5.0  7.5  10.0
          15.0                      2.0  3.0  5.0  7.5  10.0

Balanced  2.0                  1.0  2.0  3.0
          7.5                                 5.0  7.5  10.0
          15.0                                          10.0

Stimulus  Reference
Sequence  Illuminance

Biased    2.0          15.0
          7.5          15.0
          15.0         15.0

Balanced  2.0
          15.0         15.0  22.0

Consider when the reference illuminance was set to 2.0 lx. Of the seven test illuminances, five are of higher illuminance and thus a decision of 'brighter' would be expected, compared with one decision of 'dimmer' for the 1.0 lx test illuminance. According to stimulus frequency bias, when presented with both booths set to 2.0 lx the tendency would be to respond that the test booth was dimmer in an attempt to balance frequencies of using the response options, even though they were equally bright.

The results are shown in Table 5, this being the percentage frequency by which the test booth was reported to be brighter. The results shown are only those where the illuminances of the two booths were equal. When the booths were not presented at equal illuminance, there was a frequency of almost 100 percent for the booth of higher illuminance to be noted as brighter.

Results of null-condition brightness discrimination
trials with balanced and biased stimulus frequencies (Fotios
and Cheal, 2008): percentage frequency with which the test
booth was voted to be brighter than the reference booth when
both are of identical illuminance. Differences analyzed using
Dunn-Rankin variance stable rank sums

                      Iluminance of
                     Reference Booth

                                 2.0      7.5     15.0

Biased stimulus

Percentage judgment              42%      69%      75%
that test booth is

Difference in votes             n.s.      p <      p <
for the test and                         0.05     0.01
reference booths

Balanced stimulus

Percentage judgment              57%      47%      56%
that test booth is

Difference in votes             n.s.     n.s.     n.s.
for the test and
reference booths

For trials using the 7.5 lx and 15.0 lx reference illuminances Table 4 shows there were more test stimuli of lower illuminance than the reference than there were stimuli of higher illuminance than the reference. According to a stimulus frequency bias, judgments made when the two booths were of identical illuminance would tend to yield responses for the test booth to be brighter than the reference. The results shown in Table 5 confirm this; the test booth was reported to be brighter on a significantly greater number of occasions than was the reference booth (p < 0.05 at 7.5 lx; p < 0.01 at 15.0 lx) despite the equality of lamp type and illuminance. At the 2.0 lx null condition, the test booth was reported as brighter in 42 percent of trials, which is suggestive of a difference but was not statistically different from 50 percent.

In the balanced stimulus frequency null-condition trials, the same three reference illuminances were used (2.0 lx, 7.5 lx and 15.0 lx) but the comparison booth was presented at only three levels, these being equal illuminance to, and one S-class step above and below, the reference illuminance (this required extrapolation of the S-series to create further classes of illuminance at each end of the series). The results are shown in Table 5. Again, these are only the results of the equal illuminance trials since when presented at dissimilar illuminances the booth of higher illuminance was identified as brighter in 100 percent of the observations. The percentage frequency with which the test booth was noted as brighter is now much closer to the expected 50 percent than was found with the biased stimulus frequency. Statistical analysis did not suggest brightness differences between the test and reference booths at equal illuminance to be significant.

Taken together, these data confirm that a stimulus frequency bias may occur under biased experimental conditions. They further show that a balanced stimulus choice can remove a frequency bias.


When using a brightness discrimination procedure in which a range of stimulus magnitudes are compared against a constant reference, the results of Teller and others, suggest a centering bias. The results of Fotios and Cheal suggest that stimulus magnitudes in the range should be equal around that expected to be equal in brightness as the reference in order to permit an approximately equal number of brighter and dimmer responses and thus avoid a stimulus frequency bias. That approach, however, is likely to enhance the centering bias found by Teller and others, It is therefore suggested that discrimination procedures should avoid using a constant reference and should instead employ an all-possible-pairs approach, which will counter both the stimulus frequency and centering biases. Figure 2 illustrates the constant reference and all-possible-pairs approaches to experimental design and Table 6 identifies the resultant lamp pairs.

Illustration of lamp pairs resulting from either a single
reference approach or an all-possible-pairs approach when
seeking to compare five types of lamp

Approach            Lamp Set     Number of Lamp  Lamp Pairs
Single reference  A, B, C, D,                 4  A/E, B/E,
source (lamp E)   E                              C/E, D/E

All possible      A, B, C, D,                10  A/B, A/C,
pairs             E                              A/D, A/E,
                                                 B/C, B/D,
                                                 B/E, C/D,
                                                 C/E, D/E

An all-possible-pairs approach has been used in past studies of spatial brightness. In three studies multiple types of lamp were compared on an equal-illuminance basis: this required six permutations of their four types of lamp [Houser and others, 2004] or all ten possible pairs of five types of lamp [Fotios and Cheal, 2011; Vrabel and others, 1998]. Note that stimulus magnitudes include variations in illuminance in addition to lamp type. Hence Houser, Fotios, and Royer [2009] presented the six pairs available from their four light settings, these being two types of lamp each at two luminances.

A problem with all-possible-pairs is that there is an increase in the number of lamp pairs to be examined, increasing from four to ten pairs when comparing five lamps, and this has implications for experimentation resources. There may be occasions when an all-possible-pairs approach is not practical or possible due to limitations in time or nuances of the apparatus. In these cases where the experimenter wishes to proceed with a fixed reference, we recommend that the discrimination procedure is either avoided, or complemented with alternative procedures (for example, matching and rating).


This article presents proposals for good practice when using a discrimination procedure to make visual evaluations of lit scenes. While the focus of this work is judgments of spatial brightness under lamps of different SPD, the proposals are expected to be valid for other visual evaluations and other characteristics of lighting.

One proposal concerns the placement of the (typically) two lit scenes evaluated using discrimination. In simultaneous evaluations these scenes are in juxtaposed spatial locations, typically left and right: in this case, the scene locations should be counterbalanced between the left and right fields to avoid a position bias. In sequential evaluations the two scenes are presented in juxtaposed temporal locations, one after the other at the same spatial location; in this case the scene intervals (first and second) should be counterbalanced.

The second proposal concerns the experimenters' choice of paired scenes from the range of scenes available, these being the combinations of lamp type and luminance. In some studies one scene is chosen as the reference against which all other scenes are compared; we suggest this is not good practice. Instead, an all-possible-pairs approach should be adopted in which each lit scene is compared against every other lit scene. The order in which these scene-pairs are evaluated should be randomized or counterbalanced.

An aim of this guidance is to reduce error within the estimates of mean illuminance ratio for equal spatial brightness under lamps of different SPD: review of brightness matching data reveals a potential position bias of 14.5 percent [Fotios and others, 2008] although evidence from others studies found the effect of position bias to be negligible in that instance. To quantify the magnitude of bias requires that null condition trials are included. In a two-alternative discrimination trial this would mean that light settings of identical luminance and SPD were used to light both positions (or intervals). Thus, we also recommend that experimental procedures include null condition trials.


Requirements for data that might be considered reasonably free of bias in the experimental procedures are evidence that position (or interval) bias is counter-balanced or demonstrated by null condition trials to be negligible, that multiple stimuli are compared using an all-possible pairs approach rather than using a single reference stimulus, and that stimulus pairs are observed in a random order if a repeated measures design is used.

Five studies [Berman and others, 1990; Houser and others, 2004; Houser and others, 2009; Royer and Houser 2012; Vrabel and others, 1998] using a discrimination procedure to investigate spatial brightness under lighting of different SPD at photopic levels meet these criteria and are therefore considered to provide reliable evidence of lamp spectrum effects. Of the three studies which employed sequential evaluation, two [Berman and others, 1990; Houser and others, 2009] reported that stimulus intervals were counterbalanced: Vrabel and others, [Vrabel and others, 1998] did not report this information and it is assumed that the repeated sequential presentation of the two stimuli in each pair countered the interval bias otherwise expected in successive evaluations.

Seven studies of lamp spectrum and spatial brightness at photopic levels are not considered to provide appropriate evidence [Bullough and others, 2007; Cockram and others, 1970; Harper, 1974; Manav, 2007; Navaab, 2001; Pracejus, 1967; Stephens and Bolander, 2005]. The reasons for these decisions included incomplete reporting of the results [Harper, 1974] position bias [Bullough and others, 2007; Stephens and Bolander, 2005] and insufficient description of the test procedure to be able to understand what took place [Navaab, 2001; Manav, 2007]. Pracejus [1967] had sought forced choice judgments of preference of two rooms lit using different types of lamp: seven types of lamp were available but only 17 of the possible 21 combinations were apparently used, the precise combinations not being reported, and it is not clear how the reported proportional preferences for each lamp were established. In their pilot study, Cockram, Collins, and Langdon [1970] asked for the lighting in four different rooms to be placed in rank order, essentially a four-alternative forced-choice discrimination task, and these were judgments of preference rather than of brightness. There are four reasons that hinder acceptance of these results. First, different types of lamps were compared on the basis of an equal number of lamps rather than equal illuminances and there is a tendency for lighting of higher illuminance to be more preferred. Second, the occupants of the building used for their field study gave the highest preference score to the WW lamp that was normally used in the building, suggesting an adaptation or expectation effect. Third, there is an apparent error in the results: the total preference scores from the town hall staff all four stimuli should sum to 400, but the reported results sum to only 372 (night-time observations), 332 (daytime observations, tinted glazing) and 335 (daytime observations, clear glazing). Finally, there are insufficient data to test whether differences between lamps are significant.


We suggest that experiments that use a discrimination procedure to compare spatial brightness under lighting of different SPD incorporate two steps within the experimental design: stimulus positions (or intervals) are counterbalanced, and an all-possible-pairs design is used when comparing multiple stimulus magnitudes. We also suggest that multiple tests be carried out in a random order and that reports of the work present sufficient data to understand the procedure and to interpret the results. Five past studies of 12 presenting empirical evidence for lamp spectrum and spatial brightness at photopic levels meet these criteria [Berman and others, 1990; Vrabel and others, 1998; Houser and others, 2004; Houser and others, 2009; Royer and Houser 2012]. We hope that future work will augment these studies with defensible data gathered during experiments that have been intentionally designed to counter, minimize, and quantify potential biases.

(1) "cd" is the unit reported by Stephens and Bolander [2005].


Atli D, Fotios S. 2011. Rating Spatial Brightness: Does the number of response categories matter? Ingineria Iluminatului. 13(1):15-28.

Berman SM, Jewett DL, Fein G, Saika G, Ashford F. 1990. Photopic luminance does not always predict perceived room brightness. Lighting Res Technol. 22(1):37-41.

British Standards Institution (BSI). 2003. BS EN 13201-2:2003, Road lighting--Part 2: Performance requirements. London (UK): British Standards Institution.

Bullough JD, Yuan Z, Rea MS. 2007. Perceived brightness of incandescent and LED aviation signal lights. Aviat. Space Envir. Md. 78(9):893-900.

Cockram AH, Collins JB, Langdon FJ. 1970. A study of user preferences for fluorescent lamp colours for daytime and night-time lighting. Lighting Res Technol. 2(4):249-256.

Fotios SA. 2001. Lamp colour properties and apparent brightness: A review. Lighting Res Technol. 33(3):163-181.

Fotios S. 2011. Lighting in offices: Lamp spectrum and brightness. Color Technol. 127(2): 114-120.

Fotios S, Atli D. 2012. Comparing judgments of visual clarity and spatial brightness using estimates of the relative effectiveness of different light spectra. Leukos. 8(4):261-281.

Fotios SA, Cheal C. 2007. Lighting for subsidiary streets: Investigation of lamps of different SPD. Part 2-Brightness. Lighting Res Technol. 39(3):233-252.

Fotios SA, Cheal C. 2008. The effect of a stimulus frequency bias in side-by-side brightness ranking tests. Lighting Res Technol. 40(1):43-54.

Fotios SA, Cheal C. 2010a. Stimulus range bias explains the outcome of preferred-illuminance adjustments. Lighting Res Technol. 42(4):433-447.

Fotios SA, Cheal C. 2010b. A comparison of simultaneous and sequential brightness judgments. Lighting Res Technol. 42(2):183-197.

Fotios SA, Cheal C. 2011. Predicting lamp spectrum effects at mesopic levels. Part 1: Spatial brightness. Lighting Res Technol. 43(2):143-157.

Fotios SA, Houser KW. 2009. Research methods to avoid bias in categorical ratings of brightness. Leukos. 5(3):167-181.

Fotios SA, Houser KW, Cheal C. 2008. Counterbalancing needed to avoid bias in side-by-side brightness matching tasks. Leukos. 4(4):207-223.

Harper WJ. 1974. On the interpretation of preference experiments in illumination. J Illum Eng Soc. 3(2):157-159.

Houser KW, Tiller DK, Hu X. 2004. Tuning the fluorescent spectrum for the trichromatic visual response: A pilot study. Leukos. 1(1):7-24.

Houser KW, Fotios SA, Royer MP. 2009. A test of the S/P ratio as a correlate for brightness perception using rapid-sequential and side-by-side experimental protocols. Leukos. 6(2):119-137.

Jakel F, Wichmann FA. 2006. Spatial four-alternative forced-choice method is the preferred psychophysical method for naive observers. J Vision. 6(11):1307-1322.

Kang J. 2004. The effect of light on the movement of people. Doctoral Thesis. Department of Interior Design. University of Minnesota, USA.

LaBoeuf RA, Shafir E. 2006. The long and short of it: Physical anchoring effects. J Behav Decis Making. 19:393-406.

Logadottir A, Christoffersen J, Fotios SA. 2011. Investigating the use of an adjustment task to set preferred illuminance in a workplace environment. Lighting Res Technol. 43(4): 403-422.

Manav B. 2007. An experimental study on the appraisal of the visual environment at offices in relation to colour temperature and illuminance. Build Environ. 42(2):979-983.

Navvab M. 2001. A comparison of visual performance under high and low colour temperature fluorescent lamps. J Illum Eng Soc. 30(2): 170-175.

Pracejus WG. 1967. Preliminary report on a new approach to color acceptance studies. Illuminating Engineering. 62(12):663-673.

Poulton EC. 1977. Quantitative subjective assessments are almost always biased, sometimes completely misleading. Brit J Psychol. 68:409-425.

Poulton EC. 1989. Bias in quantifying judgments. London (UK): Lawrence Erlbaum Associates Publishers. 304 p.

Rea MS, Radetsky BS, Bullough JD. 2011. Toward a model of outdoor lighting scene brightness. Lighting Res Technol. 43(1):7-30.

Royer MP, Houser KW. 2012. Spatial brightness perception of trichromatic stimuli. Leukos. 9(2):89-108.

Senders VL, Sowards A. 1952. Analysis of response sequences in the setting of a psychophysical experiment. Am J Psychol. 65(3):358-374.

Stephens N, Bolander A. 2005. Factors in the perception of brightness for LED and incandescent lamps. SAE Transactions. 114(6):908-920.

Taylor LH, Socov EW. 1974. The movement of people towards lights. J Illum Eng Soc. 3(3): 237-241.

Teller DY, Pereverzeva M, Civan AL. 2003. Adult brightness vs. luminance as models of infant photometry: Variability, biasability and spectral characteristics for two age groups favour the luminance model. J Vision. 3:333-346.

Uchikawa K, Ikeda M. 1986. Accuracy of memory for brightness of colored lights measured with successive comparison method. J Opt Soc Am A. 3(1):34-39.

Vrabel PL, Bernecker CA, Mistrick RG. 1998. Visual performance and visual clarity under electric light sources: Part II - Visual Clarity. J Illum Eng Soc. 27(1):29-41.

Wyszecki G, Stiles WS. 1982. Colour Science: Concepts and methods, quantitative data and formulae, 2nd ed. New York (NY): John Wiley & Sons. 250 p.

Yeshurun Y, Carrasco M, Maloney LT. 2008. Bias and sensitivity in two-interval forced choice procedures: Tests of the difference model. Vision Res. 48(17):1837-1851.

Steve A. Fotios PhD (1) * and Kevin W. Houser PhD, PE (2)

(1) School of Architecture, The University of Sheffield, UK; (2) Department of Architectural Engineering, The Pennsylvania State University, USA.

* Corresponding author: Steve Fotios, E-mail:
COPYRIGHT 2013 Illuminating Engineering Society
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2013 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Fotios, Steve A.; Houser, Kevin W.
Article Type:Report
Geographic Code:1USA
Date:Apr 1, 2013
Previous Article:Tutorial: Rationale, Concepts, and techniques for lighting vertical surfaces.
Next Article:User preferences in office lighting: a case study comparing Led and T5 lighting.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters