Letters to the editor.
The article by Houser and others, 'A Test of the S/P Ratio as a Correlate for Brightness Perception using Rapid-Sequential and Side-by-Side Experimental Protocols' (Leukos, 6(2):119) is reviewed here with the intent of providing some understanding of why those authors have come to such different conclusions than Berman and others. We discovered that the authors have made a number of subtle, but important errors in their statistical analysis. There are also a number of misunderstandings in their application of trichromacy. Our re-analysis leads to conclusion that their data raises very interesting questions regarding brightness constancy and color effects, but does not support most of their claims regarding trichromacy.
With regard to the major points; contrary to what the authors claim, their article provides evidence that the side-by-side and rapid sequential methods of judging brightness are not the same. The experiment was not properly designed to test trichromacy, and does not provide evidence for its support. Finally, the study fails to show that CCT has no effect on brightness as claimed, because it leaves open the question of the effects of color on brightness constancy under the artificial conditions examined.
We begin with a short review of the original claims for tetrachromacy in brightness perception. We then briefly describe the claims made for trichromacy at normal lighting levels in the reviewed article, and subsequently describe why we have concluded that the statistical analysis in support of these claims is incorrect. Finally we describe where the authors have misapplied trichromacy theory, and what this means to the interpretation of their results.
The basic controversy between Berman and others, and Houser and others, is the issue of tetrachromacy at photopic light levels. In a series of articles from 1990 on, Berman and others, showed, at typical interior light levels, that brightness perception and pupil size could not be predicted from photopic luminance alone in two scenes that were identical in color [Berman and others, 1990, Berman and others, 1992]. To place those results on a quantitative basis they were empirically fit successfully by a combination of photopic and scotopic luminances. At that time only the rods and cones were the generally accepted photoreceptors of the retina and so the effect was attributed to a rod based scotopic contribution. Although the demonstration of trichromacy failure appeared robust, the attribution of the failure to rod contributions was generally not acceptable, as rods were believed to not contribute to visual processing at photopic light levels.
However, at the end of year 2003 the prestigious journal 'Science' announced the discovery of a previously unknown mammalian retinal photo-sensing receptor located primarily in the noncentral (nonfoveal) regions of the eye and functioning at typical interior and even higher light levels, as among the 10 most important scientific breakthroughs of the year 2002 [He and others, 2003]. This new receptor is located on special retinal ganglion cells that contain the photo-pigment melanopsin and are generally referred to as the pRGC's for photosensitive retinal ganglion cells. Supporting studies confirmed an additional retinal photoreceptor with peak spectral sensitivity in the blue at about 480 nm with temporal and spatial properties very different from cones and rods (slow response time and little if any spatial resolution) [Brainard and others, 2001], [Thapan and others, 2001], [Gamlin and others, 2007].
The peak sensitivity associated with these melanopic receptors (peak wavelength at 491 nm after correction for spectral transmission of the lens and cornea of the eye) is close to the peak for scotopic sensitivity [Berman and Clear 2005, 2008]. Because the wavelength dependence of the scotopic and melanopic sensitivity functions is close, the scotopic and melanopic efficacies of most light sources are highly correlated. A re-analysis of the original Berman and others, work showed that their results could be explained in terms of a melanopic contribution, instead of a scotopic contribution [Berman and Clear 2005, 2008]. This is a more substantial mechanism for supporting tetrachromacy, and removes a major objection to its acceptance.
In a number of papers, Fotios, Houser, and others, have challenged Berman and others,'s findings and conclusions [Houser and others, 2004, Fotios and others, 2008]. Berman has responded to these challenges by pointing out methodological problems in the papers. Chief among these is the use of side-by-side comparisons in place of sequential comparisons, and the failure to control for color effects. In the article that is critiqued here, careful attention has been paid to experimental protocol such as counterbalancing presentations and so on and in particular in explicitly comparing the side-by-side and sequential comparison procedures. Our main dispute with their conclusions is based on an apparent error in their statistical analysis, and the possible confounding effects of color.
The article reviewed here also explicitly challenges the viability of the brightness formula (PVS/P) [Berman and others, 1990], to the extension to sources differing in color temperature, and not just solely to their scotopic (melanopic) content. We would be remiss if we did not note that if the brightness effect is due to the influence of the pRGC's instead of the rods then the above formula is strictly an empirical correlation, and may not be accurate for all sources, and in particular may not be accurate for the narrow band sources used in the Houser and others, study.
DISCUSSION OF STATISTICAL ANALYSIS
In the last paragraph of section 3.3 the reviewed article states that there is no statistically significant difference between the 2 different sources when compared at equal luminance. The claim is also repeated in the conclusion section of the article. This claim is based on the authors' use of the variance stable rank sums (VSRS) statistical procedure [Dunn-Rankin and others, 2004]. However, this claim is inconsistent with a simple binomial test applied to the data in Table 6 of the article as explained below. To resolve this inconsistency we examined the VSRS procedure in detail and determined that it was inappropriate for the analysis of the data presented.
The VSRS test is designed to test significance for situations where A is compared to B, B to C, A to C, and so on, with the comparison being a dichotomous variable such as brighter vs. darker. As such it seems ideally-suited for the brightness comparisons in the Houser and others, article. However, the viability of this test is based on the assumption of a fixed estimate of the rank variance. This assumption is grossly violated if some or all of the comparisons have zero or little variance, which turns out to be the case for the data of Houser and others,
To understand the consequences of violating this assumption we evaluate here a worst case situation. Assume that the categories under consideration are judged to be completely distinct. That is, if we have two categories A and B, then the subjects always rank A as less bright than B. Similarly, if we have four categories A-D, then A is always judged less bright than any of B-D, B is always less bright than C or D, and C is always less bright than D. If there are only two categories, five subjects out of five choosing A to be less bright than B has a simple binomial probability of less than 5 percent. Thus in this case, where the 2 categories are completely distinct, only 5 subjects are needed to show statistical significance. For more than 2 categories there are multiple comparisons that are possible. Applying the Bonferroni correction to maintain the equivalent 5 percent significance level increases the number of subjects to 7 (7 out of 7) for 4 categories, and 10(10 out of 10) for 8 categories. When these three cases are evaluated with the VSRS procedure the number of subjects needed to give a 5 percent significance level was found to be 4, 22, and 111, for 2, 4 and 8 categories respectively (see appendix for calculation details). The failure of the VSRS procedure to properly judge the significance level in this situation for the larger number of categories is frankly spectacular.
The above is a worst case example, but the test begins to fail if some of the comparisons result in very close to zero variance. Of the six side-by-side comparisons in the Houser and others, study, three have nearly zero variance, while the remaining three cases show a trend, but have significant variance. A simple binomial test of the two equal luminance, but different CCT cases, (Table 6 rows labeled AC and BD) shows the higher CCT source being brighter at the 1.3 percent and 0.4 percent levels (30 and 24 cd/[m.sup.2] respectively). Only the second comparison is significant after adjusting for multiple comparisons, however both comparisons refer to the same hypothesis, namely that the higher CCT lamp appears brighter. Based on the binomial distribution, the probability that both comparisons would have a probability of 1.3 percent or less can be determined and computes to less than 0.2 percent, which is significant even if one assumes multiple comparisons (critical significance level of 1.25 percent).
This result affects the authors' conclusion that the side-by-side and sequential test procedures yield the same result. We surmise that this question was most likely examined because Berman claimed that the tetrachromatic contribution to brightness lacks spatial resolution, and therefore will affect sequential comparisons, but not side-by-side comparisons [Berman 2008].
The authors did find by employing McNemar's statistical test that there was a difference between the two types of comparisons in two of the six cases, but not in the other four (although McNemar's test is not "sufficient", in that it does not use all the available information, it does appear to be correct in this case). The fact that there were two cases that did show a statistically significant difference was dismissed by the authors because for any given comparison the side-by-side and rapid sequential tests appeared to result in the same conclusions based on the VSRS statistical procedure and were therefore "comparable". However, the side-by-side comparisons, and the sequential comparisons do not lead to the same conclusions when the results are analyzed by the more powerful binomial test in place of the VSRS procedure. Thus we must conclude that the two types of comparisons do not yield similar results. We do not claim that the side-by-side procedure is "faulty or invalid", but it is not appropriate to evaluate tetrachromacy, if as hypothesized, the protocol offers little or no spatial resolution.
TRICHROMACY THEORY AND BRIGHTNESS
In addition to the admittedly possibly obscure but substantial error the authors' made in their statistical analysis, they have also made a number of claims based on what appears to be some serious confusion about trichromacy and "prime color" theory. Thornton's prime colors are theoretically the most efficient set of colors under the assumption of trichromacy. They are a consequence of trichromacy, and do not change its basic properties. The basic assumption of trichromacy is that there are only three color receptors. These three receptors have different spectral sensitivity functions, but retain no information about the wavelength of the photons after they are absorbed. This means that color and brightness should be completely described by the outputs of the three cones, which in turn are described by the three color coordinates. The eye and brain appear to organize the three color inputs into an achromatic brightness channel (luminance), and two color channels. Overall brightness is a combination of the brightness channel plus input from the chromatic channels.
Berman and others, [Berman and others, 1990] showed that even when the chromatic channels were matched, it was possible to produce a brightness match that was opposite of what was predicted by photopic luminance. If trichromacy is correct that result cannot as the authors claim "... be explained by changes in the wavelengths of the spectral components that combined to form the composite spectra." The only "act of faith" in attributing this effect to the rods was in not knowing at the time of the study that there are actually five photoreceptors in the human eye. The melanopic receptor, discovered in 2002, has a fairly similar spectral sensitivity to that of the rods, but is more active at photopic levels, and is therefore the more likely candidate rather than the rods for the spectral brightness effect discovered by Berman and others,
An even more direct example of the author's confusion regarding trichromacy and prime colors is their statement in section 4.2 of their paper that "As illustrated in Fig. 1, all SPD's employed in this experiment were comprised of the same three spectral components. Under these conditions, trichromacy and the opponent colors model suggest that luminance will predict brightness perception, at least within the region near the blackbody locus. By maintaining a single set of spectral primaries, the achromatic (luminance) and chromatic (blue-yellow, red-green) channels will be stimulated in the same general ways." These statements are in conflict with standard trichromatic color theory. Motion along the black-body locus does not change the fact that it corresponds to a change in the chromaticity of the source, and therefore, according to opponent color theory, should affect brightness perception.
Another example of the confusion over prime colors is the claim that "if the primary components had been changed, brightness perception can be expected to be different even at equal chromaticity and luminance.". However, the experiment referenced as support for this claim [Hu and others, 2006] was based on a side-by-side comparison of a chromatically complex scene. The introduction of a chromatically complex scene means that the subject is not making judgments of brightness with equal chromaticity and luminance.
ALTERNATIVE EXPLANATIONS OF THE EXPERIMENTAL RESULTS
There are a number of other examples of the above confusions in the article, but it is important to note that they do not invalidate the fact that the primary results of the sequential comparisons appear inconsistent with the results of Berman and others, Furthermore as explained above, the results for the side-by-side comparisons are not particularly relevant to the melanopic brightness hypothesis. As we noted in our discussion of the statistics, re-analysis of the side-by-side results indicates that there is indeed a significant CCT effect. This effect is not predicted by Berman and others,'s hypothesis, but is consistent in direction with the results found by others for color channel effects, so it is not unexpected [Hunt 1991].
On the other hand, the sequential results do not show a CCT effect, even when the luminance of the two sources is matched. It is possible that one or the other study is a victim of a low probability event, wherein a further study would or would not show a result, but it seems more likely that there is something in the conditions of the experiments which are sufficiently different to account for the difference in the results. In particular, it should be noted that Berman and others, are not the only, and were not even the first to claim that 'white' lighting with a higher correlated color temperature is perceived as brighter than lighting with lower correlated color temperature at the same illuminance [Harrington 1954, [Akashi & Boyce 2006], [Weale 1951,1953].
The most obvious protocol difference in the 2 studies is that in the Berman and others, study, the color differences between the sources were eliminated as much as possible, while in the Houser and others, study the colors of the compared sources were not matched. The x and y chromaticity coordinates of the two sources studied can be determined from the stated source color temperatures and are approximately 0.44, 0.41 respectively for the low color temperature source and 0.30, 0.31 respectively for the high color temperature source. This means that the two sources differ significantly in hue. The fact that the two sources are on the blackbody locus does not eliminate this difference.
It should be noted that perceived brightness judgments in the presence of color differences is generally considered to be a notoriously difficult and imprecise task [Wyszecki & Stiles 1982]. For example, this shows up in row AC for the rapid sequential column where the expert and naive subjects' percentages are nearly opposite. It is therefore possible that the results are simply due to the subjects confounding brightness changes with color differences.
A stronger possibility to account for the different outcomes is that in the presence of a color difference brightness constancy comes into play. Brightness constancy is the ability to see objects as continuing to have the same brightness even though different lights may change the objects spectral properties. Such a perception often occurs in lighting practice when different illuminants are used to light the same environment. Had the color of the 2 comparison illuminants been kept equal as in the Berman and others, study this effect would likely be absent. If brightness constancy is the explanation of the different results, then color differences in the rapid sequential test create a protocol problem that confounds any conclusions about the fundamental issue of real spectral effects. We would also not expect this brightness constancy effect to occur for the side-by-side comparison because both illuminants are present at the same time.
DETERMINATION OF MINIMUM NUMBER OF SUBJECTS
As noted above, the VSRS procedure is used to determine the statistical significance of differences in ranks [Dunn-Rankin and others, 2004]. It is necessary to determine the minimum number of subjects needed to obtain a statistically significant result with the VSRS procedure for the worst case example described in the text. Consider an example where there are four categories (A-D) of lighting sources differing in luminance and color temperature. Each of N judges (subjects) evaluates the relative brightness of each of the categories against each of the other categories (Avs. B, Avs. C, Avs. D, B vs. C, B vs. D, and finally C vs. D). The number of times a particular category is judged brighter in these pair-wise comparisons is the rank for that category. In the worst case example where A is always less bright than B through D, B is always less bright than C or D, and C is always less bright than D, the ranks for A-D are simply, 0, N, 2N, and 3N, respectively. The rank difference between the ordered pairs in this example is thus simply N.
To determine whether this difference is statistically significant it is compared to a critical range value that is a function of the number of categories, K, the number of subjects (judges), N, and the significance level, a. The critical range for a statistically significant difference is the product of the expected value of the standard deviation, E(S), and a value from the range distribution, Qa. E(S) is computed from the formula given in Dunn- Rankin and others: E(S) = [(N x K x (K+1)/12).sup.1/2]
Where Qa is the number of standard deviations, S, that gives a statistically significant difference at the specified probability level, a. The value of Qa as a function of the number of categories, K, and the critical probability level, a, can be found in table of Studentized ranges (see for example http://cse.niaes. affrc.go.jp/miwa/probcalc/s-range/srng tbl.html#fivepercent). For a=5 percent, Qa is equal to 2.722, 3.633, and 4.286, for 2, 4, and 8 categories respectively.
As noted above, in the case when for the categories A is always judged to be less bright than B, and B is less bright than C, and so on, the rank differences between adjacent categories is equal to the number of subjects, N. The smallest number of subjects that can be judged statistically significant is determined by setting the number of subjects equal to the critical range. This gives the result: N = [Qa.sup.2] x K x (K + 1)/12.
As discussed above, the Houser and others, study shows a spectral response for the side-by-side protocol but not for the rapid sequential. The side-by-side spectral result is consistent with a chromatic response, while the rapid sequential procedure did not show a CCT effect. However, the Houser and others, study is not equivalent to studies that have shown such an effect, as in their case the two sources employed differed in chromaticity. The introduction of a chromatic difference makes brightness judgments difficult, and may have a brightness constancy effect. Hence the conclusion of the authors' that the study failed to show any of the hypothesized effect of spectrum or the pRGC's on brightness perception or that the results are consistent with trichromacy remains equivocal. We would like to again point out that Berman and others, were not unique in their claims on the spectral effects on brightness perception in that others have noted that higher color temperatures appear to be related to increases in brightness perception. This suggests that prime color theory and trichromacy; powerful as they are in explaining most situations, are not sufficient for judging brightness in full field lighting environment utilizing different color temperature lamps. This later situation is just as important, and arguably more important, than judging the relative brightness of side-by-side installations.
SUBMITTED BY: SM BERMAN PHD. AND RD CLEAR PHD.
Akashi Y & Boyce PR. Energy and Buildings. 2006 A field study of illuminance reduction 38,6 588-599.
Berman SM, et al. 1990. "Photopic luminance does not always predict perceived room brightness". Lighting Research Technology, 22(1):37-41.
Berman SM, et al. 1992. "Spectral Determinants of Steady-State Pupil Size with Full Field of View": J.IES, 21(2):3-13.
Berman, S.M. and Clear, R.D. 2005, Past vision studies can support a novel human photoreceptor. CIE Midterm Meeting and international Lighting congress, Leon, Spain. (2008) Light & Engineering Vol. 16, No. 2, pp. 88-94.
Berman SM. 2008. 2008, A new retinal photoreceptor should affect lighting practice. Lighting Research and Technology 40:373.
Brainard GC, et al. 2001. Action spectrum for melatonin regulation in humans: evidence for a novel circadian photoreceptor. Journal of Neuroscience, 21:6405-6412.
Dunn-Rankin, P., Knezek, GA., Wallace S., Zhang S., 2004, Scaling Methods, 2nd Ed. Mahway, NJ; Lawrence Erlbaum Associates, Inc.
Fotios SA. 2002., Experimental conditions to examine the relationship between lamp colour properties and apparent brightness. Light. Res. Technol., 1; 34:29-38.
Fotios SA, Houser KW. 2008. Measuring lamp SPD effect on the perception of interior spaces: frequently this is misleading. Balkan Light. R 10:69-78.
Gamlin PD, et al. 2007., Human and macaque pupil responses driven by melanopsin containing retinal ganglion cells. Vision Research, 47:946-954.
Harrington RE. 1954. The effects of color temperature on apparent brightness J. Optical Soc. Am. 44:113-116.
He, S. et-al. 2003, Seeing more clearly: recent advances in understanding retinal circuitry. Science, 302,408-411,17oct.
Houser KW, Tiller DK, Hu X. 2004. Tuning the fluorescent spectrum for the trichromatic visual response: a pilot study. Leukos. 1(1):7-23.
Hu X, Houser KW, Tiller DK. 2006. Higher color temperature lamps may not appear brighter. Leukos, 3(1):69-81.
Hunt, RWG, (1991) Measuring Colour, 2nd edition, Ellis Harwood, Chichester, UK. See pages 211-212 for applications of the Ware & Cowan studies.
Thapan K, Arendt J, & Skene DJ. 2001. An action spectrum for melatonin suppression: evidence for a novel non-rod, non-cone photoreceptor system in humans. Journal of Physiology, 535, 261-267.Weale, R. A. 1951. Hue-discrimination in para-central parts of the human retina measured at different luminance levels. J. Physiol. 113:115-122.
Weale RA. 1951. The foveal and para-central spectral sensitivities in man. J. Physiol. 114: 435-446.
Weale RA. 1953. Spectral sensitivity and wave-length discrimination of the peripheral retina J. Physiol. 19:170-190.
Wyszecki G. & Stiles W.S. Color Science. 1982, J Wiley & Sons NY Second Edition.