A discussion of binomial interval estimators on the boundary.ABSTRACT Often in biomedical research Biomedical research (or experimental medicine), in general simply known as medical research, is the basic research or applied research conducted to aid the body of knowledge in the field of medicine. , a binary outcome variable has minimal expected value Expected value The weighted average of a probability distribution. Also known as the mean value. , for example, mortality for aspirin aspirin, acetyl derivative of salicylic acid (see salicylate) that is used to lower fever, relieve pain, reduce inflammation, and thin the blood. Common conditions treated with aspirin include headache, muscle and joint pain, and the inflammation caused by rheumatic users. For moderate sample sizes, it is not uncommon to observe zero successes in these instances. Confidence interval confidence interval, n a statistical device used to determine the range within which an acceptable datum would fall. Confidence intervals are usually expressed in percentages, typically 95% or 99%. estimation, in this case, may be more informative than hypothesis testing hypothesis testing In statistics, a method for testing how accurately a mathematical model based on one set of data predicts the nature of other data sets generated by the same process. . There is an appeal to using one-sided confidence intervals for this special case; however, this practice is arguably ar·gu·a·ble adj. 1. Open to argument: an arguable question, still unresolved. 2. That can be argued plausibly; defensible in argument: three arguable points of law. inappropriate unless the decision is made a priori a priori In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience. . The commonly used Wald intervals, as presented in most elementary textbooks, are known to perform poorly, particularly when the proportions are near zero or one. Further, for zero observed successes, the Wald interval estimate is [0,0]. A method of confidence interval construction based on the score statistic statistic, n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample. statistic a numerical value calculated from a number of observations in order to summarize them. has been shown to outperform Outperform An analyst recommendation meaning a stock is expected to do slightly better than the market return. Notes: Exact definitions vary by brokerage, but in general this rating is better than neutral and worse than buy or strong buy. the Wald intervals. This paper will review binomial binomial (bī'nō`mēəl), polynomial expression (see polynomial) containing two terms, for example, x+y. The binomial theorem, or binomial formula, gives the expansion of the nth power of a binomial (x+ parameter confidence interval estimates for uncommon events. Introduction In pharmaceutical research there are numerous binary events for which the expected proportion is low. For example, the presence of a life threatening adverse effect during post-marketing testing when none have been observed during pre-marketing clinical trials. In the realm of scientific study, a binary outcome with x = 0 observed successes is an often ignored issue, and sometimes incorrectly interpreted as indicating [pi] = 0. Elementary texts typically use a normal approximation approximation /ap·prox·i·ma·tion/ (ah-prok?si-ma´shun) 1. the act or process of bringing into proximity or apposition. 2. a numerical value of limited accuracy. , inverted inverted reverse in position, direction or order. inverted L block a pattern of local filtration anesthesia commonly used in laparotomy in the ox. from a Wald statistic, to estimate [pi] under the condition min{n[pi], n(1 - [pi])} [greater than or equal to] 5. There is no guarantee this condition is met since [pi] is unknown. Brown, et al. (2001) discussed problems even when the criterion is met. Further, if p = x/n is used to estimate [pi], np = 0 for the zero success case regardless of sample size. There have been several papers written on the subject of interval estimation In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter. The most prevalent forms of interval estimation are confidence intervals (a frequentist method) and credible intervals (a for the binomial in this special case, each offering possible solutions (Wilson, 1927; Louis, 1981; Hanley and Lippman-Hand, 1983; Newcombe, 1998; Agresti and Coull, 1998). This issue is exacerbated by small sample sizes. As sample sizes increase, and thus more information is obtained from the population, observing zero successes is less likely to happen by chance. Note that P(X = 0) = (1 - [pi])[.sup.n], a decreasing function of sample size for fixed [pi], with [pi] = 0 being a notable exception. It is thus important to treat an experiment where n = 500 differently than from, say, n = 10. The polar opposite that which is conspicuously different in most important respects. See also: Opposite of this case, x = n observed successes, is also of great importance. Yet, a binomial distribution binomial distribution n. The frequency distribution of the probability of a specified number of successes in an arbitrary number of repeated independent Bernoulli trials. Also called Bernoulli distribution. can be defined by either the number of successes as well as by the number of failures. Interpretations made from confidence intervals estimated from x = 0 successes and x = 0 failures are equivalent, thus a discussion of the case where x = n successes has been omitted. In the following, we present some of the issues with interval estimators, in general, with a specific focus in case zero successes are observed. Methods The most common interval estimate of the binomial parameter, found in most textbooks, is computed by inverting the Wald statistic given by z = [p - [pi]]/[square root of ([p(1 - p)]/n)] where p = x/n is the maximum likelihood estimate (mle) of [pi]. The resulting interval estimate is p [+ or -] [Z.sub.[alpha]/2] [square root of (p(1 - p)/n)]. This interval has been shown to perform adequately outside of extreme cases; the x = 0 case being one of the two most extreme. Note that the Wald interval is only valid when p [member of] (0,1), as the interval estimate is reduced to either of the singular values {0,1} on the endpoints. Further, the variance estimate, using the in variance property of the mle p, is 0 when p [member of] {0,1}. Again, we discuss the case when p = 0 with similar indications when p = 1. For the zero case, the Wald interval fails on a couple of additional counts. It is symmetric No difference in opposing modes. It typically refers to speed. For example, in symmetric operations, it takes the same time to compress and encrypt data as it does to decompress and decrypt it. Contrast with asymmetric. (mathematics) symmetric - 1. about p = 0 and produces an interval that includes negative values, yet [pi] [member of] [0,1]. Further, the variance estimate implies P(success) = 0. Unfortunately, however, most elementary textbooks present the Wald intervals; Brown, et al. (2001) mention only one text that offers an alternative. Using an exact approach, Louis (1981) developed a confidence interval for the zero case. The probability of observing zero successes (or n failures) is P(X = 0) = (1 - [pi])[.sup.n]. A one-sided (1 - [alpha])100% confidence interval, as advocated by Louis, is constructed using the solution to [alpha] = (1 - [pi])[.sup.n], given by [0,1 - [[alpha].sup.1/n]]. For large sample sizes using [lim lim abbr. Mathematics limit .[n[right arrow][infinity]]n(1 - [[alpha].sup.1/n]) = -ln([alpha]), the interval estimate is [0,-ln([alpha])/n]. When [alpha] = 0.05, the interval is approximately [0,3/n], or the "rule of three." A similar method using a two-sided 95% confidence interval approach is [0,3.69/n], where [alpha]/2 = 0.025 is used in lieu of Instead of; in place of; in substitution of. It does not mean in addition to. [alpha] in the above formula. Hanley and Lippman-Hand (1983) mention several other extensions to the "rule of three." Perhaps a more conservative approach would warrant using a "rule of four" with 4/n representing an upper boundary. Near the boundary, one-sided confidence intervals are less conservative than two-sided intervals. The coverage probabilities in this region are highly erratic er·rat·ic adj. 1. Having no fixed or regular course; wandering. 2. Lacking consistency, regularity, or uniformity: an erratic heartbeat. 3. (Brown, et al., 2001). Therefore, the more conservative two-sided approach should be used to account for this unreliability. Further, the construction of one-sided confidence intervals should be determined before data are collected and not ad hoc For this purpose. Meaning "to this" in Latin, it refers to dealing with special situations as they occur rather than functions that are repeated on a regular basis. See ad hoc query and ad hoc mode. . The common practice of using one-sided intervals when x = 0 and two-sided intervals when x > 0 is inconsistent. Analogous to deciding a priori whether to perform a one-sided or two-sided hypothesis test, the data should not dictate the method of interval construction. An alternative to the Wald interval construction is based on inverting the score statistic, z = [p - [pi]]/[square root of ([[pi](1 - [pi])]/n)]. This interval, first discussed by Wilson (1927), is constructed in a similar manner to the Wald while using the exact variance, [pi](1 - [pi])/n, instead of the invariant (programming) invariant - A rule, such as the ordering of an ordered list or heap, that applies throughout the life of a data structure or procedure. Each change to the data structure must maintain the correctness of the invariant. estimate, p(1 - p)/n. As noted by Olivier and May (2006), the inversion inversion /in·ver·sion/ (in-ver´zhun) 1. a turning inward, inside out, or other reversal of the normal relation of a part. 2. a term used by Freud for homosexuality. 3. of the score statistic can be written as a weighted estimator (1-w)p+(w)[1/2][+ or -][Z.sub.[alpha]/2][square root of ((1-w)[[p(1-p)]/[n+[Z.sub.[alpha]/2.sup.2]]]+(w)[1/[4(n+[Z.sub.[alpha]/2.sup.2]]])] where the weight is w = [Z.sub.[alpha]/2.sup.2]/[n + [Z.sub.[alpha]/2.sup.2]]. Note that the maximum binomial variance of 1/4 occurs at [pi] = 1/2. As sample size approaches infinity, w tends to zero and the Wilson intervals converge con·verge v. con·verged, con·verg·ing, con·verg·es v.intr. 1. a. To tend toward or approach an intersecting point: lines that converge. b. to the Wald intervals. Whereas for small samples, the Wilson estimate depends more on the most conservative estimate of [pi], p = 1/2. For the case of x = 0 successes, the Wilson interval simplifies to [0,w]. For those who argue for the use of one-sided intervals, the Wilson interval is interestingly constructed as two-sided, yet it is inherently one-sided in that zero is always the lower limit. Thus the need for truncating the interval is ameliorated and there is no need to apply ad hoc methods as a special case when x = 0. If one-sided confidence intervals are decided upon a priori, one would use [Z.sub.[alpha]] in the definition of w. Regarding the coverage probabilities for the Wilson intervals, Chebyshev's inequality
In probability theory, Chebyshev's inequality (also known as Tchebysheff's inequality, Chebyshev's theorem, or the Bienaymé-Chebyshev inequality guarantees a minimum coverage of (1 - 1/[Z.sub.[alpha]/2.sup.2]) for the Wilson intervals since the population variance, not the sample variance, is used in the calculation. For 95% intervals, the minimum coverage for the Wilson intervals is about 75% without making any distributional assumptions. In fact the global minimum coverage for the binomial parameter occurs at [pi] [approximately equal of] 0.176 / n for a minimum coverage probability of at least 82.4% for the Wilson intervals (Olivier and May, 2006). There is, however, no such guarantee for the Wald intervals or its derivatives where the variance is estimated from the sample. Wilson (1927) commented that if using estimate [+ or -] 2 x s.e. (which implies the lack of a distributional assumption) is a valid alternative to a 95% confidence interval, then Wilson's point estimate simplifies to (1 - w)p + (w)[1/2] = [x + 2]/[n + 4]. Agresti and Coull (1998) inserted Wilson's point estimate into the Wald interval calling the resulting interval the "add two successes and two failures adjusted Wald interval." We refer to this interval as the Wilson/Agresti/Coull interval (WAC WAC (Women's Army Corps), U.S. army organization created (1942) during World War II to enlist women as auxiliaries for noncombatant duty in the U.S. army. Before 1943 it was known as the Women's Auxiliary Army Corps (WAAC). Its first director was Oveta Culp Hobby. ). Wilson (1927) also discusses using p* = (x + 1)/(n + 2) as a point estimate. The WAC interval estimate can be written as [x + 2]/[n + 4] [+ or -] [Z.sub.[alpha]/2] [square root of ([(x + 2)(n - x + 2)]/[(n + 4)[.sup.3]])] which is truncated truncated adjective Shortened to a one-sided interval as [0, [2/[n + 4]] + [Z.sub.[alpha]/2] [square root of ([2(n + 2)]/[(n + 4)[.sup.3]])]] for the zero success case. It is worth noting that the Wilson intervals are a proper subset A group of commands or functions that do not include all the capabilities of the original specification. Software or hardware components designed for the subset will also work with the original. of the WAC intervals (Agresti and Caffo, 2000). The Wilson intervals are based on inverting the binomial test In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. statistic and are thus theoretically justified. The WAC intervals are overly conservative, especially on the boundary. In addition, the minimum coverage probability can be guaranteed with the Wilson intervals but not with the Wald intervals. One line of reasoning Noun 1. line of reasoning - a course of reasoning aimed at demonstrating a truth or falsehood; the methodical process of logical reasoning; "I can't follow your line of reasoning" logical argument, argumentation, argument, line for using the Wald or adjusted Wald intervals seems to center on a didactic di·dac·tic adj. Of or relating to medical teaching by lectures or textbooks as distinguished from clinical demonstration with patients. argument that they are easier to teach. We note that with the case of x = 0, the Wilson intervals are simply [0, w] and a 2[sigma] interval comparable to WAC is 4/(n + 4), an alternative to the "rule of four." Examples In a recent meeting with a colleague, there was a discussion regarding two samples in which he observed zero negative effects. The sample sizes for the two studies were n = 10 and n = 359, respectively. Since larger sample sizes provide more information about the population, the more likely [pi] = 0 when x = 0 as n increases. Intuitively, then, observing zero successes is a more significant finding for n = 359 than for n = 10. As mentioned previously, a hypothesis test might not be appropriate in studies where small proportions are expected; alternatively, a confidence interval estimate of [pi] would be more informative. For the zero observed successes case, the upper limit of each interval is often interpreted as an approximate representation of the maximum value of [pi] for a specific sample size. Inferences can be made by comparing the upper limit with the largest acceptable value of [pi], or maximum allowable risk. Wilson (1927) offered the interpretation that if the true parameter is outside the interval [0, w], the probability of observing 0 successes is [alpha]. For the current example, the researcher expected a priori to find at least some negative effect, so a one-sided confidence interval would not be appropriate. The upper 95% confidence limit for each sample is presented in Table 1 for the WAC, Wilson and Hanley intervals. Note that when n = 10, both the WAC and Hanley intervals are more conservative than the Wilson interval. Yet for n = 359, all three estimates are similar. As sample size increases, all three upper endpoints converge to the same value, namely 0. Figure 1 makes note of their convergence behavior. [FIGURE 1 OMITTED] In practice, the concern is how large does a sample size need to be to claim a treatment is a complete success (or failure). From a design perspective, if the maximum allowable risk is known, an appropriate sample size can be computed from each interval. Letting [[pi].sub.a] be the maximum allowable risk, the sample size is n = [Z.sup.2](1 - [[pi].sub.a])/[[pi].sub.a] for the Wilson interval and n = 3.69/[[pi].sub.a] for the Hanley interval. The WAC interval does not solve readily for n. However, the sample size is roughly n = (4.8-4[[pi].sub.a])/[[pi].sub.a] using the large sample approximation [2/[n + 4]] + [Z.sub.[alpha]/2] [square root of ([2(n + 2)]/[(n + 4)[.sup.3]])] [approximately equal to] [4.8/[n + 4]]. Table 2 summarizes sample size calculations for the cases when the maximum allowable risk is 1 / 100 and 1 / 1000. The WAC sample size calculation is much more conservative than the other intervals (see Figure 2 for a more comprehensive perspective). Discussion Most elementary statistics texts present binomial parameter confidence intervals defined on the open interval open interval n. A set of numbers consisting of all the numbers between a pair of given numbers but not including the endpoints. open interval (0,1), although [pi] [member of] [0,1]. Procedures for estimates near the boundary are too often ignored. We have presented three alternatives to the Wald intervals. One-sided intervals are inappropriate a priori and should thus be avoided. The WAC interval is more conservative than the Wilson interval (often unnecessarily). The Hanley and Wilson intervals perform similarly; however, the Wilson estimate is the only inherently one-sided interval and thus the need for truncating the lower limit ad hoc is ameliorated. Interestingly, Agresti and Coull (1998) advocate the use of the Wilson intervals as a didactic tool, as well as Santner (1998). When estimating the binomial parameter on the boundary, we also recommend using the Wilson interval. References Agresti, A., Caffo, B. 2000. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Amer. Stat. 54:280-288. Agresti, A., Coull B.A. 1998. Approximate is better than "exact" for interval estimation of binomial proportions. Amer. Stat. 52:119-126. Brown, L., Cai, T., DasGupta A. 2001. Interval estimation for a binomial proportion. Stat. Sci. 16:101-117. Hanley, J.A., Lippman-Hand A. 1983. If nothing goes wrong, is everything all right? J. Amer. Stat. Assoc. 249:1743-1745. Louis, T.A. 1981. Confidence intervals for a binomial parameter after observing no successes. Amer. Stat. 35:154. Newcombe, R. 1998. Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat. Med. 17:857-872. Olivier, J., May, W.L. 2006. Weighted confidence interval construction for binomial parameters. Stat. Meth. Med. Res. 15:37-46. Santner, T.J. 1998. Teaching large-sample binomial confidence intervals. Teach. Stat. 20:20-23. Wilson, E.B. 1927. Probable inference (logic) inference - The logical process by which new facts are derived from known facts by the application of inference rules. See also symbolic inference, type inference. , the law of succession, and statistical inference Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population. It is distinguished from descriptive statistics. . J. Amer. Stat. Assoc. 22:209-212. Jake Olivier and Warren L. May Division of Biostatistics biostatistics /bio·sta·tis·tics/ (-stah-tis´tiks) biometry. bi·o·sta·tis·tics n. The science of statistics applied to the analysis of biological or medical data. Department of Preventive Medicine preventive medicine, branch of medicine dealing with the prevention of disease and the maintenance of good health practices. Until recently preventive medicine was largely the domain of the U.S. University of Mississippi Medical Center University of Mississippi Medical Center (UMC) is the health sciences campus of the University of Mississippi (Ole Miss). Located in Jackson, Mississippi (USA), it houses the Schools of Medicine, Dentistry, Nursing, Health Related Professions, and Graduate Studies in the Health 2500 North State Street Jackson, MS 39216 Corresponding Author: Jake Olivier jolivier@prevmed.umsmed.edu
Table 1. Comparison of the 95% interval upper limits when observing x =
0 success among n = 10 and n = 359 trials.
n =
Formula n = 10 359
Wald [0,0] 0 0
WAC [2/[n + 4]] + [Z.sub.[alpha]/2] 0.326 0.013
[square root of ([2(n + 2)]/[(n + 4)[.sup.3]])]
Wilson w 0.277 0.011
Hanley 3.69/n 0.369 0.010
Table 2. Comparison of sample size estimates for maximum allowable risk.
[[pi].sub.a] = [[pi].sub.a] =
Formula 0.01 0.001
Wald none
WAC (4.8 - 4[[pi].sub.a])/ 476 4796
[[pi].sub.a]
Wilson [Z.sup.2](1 - [[pi].sub.a])/ 381 3836
[[pi].sub.a]
Hanley 3.69/[[pi].sub.a] 369 3690
|
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion