# The use of probability limits for process control based on geometric distribution.

Introduction

The traditional c-charts and u-charts for attributes are based on the assumption that the underlying distribution for the process being monitored is Poisson distribution. This assumption is seldom verified in the use of this type of charts and in case when this is done, few alternatives have been studied although there are general methodologies presented in textbooks, such as Oakland, Montgomery and Ryan where basic control chart theory are introduced. In a recent article, Kaminsky et al. showed that if the process is best described by a geometric distributed events, wrong management decisions are commonly made if the traditional c-charts are used. Independently, Glushkovsky, Goh and Calvin had earlier made the use of geometric distribution for process control of near zero-defects processes.

In line with the traditional idea, Kaminsky et al. proposed new control limits for geometric distribution when it is a better description of the count data. The control limits are based on three times the standard deviation assuming the underlying distribution to be geometric. Although this is an improvement of the traditional c-chart when the underlying distribution is geometric, there are some serious problems. For example, the probability of false alarm will be higher than expected and the lower control limits will often be meaningless.

In this paper, the use of probability limits instead of the traditional limits based on the mean plus and minus three times the standard derivation is studied. The traditional k-sigma type limits and some of the problems with this approach will be first discussed. Probability limits are derived and some qualitative comparison is made. Then, some quantitative results that demonstrate the usefulness of probability limits will be presented together with some numerical examples. Finally, the use of geometric distribution in process control is discussed. It is pointed out that it is of great use in controlling processes in a near zero-defects environment, which is supported by some other related studies.

Control limits based on geometric distribution

If the observation X is geometric distributed, then

x P[X = x] = p[(1 - p).sup.x], for x = 0,1,2,... . (1)

If the subgroup size is n, then the total number of counts in the subgroup, Z = [X.sub.1] + [X.sub.2] + ... +[X.sub.n], has a negative binomial distribution, or Pascal distribution, that is,

[Mathematical Expression Omitted]. (2)

Assuming that p is known and n is fixed, control limits can easily be obtained based on these results. In case of unknown p, it can easily be estimated and this problem will not be discussed here.

The k-sigma type control limits

The expected value and the variance of the total number of counts Z in a subgroup of size n are given by

E[Z] = n(1 - p)/p, Var[Z] = n(1 - p)/[p.sup.2]. (3)

Based on the conventional idea of k-sigma, that is, the control limits computed as the k times the standard derivation, the following control limits are obtained

[UCL.sub.k] = n(1 - p)/p + k [square root of n(1 - p)/[p.sup.2]]. (4)

and

[LCL.sub.k] = n(1 - p)/p - k [square root of n(1 - p)/[p.sup.2]]. (5)

with the central limit being set to be equal to the mean, that is,

CL = n(1 - p)/p. (6)

These control limits has been shown to be much better than the traditional c-chart when the underlying distribution is geometric. Study using both simulated data and real data is presented in Kaminsky et al.. Hence, the proposed control limits improve the traditional ones in this case. However, problems that can be faced when using this type of control charts will be discussed in the following.

First of all, because the k-sigma idea is based on normal approximation, the sample size has to be large for the probability of false alarm to be equal to the case of the traditional Shewhart chart. More correctly, it is well-known that for a negative binomial distribution to be approximated by a normal distribution, np has to be large. When p is small, the sample size n has to be very large.

The problem with the LCL is the most serious one. As in the case of traditional c-chart, the LCL in (5) can easily be negative. If this happens, then no process improvement can ever be detected without using more complicated run rules, if the number of nonconformities in a product is being measured. However, by looking at the expression (5), it can be concluded that LCL can almost never be greater than 0. That is why no control charts presented in the paper by Kaminsky et al. can detect process improvement.

This can be shown as follows. In order for LCL to be greater than zero, the following inequality has to be valid

n(1 - p)/[p.sup.2] [greater than] k [square root of n(1 - p)/[p.sup.2]]. (7)

The above inequality can be simplified to be

n(1 - p) [greater than] [k.sup.2] (8)

or

P [less than] 1 - [k.sup.2] /n. (9)

Suppose that k = 3 and n = 5, which are standard values commonly used, then p[less than]-0.8, and this can never happen. Also, for continuous inspection, n = 1, so that even with k = 1, p[less than]0 which can never be valid. Hence, it can be concluded that using the LCL as in (5) is not practical and misleading.

The exact probability limits

The use of exact probability limits is a known concept and it has been mentioned in many articles and textbooks discussing charts for attributes and variables, see e.g. Calvin and Wetherill and Brown. This is also in line with the statistical process control ideas originally proposed by Shewhart Their use is more important in the case of geometrically distributed quantities because of reasons discussed previously.

It is straightforward to derive the exact probability limits. If the acceptable risk of false alarm is [Alpha], then the [UCL.sub.[Alpha]] and [LCL.sub.[Alpha]] are given as the solution of

[Mathematical Expression Omitted]. (10)

and

[Mathematical Expression Omitted]. (11)

Although these equations have to be solved numerically, they can be done using standard computer programs. Some discussion of the solution of the above equations will be found in later. Below, a simple case when n = 1, which is common when we have continuous inspection or the so-called one at a time sample will be discussed.

If n = 1, which is a common case when, for example, the inspection is carried out automatically, the control limits can easily be derived by using

[Mathematical Expression Omitted] (12)

and the [LCL.sub.[Alpha]] and [UCL.sub.[Alpha]] can be given by

[LCL.sub.[Alpha]] = ln(1 - [Alpha]/2)/ln(1 - p) (13)

and

[UCL.sub.[Alpha]] = ln([Alpha]/2)/ln(1 - p) (14)

respectively.

Because in the case n = 1, negative binomial distribution reduces to geometric distribution, these control limits are the same as those found in Calvin, Goh and Bourke. In these papers, a type of control chart based on the cumulative count of conforming items is studied. It is very effective when the fraction nonconforming becomes very low and a large number of zero-defect samples are counted. Related study based on the similar idea can be found in Xie and Goh and Glushkovsky. It is well-known that the number of cumulative conforming items between two nonconforming ones follows geometric distribution under very general assumptions.

It can also be pointed out here that by using the exact probability limits, the requirement for lower control limit to be greater than zero is

P[Z = 0] = [p.sup.n] [less than] [Alpha]/2. (15)

This is easily satisfied in practice. For example, when a is 0.0027 and n = 5, p has to be greater than 0.267 for [LCL.sub.[Alpha]] to fall below zero. For p[less than]0.267, process changes can be detected even with simple one-point out of control rule.

Some remarks

Some qualitative comparison is in order here. First, the probability limits are more suitable when the normal approximation is not valid. This is to resolve some of the problems with the traditional k-sigma type control limits, see e.g. Xie and Goh. It has been noted in this paper that this is a particularly a problem when geometric distribution is used.

One drawback of the probability limits in general is that the control limits will not be symmetric. Sometime it may be useful to have symmetric control limits, for example, to check whether the limits are drawn correctly. However, by computerization, this will no longer be a problem. Furthermore, because of the problem with the [LCL.sub.k], k-sigma type control limits are not symmetric either.

It can be noted that negative binomial distribution is related to ordinary binomial distribution as follows. If Z is negative binomial with parameter n and p, then

P[Z [less than or equal to] z] = P[Y [greater than or equal to] n]

where Y is a binomial variable with parameters n + z and p, see e.g. Johnson and Kotz. Hence, binomial table can be used to determine the negative binomial probability.

Usually, [LCL.sub.[Alpha]] to be determined by (10) is a small integer and hence, only a few terms in the summation part and it can be calculated easily. However, for (11), the problem can be serious. Although (11) can be written as

[Mathematical Expression Omitted] (17)

the calculation is usually difficult because [UCL.sub.[Alpha]] is usually a large number and numerical algorithm has to be used.

Some quantitative comparisons between different control limits

In this section, the probability limits presented in the previous section are compared with the k-sigma limits introduced in Kaminsky et al.. Some general discussion was made in later. Here some quantitative results will be presented.

First, a table is given showing the exact probability limits compared with the 3-sigma limits in Table I. The probability limits are obtained by using (11) and (12) with [Alpha] = 0.0027, a standard false alarm probability level. As expected, there is a great difference between the upper control limits obtained. Because the negative binomial distribution is highly skewed, the [UCL.sub.[Alpha]] is larger than the 3-sigma one. However, it should be noted that there is an [LCL.sub.[Alpha]] for p[less than]0.267 while the 3-sigma lower control limit does not exist in this case.
```Table I. Probability Limits [LCL.sub.[Alpha]] and [UCL.sub.[Alpha]]
for subgroup size 5 with [Alpha] = 0.0027 and the comparison with
the 3-sigma UCL

p              [LCL.sub.[Alpha]]    [UCL.sub.[Alpha]]    [UCL.sub.k]

0.01                    76                1430                1163
0.02                    37                 710                 578
0.03                    24                 470                 382
0.04                    17                 350                 285
0.05                    13                 278                 226
0.06                    10                 230                 187
0.08                     7                 170                 138
0.10                     5                 134                 109
0.12                     4                 110                  90
0.15                     3                  86                  70
0.18                     2                  70                  57
0.20                     1                  62                  50
0.25                     1                  47                  39
0.30                     0                  37                  31

Note:

In all cases, the 3-sigma LCLs are less than zero
```

Based on the general discussion presented previously, it is expected that the probability limits have a probability of false alarms very near the originally desired level. For most of the cases we have encountered, the probability of false alarm when the probability limits are used is around 0.002-0.003, which is as expected. However, when 3-sigma limit is used, the probability of false alarm is about 0.009-0.010 which is 3-5 times larger.

Finally, a chart has to be sensitive for reasonably amount of shift in the process characteristics. Table II compares the average run length for the case of p = 0.2, an example used in Kaminsky et al.. It should be pointed out that the 3-sigma limits, although will be able to detect a decrease in p reasonably well, it fails to detect an increase in p. This is because of the lower control limit which is less than zero.

It can be noted from Table II that the average run length is strictly increasing when the 3-sigma limit is used. This means that when the process is changed from the target in a sense that p is increased, there will be no alarm unless some run rules, such as those described in Nelson, are used. The probability limits, however, can detect any process change quickly. The reason that in Table II, the ARL does not reach the maximum for p = 0.2 is because of the rounding error of the lower control limit.

Some numerical examples

In this section, some numerical examples are shown to illustrate the use of the control limits and the use of geometric distribution in general. First, the control limits for some cases originally presented in Kaminsky et al. are compared and a simulated data set is then used.
```Table II. Average run length comparison for p = 0.2

p                     [ARL.sub.[Alpha]]                 [ARL.sub.k]

0.10                       5.329                            2.897
0.12                       11.97                            5.127
0.14                       29.95                            9.962
0.16                       81.84                            21.00
0.18                       235.9                            47.64
0.20                       635.7                            115.6
0.22                        1141                            298.6
0.24                        1119                            819.1
0.26                       824.8                             2381
0.28                       579.1                             7326
0.30                       411.4                            23855
0.32                       298.0                            82225
0.34                       220.1                           300240
0.36                       165.4                 [greater than]106
0.38                       126.2                [greater than]106
0.40                       97.67                [greater than]106
```

Kaminsky et al. presented a few examples where geometric distribution is more suitable than Poisson distribution in describing the process characteristic. In one example when studying the number of orders on each truck in a central distribution centre, p = 0.025 and UCL = 465.34 while LCL is less than zero. The exact probability limits are 29 and 567, respectively. Hence, if an observation falls below 29, the process is most likely to have changed.

For a data set provided in Kaminsky et al., see Table III p = 0.2 and the probability limits can be obtained from Table I as 62 and 0. For the sake of simplicity, data in Table III are grouped and subtracted by 5 because of the known shift of one in the original data [ILLUSTRATION FOR FIGURE 1 OMITTED]. The probability upper control limit is compared with the 3-sigma upper control limit of 50. Note again that the 3-sigma lower control limit is less than zero. Because p = 0.02, according to Table I, one should be the lower control limits and any count of zero should be deemed to be caused by a change in the process characteristic. The upper control limit is 62 compared with the one in the original paper, 51. However, for this small data set, all points are within the control limits.

To illustrate the use of probability limits. A set of simulated data is generated based on n = 5 and p = 0.1. The probability limits are 4 and 134 respectively, [TABULAR DATA FOR TABLE III OMITTED] while the 3-sigma upper control limit is 109. The data are given in Table IV and the control chart is given in Figure 2.

For the data set in Table IV, there is one point which falls outside the 3-sigma limit. However, all points are within the probability limits. This case is in line with the result in a previous section indicating that the probability of false alarm is around i per cent instead of 0.27 per cent associated with 3-sigma originally.
```Table IV. A set of negative binomial data as the sum of five
geometric distributed counts simulated with p = 0.1

71        22       88     118    27    37     47     43     39    45
30       105       33     102    49    31     15     38     18    65
61        59       30      73    39    69     34     55     29    69
99        43       38      56    38    28     16     14    106    62
61        24       48      24    48    39     58     20     46    29
46        30       39      62    77    31     43     36     19    22
45        35       20      63    43    37     45     36     68    56
90        14       73      65    50    27     23     60     27    43
36        77       28      81    50    35     67     49     47    41
24        28       28      58    36    61     31     29     62    85
```

The use of geometric distribution

The geometric-type control charts is useful when the underlying distribution of process is not Poisson and geometric distribution is a better description of the underlying process. This has been pointed out in  and . There are many practical situations for which this is the case.

Also, when the process is of very high yield, a large number of conforming samples are observed between two nonconforming ones and the traditional chart for attributes is not practical and, especially, further improvement cannot be detected. In this case, control chart can be developed for the cumulative count of conforming items between two nonconforming ones. The geometric control chart can be used as the distribution of cumulative count of conforming items follows geometric distribution, see for example Goh.

Note that initially it is assumed that the underlying distribution of process is geometric. However, the methodology applies for the case of negative binomial distribution. This distribution is a useful one and it has been used as early as in Greenwood and Yule when it is derived as the distribution as a consequence of certain simple assumptions in accident proneness models. Furthermore, it can be obtained as a certain mixture of Poisson distribution (see for example Johnson and Kotz). It is also well-known that the sum of negative binomial variables with the same parameters is also negative binomial. Hence, the probability limits can be obtained in a similar manner.

In practice, when the conventional Poisson distribution is not suitable, geometric distribution may provide better description of the process. It is also well-known that negative binomial distribution has a mean that is smaller than the variance. If this is the case, geometric distribution is can be an alternative. Note that for Poisson distribution, it is assumed that the mean is the same as the variance, an assumption that is questionable when the mean is very low.

Concluding remarks

Geometric distribution is a common distribution in practice although control charts based on such a distribution have not been widely studied. When the Poisson distribution is not suitable for a particular process, geometric distribution can be a good alternative. It is important to be aware of potential problems with traditional control charts and use different alternatives when necessary.

Besides the fact that many types of measurement data follow geometric distribution, it is a distribution for the monitoring of high-yield processes based on the cumulative count of conforming items. When the process is improved, it is easy to switch to the control chart for a cumulative count of conforming items and this measurement follows geometric distribution. A control chart can easily be set up without adding more resources.

It is noted in this paper that for geometric distribution, the control limits based on k times standard deviation which has been used previously will cause a frequent false alarm and cannot provide any reasonable lower control limits for further process improvement detection without introducing complicated run rules. This problem, although pointed out in many standard statistical process control text for the Poisson case, is very serious for geometric control chart and the k-sigma type control limits should never be used in this case, a conclusion supported by this study.

References

1. Oakland, J.S., Statistical Process Control, Heinemann, London, 1986.

2. Montgomery, D.C., Introduction to Statistical Quality Control, Wiley, New York, NY, 1991.

3. Ryan, T.P., Statistical Methods for Quality Improvement, Wiley, New York, NY, 1989.

4. Kaminsky, F.C., Benneyan, J.C., Davis, R.D. and Burke, R.J., "Statistical control charts based on a geometric distribution", Journal of Quality Technology, Vol. 24 No. 2,1991, pp. 69-75.

5. Glushkovsky, E.A., "'On-line' G-control chart for attribute data", Quality and Reliability Engineering International, Vol. 10 No. 4, 1994, pp. 217-27.

6. Goh, T.N., "A control chart for very high yield processes", Quality Assurance, Vol. 13 No. 1, 1987, pp.18-22.

7. Calvin, T.W., "Quality control techniques for 'zero-defects'", IEEE Trans on Components, Hybrids, and Manufacturing Technology, CHMT-6, 1983, pp. 323-28.

8. Wetherill, G.B. and Brown, D.W., Statistical Process Control - Theory and Practice, Chapman & Hall, London, 1991.

9. Shewhart, W.A., Economic Control of Quality of Manufactured Product, Van Nostrand, New York, NY, 1931.

10. Bourke, P.D., "Detecting a shift in fraction nonconforming using run-length control charts with 100% inspection", Journal of Quality Technology, Vol. 23 No. 2, 1991, pp. 225-38.

11. Xie, M. and Goh, T.N., "Improvement detection by control charts for high yield processes", International Journal of Quality & Reliability Management, Vol. 10 No. 7, 1993, pp. 24-31.

12. Xie, M. and Goh, T.N., "Some procedures for decision making in controlling high yield processes", Quality and Reliability Engineering International, Vol. 8, 1992, pp. 355-60.

13. Nelson, L.S., "The Shewhart control chart - tests for special causes", Journal of Quality Technology, Vol. 16 No. 4, 1984, pp. 237-39.

14. Goh, T.N., "A charting technique for low-defective production", International Journal of Quality & Reliability Management, Vol. 4 No. 1, 1987, pp. 18-22.

15. Greenwood, M. and Yule, G.U., "An enquiry into the nature of frequency distribution of multiple happenings", Journal of the Royal Statistical Society, Series A, Vol. 83, 1920, pp. 255-79.

16. Johnson, N.L. and Kotz, S., Discrete Distributions, Wiley, New York, NY, 1969.
COPYRIGHT 1997 Emerald Group Publishing, Ltd.
No portion of this article can be reproduced without the express written permission from the copyright holder.