A non-parametric method for experimental analysis with censored data.
Engineers usually determine feasible factor/level settings by performing experiments to enhance the product quality. Occasionally, however, only part of an experiment can be completed owing to some uncontrollable causes such as damage to the instrument, power failure during the experiment, and time and cost limitations. Under such circumstances, experimental results are composed of "complete" and "incomplete" data. Incomplete data are referred to as censored data. Such data often arise when the response variable is time to failure, e.g. accelerated life testing. The censored data contain less information than complete data and also make analysis more difficult to perform.
Conventional approaches for analysing censored data are computationally complicated and often difficult to explain to practitioners. Moreover, alternative methods, e.g. maximum likelihood estimation (MLE) and Taguchi's minute accumulating analysis (MAA), also have their limitations, as pointed out in the next section. In this work, an effective procedure is developed for analyzing censored data in an experiment.
Non-parametric methods, which do not require a previous knowledge of how the variables are distributed, are more easily understood and implemented than other approaches. The non-parametric method and regression analysis are used to analyse multi-factor and multi-level experimental results involving censored data. The proposed procedure not only considers the variability of the control factors but can concurrently perform censored data analysis for replicated and unreplicated experiments.
A number of different types of censored data can be found in Nelson and Hahn. In this work, only the singly censored data, in which the values of the observations in one of the distribution tails are not known, are discussed. Moreover, only the first-order interaction effect is considered since the second-order or higher-order interaction effect can generally be ignored in industry. The proposed procedure is described in section 3. In section 4, the proposed procedure is used to analyze a simple problem with the data censored to the right. Concluding remarks are finally provided in section 5.
2. Related work
Nelson and Hahn[1,2] applied linear estimation methods for regression analysis using the ordered observations of censored data. Hahn and Nelson reviewed graphical, maximum likelihood and linear estimation methods for analysing censored life data to estimate relationships between stress and product life. The advantages and disadvantages of the methods are also compared to select the appropriate one. Graphical methods employ a "subjective" procedure, causing different results to vary according to different individuals. The maximum likelihood method is likely the most general approach and frequently used by statisticians. However, those having limited statistical training find it difficult to comprehend. Moreover, the MLE may not exist and the computational cost may be high since many possible models must be fitted. The linear estimation methods are computationally simpler than the MLE methods; however, they are only applicable for Type II censoring. Krall, Utoff and Harley proposed a forward-selection procedure for selecting the most important variables associated with survival by using MLE. The amount of computation in their method is less than the original MLE method. However, their procedure still requires much computational effort.
Schmee and Hahn suggested using iterative least squares (ILS) as a simple method for analysing censored data. An initial least squares fit is obtained, in which the censored values are treated as failures. Next, the initial fit is used to estimate the expected failure time for each censored observation. These estimates are then used to obtain a revised least squares fit and new expected failure times are estimated for the censored values. This procedure is iterated until convergence is achieved. However, ILS treats the censored observations as if they were uncensored, thereby leading to biased estimates of the regression line. Hahn, Morgan and Schmee applied the iterative least squares approach to analyse the results of a fractional factorial experiment involving censoring to the left. The major limitation is that the final model is significantly influenced by the initial model selection. Also, this approach does not consider the interaction and variability of variables.
Taguchi developed a minute accumulating analysis (MAA) method for interval-censored data. The data are represented by 0 and 1. In each cycle, if the individual test piece is alive, it is expressed as 1; if it is dead, it is expressed as 0. This generated binary data can be treated as if they came from a split-plot experiment. The main-plot factors are control factors studied in the experiment; the sub-plot factor is the time factor created in the binary data. However, Taguchi treated censoring times as actual failure times which may lead to serious deficiencies because the unobserved failure and censoring times may differ greatly.
Hamada and Wu proposed an iterative procedure for analysing censored data from highly fractionated experiments. The data are first transformed to achieve near normality. Standard methods are then used to select a tentative model based on the combined complete and imputed censored data. Next, the current model is fitted, and then the censored data are input again. This cycle continues until the selected model stops changing. Several models may be identified and diagnostic checking can be performed to assess their adequacy. Finally, the optimal factor/level combination can be determined. A major limitation of their method is that it relies on the existence of the MLEs. Another drawback is that it does not consider the variability of control factors.
Torres presented a method based on the rank transformation of the responses to deal with the analysis of unreplicated factorial experiments with possible abnormalities. The ranks of observations, when computed, are analysed as if the ranks were the original observations. The normal plot of the effects of the ranked observations is used to determine the significant factors. This procedure is easy to use with a general statistical package. However, it only can be used to analyse an unreplicated experiment with complete data.
3. Proposed procedure
In this work, the non-parametric technique is used to reduce the complexity of analysing data. Ranks of the observations is the conventional approach in the area of non-parametric statistics. Using ranks for the analysis of experimental designs has many advantages. First, the ranks can be found more easily than the expected responses for the censored data. Also, the ranks can simplify the data analysis procedure. Next, the distribution of the ranks remains the same as the distribution of the original data set, regardless of the distribution they belong to. Moreover, using ranks is a robust method for analysing a problem in which little is known about the data distribution. Accordingly, the method based on the rank transformation of the responses is used in this work. Those factors having significant effects on response average and standard deviation are identified for analysing replicated experiments. Only those factors having significant effects on response average are identified for analysing unreplicated experiments. The proposed procedure for analysing singly censored data in a replicated experiment is described in the following:
Step 1: Distinguish the experimental results as the uncensored (complete) data and the censored (incomplete) data, which are denoted by Yu and [Y.sub.C], respectively.
Let N be the sample size, i.e. there are assumed to be N responses. If the censored point is C and there are n uncensored data, we have [Y.sub.U] = [[[y.sub.1], [y.sub.2], ..., [y.sub.n]].sup.T] and [Y.sub.C] = [[[y.sub.n+1], [y.sub.n+2], ..., [y.sub.N]].sup.T], where [y.sub.i] denotes the observed value. The uncensored data are ranked in order of increasing magnitude and denoted by [R.sub.U] = [[[r.sub.1], [r.sub.2], ..., [r.sub.n]].sup.T], where [r.sub.i] is the rank of [y.sub.i]. If several observed values are tied, assign to each the average of the corresponding ranks.
Step 2: Find the relationship between [Y.sub.U] and [Z.sub.U] by using the regression analysis:
[Mathematical Expression Omitted] (1)
where [Z.sub.U] is the matrix of the factor level for the uncensored data and [Mathematical Expression Omitted] is the matrix of the regression coefficients.
Step 3: Find the estimate of [Y.sub.C], [Mathematical Expression Omitted].
[Mathematical Expression Omitted] is obtained by substituting [Z.sub.C] for [Z.sub.U] in equation (1), where [Z.sub.C] is the matrix of the factor level for the censored data.
Step 4: Rank the estimated censored data [Mathematical Expression Omitted].
The estimated censored data are ranked in order of increasing magnitude and denoted by [Mathematical Expression Omitted], where [r.sub.i] is the rank of the ith value in [Mathematical Expression Omitted]. For the n uncensored data in an experiment of size N, if the data are censored to the right, [r.sub.i] should be between n + 1 and N. If the data are censored to the left, [r.sub.i] should be between 1 and N - n.
To reduce the influence of the error of the estimation, the rank of the estimated censored data is used in this work instead of the estimated censored data for further analysis. Moreover, from equation (1), the same estimated response is obtained for the same input factor level. However, the probability of the same responses having occurred in an experiment is usually low. Therefore, this work provides the different and adjacent ranks for the same estimated responses to reduce variance of analysis.
Step 5: Find the regression models for response average and standard deviation for each trial.
Let [Mathematical Expression Omitted]. We first compute [R.sub.j] and [S.sub.j] by using R, where [R.sub.j] is the average of the ranks and [S.sub.j] is the standard deviation of the ranks for the jth trial, respectively. Next, the regression models for [R.sub.j] and Z, and [S.sub.j] and Z can be estimated by using the following equations:
[Mathematical Expression Omitted] (2)
[Mathematical Expression Omitted] (3)
where [Mathematical Expression Omitted] and [Mathematical Expression Omitted] are the matrices of the regression coefficients.
Step 6: Identify the factors having the significant effects on the response average and standard deviation.
The normal plots of [Mathematical Expression Omitted] and [Mathematical Expression Omitted] can be used to demonstrate evidence of significance of effects.
Step 7: Determine the optimal factor/level combination.
Selection is made of the optimal levels for the factors having significant effects on the response average and standard deviation. If they come to an agreement between the location and dispersion effects, the optimal factor/level settings are established. Otherwise, the engineering experience or the normal probabilities of [Mathematical Expression Omitted] and [Mathematical Expression Omitted] can be used to assist the judgement. If all of the above situations fail, the experimenter may have to conduct another experiment for further analysis.
The above procedure is designed for the replicated experiment. On the other hand, for the case of an unreplicated experiment with censored data, the analysis procedure is the same as that in the replicated experiment except that only the optimization of the response average must be considered.
4. Numerical example
The given data were 16 observations from an experiment with five factors at two levels assigned to an [L.sub.8], as shown in Table I. Assume that the smaller-the-better quality characteristic is desired and five of the points could not be observed. The data are censored to the right. The censored point is 67.
Table I. Data for example 1 Factors Responses [R.sub.U] Trial A B C D E 1 1 1 1 1 1 66 66 10.5 10.5 2 1 1 2 2 2 (*)(68) 63 (*) 7.5 3 1 2 1 2 2 (*)(80) (*)(88) (*) (*) 4 1 2 2 1 1 63 65 7.5 9 5 2 1 1 1 2 (*)(73) (*)(71) (*) (*) 6 2 1 2 2 1 37 42 1 4 7 2 2 1 2 1 38 39 2 3 8 2 2 2 1 2 57 48 6 5 Note: * censored data; the entries in parentheses are the values of the censored data
Step 1: From Table I, we have
[Y.sub.U] = [[66, 66, 63, 63, 65, 37, 42, 38, 39, 57, 48].sup.T]
and [R.sub.U] = [[10.5, 10.5, 7.5, 7.5, 9, 1, 4, 2, 3, 6, 5].sup.T]
Step 2: Since
[Mathematical Expression Omitted]
[Mathematical Expression Omitted]. (4)
Step 3: The estimates of the censored data are found by submitting the factor level of the censored data to equation (4). Therefore, we have the estimated responses 63, 62 and 54.5 for trials 2, 3 and 5, respectively.
Step 4: Since the given data are censored to the right and there are five censored points, the ranks of the estimated censored data should be between 12 and 16. Therefore, [Mathematical Expression Omitted].
Step 5: By using [Mathematical Expression Omitted], [R.sub.j] and [S.sub.i] (j = 1, 2, ..., 8) are computed and summarized in Table II. The regression models for [R.sub.j] and Z, and [S.sub.j] and Z are estimated as follows:
[Mathematical Expression Omitted]. (5)
[[Mu].sub.S[where]Z] = -3.0185 + 0.7315A - 1.574B + 3.074C + 1.1085D + 0.5185E + 0.537AB - 1.287AC (6)
Table II. The estimated responses and standard deviations for the ranked observations Trial 1 2 3 4 5 6 7 8 [R.sub.j] 10.5 11.75 14.5 8.25 12.5 3 2.5 5.5 [S.sub.j] 0 3.324 0.5 0.75 0.5 1.5 0.5 0.5
Step 6: The normal probabilities of the regression coefficients for equations (5) and (6) are computed in Table III. Their corresponding normal plots are shown in Figures 1 and 2, respectively. From these Figures, we can conclude that factors AB, A, B and E have significant effects on the response average and factors AC, B and C have significant effects on the standard deviation.
[TABULAR DATA FOR TABLE III OMITTED]
Step 7: From equation (5), the level of factor E should be set to "1" since the quality characteristic of the problem is the smaller-the-better. A comparison of the differences in [Mathematical Expression Omitted] for different level settings is listed in Table IV. Based on this table, the levels of factors A and B should be set at [A.sub.2] and [B.sub.2] such that [Mathematical Expression Omitted] can be minimized. Therefore, the tentative optimal factor levels for the location effect should be [A.sub.2], [B.sub.2] and [E.sub.1]. Similarly, a comparison of the differences in [Mathematical Expression Omitted] for different level settings is listed in Table V. From equation (6) and Table V, the tentative optimal factor levels for the dispersion effect can be set at [A.sub.2], [B.sub.2] and [C.sub.1]. Since the above two cases correspond to each other, we then have the optimal factor/level settings: [A.sub.2], [B.sub.2], [C.sub.1], [E.sub.1].
If 16 original observations listed in Table I are analysed by using the ANOVA approach, we have the optimal settings: [A.sub.2], [B.sub.2], [C.sub.2], [E.sub.1]. The analysis results for the two techniques are quite similar. Accordingly, the validity of the proposed procedure is established.
[TABULAR DATA FOR TABLE IV OMITTED]
[TABULAR DATA FOR TABLE V OMITTED]
5. Concluding remarks
An effective procedure based on the rank transformation of the responses and the regression analysis is proposed in this work for analysing an experiment with singly censored data. The proposed procedure is simpler than the conventional method such as maximum likelihood estimation. To ensure the effectiveness of the proposed procedure, we suggest that at least two-thirds complete data in an experiment are involved, and accordingly stable (or matured) data can be obtained for analysis. On the other hand, if the [R.sup.2] (the coefficient of determination) of the predicted regression model is too low, we recommend that the design factors be reconsidered; otherwise, the proposed procedure should be abandoned and alternative methods used. The proposed procedure, although lacking a rigorous theoretical justification, can be easily implemented in an industrial setting. Finally, a numerical example has verified the effectiveness of the proposed procedure.
1. Nelson, W. and Hahn, G. J., "Linear estimation of a regression relationship from censored data, part I - simple methods and their application", Technometrics, Vol. 14, 1972, pp. 247-69.
2. Nelson, W. and Hahn, G. J., "Linear estimation of a regression relationship from censored data, part II - best linear unbiased estimation and theory", Technometrics, Vol. 15, 1973, pp. 133-50.
3. Hahn, G.J. and Nelson, W., "A comparison of methods for analysing censored life data to estimate relationships between stress and product life", IEEE Transactions on Reliability, Vol. R-23, 1974, pp. 2-11.
4. Krall, J.M., Uthoff, V.A. and Harley, J.B., "A step-up procedure for selection variables associated with survival", Biometrics, Vol. 31, 1975, pp. 49-57.
5. Schmee, J. and Hahn, G.J., "A simple method for regression analysis with censored data", Technometrics, Vol. 21, 1979, pp. 417-34.
6. Hahn, G.J., Morgan, C.B. and Schmee, J., "The analysis of a fractional factorial experiment with censored data using iterative least squares", Technometrics, Vol. 23, 1981, pp. 33-6.
7. Taguchi, G., System of Experimental Design, UNIPUB/KRAUS International Publications, White Plains, NY, 1987.
8. Hamada, M. and Wu, C.F.J., "Analysis of censored data from highly fractionated experiments", Technometrics, Vol. 30, 1991, pp. 25-38.
9. Torres, V.A., "A simple analysis of unreplicated factorials with possible abnormalities", Journal of Quality Technology, Vol. 25, 1993, pp. 183-7.
10. Taguchi, G., Introduction to Quality Engineering, 5-day Seminar Course Manual, American Supplier Institute, Inc., 1987 (in Chinese).
|Printer friendly Cite/link Email Feedback|
|Author:||Tong, Lee-Ing; Su, Chao-Ton|
|Publication:||International Journal of Quality & Reliability Management|
|Date:||Apr 1, 1997|
|Previous Article:||Strategic quality management and financial performance indicators.|
|Next Article:||Some observations on the issues of quality cost in construction.|