# Weighted Lin-Wang tests for crossing hazards.

1. Introduction

Lin and Wang  have recently introduced an ingenious modification of the two-sample logrank statistic, appropriate for crossing hazards alternatives. Through a simulation study, they demonstrated that their modified test had greater power than the commonly used logrank and Wilcoxon tests for detecting differences between crossing survival curves. In this note, we propose weighted versions of the Lin-Wang (LW) test and investigate the performance of these weighted tests in a limited simulation study. Details are given in Section 2, and the simulation results are presented in Section 3. We give an example in Section 4 and conclude remarks in Section 5.

2. Methods

For consistency, we adhere to the notational conventions introduced by Lin and Wang . We have survival data from two groups of subjects, the groups being labeled I and II, and are interested in comparing the survival distributions of the two groups. Events (failures or deaths) are observed at r distinct time points [t.sub.1] < ... < [t.sub.r] across the pooled groups. At time [t.sub.j], the number of observed failures in each of the two groups is denoted by [d.sub.1j] for Group I and [d.sub.2j] for Group II, and the numbers at risk just before time [t.sub.j] are denoted by [n.sub.1j] and [n.sub.2j], respectively, for j = 1, 2, ..., r. Consequently, at time [t.sub.j], there are dj = [d.sub.1j] + [d.sub.2j] failures out of [n.sub.j] = [n.sub.1j] + [n.sub.2j] subjects. Subjects maybe censored during or at the end of the period of observation. A representative 2x2 contingency table of group by status at observed failure time [t.sub.j] is given in Table 1.

We are interested in assessing the null hypothesis

[H.sub.0]: the survival distributions of the two groups are[ identical versus the global alternative hypothesis.

[H.sub.1]: the survival distributions of the two groups are not identical.

Lin and Wang introduced the quadratic statistic

[DELTA] = [r.summation over (j=1)] [[[d.sub.1j] - E([d.sub.1j])].sup.2] (1)

for comparison of the two groups: they argued that A reflects the quadratic distance between the two underlying survival distributions hence should be sensitive to differences in either direction. They therefore based inference relating to [H.sub.0] on the standardized version of [DELTA], which they denoted as [T.sup.*]

Let us define a weighted version of [DELTA] as

[[DELTA].sub.w] = [r.summation over (j=1)] [w.sub.j] x [[[d.sub.1j] - E([d.sub.1j])].sup.2] (2)

with arbitrary weights [w.sub.j] usually nonnegative. Our test statistic for assessing [H.sub.0] is the standardized version of [[DELTA].sub.w]; namely,

[T.sub.w] = [[DELTA].sub.w] - E([[DELTA].sub.w])/[square root of (Var([[DELTA].sub.w]))], (3)

where E([[DELTA].sub.w]) and Var([[DELTA].sub.W]) are calculated from the marginal hypergeometric distribution of the [d.sub.1j]. In particular,

E([[DELTA].sub.w]) = [r.summation over (j=1)] [[n.sub.1j][n.sub.2j][d.sub.j]([n.sub.j] - [d.sub.j])/[n.sup.2.sub.j]([n.sub.j] - 1)], (4)

and Var([[DELTA].sub.W]) is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (5)

The raw moments of [d.sub.ij] can be readily calculated from the following expression for the factorial moments:

E([d.sup.(r).sub.1j]) = [n.sup.(r).sub.1j][d.sup.(r).sub.j]/[n.sup.(r).sub.j], (6)

where [n.sup.(r)] = [n.sup.*](n - 1) * ... * (n - r + 1). For reference,

E([d.sub.1j]) = [[n.sub.1j][d.sub.j]/[n.sub.j]], (7)

Var([d.sub.1j]) = [n.sub.1j][n.sub.2j][d.sub.j]([n.sub.j] - [d.sub.j])/[n.sup.2.sub.j]([n.sub.j] - 1), (8)

E([d.sup.2.sub.1j]) = Var ([d.sub.1j]) + [[E([d.sub.1j]).sup.2], (9)

E([d.sup.3.sub.1j]) = 3E([d.sup.2.sub.1j]) - 2E ([d.sub.1j]) + [[n.sup.(3).sub.1j][d.sup.(3).sub.j]/[n.sup.(3).sub.j]]], (10)

E([d.sup.3.sub.1j]) = 6E([d.sup.3.sub.1j]) - 11E ([d.sup.2.sub.1j]) + 6E ([d.sub.1j]) + [[n.sup.(4).sub.1j][d.sup.(4).sub.j]/[n.sup.(4).sub.j]]]. (11)

We note in passing that there are typographical errors in the expressions for E([d.sup.3.sub.ij]) and E([d.sup.4.sub.ij]) in Lin and Wang .

Under the same assumptions as enumerated by Lin and Wang ; namely, the underlying failure times are independent, the censoring distributions (if any) for group I and group II are independent of each other, and of the respective survival distributions, the total number of observed failures and the distinct number of failure times are large, and the weights are positive and bounded; then [T.sub.w] approximately follows a standard normal distribution. We are thus specifying the usual random censorship model, with further conditions to ensure approximate normality of [T.sub.w]. For assessing the null hypothesis of equality of the underlying survival distributions of the two groups, Lin and Wang propose a two-sided test statistic based on [T.sup.*], and we will follow that convention with [T.sub.w].

3. Simulation Studies

In this section, we will investigate the empirical performance of weighted versions of the LW statistic, compared to the original (unweighted) LW statistic.

3.1. Empirical Type I Error. We first investigate achieved significance levels of the LW statistic and three weighted versions. Following LW, we generated two independent random samples from the exponential distribution with mean of 4. The censoring distribution is Uniform (0,20) in each group. The number of iterations in each simulation study is 5000. The empirical Type I error is calculated as the proportion of 5000 repeated random samples in which we reject the null hypothesis at the alpha = 0.05 significance level, under the assumption that T and weighted versions [T.sub.w] have normal distributions, and two-sided tests are utilized. We report on three weighted versions of the LW statistic, delineated by different sets of weights [w.sub.j], 1 [less than or equal to] j [less than or equal to] r: (i) [w.sub.j] - [n.sub.j]; (ii) [w.sub.j] = [square root of ([N.sub.j])]; (iii) [w.sub.j] = 1/SD([d.sub.1j]), where SD([d.sub.1j]) - [square root of (Var([d.sub.1j]))]. The empirical Type I errors are given in Table 2.

In this limited simulation study the empirical Type I errors are quite close to the theoretical 0.05 value, for both the LW statistic and the weighted variants. The normal distribution seems an adequate approximation for the sample sizes investigated.

3.2. Empirical Power. Following LW, we undertook simulation studies comparing the empirical powers of the unweighted LW statistic with its weighted variants, under the three following scenarios.

Scenario 1. This scenario entails crossing survival curves. The LW specification is as follows. "In Group I the survival times follow an exponential distribution with mean of 6. In Group II the survival times follow an exponential distribution with mean of 2. However, if the survival time in Group II is greater than or equal to 1.5, then the survival time is regenerated to follow an exponential distribution with mean of 40. The censoring distribution is Uniform (0,20) in Group I and Uniform (0,100) in Group II, which result in about 24% censoring rate in Group I and 18% in Group II, respectively."

Scenario 2. In this situation, the two survival curves are initially close, then cross, and diverge. The LW description is as follows. "In Group I the survival times follow an exponential distribution with mean of 4. In Group II the survival times follow an exponential distribution with mean of 3. However, if the survival time in Group II is greater than or equal to 4, then the survival time is regenerated to follow an exponential distribution with mean of 20. Also, censoring is assumed to occur randomly across the two groups. For each subject in the two groups, an independent Uniform (0,1) random variable U is generated. In Group I, if U is less than 0.2, then the corresponding time point will be flagged as censored. Otherwise it is not censored. The censoring in Group II is created similarly but with a different rate. The censoring rate is 20% in Group I and 30% in Group II, respectively."

Scenario 3. Here, the proportional hazards assumption obtains. The LW specification is as follows. "The survival times follow an exponential distribution with means 2 and 5 in Groups I and II, respectively. The censoring mechanism is similar to that in Situation (Scenario 2), but this time with 20% censoring rate in Group I and 15% censoring rate in Group II, respectively."

The number of iterations in each simulation study is 5000. The estimated statistical power is calculated as the proportion of 5000 repeated random samples in which we reject the null hypothesis at the nominal alpha = 0.05 significance level, with two-sided test statistics. The weighted versions of the LW statistic are as above, namely, (i) [w.sub.j] = (ii) [w.sub.j] = [square root of ([n.sub.j])]; (iii) [w.sub.j] = 1/SD([d.sub.1p], where SD ([d.sub.1j]) = [square root of (Var([d.sub.1j]). Findings for the three scenarios are given in Tables 3,4, and 5, respectively.

Interestingly, none of the procedures is dominant under every scenario. We might tend to favor the LW statistic under Scenario 1, the weighted version [LW.sub.m3] under Scenario 2, and the weighted versions [LW.sub.m1] and [LW.sub.m2] under Scenario 3.

4. An Example

We will apply the various procedures to data arising from a cancer chemotherapy experiment, as explained in Koziol  and Koziol and Yuh . Briefly, sixty leukemic mice were randomly subdivided into two groups of equal size; one group (Group (a)) was treated with a new investigative chemotherapeutic agent, and the other group (Group (b)) served as controls. Survival times of the two cohorts are given in Table 6, and Kaplan-Meier survival curves for the groups are depicted in Figure 1.

Clearly, we are in crossing hazards setting, and the logrank test and the generalized Wilcoxon test are not necessarily sensitive to this type of alternative. Indeed, with these data, the logrank chi-square statistic (with 1 d.f.) is 1.36 (P = 0.24), and the generalized Wilcoxon chi-square statistic is 1.12 (P = 0.27); we would fail to reject the hypothesis of equality of survival distributions for the two cohorts with either of these tests.

On the other hand, the LW statistic and its weighted variants all point to significantly different survival experiences in the two cohorts, with P values of [10.sup.-6] or smaller. In comparison, the omnibus Kolmogorov-Smirnov, Kuiper, and Cramer-von Mises statistics introduced by Koziol and Yuh  were also indicative of significantly different survival distributions but with more modest P values of [10.sup.-3].

5. Concluding Remarks

The logrank test as described in Section 2 should be ascribed to Mantel : Mantel brilliantly intuited that the Mantel=Haenszel (MH) statistic  for assessing association across independent 2x2 tables could be applied to survival data, by constructing a 2x2 table as in Table 1 at each event (death) time then combining the resulting 2x2 tables as in the MH procedure.

Correspondingly, our incorporation of weights into the LW statistic as described in Section 2 is not new: our motivation devolves from similar introduction of weights into the Mantel formulation of the logrank statistic, by Tarone and Ware  and Leurgans  among others. And, anticipating the findings in Section 3, these investigators have shown that the weights can enjoy improved power properties over the unweighted MH statistics in various settings. We remark that calculation of the LW statistic is rather computationally intensive; but incorporation of weights should cause no additional computational difficulties. Optimal choice of weights remains an open issue, which we are currently pursuing.

The generalized Wilcoxon test and the logrank test are perhaps the best known and most commonly used procedures for the comparison of two survival distributions with observations subject to random censorship. Mantel  and others recognized, however, that these tests may not be appropriate whenever the alternative of interest is not that the one survival distribution is stochastically larger than the other but merely that the distributions are not equal. Crossing hazards are an example of nonstochastic ordering of survival distributions. For testing equality against such alternatives, Koziol  proposed a two-sample Cramer-von Mises type statistic based on the product-limit estimates of the individual survival distributions, and later Koziol and Yuh  introduced Kolmogorov-Smirnov and Kuiper as well as Cramer-von Mises statistics for the same omnibus two-sample testing problem. The LW statistic is more closely attuned to the logrank test than these omnibus procedures; and, as seen in the example, the LW statistics may be more sensitive to crossing hazards alternatives.

It should be noted that Mantel  also proposed a modification of the Mantel logrank test, appropriate for crossing hazards: Mantel suggested that one construct a "chi-squared" statistic at each event time as in Table 1, sum these individual statistics over the event times, and then treat the resulting sum as an approximate chi-square random variable with n degrees of freedom, n being the number of tables (distinct event times). We explored this statistic in simulation studies, but regrettably we cannot recommend this statistic, due to decreased power relative to the other statistics reported herein, and the tenuous assumption that a chi-square distribution for this statistic is adequate (though with larger sample sizes, a normal approximation might be invoked).

http://dx.doi.org/10.1155/2014/643457

Conflict of Interests

The authors declare that they have no competing interests.

Authors' Contributions

James A. Koziol conceived of the study. Zhenyu Jia and James A. Koziol carried out the data analysis and drafted the paper. Both authors read and approved the final paper.

Acknowledgments

This work was supported by the National Cancer Institute Early Detection Research Network (EDRN) Consortium Grant no. U01 CA152738. Many years ago, James A. Koziol was mentored by Nathan Mantel at the National Cancer Institute and will be forever indebted to him.

References

 X. Lin and H. Wang, "A new testing approach for comparing the overall homogeneity of survival curves," Biometrical Journal, vol. 46, no. 5, pp. 489-496, 2004.

 J. A. Koziol, "A two sample Cramer-von Mises test for randomly censored data," Biometrical Journal, vol. 20, no. 6, pp. 603-608, 1978.

 J. A. Koziol and Y. S. Yuh, "Omnibus two-sample test procedures with randomly censored data," Biometrical Journal, vol. 24, no. 8, pp. 743-750, 1982.

 N. Mantel, "Evaluation of survival data and two new rank order statistics arising in its consideration," Cancer Chemotherapy Reports, vol. 50, no. 3, pp. 163-170, 1966.

 N. Mantel and W. Haenszel, "Statistical aspects of the analysis of data from retrospective studies of disease," Journal of the National Cancer Institute, vol. 22, no. 4, pp. 719-748, 1959.

 R. E. Tarone and J. Ware, "On distribution free tests for equality of survival distributions," Biometrika, vol. 64, no. 1, pp. 156-160, 1977.

 S. Leurgans, "Three classes of censored data rank tests: strengths and weaknesses under censoring," Biometrika, vol. 70, no. 3, pp. 651-658, 1983.

James A. Koziol (1) and Zhenyu Jia (2,3,4)

(1) College of Health, Human Services and Science, Ashford University, San Diego, CA 92128, USA

(2) Department of Statistics, University of Akron, Akron, OH 44325, USA

(3) Department of Family and Community Medicine, Northeast Ohio Medical University, Rootstown, OH 44272, USA

(4) Guizhou Provincial Key Laboratory of Computational Nano-Material Science, Guizhou Normal College, Guiyang 550018, China

Correspondence should be addressed to Zhenyu Jia; zjia@uakron.edu

Received 2 January 2014; Accepted 17 February 2014; Published 30 March 2014

```
Table 1: Survival experience of the two groups at observed failure
time [t.sub.j].

Group   Number of           Number of           Number at risk just
failures          non-failures              before [t.sub.j]

I       [d.sub.1j]   [n.sub.1j] - [d.sub.1j]        [n.sub.1j]

II      [d.sub.1j]   [n.sub.2j] - [d.sub.2j]        [n.sub.2j]

Total                 [n.sub.j] - [d.sub.j]          [n.sub.j]

Table 2: Empirical levels of the Lin-Wang test, and three
weighted variants.

[MATHEMATICAL
EXPRESSION NOT
Sample sizes    LW     REPRODUCIBLE IN ASCII]

(20,20)        0.044           0.045
(30,30)        0.048           0.053
(40,40)        0.046           0.048
(50,50)        0.051           0.053
(60,60)        0.053           0.045
(70,70)        0.049           0.049
(80,80)        0.056           0.047
(90,90)        0.051           0.049
(100,100)      0.048           0.050

[MATHEMATICAL            [MATHEMATICAL
EXPRESSION NOT           EXPRESSION NOT
Sample sizes   REPRODUCIBLE IN ASCII]   REPRODUCIBLE IN ASCII]

(20,20)                0.044                    0.042
(30,30)                0.053                    0.048
(40,40)                0.047                    0.045
(50,50)                0.050                    0.049
(60,60)                0.051                    0.053
(70,70)                0.051                    0.047
(80,80)                0.053                    0.053
(90,90)                0.050                    0.050
(100,100)              0.049                    0.048

Notes: Sample sizes are given for group 1, followed by group 2. LW
denotes the Lin-Wang test, and[MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII]. denotes the weighted version of the LW
test, with weights [w.sub.1] as described in the text. The
underlying distributions of group 1 and group 2 were identical, as
described in the text. The empirical levels of the two- sided test
statistics were estimated from 5000 simulations, at nominal alpha
level 0.05.

Table 3: Empirical powers of the Lin-Wang test, and three
weighted
variants, under Scenario 1.

[MATHEMATICAL
EXPRESSION NOT
Sample sizes    LW     REPRODUCIBLE IN ASCII]

(20,20)        0.406           0.269
(30,30)        0.627           0.524
(40,40)        0.813           0.772
(50,50)        0.904           0.902
(60,60)        0.952           0.968
(70,70)        0.982           0.992
(80,80)        0.99            0.998

[MATHEMATICAL            [MATHEMATICAL
EXPRESSION NOT           EXPRESSION NOT
Sample sizes   REPRODUCIBLE IN ASCII]   REPRODUCIBLE IN ASCII]

(20,20)                0.331                    0.407
(30,30)                0.589                    0.602
(40,40)                 0.81                    0.774
(50,50)                0.912                     0.87
(60,60)                0.965                    0.922
(70,70)                 0.99                    0.965
(80,80)                0.996                    0.981

Notes: Sample sizes are given for group 1, followed by group 2. LW
denotes the Lin-Wang test, and [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII]. denotes the weighted version of the LW test,
with weights [w.sub.i] as described in the text. The empirical powers
of the two-sided test statistics were estimated from 5000 simulations,
at nominal alpha level 0.05.

Table 4: Empirical powers of the Lin-Wang test, and three
weighted
variants, under Scenario 2.

[MATHEMATICAL
EXPRESSION NOT
Sample sizes    LW     REPRODUCIBLE IN ASCII]

(20,20)        0.089           0.039
(30,30)        0.157           0.046
(40, 40)       0.222           0.057
(50, 50)       0.314           0.077
(60, 60)       0.402           0.106
(70, 70)       0.484           0.122
(80, 80)       0.549           0.151
(90,90)        0.612           0.173
(100,100)      0.675           0.212

[MATHEMATICAL            [MATHEMATICAL
EXPRESSION NOT           EXPRESSION NOT
Sample sizes   REPRODUCIBLE IN ASCII]   REPRODUCIBLE IN ASCII]

(20,20)                0.053                    0.097
(30,30)                0.084                    0.167
(40, 40)               0.116                    0.239
(50, 50)               0.169                    0.339
(60, 60)               0.225                    0.433
(70, 70)               0.273                    0.511
(80, 80)               0.338                    0.583
(90,90)                0.379                    0.644
(100,100)              0.431                    0.700

Notes: Sample sizes are given for group 1, followed by group 2. LW
denotes the Lin-Wang test, and [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII] denotes the weighted version of the LW test,
with weights [w.sub.i] as described in the text. The empirical powers
of the two-sided test statistics were estimated from 5000 simulations,
at nominal alpha level 0.05.

Table 5: Empirical powers of the Lin-Wang test, and three weighted
variants, under Scenario 3.

[MATHEMATICAL
EXPRESSION NOT
Sample sizes    LW     REPRODUCIBLE IN ASCII]

(20, 20)       0.433           0.430
(30, 30)       0.600           0.630
(40, 40)       0.728           0.775
(50, 50)       0.822           0.878
(60, 60)       0.889           0.929
(70, 70)       0.931           0.970
(80, 80)       0.964           0.985

[MATHEMATICAL            [MATHEMATICAL
EXPRESSION NOT           EXPRESSION NOT
Sample sizes   REPRODUCIBLE IN ASCII]   REPRODUCIBLE IN ASCII]

(20, 20)               0.435                    0.387
(30, 30)               0.628                    0.532
(40, 40)               0.767                    0.647
(50, 50)               0.872                    0.753
(60, 60)               0.925                    0.827
(70, 70)               0.962                    0.884
(80, 80)               0.982                    0.933

Notes: Sample sizes are given for group 1, followed by group 2. LW
denotes the Lin-Wang test, and [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII]. denotes the weighted version of the LW test,
with weights [w.sub.i] described in the text. The empirical powers of
the two-sided test statistics were estimated from 5000 simulations, at
nominal alpha level 0.05.

Table 6: The clinical data for sixty leukemic mice which were
randomly subdivided into two groups (Group A and Group B) of
equal size. "1" indicates the censored data.

Group A                       Group B

Survival (days)   Censoring   Survival (days)   Censoring

4.7                   0            15.4             0
5.4                   0            15.4             0
7.1                   0            15.7             0
7.5                   0            16.1             1
8.1                   0            16.5             1
8.3                   1            16.6             0
8.5                   0            16.9             0
8.6                   0            17.9             0
10                    0            18.4             0
10.4                  0            18.5             0
11.1                  0            18.9             0
12.1                  1             19              0
13.8                  0            19.1             0
15                    0            19.2             0
15.1                  0            19.4             0
15.3                  0            19.7             0
17.6                  0            19.8             0
21                    0            20.4             1
22.7                  0            20.8             0
23.9                  0            20.9             1
24.1                  0            21.3             0
27.4                  0            21.4             0
31.8                  0            21.4             0
33.5                  0            21.4             1
34.9                  0            21.5             0
35.5                  1            21.7             0
35.6                  1             22              0
35.9                  0            22.2             0
37.4                  0            22.5             0
38.2                  0            23.8             0
```
Title Annotation: Printer friendly Cite/link Email Feedback Research Article Koziol, James A.; Jia, Zhenyu Computational and Mathematical Methods in Medicine Report 1USA Jan 1, 2014 3760 Automatic blastomere recognition from a single embryo image. MACT: a manageable minimization allocation system. Distributions, Theory of (Functional analysis) Equations, Quadratic Quadratic equations Simulation Simulation methods Theory of distributions Weighting (Statistics)