Quantifying the bias associated with use of discrepant analysis.
Usually, three tests are done on specimens with discrepant results: a candidate test, comparison method, and confirmatory test. Specimens with discrepant results may, however, also be retested with more than one confirmatory test. Sequential testing may continue with two or more confirmatory tests until discrepancies are resolved, by agreement, with only one agreement needing to occur between the candidate test result and any confirmatory test to declare resolution of the discrepancy [1-4]. In cases where sequential testing is done, no additional tests are run after concordance is reached.
A specimen subjected to DA can be classified correctly in two ways and classified incorrectly in two ways. A specimen can be correctly classified if both the candidate test and the comparison method correctly classify the specimen, or there is initial disagreement between the candidate test and comparison method that is correctly resolved by the confirmatory test. Similarly, a specimen can be misclassified if both the candidate test and the comparison method misclassify the specimen, or there is initial disagreement between the candidate test and comparison method that is incorrectly resolved by the confirmatory test. In all cases, misclassification occurs whenever two testing errors occur.
Since the mid-1980s, DA has increasingly been used to estimate test parameters, particularly in infectious disease testing for organisms such as Chlamydia trachomatis, Helicobacter pylori, and Mycobacteria tuberculosis. Increased use of DA has paralleled the evolution of molecular diagnostic techniques, which are often used as confirmatory adjuncts to the traditionally used, but less sensitive, comparison method testing by organism culture.
The fundamental flaw is that DA uses circular reasoning. The purpose of DA is to estimate the ability of a test to classify a specimen as normal or abnormal, but DA also uses the same test results to actually classify the specimen as normal or abnormal. This circular reasoning leads to misclassification bias and, hence, yields biased estimates .
A simple example illustrates the potential magnitude of the bias in estimates computed with use of DA. Suppose that the candidate test and the comparison method are independent fair coin flips, with the result being called abnormal if heads occurs and normal if tails occurs, and let the confirmatory test be an independent perfect test. Assume that one-half of the specimens are actually abnormal. The results of this simple example and their associated probabilities are listed in Table 1. Obviously, the true analytical sensitivity of this coin flip test is 50%, but the expected DA estimate of the sensitivity is:
Pr [Candidate test result is + | Classification of the
specimen is + } = 0.375/0.5 = 0.75
Thus, the expected DA estimate for sensitivity is 75%, well above the true value of 50%.
A hypothetical example serves as a useful introduction to DA. Consider 100 abnormal and 200 normal specimens tested with a candidate test and a comparison method. Assume that the candidate test has 80% analytical sensitivity and 70% analytical specificity, whereas the comparison method has 95% analytical sensitivity and 80% analytical specificity. The true status of the specimens and expected test results are shown in Table 2, but because the true status of the samples would not be known, the 2 x 2 table of expected test results would distribute as shown in Table 3.
If the discrepancies in results for the 300 samples are resolved by using an independent perfect test, then the expected DA estimates for analytical sensitivity and specificity are, respectively:
Se ?? = 88 + 4 / 88 + 4 + 19 = 82.9%
Sp?? = 113 + 28 / 113 + 48 + 28 - 74.6%
Also, the true PPV of the candidate test is 57.1%, and the true NPV is 87.5%, but DA yields an expected PPV estimate of 65.7% and an expected NPV estimate of 88.1%. Thus, all of the estimates are positively biased.
The true sensitivity, specificity, PPV, and NPV of a laboratory test are probabilities and thus are parameters. These parametric values should not be confused with the estimates for these parameters, which are statistical values. When DA is used to estimate sensitivity, for example, a statistic is computed. This statistic is not an unbiased estimate for the true sensitivity. Nonetheless, it is still an unbiased estimate for its expected value. The difference between the expected value of the DA estimate and the true sensitivity is the expected bias of the DA estimate.
Similarly, the difference between the expected value of a DA estimate for any performance parameter and the true value of that parameter is the expected bias of the DA estimate for that parameter. Like all parametric values, this expected bias does not vary from estimate to estimate. In this report, we undertake to examine more fully the direction and magnitude of the expected biases in the estimates of analytical sensitivity, specificity, PPV, and NPV computed with DA by investigating how these biases are affected by the true sensitivities and specificities of all of the tests used in the DA process, along with the true proportion of abnormal specimens tested.
Materials and Methods
Let R = the proportion of abnormal specimens among the specimens tested (the prevalence rate)
X = candidate test results Y = comparison method results Z = confirmatory test results Sez = analytical sensitivity of test i (i = X, Y, Z) Spz = analytical specificity of test i (i = X, Y, Z)
Table 4 lists all possible combinations of true specimen status, candidate test result, comparison method result, confirmatory test result, estimated specimen status, and the probability of occurrence for each combination when DA is used and all test results are independent. The outcomes that lead to misclassification of a specimen have been emphasized. The expected values of the DA estimates for the performance parameters are given below: Analytical sensitivity:
[Se.sub.D] =(p+I + p-1) + (p+2 + p-2) /(p+l + p-I) + (p+2 + p-2) + (p+4 + p-4)
[Sp.sub.D] = (p+5 +_ p-5) + (p+6 + p-6) / (p+3 + p-3) + (p+5 + p-5) + (p+6 + p- 6)
[PPV.sub.D] = (p+l + p-1) + (p+2 + p-2) / (p+1 + p-1) + (p+2 + p-2) + (p+3 + p- 3)
(NPVD = P+5 + p-5) + (p+6 + p-6) / (p+4 + p-4) + (p+5 + p-5) + (p+6 + p-6)
Table 5 lists the test result frequencies. Notice that DA divides cell frequencies b and c in the traditional 2 x 2 contingency table of results (corresponding to disagreement between the candidate test and comparison method) into component parts that depend on the results of confirmatory testing. When discrepancies are resolved, the b1 component is shifted into the true positive (a) cell and c2 is shifted into the true negative (d) cell, leading to the estimates given below:
Analytical sensitivity: [Se.sub.D]?? = a+b1 / a+b1+c1
c2 + d Analytical specificity: [Sp.sub.D]?? = b2 + c2 + d
PPV: [PPV.sub.D] = a+b1 / a + b1 + b2
NPV: [NPV.sub.D] = a+b1 / c1 + c2 + d
The magnitude and direction of the bias that DA imparts on the estimates for analytical sensitivity and specificity can best be shown graphically by fixing the prevalence rate and the performance parameters of the comparison method and confirmatory test while allowing the true analytical sensitivity and analytical specificity of the candidate test to vary.
Figure 1, left, displays the biases of these estimates when the prevalence rate is 25%; the comparison method has analytical sensitivity and specificity of 90%, and the confirmatory test has analytical sensitivity and specificity of 95%. Thus, if DA were used in this situation with a candidate test that actually has 70% analytical sensitivity and 80% analytical specificity, Fig. 1, left, shows that the expected biases are ~5% for analytical sensitivity and 2% for analytical specificity, yielding expected estimates of ~75% for analytical sensitivity and 82% for analytical specificity. For this scenario, the percent bias in the DA estimates for PPV and NPV is given by Eq. 1:
[FIGURE 1 OMITTED]
bias = 14.5% - 0.15 (true PPV% or NPV%) (1)
Thus, the candidate test described here that has a true PPV of 54% and a true NPV of 89% would have DA estimates with an expected bias of ~6% for PPV and 1% for NPV, yielding expected estimates of ~60% for PPV and 90% for NPV.
As Fig. 1, left, and Eq. 1 suggest, DA can be expected to yield inflated estimates of the performance parameters for most true values of the analytical sensitivity and specificity. For this example, the analytical sensitivity of the candidate test must exceed ~90%, and the analytical specificity must exceed ~98% before DA tends to yield analytical sensitivity and specificity estimates that are negatively biased. Also, as Eq. 1 shows, for the given situation, DA will tend to overestimate the PPV and NPV whenever their true values are less than ~97%.
Figure 1, right, displays the biases in the DA estimates of analytical sensitivity and specificity when the sensitivities and specificities of the comparison method and confirmatory test are the same as in Fig. 1, left, but the prevalence rate decreases from 25% to 10%. The biases in the DA estimates of PPV and NPV are unaffected by the prevalence rate. As Fig. 1 demonstrates, even in those limited situations where DA can be expected to produce negatively biased estimates, the amount of the underestimation is usually small. The maximum negative bias occurs when the candidate test is a perfect test, a situation in which it is highly unlikely that DA would be applied because the estimated candidate test parameters are acceptably high.
Table 6 shows the effect the prevalence rate has on the DA estimates of the performance parameters. Notice how variable the DA estimate of analytical sensitivity is when the prevalence rate is low. When the prevalence rate is 1% and the candidate test is a perfect test (100% sensitive and specific), the DA estimate of analytical sensitivity is only 66.8%, whereas, paradoxically, when the candidate test is only 70% sensitive and specific, the expected DA estimate is 89.2%. When the prevalence rate is low and the candidate test is not highly sensitive and specific, the true PPV of the candidate test is quite low, but the DA estimate tends to overestimate the true value by a large amount. For example, when the prevalence rate is 1% and the true analytical sensitivity and specificity of the candidate test are 90%, then the true PPV is ~8.3%, but the expected DA estimate is 21.6%, over 2.5 times the true value. As the true analytical sensitivity and specificity decrease, the relative bias of the DA PPV estimate increases.
Using DA could actually result in an inferior candidate test supplanting a better test. Consider the following scenario: test T is new and is being proposed as a screening test for a disease with a prevalence rate of 1%. The test currently used for this purpose (test U) has 90% analytical sensitivity and 97% analytical specificity, yielding a PPV of 23.3%. The most specific test available (test V) is known to have an analytical specificity of 99% but is only 80% sensitive (PPV = 44.7%) and either too costly or time consuming to use as a screening test. Suppose that the analytical sensitivity of test T is greater than that of test V but less than that of test U, say, 85%. Then test T would have to have an analytical specificity >97.2% to have a higher PPV than test U. Now suppose that a sample of specimens from the population is tested with both test T and test U, with all discrepancies resolved by using test V. Then the analytical specificity of test T need only exceed 96.7% to yield an expected estimated PPV greater than that of test U. In reality, a test with 85% analytical sensitivity and 96.7% analytical specificity has a true PPV of only 20.6%. Thus, a candidate test that is less sensitive and specific, and hence has a lower PPV, than the present screening test could replace the present test because its DA-estimated PPV appears to be greater than the PPV of the currently used test.
One argument often made for using DA in the absence of a perfect test is the situation where two complementary comparison methods are available, one with high analytical sensitivity and one with high analytical specificity. The candidate test is first tested against one of the comparison methods and then DA is applied, with the other comparison method used as a confirmatory test. For example, suppose that the candidate test is first tested against a comparison method with 100% analytical specificity but only 80% analytical sensitivity. All discrepancies are then resolved by using a second independent confirmatory test with 100% analytical sensitivity and 80% analytical specificity. If 25% of the samples are actually abnormal, then Fig. 2 shows the expected biases of the DA estimates of analytical sensitivity and specificity. For this scenario, the percentage of bias in the DA estimates for PPV and NPV is given by Eq. 2.
bias = 20% - 0.2 (true PPV% or NPV%) (2)
The estimates are not affected by the order of the testing.
Therefore, in this situation, if a candidate test has both analytical sensitivity and specificity of 78%, and hence a PPV of 54% and an NPV of 91%, then the expected DA estimates are ~84% for analytical sensitivity, 82% for analytical specificity, 63% for PPV, and 93% for NPV. Not only are all of the estimates positively biased, but a test that, in reality, has lower analytical sensitivity and specificity than both the comparison method and confirmatory test would probably appear to be more sensitive than the comparison method and more specific than the confirmatory test. It can be shown that when one of the comparison methods has 100% analytical sensitivity and the other has 100% analytical specificity, DA can always be expected to overestimate the performance parameters, unless the true analytical sensitivity or specificity of the candidate test is also 100%. Fig. 2 and Eq. 2 illustrate this.
[FIGURE 2 OMITTED]
Often, DA is performed with only one type of discordance being resolved. For example, when culture is used as the comparison method, the culture-positive discrepancies are usually not resolved because culture is assumed to have 100% analytical specificity [1,4,6-8]. In such a situation, only the culture-negative discordancies are resolved. Fig. 3 shows the biases in the analytical sensitivity and specificity estimates when the prevalence rate is 25%, the comparison method has 80% analytical sensitivity and 99% analytical specificity, and a confirmatory test that has 95% analytical sensitivity and specificity is used to resolve the comparison method negative discordancies. When only one type of discordance is resolved, the biases are no longer the same for PPV and NPV. For this scenario, the percentage of biases in the DA estimates for PPV and NPV are given by Eqs. 3 and 4.
bias in PPV estimate = 6% - 0.07 (true PPV%) (3)
bias in NPV estimate = 20% - 0.21 (true NPV%) (4)
For this example, a candidate test that actually has 85% analytical sensitivity and 90% analytical specificity, and thus has PPV of 73.9% and NPV of 94.7%, can be expected to have DA estimates of 85.4% for analytical sensitivity, 90.3% for analytical specificity, 74.6% for PPV, and 94.8% for NPV, all positively biased.
[FIGURE 3 OMITTED]
In the absence of a perfect comparison method, the true status of a specimen can never be known, leading to potential misclassification bias . In the case of DA, the same test results are being used to estimate both a specimens true status and the ability of the test to identify that status. In addition, the status of some specimens is estimated by using the results of only two tests, whereas for other samples, more than two tests are used. The systematic and subjective nature of this testing procedure leads to differential misclassification bias, which modeling suggests almost always results in overestimation of the performance parameters of the candidate test.
Although it is true that DA often yields estimates for the performance parameters that are more accurate than just comparing the candidate test to one imperfect comparison method, this does not justify using the candidate test, the performance capabilities of which are in doubt, to help classify the specimens. It would be preferable to estimate the true status of a specimen by using all of the independent comparison methods available. When only two such comparison methods are to be used, as in basic DA, it would be better to test all of the specimens with both comparison method tests, classify only the specimens with concordant results, and discard the specimens with discordant results. Doing so would greatly reduce the misclassification bias. For the example displayed (Fig. 1, left), when the analytical sensitivity and specificity of the candidate test each range between 50% and 100%, the bias in the DA analytical sensitivity estimate ranges between -0.1% and 5.0%. When only the comparison methods are used to classify the specimens, with discordancies discarded, the bias ranges from -0.2% to 0.0%, which is appreciably less. Over the same range of analytical sensitivities and specificities, the candidate test true PPV ranges from 25% to 100%. The bias in the DA PPV estimate ranges from -0.5% to 10.8%. In contrast, the bias in the PPV estimate computed with the method that uses only the concordant comparison method results to classify the specimens ranges from -0.6% to 0.3%, which is again appreciably less.
An example showing the expected biases and errors in the estimates of the performance parameters when each of the three estimation procedures is used is revealing. Suppose that one had available two independent comparison methods, tests A and B, along with 1000 abnormal and 3000 normal specimens to estimate the performance parameters of a candidate test that actually has 80% analytical sensitivity and 80% analytical specificity. Let test A have 90% analytical sensitivity and specificity, and let test B have 95% analytical sensitivity and specificity. Table 7 displays the expected estimates and 95% confidence intervals for each of the following three methods of classifying the specimens: method I, test B only; method II, DA; method III, tests A and B with discordancies discarded. Only two of the four 95% confidence intervals from method I and one of the four intervals from method II cover the true values, whereas all four of the intervals from method III cover the true values. Overall, the DA estimates are slightly less biased than using only test B to classify the specimens, but using method III, which avoids using the candidate test to classify the specimens, is vastly superior to DA.
Up to this point, the results and discussions have assumed that all test results are independent. The impact of DA is worsened by dependence between any of the tests used in the DA process. Dependent tests here refer to those measuring disease markers that are either analytically or physiologically similar, such that their test results tend to agree; i.e., they tend to classify and misclassify in tandem. For purposes of myocardial infarction diagnosis, total creatine kinase (CK) and CK-MB fraction are dependent. On the other hand, electrocardiographic changes would be independent of both. Suppose that the discrepancies in Table 3 are resolved by using a test that is similar to the candidate test. In particular, suppose that the confirmatory test yields positive results in tandem with the candidate test with probability p (i.e., when the candidate test result is positive, the confirmatory test result is also positive with probability p).
It follows that the expected DA estimate for the analytical sensitivity of the candidate test is:
[Se.sub.D] = 88 + 52p / 135 + 5p
The more the confirmatory test mirrors the candidate test, the greater the value of p and the closer the DA estimate for analytical sensitivity is to 100%, regardless of the true analytical sensitivity of the candidate test.
The DA estimates remain biased even when a comparison method with 100% analytical sensitivity (or specificity) is complemented by a confirmatory test with 100% analytical specificity (or sensitivity). Even using a perfect comparison method to resolve discordancies does not eliminate this bias. Indeed, resolving discordancies with a perfect comparison method yields expected DA estimates for imperfect tests that are always too high.
When the prevalence rate is low, the bias associated with the DA estimate of PPV can be quite large. This overestimation of PPV can be especially problematic if one is considering using the candidate test as a screening test for a relatively rare disease, where even a small bias would result in a large underestimation of the expected false-positive rate.
Staquet et al.  showed that when unbiased estimates for the analytical sensitivity and specificity of the comparison method exist, then unbiased estimates for the analytical sensitivity and specificity of a candidate test can be easily computed. Thus, when one has unbiased estimates for the analytical sensitivity and specificity of a comparison method, it is difficult to justify the use of DA because one need not incur the increased costs of subjecting specimens with discrepant results to further testing. Other methods for estimating the performance parameters of a candidate test have been derived when unbiased estimates of comparison method test analytical sensitivity and specificity are not available [11,12]. When technological advances result in the gradual replacement of an old "gold standard," such as culture, with a new standard such as PCR, it is tempting to resolve discordancies by using the new technology. A fairer approach would be to report two estimates of analytical performance by using old and new technologies independently, or to treat the old and new standards as comparison methods and use method III described earlier, which discards samples whose comparison method results do not agree.
We applaud all efforts to investigate discrepancies in laboratory testing; this is nothing but good science. Using the process for estimating and reporting test performance that has come to be called DA, however, should be relied upon only as a last resort; we suggest that users scrutinize estimates derived by DA very carefully. As shown here, DA consistently overestimates laboratory performance parameters. Certainly, rare occasions may arise when the use of DA might be justified, as when comparison method testing is initially performed in Third World conditions and only a few samples can be transported for confirmation. Because alternative methods exist, however, the magnitude and direction of the bias associated with DA appear to negate its use as a valid method for estimating the performance parameters of a laboratory test, a conclusion that makes its use difficult to justify for reasons such as cost reduction or convenience.
Received July 24, 1997; revision accepted September 19, 1997.
[1.] Quinn TC, Welsh L, Lentz A, Crotchfelt K, Zenilman J, Newhall J, et al. Diagnosis by AMPLICOR PCR of Chlamydia trachomatis infection in urine samples from women and men attending sexually transmitted disease clinics. J Clin Microbiol 1996;34:1401-6.
[2.] Lee HH, Chernesky MA, Schachter J, Burczak JD, Andrews WW, Muldoon S, et al. Diagnosis of Chlamydia trachomatis genitourinary infection in women by ligase chain reaction assay of urine. Lancet 1995;345:213-6.
[3.] Smith KR, Ching S, Lee H, Ohhashi Y, Hu HY, Fisher HC III, et al. Evaluation of ligase chain reaction for use with urine for identification of Neisseria gonorrhcece in females attending a sexually transmitted disease clinic. J Clin Microbiol 1995;33:455-7.
[4.] Schachter J, Stamm WE, Quinn TC, Andrews WW, Burczak JD, Lee HH. Ligase chain reaction to detect Chlamydia trachomatis infection of the cervix. J Clin Microbiol 1994;32:2540-3.
[5.] Hadgu A. The discrepancy in discrepant analysis. Lancet 1996; 348:592-3.
[6.] Goessens WH, Kluytmans JA, den Toom N, van Rijsoort-Vos TH, Niesters BG, Stolz E, et al. Influence of volume of sample processed on detection of Chlamydia trachomatis in urogenital samples by PCR. J Clin Microbiol 1995;33:251-3.
[7.] Hoffner SE, Cristea M, Klintz L, Petrini B, Kallenius G. RNA amplification for direct detection of Mycobacterium tuberculosis in respiratory samples. Scand J Infect Dis 1996;28:59-61.
[8.] Biro FM, Reising SF, Doughman JA, Kollar AM, Rosenthal SL. A comparison of diagnostic methods in adolescent girls with and without symptoms of Chlamydia urogenital infection. Pediatrics 1994;93:476-80.
[9.] Valenstein PN. Evaluating diagnostic tests with imperfect standards. Am J Clin Pathol 1990;93:252-8.
[10.] Staquet M, Rozencweig M, Lee YJ, Muggia FM. Methodology for assessment of new dichotomous diagnostic tests. J Chronic Dis 1981;34:599-610.
[11.] Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics 1980;36:167-71.
[12.] Smith PJ, Hadgu A. Sensitivity and specificity for correlated observations. Stat Med 1992;11:1503-9.
HARVEY B. LIPMAN * and J. REX ASTLES
Division of Laboratory Systems, Public Health Practice Program Office, Centers for Disease Control and Prevention, Atlanta, GA 30341-3714.
* Address correspondence to this author at: Centers for Disease Control and Prevention, 4770 Buford Highway NE, Mailstop G25, Atlanta, GA 303413714. Fax (770) 488-7667; e-mail firstname.lastname@example.org.
(1) Nonstandard abbreviations: DA, discrepant analysis; PPV, positive predictive value; and NPV, negative predictive value.
Table 1. All possible combinations of true status, test results, classification when DA is used, and their probabilities of occurrence, when the candidate and comparison tests are independent fair-coin flips, with discrepancies resolved using a perfect test. Test results True status Candidate Comparison Confirmatory Outcome of specimen test method test A + + + B + + - + C + - + + D + - - E - + + F - + - - G - - + - H - - - True status Classification Outcome of specimen of specimen Probability A + + 0.125 B + + 0.125 C + + 0.125 D + - 0.125 E - + 0.125 F - - 0.125 G - - 0.125 H - - 0.125 Table 2. The true status of the specimens and expected test results when 100 abnormal and 200 normal specimens are tested with a candidate test having 80% analytical sensitivity and 70% analytical specificity and a comparison method having 95% analytical sensitivity and 80% analytical specificity. True status Test results Candidate Comparison No. of of specimen test method specimens + + + 76 + + - 4 + - + 19 + - - 1 - + + 12 - + - 48 - - + 28 - - - 112 Table 3. Expected test results when 100 abnormal and 200 normal specimens are tested with a candidate test having 80% analytical sensitivity and 70% analytical specificity and a comparison test having 95% analytical sensitivity and 80% analytical specificity. Comparison method Candidate test Abnormal Normal Total Abnormal 88 52 140 Normal 47 113 160 Total 135 165 300 Table 4. All possible combinations of true status, test results, classification when DA is used, and their probabilities of occurrence, assuming all test results are independent. Test results True status Candidate Comparison Confirmatory Classification of specimen test method test of specimen + + + + + + - + + + + - - - (a) + - + + + + - + - - (a) + - - - (a) - + + + (a) - + - + + (a) - + - - - - - + + + (a) - - + - - True status of specimen Probability + [p.sub.+1] = R [Se.sub.x] [Se.sub.y] + [p.sub.+2] = R [Se.sub.x] (1-[Se.sub.y]) [Se.sub.z] + [p.sub.+3] = R [Se.sub.x] (1-[Se.sub.y]) (1-[Se.sub.z]) + [p.sub.+4] = R (1-[Se.sub.x]) [Se.sub.y] [Se.sub.z] + [p.sub.+5] = R (1-[Se.sub.x]) [Se.sub.y] (1-[Se.sub.z]) + [p.sub.+6] = R (1-[Se.sub.x]) (1-[Se.sub.y]) - [p.sub.-1] = (1-R) (1-[Sp.sub.x]) (1-[Sp.sub.y]) - [p.sub.-2] = (1-R) (1-[Sp.sub.x]) [Sp.sub.y] (1-[Sp.sub.z]) - [p.sub.-3] = (1-R) (1-[Sp.sub.x]) [Sp.sub.y] [Sp.sub.z] - [p.sub.-4] = (1-R) [Sp.sub.x] (1-[Sp.sub.y]) (1-[Sp.sub.z]) - [p.sub.-5] = (1-R) [Sp.sub.x] (1-[Sp.sub.y]) [Sp.sub.z] - [p.sub.-6] = (1-R) [Sp.sub.x] [Sp.sub.y] (a) Incorrect classification. Table 5. DA test results. Test results Candidate Comparison Confirmatory No. of test method test specimens + + a + - + b1 + - - b2 - + + C1 - + - C2 - - d Table 6. Expected DA estimates for analytical sensitivity, specificity, PPV, and NPV for various true sensitivities, specificities, and prevalence rates when the comparison test is 90% sensitive and specific and the confirmatory test is 95% sensitive and specific. True analytical Expected DA analytical sensitivity, sensitivity, specificity (%) at prevalance rate of specificity (%) 25% 10% 1% 100, 100 98.5, 99.8 95.7, 99.9 66.8, 100 90, 90 90.5, 91.2 89.1, 91.3 81.4, 91.3 80, 80 82.8, 82.4 83.6, 82.3 86.6, 82.3 70, 70 75.6, 73.4 79.1, 73.2 89.2, 73.1 True PPV, NPV/ Expected DA PPV, NPV (%) 100, 100 100, 100/ 100, 100/ 100, 100/ 99.5, 99.5 99.5, 99.5 99.5, 99.5 90, 90 75.0, 96.4/ 50.0, 98.8/ 8.3, 99.9/ 78.2, 96.4 57.0, 98.5 21.6, 99.4 80, 80 57.1, 92.3/ 30.8, 97.3/ 3.9, 99.7/ 63.0, 93.0 40.7, 97.5 17.8, 99.3 70, 70 43.8, 87.5/ 20.6, 95.5/ 2.3, 99.6/ 51.7, 88.9 32.0, 95.7 16.5, 99.1 Table 7. Expected estimates (and 95% confidence intervals) for the performance parameters of a test with true analytical sensitivity and specificity of 80%. True value Method I (b) Method II (c) No. of tests performed 8000 9040 Specimen status Abnormal 1000 1100 1066 Normal 3000 2900 2934 Analytical 80 71.8 (69.1, 74.5) 82.8 (80.6, 85.1) sensitivity, % Analytical 80 79.0 (77.5, 80.5) 83.3 (82.0, 84.7) specificity, % PPV, % 57.1 56.4 (53.8, 59.0) 63.1 (60.5, 65.6) NPV, % 92.3 88.1 (86.8, 89.3) 93.0 (92.0, 94.0) Total bias for all -14.1 12.8 four estimates, % Method III (d) No. of tests performed 11440 Specimen status Abnormal 870 Normal 2570 Analytical 79.0 (76.2, 81.7) sensitivity, % Analytical 79.9 (78.3, 81.4) specificity, % PPV, % 57.1 (54.2, 59.9) NPV, % 91.8 (90.6, 93.0) Total bias for all -1.6 four estimates, % (a) Test A has analytical sensitivity and specificity of 90%; test B has analytical sensitivity and specificity of 95%. (b) Method I estimates the status of the specimens by using test B. (c) Method II estimates the status of the specimens by using the candidate test and test A, resolving discordancies with test B. (d) Method III estimates the status of the specimens by using tests A and B and discards the discordancies.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Test Utilization and Outcomes|
|Author:||Lipman, Harvey B.; Astles, J. Rex|
|Date:||Jan 1, 1998|
|Previous Article:||Variability and determinants of total homocysteine concentrations in plasma in an elderly population.|
|Next Article:||Statistically accurate estimation of hormone concentrations and associated uncertainties: methodology, validation, and applications.|