Printer Friendly

Stronger Together: Aggregated Z-values of Traditional Quality Control Measurements and Patient Medians Improve Detection of Biases.

Internal quality control (QC) [4] ensures analytical quality in clinical chemistry. To prevent the release of erroneous results and ultimately avert patient harm, medical laboratories allocate considerable human and financial resources to these procedures. In >60 years, QC has progressed from simple control charts (1) to statistically planned QC strategies (2). Based on quality goals for the individual laboratory test (3, 4) and actual performance (5), appropriate rules can be selected from a pool of established procedures. Groundbreaking work in this area has been contributed by Westgard, who developed rules to evaluate multiple control measurements together (6-8). Other parameters to adjust the QC strategy to individual needs include the frequency of QC events (9) and triggers (e.g., lot changes) (10, 11).

Traditionally, QC of biochemical tests relies on the measurement of well-characterized control samples. The quantity of the analyte in these samples is determined using methods of a higher metrological order. Therefore, comparison of measured and assigned values can reveal systematic and random errors of the test. However, these materials suffer from several drawbacks. QC samples often constitute a considerable amount of the total price of an analysis (12). Moreover, the central assumption that QC material exhibits the same behavior as patient samples does not hold true in many cases (13). Miller et al. compared patient and QC samples with different reagent lots and found significant differences between sample types in 40.9% of cases (14). In a similar study, Cho et al. found a significant difference in only 7.8% (15). Despite the discrepancy regarding the frequency of differences, both studies illustrate serious limitations of QC material. A lack of commutability is also frequently observed in external quality assessment schemes that use samples similar to QC materials (16,17).

The test results of patient samples are increasingly being used for QC purposes (18-22). The true value of patient samples is unknown. However, if comparable and sufficiently large patient populations are repeatedly tested, a central location parameter (e.g., the median) of the distribution of measurements should remain constant within certain bounds. Calculated parameters do not generate costs for additional tests or materials. Because no sample is tested repeatedly, precision is hard to evaluate with patient samples. Moreover, the patient population cannot be controlled by the laboratory. Changes in the location parameter that result from changes in the patient population cannot be readily distinguished from changes caused by errors in the laboratory test. A systematic preanalytic error (e.g., during transport or centrifugation (23)) can influence many patient samples and shift the location parameter without changing control measurements.

This study provides estimates of the utility of daily patient medians for QC. Moreover, we introduce several algorithms that aggregate QC measurements and daily patient medians into a single parameter. All procedures were evaluated in simulations of laboratory tests and errors.



To avoid relying on analytical performance specifications such as the allowable total error ([TE.sub.a]), we calculated the number of patient samples required for the standard error of the median to reach the same magnitude as the standard error of a single QC measurement. For measurements in which patient values follow a normal or lognormal distribution, the required number can be estimated with Eqs. 1 and 2, respectively:

n = [pi]/2 x C[V.sup.2.sub.t] + C[V.sup.2.sub.a]/C[V.sup.2.sub.a] (1)

[mathematical expression not reproducible] (2)

[[sigma].sub.t], [mu], C[V.sub.t] are standard deviation, mean, and coefficient of variation of the distribution of patient analyte true values, respectively. [[sigma].sub.a], C[V.sub.a] express analytical imprecision as standard deviation and coefficient of variation, respectively (see Methods and Results in the Data Supplement that accompanies the online version of this article at


In this work, daily measurements of internal QC material and daily medians are considered as simple QC parameters. Control rules aggregate these parameters into a combined measure. All control rules presented here operate on Z-values that are calculated from a simple QC parameter X according to the following formula:

Z(X) - X - [[mu].sub.stable]/[[sigma].sub.stable] (3)

The standard deviation [[sigma].sub.stable] and the mean [[mu].sub.stable] can be calculated from previous measurements that are assumed to have been conducted under stable conditions. Parameters of control materials are often also provided by the manufacturer.


Daily Z-values of simple QC parameters (internal QC measurements or daily medians) were combined with the following algorithm:
   IF (at least n = 2 Z-values arepositive){
     pos_z = Minimum of the n = 2 largest Z-values
   ELSE {
     pos_z = 0
   IF (at least n = 2 Z-values are negative){
     neg_z = Maximum of the n = 2 smallest Z-values
   ELSE {
     neg_z = 0
   RETURN maximum of absolute values of pos_z
     and neg_z

To the best of our knowledge, this algorithm has not been described before. For Z-values from 2 control measurements as input, it closely resembles "Westgard-Rules" of the form 22S or 23S, which are often applied to detect biases. The innovation of this approach is to avoid specifying a threshold value for standard deviations away from the mean. Therefore, the general classification ability of this type of combination can be evaluated. When Z-values from 3 simple control parameters are included but n = 2, this algorithm resembles "Westgard-Rules" of the form 2 of [3.sub.2S] or 2 of [3.sub.3S]. We refer to this type of control rule as "Westgard-like algorithm" and abbreviated it as "west" if daily patient medians were included and as "west.QC" if not.


In addition, unweighted Z-values were aggregated using Stouffer's method (24):

[Z.sub.s] = [[summation].sup.n.sub.i=1] [Z.sub.i]/[square root of n (4)

A weighted method aggregates the individual Z values as follows:

[Z.sub.W] = [[summation].sup.n.sub.i=1] [w.sub.i] [Z.sub.i]/[square root of [[summation].sup.n.sub.i=1] [w.sup.2.sub.i] (5)

Different weights [w.sub.i], based on sample size, effect size, or estimated standard error, have been proposed (25). Here, the probability of each simple QC parameter to detect an out-of-control situation is used (see Methods and Results in the online Data Supplement). We denominate these methods as weighted and unweighted aggregation of Z-values. The methods that include patient medians are abbreviated "zWAggr" and "zAggr," whereas methods that aggregate only internal QC measurements are abbreviated as "zWAggr.QC" and "zAggr.QC."


An out-of-control situation can be caused by a bias shifting all QC measurements by the equivalent of 2 Z-values (Fig. 1A). Other biases might affect Z-values unequally (Fig. 1B).

[DELTA]Z = [absolute value of Z(4) - Z([d.sub.2])] (6)

Eq. 6 expresses the difference between the shifts at QC measurements 1 and 2 normalized to Z-values (Fig. 1C). To allow comparison of the Westgard-like algorithm and the unweighted aggregation of Z-values, simplified conditions were assumed. In this simulation, the mean of the shifts Z([d.sub.1]) and Z([d.sub.2]) was kept at 2 Z-values, but [DELTA]Z was increased in steps of 0.01 from 0-3. In each step, 500 pairs of Z-values were generated for the stable and for the out-of-control situation. The areas under the receiver operating characteristic curves (AUC) of both algorithms were subsequently calculated.


We have created a dedicated software package "rSimLab" implemented in the "R" programming language (26) to simulate measurements in laboratory medicine. "rSimLab" is available free of charge at https:// The design principle was to initially generate patient and QC samples with known true values as described in the following paragraph. Measurements of these samples were then simulated using characteristics of the respective assays (Fig. 2).

Daily measurements were simulated for 5 analytes--albumin, hemoglobin A1c (HbA1c), testosterone, troponin I, and vitamin D3 (Table 1). Together, these parameters represent a broad spectrum of analytical properties relevant for QC. The daily number of measurements followed a normal distribution. For each measurement, a true value was drawn from a normal, lognormal, or bimodal distribution (27). The center and spread of each analyte were modeled to resemble measurements stored in our laboratory information system at the university hospital of the Technische Universitat Munchen. Seasonal trends were simulated by changing the center of the distribution. At least 2 QC measurements with a fixed true value were simulated for each day.

[[sigma].sub.c] = [square root of [[alpha].sup.2] + [(c[beta]).sup.2] (7)

Precision was specified using the so-called characteristic function (Eq. 7) with c being the concentration and [[sigma].sub.c] being the standard deviation at this concentration (28). This function models a constant imprecision close to the limit of detection and an imprecision relative to the measured value at higher values (see Methods and Results in the online Data Supplement, see Fig. 1 in the online Data Supplement).


Levels of QC measurements were chosen similar to those in our laboratory with at least 1 normal and 1 pathological control. Controls were measured only once daily. No bracketing was used as is common practice in German laboratories. Medians were calculated daily from a varying number of samples.


Bias in clinical chemistry can be caused by various reasons such as operator errors or unstable reagent lots (13, 29, 30). In this simulation, relative and absolute biases were modeled. They increased steadily over time or occurred abruptly. The size of biases was adjusted to exceed out-of-control specifications on approximately 5% of all days (see Fig. 3, see Methods and Results in the online Data Supplement).


To test the robustness of QC procedures, several disturbances were simulated that affect the simple QC parameters but not the patient sample measurements. Internal QC measurements were disturbed by a constant bias, by a relative bias, or by increased imprecision to simulate a lack of commutability (14, 15, 31). Patient medians were disturbed by a constant increase in all patient true values or by a removal of one-third of all samples to model changes in the patient population (see Methods and Results in the online Data Supplement).


Each run of a simulation started with a stable phase that was used to derive mean and standard deviation for the calculation of Z-values in control rules. Decision thresholds were determined in a first error-prone phase with simulated biases. The performance of simple QC parameters, of the Westgard-like algorithm, of the weighted, and of the unweighted aggregation of Z-values, was evaluated in an additional independent error-prone phase. All control rules were investigated using internal QC measurements alone and combining QC measurements and patient medians. Finally, the same control rules were tested in an error-prone phase with disturbances.

In each phase, 730 days were simulated to accommodate all simulated biases. For each analyte, 200 independent simulation runs were conducted to detect a difference of 0.01 in AUC values with a power of at least 95%.

To make this simulation reproducible (32), the full code is provided in a Code file in the online Data Supplement.


German RiliBAEK states a "permissible relative deviation" and the "applicable concentration intervals" for the most important analytes (33). The permissible relative deviations were treated as TEa. Values outside the applicable concentration interval were not considered for performance evaluation. A day was regarded as out-of-control if >5% of patient measurements exceeded this quality requirement. Other ratios were tested in a sensitivity analysis (see Methods and Results in the online Data Supplement, Fig. 2 in the online Data Supplement).

The AUC metric was used to express the ability to correctly classify out-of-control days regardless of thresholds (34). Concrete decision limits were determined based on maximum Youden Index and on 90% sensitivity (often called "probability of error detection") (8) in an undisturbed, error-prone phase (35). For both thresholds, sensitivity, specificity, and balanced accuracy were calculated in independent error-prone simulation phases with and without disturbances.

The Wilcoxon rank sum test was used to compare AUC values from different control rules. Differences with a P-value of <0.01 were considered statistically significant.



Of the simulated analytes, only albumin follows a normal distribution. The analytical C[V.sub.a] at the mean is 0.037. The ratio of spread to the center of sample Values [[sigma].sub.t]/[[mu].sub.t] = C[V.sub.t] is 0.255. In total, 79 patient samples are sufficient for the standard error of the median to reach the same magnitude as the standard error of a QC measurement at the same analyte value.

HbA1c, troponin I, and vitamin D3 follow a lognormal distribution. Their C[V.sub.a], C[V.sub.t], and required number of samples are 0.034, 0.162, and 36 samples, 0.038, 4.1, and 178 samples, and 0.056, 0.581, and 110 samples, respectively.

Because of the bimodal distribution, the number of samples needed for testosterone could not be readily calculated.


Under simplified conditions, the Westgard-like algorithm and the unweighted aggregation of Z-values performed comparably when Z-values of QC measurements were affected equally (Fig. 1D). The AUC of the unweighted aggregation of Z-values and of the Westgard-like algorithm reaches 0.998 and 0.996, respectively.

When an out-of-control condition affected QC measurements unequally, their Z-values differed. With increasing difference [DELTA]Z between these Z-values, the performance of the Westgard-like algorithm decreased but not that of the unweighted aggregation of Z-values. When the difference [DELTA]Z equaled 3 Z-values, AUCs of the aggregation of Z-value and of the Westgard-like algorithm were 0.998 and 0.943, respectively.


The R package "rSimLab" is flexible enough to simulate normal, lognormal, or bimodal distributions, and seasonal variations of true values. The precision of the analytical methods was modeled using the characteristic function. The [TE.sub.a] was specified as a proportion of the concentration and did not change in line with imprecision. Consequently, the probability of impermissible errors varied over the measurement range.


After 200 runs for each simulation, mean AUC (mAUC) for patient medians considerably varied and ranged from 0.64-0.96 (Fig. 4, Table 1 in the online Data Supplement). The highest mAUC was reached in the simulation of albumin, characterized by a large number of samples and a low ratio between spread and center. For vitamin D3, seasonal variations increased the error of the median, but 200 samples, on average, were sufficient to reach an mAUC of 0.84. In the simulation of HbA1c measurements, the mAUC of medians of only 70 patient samples was higher than that of the poorly performing control measurements. The median of the testosterone simulation was located between the distributions of "male" and "female" patients and was not effective in detecting a bias (mAUC 0.64).

The Westgard-like algorithm and the aggregations of Z-values tended to have a higher mAUC than the simple QC parameters they combined. Only the median of albumin and the control measurement 1 of testosterone had a slightly higher (<0.01) mAUC than an aggregation of Z-values. The Westgard-like algorithm performed worse than the simple QC parameters it combined in simulations of albumin, testosterone, and troponin I. The inclusion of patient medians improved all control rules in the simulations of albumin, HbA1c, and vitamin D3, but only the Westgard-like algorithm in the simulations of troponin I. When control rules were combined in an additional experiment using sex-specific medians of testosterone, mAUCs were not markedly improved (see Methods and Results in the online Data Supplement, see Fig. 3 in the online Data Supplement). Aggregations of Z-values had higher mAUCs than the respective Westgard-like algorithm in all simulations. Weighting nominally enhanced the aggregation of Z-values in the simulation of HbA1c and troponin I. However, mAUC of the weighted and unweighted aggregation of Z-values differed by <0.02 in all simulations. The differences between AUCs of the Westgard-like algorithm and of the corresponding weighted or unweighted Z-value aggregations were statistically significant.

Performance evaluations using thresholds based on Youden Index or 90% sensitivity followed the pattern of AUC values (see Tables 2-7 in the online Data Supplement). The inclusion of medians improved balanced accuracies and specificities in simulations of albumin, HbA1c, and vitamin D3. Mean balanced accuracies and mean specificities of Z-value aggregations were always higher than those of the respective Westgard-like algorithms.


When internal QC measurements were disturbed (by increased imprecision, by a relative bias, or by a constant bias), the mAUC of single control measurements decreased by >20% (e.g., vitamin D3). In most simulations, a control rule that included medians had a higher mAUC than the same rule without this parameter. Inclusions of medians led to a slightly lower mAUC (decrease of mAUC < 0.01) only in simulations of testosterone and troponin I when disturbed by an increase in imprecision. In simulations of albumin and vitamin D3, the disturbance caused a much larger performance loss in rules without medians.

When medians were disturbed (by removal of one-third of all samples or by a constant increase of all patient true values), the mAUC of medians also decreased by up to 20% (e.g., albumin). A rule that included medians still exhibited a higher or an only slightly lower (decrease of mAUC < 0.03) mAUC than the same rule without medians (Fig. 5, see Table 1 in the online Data Supplement).

In the simulation of HbA1c measurements, the removal of one-third of all patients decreased all mAUC values. The reduced number of patients increased randomness, in that days were regarded as "stable" or "out-of-control" that would have been classified differently with more samples. This classification served as the basis for AUC calculations. As such, mAUC values decreased although QC measurements were not affected by the fewer samples directly (see Methods and Results in the online Data Supplement; Fig. 1 in the online Data Supplement).


QC in clinical laboratories has become increasingly sophisticated. Instead of "one-size-fits-all" approaches, individual QC strategies are developed on the basis of knowledge of the analytical methods and clinical needs. We investigated the circumstances under which patient medians may be useful and how these can be incorporated into control rules for the detection of biases.

We did not address the construction of a comprehensive QC strategy combining several control rules.

However, this work suggests that the aggregation of Z-values can be a new element of such "multirules." Thresholds for aggregated Z-values need to be defined on the basis of individual needs. A Z-value threshold corresponds to a significance level (e.g., a threshold of [+ or -]1.96 Z equals a significance level of [alpha] = 0.05) (25). The inclusion of medians is particularly suited for retrograde QC strategies, in which patient results are held until a QC event has passed. Here, QC and patient samples measured under the same conditions can be evaluated together. Bracketing a fixed number of patient results for a QC event removes a source of uncertainty for patient medians. In anterograde strategies, in which a QC event has to be passed before measurements are started, patient medians are only available from the previous run and only persistent deviations can be detected.

In this study, QC procedures were compared in an extensive, carefully parameterized simulation. Precision was modeled with characteristic functions to create higher imprecision close to the limits of detection. [TE.sub.a] was specified as a proportion of the concentration and did not change in line with imprecision. Consequently, the probability of impermissible errors was not constant over the range of measurements. This fact is often simplified in mathematical calculations of control rules. The AUC metric was used to evaluate the performance of QC procedures regardless of their decision thresholds.

The median of patient measurements can offer valuable information for QC if the spread of patient measurements in relation to their center is small. We have provided more exact estimates for the required numbers than the previously recommended 200 patient samples (18, 36) and confirmed the importance of the spread (37). Other location parameters besides the median have been proposed, including the mean after outlier removal with Tukey method, the average of normal values, and the mean of log-transformed values (19). All parameters calculated from patient measurements are always prone to be influenced by changes in the patient population. The more traditional use of QC materials also suffers from hard-to-control failures such as noncommutability. Patient medians should therefore supplement, but not replace, traditional QC measurements.

The proposed aggregation of Z-values has a superior discriminative ability compared with the algorithm resembling traditional Westgard single rules of the form 2xs (x = 2, 2.5 ...). However, although a variety of errors and disturbances was simulated, these settings may not reflect the situation of an individual assay. When trying to select an algorithm for the detection of biases, their different mechanisms offer guidance. The Westgard-like algorithm combines individual Z-values with a "minimum threshold" strategy. For the 225S rule, 2 Z-values need to exceed the minimum boundary 2.5 to be flagged. If 1 Z-value is higher than the boundary by far but the other Z-value misses the boundary slightly, no signal is given. In contrast, the proposed aggregation of Z-values averages the results of simple QC parameters. As shown by our simulation under simplified conditions, this is especially beneficial when the deviation causing the out-of-control situation does not affect all simple QC parameters equally. The aggregation of Z-values can be nominally improved when the individual Z-values are weighted. Linnet has compared a "mean rule," closely related to Stouffer method, with traditional Westgard rules. Given the same type-I error, the mean rule was more powerful for detection of shifts of location than Westgard rules (38).

The robustness of control rules was tested when internal QCs were disturbed with a constant or relative bias or an increase in imprecision. Patient medians were influenced by a removal of one-third of all patients values or a constant increase in patients' true values. When patient values remained unchanged, patient medians could mitigate the effect of failing QC measurements and vice versa. A confounding factor, such as inappropriate storage conditions or a change in the patient population, is much more likely to influence all QC samples or all patient measurements, but not both. Therefore, any rule that includes information from both sources provides a superior detection of biases.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contribution to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.

Authors' Disclosures or Potential Conflicts of Interest: No authors declared any potential conflicts of interest.

Acknowledgment: We would like to thank Christ of Winter for fruitful discussions.


(1.) Levey S, Jennings ER. The use of control charts in the clinical laboratory. American Journal of Clinical Pathology 1950; 20:1059-66.

(2.) Kinns H, Pitkin S, Housley D, Freedman DB. Internal quality control: best practice. J Clin Pathol 2013; 66: 1027-32.

(3.) Klee GG. Establishment of outcome-related analytic performance goals. Clin Chem 2010; 56:714-22.

(4.) Sandberg S, Fraser CG, Horvath AR, Jansen R, Jones G, Oosterhuis W, et al. Defining analytical performance specifications: consensus statement from the first strategic conference of the European Federation of Clinical Chemistry and Laboratory Medicine. Clin Chem Lab Med 2015; 53:833-5.

(5.) Linnet K, BoydJC. Selection and analytical evaluation of methods-with statistical techniques. Tietz Textbook of Clinical Chemistry and Molecular Diagnostics 5th ed. St. Louis (MO): Elsevier Sci. Intl. 2012:7-48.

(6.) Westgard JO, Groth T, Aronsson T, Falk H, deVerdier CH. Performance characteristics of rules for internal quality control: probabilities for false rejection and error detection. Clin Chem 1977; 23:1857-67.

(7.) Westgard JO. Internal quality control: planning and implementation strategies. Ann Clin Biochem 2003; 40: 593-611.

(8.) Westgard JO. Statistical quality control procedures. Clin Lab Med 2013; 33:111-24.

(9.) Parvin CA. Assessing the impact of the frequency of quality control testing on the quality of reported patient results. Clin Chem 2008; 54:2049-54.

(10.) Cooper G, DeJonge N, Ehrmeyer S, Yundt-Pacheco J, Jansen R, Ricos C, Plebani M. Collective opinion paper on findings of the 2010 convocation of expertson laboratory quality. Clin Chem Lab Med 2011; 49:793802.

(11.) Parvin CA, Gronowski AM. Effect of analytical run length on quality-control (QC) performance and the QC planning process. Clin Chem 1997; 43:2149-54.

(12.) Howanitz PJ, Tetrault GA, Steindel SJ. Clinical laboratory quality control: a costly process now out of control. Clin Chim Acta 1997; 260:163-74.

(13.) Thaler MA, lakoubov R, Bietenbeck A, Luppa PB. Clinically relevant lot-to-lot reagent difference in a commercial immunoturbidimetric assay for glycated hemoglobin A1c. Clin Biochem 2015; 48:1167-70.

(14.) Miller WG, Erek A, Cunningham TD, Oladipo O, Scott MG, Johnson RE. Commutability limitations influence quality control results with different reagent lots. Clin Chem 2011; 57:76-83.

(15.) Cho MC, Kim SY, Jeong TD, Lee W, Chun S, Min WK. Statistical validation of reagent lot change in the clinical chemistry laboratory can confer insights on good clinical laboratory practice. Ann Clin Biochem 2014; 51: 688-94.

(16.) Steele BW, Wang E, Klee GG, Thienpont LM, Soldin SJ, Sokoll LJ, et al. Analytic bias of thyroid function tests: analysis of a College of American Pathologists fresh frozen serum pool by 3900 clinical laboratories. Arch Pathol Lab Med 2005; 129:310-7.

(17.) Kristensen GB, Rustad P, Berg JP, Aakre KM. Analytical bias exceeding desirable quality goal in 4 out of 5 common immunoassays: results of a native single serum sample external quality assessment program for cobalamin, folate, ferritin, thyroid-stimulating hormone, and free T4 analyses. Clin Chem 2016; 62:1255-63.

(18.) Wilson A, Roberts WL, Pavlov I, Fontenot J, Jackson B. Patient result median monitoring for clinical laboratory quality control. Clin Chim Acta 2011; 412:1441-6.

(19.) Fleming JK, Katayev A. Changing the paradigm of laboratory quality control through implementation of real-time test results monitoring: for patients by patients. Clin Biochem 2015; 48:508 -13.

(20.) Jorgensen LM, Hansen SI, Petersen PH, Soletormos G. Median of patient results as a tool for assessment of analytical stability. Clin ChimActa 2015; 446:186-91.

(21.) Housley D, Kearney E, English E, Smith N, Teal T, Mazurkiewicz J, Freedman DB. Audit of internal quality control practice and processes in the south-east of England and suggested regional standards. Ann Clin Biochem 2008; 45:135-9.

(22.) Goossens K, Van Uytfanghe K, Twomey PJ, Thienpont LM, Participating Laboratories. Monitoring laboratory data across manufacturers and laboratories--a prerequisite to make "big data" work. Clin Chim Acta 2015; 445:12-8.

(23.) Orton DJ, Gifford JL, Seiden-Long I, Khan A, de Koning L. Critically high plasma ammonia in an adolescent girl. Clin Chem 2016; 62:1565-8.

(24.) Stouffer SA, Suchman EA, DeVinney LC, Star SA, Williams Jr RM. The American Soldier. Vol. 1, Adjustment During Army Life. Princeton (NJ): Princeton University Press; 1949.

(25.) Zaykin DV. Optimally weighted Z-test is a powerful method for combining probabilitiesin meta-analysis. J Evol Biol 2011; 24:1836-41.

(26.) R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2016.

(27.) Haeckel R, Wosniok W. Observed, unknown distributions of clinical chemical quantities should be considered to be log-normal: a proposal. Clin Chem Lab Med 2010; 48:1393.

(28.) Thompson M. The characteristic function, a method-specific alternative to the Horwitz function. J AOAC Int 2012; 95:1803-6.

(29.) Algeciras-Schimnich A, Bruns DE, Boyd JC, Bryant SC, La Fortune KA, Grebe SK. Failure of current laboratory protocols to detect lot-to-lot reagent differences: findings and possible solutions. Clin Chem 2013; 59:1187-94.

(30.) Theodorsson E, Magnusson B, Leito I. Bias in clinical chemistry. Bioanalysis 2014; 6:2855-75.

(31.) Candas-Estebanez B, Cano-Corres R, Dot-Bach D, Valero-Politi J. Lack of commutability between a quality control material and plasma samples in a troponin I measurement system. Clin Chem Lab Med 2012; 50: 2237.

(32.) Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med 2006; 25:4279-92.

(33.) German Medical Association. Revision of the "guideline of the German Medical Association on quality assurance in medical laboratory examinations-RILIBAEK". J Lab Med 2015; 39:26-69.

(34.) Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: Visualizing classifier performance in R. Bioinformatics 2005; 21:3940-1.

(35.) Lopez-Raton M, Cadarso-Suarez C, Rodriguez-Alvarez MX, Gude-Sampedro F. Optimal Cutpoints: an R package for selecting optimal cutpoints in diagnostic tests. J Stat Software 2014; 61:1-36.

(36.) Lott JA, Smith DA, Mitchell LC, Moeschberger ML. Use of medians and "average of normal" of patients' data for assessment of long-term analytical stability. Clin Chem 1996; 42:888-92.

(37.) Cembrowski GS, Chandler EP, Westgard JO. Assessment of "average of normals" quality control procedures and guidelines for implementation. Am J Clin Pathol 1984; 81:492-9.

(38.) Linnet K. Mean and variance rules are more powerful or selective than quality-control rules based on individual values. Eur J Clin Chem Clin 1991; 29:417-24.

Andreas Bietenbeck, [1] * Markus A. Thaler, [1] Peter B. Luppa, [1] and Frank Klawonn [2,3]

[1] Institut fur Klinische Chemie und Pathobiochemie, Klinikum rechtsder Isar der Technischen Universitat Munchen, Munchen, Germany; [2] Department of Biostatistics, Helmholtz Centre for Infection Research, Braunschweig, Germany; [3] Department of Computer Science, Ostfalia University of Applied Sciences, Wolfenbuttel, Germany.

* Address correspondence to this author at: Institut fur Klinische Chemie und Pathobiochemie, Klinikum rechts der Isar der Technischen Universitat Munchen, Ismaninger Strasse 22,81675 Munchen, Germany. Fax +49-89-4140-4875; e-mail

Received December 12,2016; accepted April 11,2017.

Previously published online at DOI: 10.1373/clinchem.2016.269845

[4] Nonstandard abbreviations: QC, quality control; HbA1c, Hemoglobin A1c; TEa, allowable total error; CV, coefficient of variation; AUC, area under the receiver operating characteristic curve; mAUC, mean AUC; OOC, out-of-control.

Caption: Fig. 1. The Westgard-like algorithm and the aggregation of Z-values under simplified conditions.

Caption: Fig. 2. Steps of simulation.

Caption: Fig. 3. Example of biases in 1 simulation of albumin.

Caption: Fig. 4. AUC values of quality control rules and simple quality control parameters.

Caption: Fig. 5. AUC values of quality rules and simple quality control parameters from disturbed simulations.
Table 1. Settings for simulation of five analytes. (a)

                           Albumin                HbA1c

Description             Many measurements      Low sigma value; four
                        per day; normal        QC measurements
                        distribution of true

Measurements per        500 (50)               70 (6)
day (SD)

Distribution of         Normal;                Lognormal;
values                  center13: 3.474        center: 5.504
                        spread: 0.886          spread: 0.892
                        (unit: g/dl)           (unit: %)

Allowable total error   12.5% [2-7]            10% [4.89- 14.96]
[on interval]

  [alpha] (c)           0.067                  0.1
  [beta] (d)            0.031                  0.029
Quality controls
  QC_1                  3                      5
  QC_2                  6                      6
  QC_3                                         8
  QC_4                                         10

                          Testosterone                  Troponin I

Description             Bimodal distribution of       Many measurements
                        true values based on          close to limit of
                        sex                           detection

Measurements per        Male: 50(5)                   120(20)
day (SD)                Female: 30 (4)

Distribution of         Male: lognormal;              Lognormal;
values                  center: 3.505                 center: 0.160
                        spread: 2.402                 spread: 0.656
                        female: lognormal;            (unit: ng/mL)
                        center: 0.366
                        spread: 0.532 (unit: ng/mL)

Allowable total error   20.5% [0.2-20]                20% [0.1-35]
[on interval]

  [alpha] (c)           0.008                         0.003
  [beta] (d)            0.034                         0.033
Quality controls
  QC_1                  3                             0.05
  QC_2                  8                             0.95

                           Vitamin D3

Description             Seasonal variation of
                        true values

Measurements per        200 (20)
day (SD)

Distribution of         Lognormal;
values                  center: 19.585 [+ or -]
                        seasonal variation;
                        spread: 11.382
                        (unit: ng/mL)

Allowable total error   25% [5-50]
[on interval]

  [alpha] (c)           0.5
  [beta] (d)            0.05
Quality controls
  QC_1                  10
  QC_2                  30

(a) [alpha] and 6 are parameters of the characteristic function.

(b) Center and spread denote mean and standard deviation for normal

(c) [alpha] expresses constant imprecision at low concentration.

(d) [beta] expresses relative imprecision similar to the
coefficient of variation. For lognormal distribution, respective
values are also provided on a non-logarithmized scale for better
COPYRIGHT 2017 American Association for Clinical Chemistry, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Laboratory Management
Author:Bietenbeck, Andreas; Thaler, Markus A.; Luppa, Peter B.; Klawonn, Frank
Publication:Clinical Chemistry
Article Type:Report
Date:Aug 1, 2017
Previous Article:Single-Nucleotide Polymorphism Leading to False Allelic Fraction by Droplet Digital PCR.
Next Article:Rapid RHD Zygosity Determination Using Digital PCR.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters