Time to pay attention to reagent and calibrator lots for proficiency testing.
PT samples are intended to mimic patient samples and thus to provide information that relates to medical decisions for patient samples. There is consensus that PT using commutable samples reflects the performance for patient samples, and that when noncommutable samples are used there may be artifacts that typically limit assessment of an individual laboratory's result to comparison to a peer group of similar measurement procedures and prevent using the mean or median results from different measurement procedures to assess agreement among results for patient samples (3).
In this issue of Clinical Chemistry, Stavelin et al. (4) demonstrate that different performance by different reagent lots or batches represented in a PT event can lead to erroneous decisions regarding the performance of individual laboratories or of measurement procedures. Furthermore, reagent lot-specific effects can influence interpretation when commutable or noncommutable samples are used. The Stavelin et al. report was based on point-of-care devices for which each reagent lot was calibrated by the manufacturer. However, the observations are generalizable to measurement procedures that use separate reagents and calibrators.
Consider the situation when commutable PT samples are used. The target value may be established by a reference measurement procedure, in which case a miscalibrated measurement procedure can cause participant failure. Subsequent follow-up by the producer of the measurement procedure can improve the calibration process to prevent recurrence. However, if multiple lots of the reagent or calibrator are in use, it may be impossible to determine from the aggregate results for a measurement procedure whether a specific reagent and/or calibrator lot was responsible unless the specific lots used were reported along with the PT sample results.
When a target value for a commutable PT sample is established as an all-results mean, additional complications can occur. An all-results mean will be influenced by erroneous calibrations. As an example, consider that one-half of participants' measurement procedures were miscalibrated by a 30% positive bias. The all-results mean will be in the middle and will be 15% off for both the correctly and incorrectly calibrated participants, possibly causing a large failure rate but obscuring whether a given participant was producing correct or incorrect results. A real situation will not be so simplistic, but such a situation could be responsible for inappropriate participant grading and for failing to identify a measurement procedure that had a calibration problem, because the possibility of lot-specific errors in results would be confounded with a larger number of other results that could mask the problem. If reagent and calibrator lot information was available, the PT provider could assess whether the differences in results were associated with a particular reagent or calibrator lot, grade participants appropriately, and inform the producer of a problem measurement procedure that calibration issues were observed and corrective action was needed.
When noncommutable PT samples are used, the target value is established as the all-results mean for a peer group of similar measurement procedures. Peer groups typically use the same reagent and calibrator formulations from the same in vitro diagnostics manufacturer, as well as the same instrument or instrument family from 1 manufacturer. Reagent lots may be calibrated by the manufacturer or calibrated using several different lots of calibrator. Stavelin et al. (4) clearly show that PT samples can be noncommutable with different reagent lots within a homogeneous peer group. The common assumption that PT materials are commutable within a given peer group can be incorrect unless only a single reagent lot is used by all participants in that peer group. Peer groups that include laboratory-developed measurement procedures may be too diverse and complex to distinguish lot-specific deficiencies from PT results.
Miller et al. (5) demonstrated that 41% of serum-based QC materials had noncommutability bias with different reagent lots for a large number of analytes and measurement procedures. Because noncommutable serum-based PT samples are prepared very similarly to QC materials, PT samples are likely to have a fairly high occurrence of reagent lot-specific noncommutability bias. Reagent lot-specific noncommutability bias may manifest itself as a large SD not reflecting the interlaboratory performance for real patient samples.
Erroneous conclusions regarding an individual participant's performance in a PT event can be made when several reagent lots are used by participants and there are different noncommutability biases present for different reagent lots. In this situation, the all-results mean used as the target value will bear no valid relationship to the expected values that each participant should recover. Consequently, some participants may receive unacceptable scores when results for patient samples are correct, whereas other participants may receive acceptable scores when results for patient samples are incorrect. If the reagent lots were known to the PT provider, the participant results could be evaluated against reagent lot-specific target values and a more appropriate assessment of performance could be made. If no reagent lot-specific biases were observed, then the participant results could be aggregated into a single peer group for evaluation. Because accreditation requirements are frequently based in part on performance in PT programs, it is important that laboratories are graded using criteria that take into consideration the limitations of noncommutable PT samples, including those caused by different reagent lots.
The practice of aggregating peer groups to obtain adequate numbers for grading is not scientifically defensible unless there is evidence that the results are not confounded by noncommutability, which can be different for every PT event because reagent lots are likely to be different. Aggregating peer groups may give an acceptable grade to laboratories that are in fact poor performers and give unacceptable grades to laboratories that are good performers.
Because noncommutability is a reagent lot-specific limitation, calibrator lots do not influence noncommutability bias. Faulty calibrator lots will affect noncommutable PT results in the same manner as commutable PT samples. However, it is not possible to identify PT results that are influenced by defective calibrators unless the noncommutability bias can be parsed from the data, further emphasizing the importance of reagent and calibrator lot reporting in PT programs.
Although the limitations of noncommutable PT samples have been recognized for >2 decades (3, 6, 7), erroneous recommendations continue to be made to use PT results from noncommutable samples to estimate the bias of a measurement procedure and to make judgments about the agreement of results among different measurement procedures. We now have evidence that noncommutability of PT materials can be different for different reagent lots, further emphasizing that this insidious artifact needs to be recognized to prevent incorrect inferences about measurement procedure performance based on flawed PT data from noncommutable samples. PT providers should strive to use commutable samples whenever possible. When commutable PT samples are not possible, the limitations of the data should be recognized, incorrect inferences be avoided, and participant grading be based on scientifically defensible principles.
PT providers can improve the value of their programs by collecting reagent and calibrator lot information to assist in more effectively verifying acceptable performance and influencing improvement in laboratory testing. In particular, participant grading will be more reflective of actual performance by identifying and removing artifacts from reagent or calibrator lot influences. Introducing tools to more appropriately assess calibration lot-to-lot differences in results will contribute to recognizing the magnitude of this poorly examined source of variability and lead to improvements in calibration consistency that remain a challenge for laboratory medicine.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:
Employment or Leadership: W.G. Miller, Clinical Chemistry, AACC.
Consultant or Advisory Role: None declared.
Stock Ownership: None declared.
Honoraria: None declared.
Research Funding: None declared.
Expert Testimony: None declared.
Patents: None declared.
(1.) Miller WG, Myers GL, Rej R. Why commutability matters. Clin Chem 2006;52:553-4.
(2.) Miller WG, Myers GL. Commutability still matters. Clin Chem 2013;59:1291-3.
(3.) Miller WG, Jones GRD, Horowitz GL, Weykamp C. Proficiency testing/external quality assessment: current challenges and future directions. Clin Chem 2011;57:1670-80.
(4.) Stavelin A, Riksheim BO, Christensen NG, Sandberg S. The importance of reagent lot registration in external quality assurance/proficiency testing schemes. Clin Chem 2016;62:708-15.
(5.) Miller WG, Erek A, Cunningham TD, Oladipo O, Scott MG, Johnson RE. Commutability limitations influence quality control results with different reagent lots. Clin Chem 2011;57:76-83.
(6.) Miller WG, Kaufman H, McLendon WW, eds. College of American Pathologists Conference XXIII: Matrix effects and accuracy assessment in clinical chemistry, June 1992. Arch Pathol Lab Med 1993;117:343-436.
(7.) Ross JW, Miller WG, Myers GL, Praestgaard Jens. The accuracy of laboratory measurements in clinical chemistry: a study of eleven routine analytes in the College of American Pathologists Chemistry Survey with fresh frozen serum, definitive methods and reference methods. Arch Pathol Lab Med 1998;122:587-608.
W. Greg Miller *
Department of Pathology, Virginia Commonwealth University, Richmond, VA.
* Address correspondence to this author at: 403 N 13th St, Room 501, Richmond, VA 23298. Fax 804-828-0353; e-mail email@example.com.
Received February 26,2016; accepted March 1,2016.
Previously published online at DOI: 10.1373/clinchem.2016.255802
|Printer friendly Cite/link Email Feedback|
|Author:||Miller, W. Greg|
|Date:||May 1, 2016|
|Previous Article:||B-type natriuretic peptide testing in the era of neprilysin inhibition: are the winds of change blowing?|
|Next Article:||DETECT the extremes that usually remain undetected in conventional observational studies.|