Printer Friendly

Approach to maintaining comparability of biochemical data during long-term clinical trials.

In long-term medical trials it is essential that measurements of clinical and biochemical variables collected throughout the study are comparable. Measurements in long-term studies may be affected when analytical methods are changed, conventions on specimen storage revised, calibration procedures updated, and quality-control (QC) systems replaced. Secular changes may also occur in the characteristics of the underlying population [1]. Procedures are needed that can monitor the accuracy and performance of different analytical methods, decide when a substantial change in measurement has occurred, determine whether an adjustment to the data is required, choose which, if any, mathematical transformation should be applied, and verify the adequacy of these procedures.

The procedures we have developed are illustrated with data for glycohemoglobin and plasma triglyceride obtained over 15 years from the UK Prospective Diabetes Study (UKPDS) (2) [2], a randomized intervention trial designed to investigate whether intensive as opposed to conventional therapy of glycemic control can reduce morbidity and mortality in patients with type 2 diabetes. This trial was passed by the Central Oxford Research Ethics Committee and fulfills the criteria of the Helsinki Declaration, 1975 and 1983.


The UKPDS recruited 5102 newly diagnosed type 2 diabetic patients in 23 UK centers over 15 years from 1977 to 1991 [2] and is ongoing. Blood and urine samples were obtained regularly for biochemical analysis in a central laboratory [3] where sample dates, analytical methods, and assay results were recorded.

Biochemistry methodology for 15 UKPDS analytes has previously been described. Glycohemoglobin was measured as hemoglobin AI, (Hb [A.sub.1c] from 1978 to 1984 by isoelectric focusing (IEF), as Hb [A.sub.1] from 1984 to 1989 by electroendosmosis [4] (EEO), and as Hb [A.sub.1c], from 1989 until 1996 by HPLC with a Bio-Rad Diamat Automated Glycosylated Hemoglobin Analyzer (Bio-Rad Laboratories) [3]. Fasting triglyceride was measured between 1980 and 1985 with an enzymatic UV kit on a Pye UNICAM AURA spectrophotometer (AT1 Unicam) and from 1988 until the present with an enzymatic colorimetric kit (GPO-PAP, Boehringer Mannheim) on a Cobas FARA analyzer (Roche Diagnostica) with no correction for free glycerol.


Accepted internal laboratory quality-control and external quality-assurance procedures have been used throughout the UKPDS trial. To maintain comparability of data when improved assays were introduced, formal laboratory comparisons of biochemical methods were undertaken, and statistical overview techniques were used to detect unforeseen shifts (Table 1).


For each laboratory assay run, commercially available QC sera were measured at low, medium, and high concentrations. From 1986, these results have been entered into an in-house computer program (QSTAT) that determined whether the results were acceptable by modified Westgard rules [5]. These rules used the deviation of sequential QC sera results from the mean of 30 run-in measurements for each QC concentration. A single QC value [+ or -] 3 SD or two QC values [+ or -] 2 SD from the mean indicated that reassay was necessary (Fig. 1).

The central laboratory participated in appropriate external QA schemes. These were available for most analytes in the later stages of the study. Reports from these schemes were inspected to compare the performance of the methods used by the UKPDS laboratory with those of other laboratories and with available reference methods.



Analytical methods were updated during the study as improved technologies became available. Formal laboratory comparisons of any new method with the previous one involved the assay of at least 200 samples in parallel on several days and over a representative range of values.

Descriptive statistics were prepared from the data for each method, and the results for different methods were compared with appropriate statistical tests, including paired Mests, Mann-Whitney U-test, or Wilcoxon signed rank test for differences in central tendency and Kolmogorov-Smirnov test for differences between distributions. Scattergrams (Fig. 2A) and difference plots (Fig. 2B), as outlined by Bland and Altman [6], were inspected to identify differences between methods and to determine whether any differences were related to an offset or to the concentration of the analyte. An appropriate equation relating the two methods was calculated: e.g., a linear or quadratic regression model with a logarithmic or square root transformation of the data if required. For further statistical analyses this equation was used to realign the previous data to the current method (Fig. 3).


Representative data from a suitable population can be used to confirm the comparability of measurements over time. This approach assumes that the distribution of biochemical variables measured in random samples drawn at different times from a large representative population will not vary substantially, although the population demographics may nevertheless vary with time (e.g., the change in population cholesterol that occurred during the MRFIT trial). The reference population we used consisted of the newly diagnosed patients entering the study each year. To recruit a control population alongside the study population, even though this would have financial, workload, and recruitment implications, is now more desirable. To ensure that there were no major potentially confounding changes in population characteristics, such as body weight, the relevant clinical and biometric data were analyzed. Data from this population for each analytical method used during the study were compared with data from the current analytical method, to determine whether there were any systematic differences between a previous assay method and the current method. Box-and-whisker plots (Fig. 2C) showing the median and interquartile range (box), and 10th and 90th centiles (whiskers) for each analytical method were examined and compared with the median and 95% confidence interval of the current method. Appropriate statistical tests were applied to compare each method with the current method to determine whether there were significant differences between the methods.



In large trials, it is necessary to decide whether small but statistically significant differences are of clinical importance [7]. For Hb [A.sub.1c], and triglycerides we have used the criteria that it is not necessary to realign differences with statistical significance (P >0.01) or differences between the medians of <5% (Fig. 4). Such criteria are based on differences between normal and pathological populations; assay performance can also be used to assist in deciding whether it is necessary to realign data from structured laboratory comparisons of analytical methods.

In short-term trials, it may be sufficient to use Cusum charts [8] (a plot against time of the daily cumulative sum of the difference between the measurement and the established mean, keeping the sign of the difference) to detect changes across time within an analytical method. The quangle, or quality-control angle, chart [9] is more suitable for looking at data over a longer time period (Fig. 5). For a series [X.sub.1] ... [X.sub.n], with T the target and neutral value, the quangle after r steps can be calculated by taking a as the length of each segment and [theta] as the angle corresponding to 1 unit (the angular scale) such that

x = a [r.summation over (1)] cos [[theta]([X.sub.i] - T)]

y = a [r.summation over (1)] sin [[theta]([X.sub.i] - T)] (1)

These will be approximately the same as the co-ordinates for the cusum if [X.sub.i] - T is small, and [theta] is chosen to be equal to b/a [9].


Changes in the direction of the line (inflexion points) on a quangle plot indicate a change in the mean value of accumulating data. If the mean continually increases, the quangle will continue to change direction and is independent of the choice of a target value, and thus it is easier to separate time periods of different behavior than is the use of Cusum methods. The shape of the quangle plot is a useful visual aid where more than one change has occurred over the time period. A quangle plot was prepared for each variable, and the dates of changes of analytical methods were marked on the plot. Laboratory records of QC data on either side of any additional obvious inflexion points were checked for discontinuities. When an unexpected inflexion point was confirmed by a simultaneous change in QC data, the data were divided into two groups at this point and considered as different analytical methods for purposes of adjustment.



A sequential plot of QC data from QSTAT for Hb [A.sub.1c], measured by HPLC in the central laboratory, is shown in Fig. 1. The performance of analytical methods was assessed by monitoring the CV for 30 sequential QC samples at three concentrations: low, medium, and high. The mean concentration of each QC was compared with the value assigned externally for the QC, where available. Interassay CVs of <2% were obtained for both Hb [A.sub.1c] and plasma triglyceride. The UK National External Quality Assessment Schemes for Hb [A.sub.1c] and triglyceride and the Murex Diagnostics Ltd. Clinical Chemistry Quality Assessment Program for triglyceride were used for assessment of the laboratory performance, QA, and comparison with other laboratories and analytical methods.


INTRODUCTION OF IMPROVED ANALYTICAL METHODS Measurements of glycohemoglobin by the EEO method (Hb [A.sub.1]) were compared with measurements of the same samples by the HPLC method on the Bio-Rad Diamat analyzer (Hb [A.sub.1c]) in 296 samples across a range of samples from apparently healthy subjects and diabetic patients (Fig. 2, A and B). This showed that it was necessary to realign the previous EEO data to the HPLC by a linear regression equation:

Hb [A.sub.1c] = 0.83Hb [A.sub.1] - 0.54 (2)

Comparisons were also made for measuring Hb [A.sub.1] by the IEF and EEO methods.


Data were obtained from newly diagnosed patients at the recruitment and randomization visits between 1977 and 1991. Weight, height, age, and initial fasting plasma glucose were inspected and showed no systematic changes over time. Data from these patients were then used to confirm that other longitudinal data remained comparable.

Data from this reference population for the three different assay methods for measuring glycohemoglobin (IEF, EEO, and HPLC) are illustrated in Fig. 2C with the shaded areas showing data before realignment to the HPLC method [the median (95% confidence interval) of the HPLC method is indicated for reference]. These data confirmed that the formulas obtained from the direct laboratory comparisons were robust.

A quangle plot of the plasma triglyceride data identified a discontinuity during 1982 (Fig. 5). Laboratory QC data confirmed a step change in all three concentrations of QC during 1982 that had previously been undetected by visual inspection of plots of consecutive QC values (Levey-Jennings charts). The data from the reference population were divided at this inflexion point, and an adjustment was made as if they were different analytical methods. In total, five inflexion points were identified on the quangle plots for the 15 UKPDS analytes, only 2 of which were confirmed by QC records. If the quangle point was confirmed by QC records, a decision was made to realign the data for these two analytes by the predetermined rules for statistical significance vs clinical difference. The relevant mathematical transform was calculated, and a graph of centiles for adjusted data was plotted against those for the current method to check that the transformation was valid across the measured range (Fig. 3).


The need for comparability of measurements in long-term trials is self-evident, although formal procedures for checking data over many years are rarely specified. Where only clinical end-points only are of interest, careful randomization can protect major trial outcomes from systematic bias. On the other hand, if data from long-term studies are to be used to monitor progression, deterioration, emergence of new risk factors, or markers of disease, then monitoring and maintaining comparability of biochemical and clinical data from changing analytical methods is essential.

In clinical trials, analytical methods need to be updated when improved techniques become available. The alternative would be to constrain a trial to obsolete technology, which would ultimately render the data irrelevant to modern practice and the results unpublishable. Adherence to older methods is not an option because the accuracy and precision of laboratory techniques used in the 1970s are not acceptable in the 1990s. For example, measurement of glycohemoglobin has changed from Hb [A.sub.1] to Hb [A.sub.1c], with a concomitant reduction in the inter-assay CV from 10% to 2% and different normal ranges for the analytes [10]. Storage of some samples at -70[degrees]C may allow retrospective comparisons, but with large trials not all samples could be reassayed years later. Direct laboratory comparison of samples across the appropriate range is essential when laboratory methods are changed. For the UKPDS, the statistical evaluation procedures described here verified that these comparisons were reliable.

Internal QC and external QA schemes in the laboratory are necessary to monitor the performance of an analytical method. However, currently available external QA schemes do not provide adequate long-term monitoring. QA schemes report the mean [+ or -] SD of results from all participating laboratories and also more-specific method means. This norm-referencing is not necessarily stable in the long term as laboratories join and leave the schemes or change their methodologies, so the reported norms and sizes of groups for each method will vary. Existing QA schemes rely on lyophilized nonhuman sera, which may not perform the same as human samples when assayed. However, new external QA schemes [11] aim to compare individual laboratory performance with reference methods for native human blood or serum. These schemes, which are being organized internationally, should allow results from different laboratories and methods to be compared. Networks of reference laboratories are being set up in Europe with guidelines from external assessment to achieve an accuracy-based uniform measurement scheme with traceability to the true value, i.e., the European Reference System for the Medical Laboratory [12].

A primary reference material, which is a mixture of pure Hb [A.sub.1c], and Hb [A.sub.0], along with a reference method that specifically measures Hb [A.sub.1c], is being developed by an IFCC working group on calibration of Hb Al, [13]. This reference system will in time be used for the primary calibration of routine tests; a common calibrator for Hb[A.sub.1c], has been shown to reduce interlaboratory variation markedly [14]. While this system is being developed, the method from the Diabetes Control and Complications Trial [15] will be used as a reference method for international standardization and comparison of results from different methods.

Assurance is needed that measurements carried out 10 years ago are comparable with those carried out today. In the absence of a recognized reference method, one solution is the use of data from a reference population. These data can be taken from measurements in an external population of apparently healthy subjects, or, as described in this paper, an internal population where the distributions of the biochemical characteristics of a large population at entry to the study were assumed to be stable when anthropometric measurements, such as body weight and height, had not changed. One should be aware that this is nevertheless no guarantee of complete stability: Populations change their characteristics with time and age [16]. However, a judiciously chosen population will allow comparisons between the study and background demographic change. This procedure allows a longitudinal check that values determined in the early years of the study are not significantly different from those in subsequent years and confirms that no previously undetected anomalies exist in the data.

The quangle plot provides another method for monitoring changes in data, which is probably more useful over long periods than the more commonly used Cusum [8]. When examining longitudinal data in a reference population, the quangle can be used to assess comparability of data, and this technique is applicable to either internal or external reference populations. Only two possible discontinuities in the UKPDS data identified by the quangle plot were confirmed by reanalysis of previous QC data. Both occurred early in the study when laboratory QC involved only visual inspection of data (before the advent of computerized Westgard rules) and it is possible that the change in triglyceride measurement in 1982 was caused by a change of reagents.

As the number of samples in a comparison increases, the power of statistical techniques to identify smaller and smaller conventionally statistically significant differences increases. In a large study differences can be identified as statistically significant although they may not be of biological or clinical relevance [17]. It is therefore necessary to determine whether any discontinuities detected by quangle from large data sets and confirmed by QC records are of clinical importance to the study. The decision to realign the data should be based on the biological and analytical characteristics of the variable and the nature of the clinical study. This interpretation of clinical laboratory data has already been suggested as the clinical usefulness approach by Petersen et al. [18] along with the assessment of analytical performance and biological within- and between-subject variation.

That laboratory data from clinical trials can be compared with data from methods used in other laboratories is important. Steps should be taken to use the procedures described in this paper for comparison with national and international standards where available. The main comparative outcomes of long-term trials, such as the UKPDS, are protected from systematic bias by an appropriate experimental design and random allocation to different therapies. However, failure to ensure comparability of data from different analytical methods may mean that substantial and important longitudinal descriptive results remain undetected or that trends over time are not correctly identified. Trial reports should therefore specify what procedures have been used to ensure that clinical and biochemical measurements made over extended periods of time are comparable and, in future, should detail comparisons of laboratory methods with international references.

We thank the DRL Biochemistry Laboratory; Martin Payne and Ted Bown from the Clinical Chemistry Section; Robin Carter, Sue Brownlee, Rachel Eddy, Karen Fisher, Dick Jelfs, Rajeev Mair, Rachel Mullins, and Robert Powrie from the Immunochemistry Section; Pauline Sutton, James Brown, Christopher Groves, and Natasha Lawrence from the Immunoassay Laboratory; and Margaret Evans and Lesley Stowell from Specimen Reception. We also thank Caroline Wood and Carol Hill for typing the manuscript. We are grateful for the cooperation of the patients and staff at the centers and for grants from the Medical Research Council, British Diabetic Association, National Institutes of Digestive Disorders and Kidney Disease and National Eye Institute at the National Institutes of Health, US, Department of Health, British Heart Foundation, Health Promotion Research Trust, Becton Dickinson, Boehringer Mannheim, Bristol Myers Squibb, Hoechst, Lilly, Lipha, Novo Nordisk, and the Clothworkers Foundation.


[1.] Vartiainen E, Puska P, Pekkanen J, Tuomilehto J, Jousilahti P. Changes in risk factors explain changes in mortality from ischaemic heart disease in Finland. BMJ (Br Med J) 1994;309: 23-7.

[2.] UKPDS Group. UK Prospective Diabetes Study VIII: study design, progress and performance. Diabetologia 1991;34:877-90.

[3.] UKPDS Group. UK Prospective Diabetes Study XI: biochemical risk factors in type 2 diabetic patients at diagnosis compared with age-matched normal subjects. Diabetic Medicine 1994;11:534-44.

[4.] Moore JC, Bown E, Outlaw MC, Jelfs R, Holman RR, Turner RC. Glycosylated haemoglobin: comparison of five different methods, including measurement on capillary blood samples. Ann Clin Biochem 1985;23:85-91.

[5.] Westgard JO, Barry PL, Hunt MR, Groth T. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin Chem 1981;27: 493-501.

[6.] Bland JIM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-10.

[7.] Porter MA. Practical significance? R Stat Soc News 1995;22:5.

[8.] Whitby LG, Mitchell FL, Moss DW. Quality control in routine clinical chemistry. Adv Clin Chem 1967;10:65-76.

[9.] North WRS. The Quangle--a modification of the Cusum chart. Appl Statistics 1982;31:155-8.

[10.] Standing SJ, Taylor RP. Glycated haemoglobin: an assessment of high capacity liquid chromatographic and immunoassay methods. Ann Clin Biochem 1992;29:494-505.

[11.] Packard CJ, Bell MA, Eaton RH, Dagen MM, Cassidy M, Shepherd J. A pilot scheme for improving the accuracy of serum cholesterol measurement in Scotland and Northern Ireland. Ann Clin Biochem 1993;30:387-93.

[12.] Thienpont L, Franzini C, Kratochvila J, Middle J, Ricos C, Siekmann L, et al. Analytical quality specifications for reference methods and operating specifications for networks of reference laboratories. Eur J Clin Chem Clin Biochem 1995;33:949-57.

[13.] Holzel W, Miedema K, Finke A, Goldstein D, Goodall I, Jeppsson J, et al. Development of a reference system for the international standardisation of HbA1c/glycated haemoglobin determinations [Abstract]. Proc Int Congr Clin Chem 1996;16:374.

[14.] Weykamp CW, Penders TJ, Miedema K, Muskiet FA, van der Slik W. Standardisation of glycohemoglobin results and reference values in whole blood studied in 103 laboratories using 20 methods. Clin Chem 1995;41:82-6.

[15.] The Diabetes Control and Complications Trial Research Group. The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N Engl J Med 1993;329:977-86.

[16.] Petersen PH, Blaabjerg O. Reference intervals for plasma proteins. Upsala J Med Sci 1994;99:307-14.

[17.] Chariton B. Does the size count for the clinical survey? Hosp Doctor 1995;6:26.

[18.] Petersen PH, Groth T, de Verdier C-H. Principles for assessing analytical quality specifications ("AQSpecs") and their use in design of control systems. Upsala J Med Sci 1993;98:195-214.


UK Prospective Diabetes Study (UKPDS) Group, Diabetes Research Laboratories, Radcliffe Infirmary, Woodstock Road, Oxford OX2 6HE, UK.

(1) Department of Clinical Biochemistry, Medical School Buildings, Aberdeen Royal Infirmary, Foresterhill, Aberdeen, AB9 2ZB, UK.

(2) Nonstandard abbreviations: UKPDS, UK Prospective Diabetes Study; Hb, hemoglobin; IEF, isoelectric focusing; EEO, electroendosmosis; QA, quality assurance.

* Author for correspondence. Fax 00-44-1865-723884; e-mail

Received October 28, 1996; revision accepted June 5, 1997.
Table 1. Statistical overview techniques.
QC and QA

* Continuously monitor assay performance

* Review internal QC results for all runs

* Apply accept/reject rules for multilevel QC sera

* Monitor using external QA schemes

* Traceability of assay measurements to national or
international standards

* Storage of samples at -70[degrees]C for future analysis
Introduction of improved analytical methods

* Perform formal parallel laboratory assays to compare old and
new analytical methods

* Compare descriptive statistics for both methods

* Examine scattergram and difference plot

* Apply statistical tests of differences

* If appropriate, realign previous data for statistical analysis

Maintaining comparability of data by using a reference

* Check for long-term comparability of data

* Analyze results for each assay method with respect to current

* Check for unexpected changes with quangle plots

* Evaluate inflexion point with significance vs difference plots

* If appropriate, realign previous data for statistical analysis
COPYRIGHT 1997 American Association for Clinical Chemistry, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1997 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Laboratory Management
Author:Cull, Carole A.; Manley, Susan E.; Stratton, Irene M.; Neil, H. Andrew W.; Ross, Iain S.; Holman, Ru
Publication:Clinical Chemistry
Article Type:Clinical report
Date:Oct 1, 1997
Previous Article:Measurement of low-density lipoprotein particle size by high-performance gel-filtration chromatography.
Next Article:Early assessment of patients with suspected acute myocardial infarction by biochemical monitoring and neural network analysis.

Related Articles
Clinical trials: A golden opportunity for laboratories?
Clinical trial reimbursement integration and management.
Preparing for FDA BIMO inspections: an industry expert shares tips and useful data for developing a quality-systems approach for conducting medical...
Biochemical markers of bone turnover: Why theory, research, and clinical practice are still in conflict.
Toward a laboratory data interchange standard for clinical trials.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters