# Comparison of eight computer programs for receiver-operating characteristic analysis.

ROC analysis is now a standard tool to assess, define, and compare
the diagnostic validity of laboratory tests or diagnostic measures (1).
Medline searches have shown that the number of publications using ROC
curves has increased from ~300 studies in the 1980s to >5000 studies
since 1990. Several computer programs have been developed to generate
ROC curves, and some of the early programs were briefly described in
1993 (2). However, all of these early programs had limitations for easy
and accessible practical use. Within the last several years, commercial
and public domain programs have become available for complex ROC
analysis and ROC plotting. To our knowledge, an overview and comparison
of these newly available ROC programs has not been performed.

The aims of this study were (a) to survey currently available ROC programs, (b) to compare these ROC programs for their ease of use, and (c) to evaluate their relative utility in ROC analysis.

Material and Methods

ROC SOFTWARE STUDIED

Eight currently available ROC programs were evaluated (Table 1). All programs run on IBM-compatible computers. We performed all our evaluation studies on computers running under Microsoft Windows 2000 with at least 128 MB of RAM, a Pentium processor, and 250 MB of space on the hard drive. The general features of these programs are summarized in Table 1. The software Stata 7.0 was not included in this comparative study. We were unable to make a complete evaluation of this software because all necessary calculations could not be performed although we repeatedly discussed the issues with company representatives in Germany via the company hotline.

DATASETS FOR ROC ANALYSIS

To compare the programs, we used a previously described dataset of 928 men with prostate cancer (n = 606) and benign prostatic hyperplasia (n = 322) and subgroups of this population (3). ROC analyses of total prostate-specific antigen (tPSA) s free PSA (fPSA), the ratio of fPSA to tPSA (fPSA/tPSA), and of other values calculated by an artificial neural network approach with the mentioned dataset (3) were carried out to estimate the advantages and disadvantages of each program.

EVALUATION CRITERIA

To evaluate the programs, five simple criteria were chosen to encompass the ease of learning program operations, use of the software, and data handling and to characterize the usefulness of a each program (Table 2). A maximum percentage value was assigned to each criterion. The sum of all percentage values gives the final score. The criteria are described briefly below:

Data input. It is important to import or copy data into the program easily without any intermediate storage or special format, to be able to edit the data in the program (e.g., in a spreadsheet), and to save more than one dataset. The tendency of each program to crash was also taken into consideration.

Data output. Presentation of the results and processing of the exported data were assessed. The program should be able structure the results comprehensively. Processing of data characterizes the capability of the program to export and save the results, including the calculated graphs, as well as to draw more than one curve in one graph. This facility is very important for comparing several tests with each other.

Analysis results. This criterion was the most important one and included correctness and completeness of the results. It is obvious that correctness of results is mandatory. Incorrect results had to be considered as an exclusion criterion to recommend the respective software for ROC analysis.

There are several approaches to calculate the area under the ROC curve (AUC) for the comparison of ROC curves. Table 3 lists the main characteristics and limitations of three commonly used methods. It is crucial to know whether the curves result from independent or dependent (correlated) data. In laboratory diagnostics, the values of interest are in most cases measured on the same patients. We therefore considered only methods for correlated data. A second distinction can be made between nonparametric and parametric methods. Parametric methods are efficient under certain assumptions. These assumptions are often not fulfilled in practice, and their results are biased. Nonparametric methods should be used if the variables follow an ordinal or skewed distribution or if there are small sample sizes. A parametric approach should be preferred in case of a large sample size and continuous measurements.

The subcriterion completeness assessed the capability of a program to calculate all necessary ROC data for a reasonable decision regarding a diagnostic test. This included the AUC with its confidence intervals (CIs), the sensitivities and specificities at certain cutoffs with their CIs, the presentation of the graph, and the ability to compare the RUCs showing the respective statistical significance values.

Program comfort. This point of the comparison dealt with the compatibility of the program with standard calculation, text, and presentation programs, e.g., Microsoft Excel, Word, or PowerPoint. Programs were also evaluated based on the availability of help functions, tutorials, and demonstration versions and ease of obtaining information regarding program updates.

User manual. This criterion assessed the structure and comprehensibility of the user manual and whether the manufacturer provides an online manual, a homepage, or an e-mail address to solve current problems.

Results

The ROC programs were tested with a previously described dataset and various subsets (3). The assessment ratings for the five evaluation criteria are given in Table 2 for each program. As shown in one representative example (Table 4), AUC calculated by the various programs, which in some cases used different calculation methods as described below, differed only marginally. In addition, equivalent statistical differences between the RUCs of the various markers were obtained. Thus, the essential demand concerning the correctness of results seemed to be fulfilled by all programs compared. Moreover, the other criteria were helpful to assist in ranking the software for usefulness. The individual programs are described below.

AccuROC

AccuROC uses the method of DeLong et al. (4). To our knowledge, at this stage it is the only program that uses this method. The layout of the program is very well structured, and because of the comprehensive manual and the up-to-date homepage, the program is easy to learn. Up to three curves can be drawn into one graph, and the coordinates of each curve can be saved, which makes it possible to put more than three curves in one graph with use of a calculating program such as Excel. Furthermore, AccuROC can calculate the CIs and SD with a bootstrap method.

A serious drawback of this program is that except for the graph and its coordinates, none of the other results can be saved or exported; they can only be printed. If a diagnostic marker shows that lower values are associated with a higher risk of disease, all the test values have to be transformed by rendering them negative, manually or using a spreadsheet. This procedure makes the data input quite complicated.

Analyse-It

This software was published in 2001. The ROC analysis is performed according to the method of Hanley and McNeil (5, 6). According to the information of the software developers, an update was planned for the end of 2002. This update should use the method of DeLong et al. (4). It is an add-in program for Microsoft Excel. Like the software MedCalc, it is a program that implements several statistical procedures, including ROC analysis. It is simple to use and provides a very good online manual, help function, and tutorial. An advantage of its integration into Excel is that the interplay with other programs is excellent. Data input is easy, and the layout is clearly arranged. All necessary results are calculated in one step, and up to three curves can be displayed in one graph.

Unfortunately, RUCs can not be compared if any AUC is <0.7, but this will also be changed in the update version. Another drawback of this program is that it does not calculate CIs for the sensitivities and specificities.

CMDT

CMDT is a freeware program and can be downloaded from the internet (Table 1). An estimate of the AUC is given by the Wilcoxon rank-sum statistic. For comparison of ROC curves, it uses a permutation test suggested by Venkatraman and Begg (7).

The drawbacks of this program are that it is prone to crashing, the graph can barely be edited in the program, and only one curve can be displayed. This makes it impossible to compare curves visually. Furthermore, the graph is not of publication quality and has to be saved as an extended metafile to be processed in another graphics program.

The advantages of the program are that it uses a bootstrap method to calculate the CIs and that the data can be edited in the program.

GraphROC

The program GraphROC uses the method of Hanley and McNeil (5, 6) to calculate the ROC curve. It is one of the first commercially available programs on the Windows platform and is still in use (8). GraphROC is a longwinded program. Creating an input file is complicated, and it is not possible to edit the data after loading them into the program. Every result has to be copied via clipboard to save it. To edit the graph, it has to be copied via clipboard into another graphics program. In addition, the program is susceptible to crashing.

The advantages of GraphROC are the ability to draw several curves in one graph and the opportunity to compare paired and unpaired datasets. It is also possible to compare curves at a certain sensitivity or specificity cutoff, which is, as far as we know, a feature that only GraphROC provides. A demonstration version of GraphROC can be downloaded.

MedCalc

MedCalc also works with the method of Hanley and McNeil (5,6). This program is very interesting for those users who wish to do more than just ROC analysis because it provides a wide range of other special biomedical statistics, e.g., Bland-Altman plots, Passing-Bablok regression, and logistic regression. The data import is very easy and is possible from Excel, SPSS, dbase, Lotus, and as a text file. The layout is clearly arranged, it is possible to export data, and the graph can be edited in the program. MedCalc provides an online manual, and a 30-day demonstration version can be downloaded from the company homepage.

A clear disadvantage of this program is that only two curves can be presented in one graph.

mROC

mROC is a computer program that implements an approach of combining the ROC curves of several tumor markers or test values by the best linear combination, which maximizes the AUC under the hypothesis of a multivariate gaussian distribution (9). Methods for estimating CIs for the AUC are also provided (10). Furthermore, conventional ROC analysis is possible. Learning to work with the program is easy, the layout is well structured, and the provided manual is intelligible. However, the data input is quite complicated, and the data cannot be edited in the program. Numerical and graphic results can be exported. Unfortunately, only one curve can be displayed in a graph, and a comparison of different ROC curves is not possible.

By combining several markers or tests into one ROC curve, thus creating a "virtual marker", this program brings interesting additional new aspects to ROC analysis. Nevertheless, it cannot be recommended for a convenient ROC analysis.

ROCKIT

ROCKIT is a free program developed by C.E. Metz et al. (11-13). Although it is mathematically a very well thought-out program, we would not recommend this program unless the user has a statistical background. It is uncomfortable to create an input file, the layout is somewhat confusing, the interplay with other programs is not optimized, it does not have a help function, and it frequently crashed when we used it.

Apart from these disadvantages, it calculates all necessary results, and with the included software PLOTROC (a program in Excel), several curves can be displayed in one graph.

SPSS

Although SPSS is a widely used statistical program, the ROC analysis within this package is not yet fully developed. In SPSS it is not possible to compare ROC curves. More than one curve in a graph can be displayed only if either higher or lower values of a marker are associated with a higher risk of disease. Despite the advantage of this program to show a wide range of other statistics, a valid ROC analysis cannot be performed with this software.

As can be seen in Table 2, we did not find any software that fulfilled all our expectations perfectly. Every program had advantages and disadvantages. More detailed characteristics of each program are summarized in Table 5.

Discussion

Since the original paper by Metz (14) describing ROC analysis and its use in optimizing diagnostic strategies, many enhancements have been made to further improve its use (2,5). ROC analysis has recently been included in the checklist for reporting studies concerning diagnostic accuracy of medical tests (1). Other studies have focused on preconditions and their influence on diagnostic performance (15). Most studies comparing tumor markers (e.g., PSA and its molecular forms) are already using ROC comparisons (3). To perform these ROC comparisons, many commercially available programs have been introduced (4,7-14,16); however, to our knowledge, a comparison study of the available programs regarding their technical and mathematical aspects has not been published. With this study, we analyzed the advantages and drawbacks of eight ROC programs to find the bestoptimized program for ROC analysis for clinicians. The programs Analyse-it, AccuROC, MedCalc, and to a certain extent GraphROC show good performance, but each program has different limitations.

The results of the comparison show that three of the eight programs can make ROC analysis easier and more economical. The leading program is Analyse-it with a final score of 91%. Although this program received maximum scores for the criteria data input, software comfort, and user manual, it is not acceptable that only three curves can be displayed and that the CIs for the sensitivities and specificities are not calculated. However, add-in software for a program, such as Excel, that is already widely used is potentially valuable, and if the drawbacks can be removed in a future version, this software could make ROC analysis much easier. Except for SPSS, none of the other programs provides as good a help function and tutorial. Questions concerning the program are answered quickly via e-mail. Therefore, the price is acceptable considering such good service. Additionally, a full demonstration version can be downloaded at www.analyseit.com.

In second place is AccuROC with a total score of 85%. Its use of the totally nonparametric method of DeLong et al. (4) and bootstrap methods (17) and its well-structured layout are the strong points of this program. On the other hand, complicated data input and the fact that data output (except the graph) can only be printed and not be saved or copied are disadvantages. Another drawback is the limited license for 2 years and the limited use of this program for only one computer. If one attaches great importance to highly accurate results and accepts the mentioned drawbacks, we can recommend AccuROC.

The third software that we recommend is MedCalc, with a total score of 84%. Although the ROC analysis is only one tool of this program, all necessary parameters are calculated. Data and results are clearly arranged, and the general handling is easy. Unfortunately, only two curves can be presented in one graph, which limits the relevant use of this program. If it were not for this drawback, MedCalc would fulfill most of our expectations of efficient ROC analysis. Even the price is reasonable, considering the additional statistical methods included. For those who do not need a multicurve presentation and are interested in a wide range of other statistics, MedCalc is a reasonable software.

GraphROC achieved a score of 78%. The completeness of the results cannot be criticized. All the main parameters can be calculated with this software. It even has a feature that shows every possible cutoff point with its sensitivity and specificity in a separate diagram with automatic updating of clinical sensitivity and specificity values, by use of simple mouse clicks. The main drawbacks are the user-unfriendly data input and the longwinded processing of results and graphs. The user-friendliness of the program would be improved if there was a way to export the points of the ROC curve to either a text file or spreadsheet. This would give the user more flexibility in terms of graphic capability. GraphROC can still compete with the other programs for ROC analysis, although the software has not been further developed since 1996.

The shortcomings of the other four programs outlined above make it difficult to recommend these programs for regular ROC analysis.

In summary, it is surprising that valid ROC analysis with all necessary data and a good plotting function is not offered in a single program. It should not be necessary to use more than one software to perform a valid ROC analysis. Therefore, the programs Analyse-it, AccuROC, or MedCalc should be enhanced as described above to provide all necessary functions.

We gratefully acknowledge Prof. Wernecke for helpful suggestions and Silke Klotzek for helpful technical assistance. The study contains parts of the thesis of S.W.

Received September 12, 2002; accepted December 11, 2002.

References

(1.) Bruns DE, Huth EJ, Magid E, Young DS. Toward a checklist for reporting of studies of diagnostic accuracy of medical tests. Clin Chem 2000;46:893-5.

(2.) Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine [Review]. Clin Chem 1993;39:561-77.

(3.) Stephan C, Jung K, Cammann H, Vogel B, Brux B, Kristiansen G, et al. An artificial neural network considerably improves the diagnostic power of percent free prostate-specific antigen in prostate cancer diagnosis-results of a five year investigation. Int J Cancer 2002;99:466-73.

(4.) DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837-45.

(5.) Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29-36.

(6.) Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristics curves derived from the same cases. Radiology 1983;148:839-43.

(7.) Venkatraman ES, Begg CB. A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996;83:835-48.

(8.) Kairisto V, Poola A. Software for illustrative presentation of basic clinical characteristics of laboratory tests-GraphROC for Windows. Scand J Clin Lab Invest 1995;55(Suppl 222):43-60.

(9.) Su JQ, Liu LS. Linear combinations of multiple diagnostic markers. J Am Stat 1993;88:1350-5.

(10.) Reiser B, Faraggi D. Confidence intervals for the generalized ROC criterion. Biometrics 1997;53:644-52.

(11.) Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27: 723-31.

(12.) Metz CE, Wang PL, Kronman HB. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Deconinck F, ed. Information processing in medical imaging. The Hague: Nijhoff, 1984:432-45.

(13.) Metz CE. Statistical analysis of ROC data in evaluating diagnostic performance. In: Herbert D, Myers R, eds. Multiple regression analysis: applications in the health sciences. New York: American Institute of Physics, 1986:365-84.

(14.) Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-98.

(15.) Jung K, Stephan C, Lein M, Brux B, Sinha P, Schnorr D, et al. Receiver-operating characteristic as a tool for evaluating the diagnostic performance of prostate-specific antigen and its mo lecular forms-what has to be considered? Prostate 2001;46: 307-10.

(16.) Kramar A, Faraggi D, Fortune A, Reiser B. mROC: a computer program for combining tumour markers in predicting disease states. Comput Methods Programs Biomed 2001;66:199-207.

(17.) Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall, 1993:436pp.

CARSTEN STEPHAN, [1] * SEBASTIAN WESSELING, [1] * TANIA SCHINK, [2] and KLAUS JUNG [1] [dagger]

[1] Departments of 'Urology and [2] Medical Biometry, University Hospital Charite, Humboldt University, D-10098 Berlin, Germany.

* Both authors contributed equally to this article. tAddress correspondence to this author at: Department of Urology, University Hospital Charite, Humboldt University Berlin, Schumannstrasse 20/21, D-10098 Berlin, Germany. Fax 49-30-450-515904; e-mail Klaus.jung@charite.de.

[3] Nonstandard abbreviations: tPSA and fPSA, total PSA and free prostate- specific antigen, respectively; AUC, area under the ROC curve; and CI, confidence interval.

The aims of this study were (a) to survey currently available ROC programs, (b) to compare these ROC programs for their ease of use, and (c) to evaluate their relative utility in ROC analysis.

Material and Methods

ROC SOFTWARE STUDIED

Eight currently available ROC programs were evaluated (Table 1). All programs run on IBM-compatible computers. We performed all our evaluation studies on computers running under Microsoft Windows 2000 with at least 128 MB of RAM, a Pentium processor, and 250 MB of space on the hard drive. The general features of these programs are summarized in Table 1. The software Stata 7.0 was not included in this comparative study. We were unable to make a complete evaluation of this software because all necessary calculations could not be performed although we repeatedly discussed the issues with company representatives in Germany via the company hotline.

DATASETS FOR ROC ANALYSIS

To compare the programs, we used a previously described dataset of 928 men with prostate cancer (n = 606) and benign prostatic hyperplasia (n = 322) and subgroups of this population (3). ROC analyses of total prostate-specific antigen (tPSA) s free PSA (fPSA), the ratio of fPSA to tPSA (fPSA/tPSA), and of other values calculated by an artificial neural network approach with the mentioned dataset (3) were carried out to estimate the advantages and disadvantages of each program.

EVALUATION CRITERIA

To evaluate the programs, five simple criteria were chosen to encompass the ease of learning program operations, use of the software, and data handling and to characterize the usefulness of a each program (Table 2). A maximum percentage value was assigned to each criterion. The sum of all percentage values gives the final score. The criteria are described briefly below:

Data input. It is important to import or copy data into the program easily without any intermediate storage or special format, to be able to edit the data in the program (e.g., in a spreadsheet), and to save more than one dataset. The tendency of each program to crash was also taken into consideration.

Data output. Presentation of the results and processing of the exported data were assessed. The program should be able structure the results comprehensively. Processing of data characterizes the capability of the program to export and save the results, including the calculated graphs, as well as to draw more than one curve in one graph. This facility is very important for comparing several tests with each other.

Analysis results. This criterion was the most important one and included correctness and completeness of the results. It is obvious that correctness of results is mandatory. Incorrect results had to be considered as an exclusion criterion to recommend the respective software for ROC analysis.

There are several approaches to calculate the area under the ROC curve (AUC) for the comparison of ROC curves. Table 3 lists the main characteristics and limitations of three commonly used methods. It is crucial to know whether the curves result from independent or dependent (correlated) data. In laboratory diagnostics, the values of interest are in most cases measured on the same patients. We therefore considered only methods for correlated data. A second distinction can be made between nonparametric and parametric methods. Parametric methods are efficient under certain assumptions. These assumptions are often not fulfilled in practice, and their results are biased. Nonparametric methods should be used if the variables follow an ordinal or skewed distribution or if there are small sample sizes. A parametric approach should be preferred in case of a large sample size and continuous measurements.

The subcriterion completeness assessed the capability of a program to calculate all necessary ROC data for a reasonable decision regarding a diagnostic test. This included the AUC with its confidence intervals (CIs), the sensitivities and specificities at certain cutoffs with their CIs, the presentation of the graph, and the ability to compare the RUCs showing the respective statistical significance values.

Program comfort. This point of the comparison dealt with the compatibility of the program with standard calculation, text, and presentation programs, e.g., Microsoft Excel, Word, or PowerPoint. Programs were also evaluated based on the availability of help functions, tutorials, and demonstration versions and ease of obtaining information regarding program updates.

User manual. This criterion assessed the structure and comprehensibility of the user manual and whether the manufacturer provides an online manual, a homepage, or an e-mail address to solve current problems.

Results

The ROC programs were tested with a previously described dataset and various subsets (3). The assessment ratings for the five evaluation criteria are given in Table 2 for each program. As shown in one representative example (Table 4), AUC calculated by the various programs, which in some cases used different calculation methods as described below, differed only marginally. In addition, equivalent statistical differences between the RUCs of the various markers were obtained. Thus, the essential demand concerning the correctness of results seemed to be fulfilled by all programs compared. Moreover, the other criteria were helpful to assist in ranking the software for usefulness. The individual programs are described below.

AccuROC

AccuROC uses the method of DeLong et al. (4). To our knowledge, at this stage it is the only program that uses this method. The layout of the program is very well structured, and because of the comprehensive manual and the up-to-date homepage, the program is easy to learn. Up to three curves can be drawn into one graph, and the coordinates of each curve can be saved, which makes it possible to put more than three curves in one graph with use of a calculating program such as Excel. Furthermore, AccuROC can calculate the CIs and SD with a bootstrap method.

A serious drawback of this program is that except for the graph and its coordinates, none of the other results can be saved or exported; they can only be printed. If a diagnostic marker shows that lower values are associated with a higher risk of disease, all the test values have to be transformed by rendering them negative, manually or using a spreadsheet. This procedure makes the data input quite complicated.

Analyse-It

This software was published in 2001. The ROC analysis is performed according to the method of Hanley and McNeil (5, 6). According to the information of the software developers, an update was planned for the end of 2002. This update should use the method of DeLong et al. (4). It is an add-in program for Microsoft Excel. Like the software MedCalc, it is a program that implements several statistical procedures, including ROC analysis. It is simple to use and provides a very good online manual, help function, and tutorial. An advantage of its integration into Excel is that the interplay with other programs is excellent. Data input is easy, and the layout is clearly arranged. All necessary results are calculated in one step, and up to three curves can be displayed in one graph.

Unfortunately, RUCs can not be compared if any AUC is <0.7, but this will also be changed in the update version. Another drawback of this program is that it does not calculate CIs for the sensitivities and specificities.

CMDT

CMDT is a freeware program and can be downloaded from the internet (Table 1). An estimate of the AUC is given by the Wilcoxon rank-sum statistic. For comparison of ROC curves, it uses a permutation test suggested by Venkatraman and Begg (7).

The drawbacks of this program are that it is prone to crashing, the graph can barely be edited in the program, and only one curve can be displayed. This makes it impossible to compare curves visually. Furthermore, the graph is not of publication quality and has to be saved as an extended metafile to be processed in another graphics program.

The advantages of the program are that it uses a bootstrap method to calculate the CIs and that the data can be edited in the program.

GraphROC

The program GraphROC uses the method of Hanley and McNeil (5, 6) to calculate the ROC curve. It is one of the first commercially available programs on the Windows platform and is still in use (8). GraphROC is a longwinded program. Creating an input file is complicated, and it is not possible to edit the data after loading them into the program. Every result has to be copied via clipboard to save it. To edit the graph, it has to be copied via clipboard into another graphics program. In addition, the program is susceptible to crashing.

The advantages of GraphROC are the ability to draw several curves in one graph and the opportunity to compare paired and unpaired datasets. It is also possible to compare curves at a certain sensitivity or specificity cutoff, which is, as far as we know, a feature that only GraphROC provides. A demonstration version of GraphROC can be downloaded.

MedCalc

MedCalc also works with the method of Hanley and McNeil (5,6). This program is very interesting for those users who wish to do more than just ROC analysis because it provides a wide range of other special biomedical statistics, e.g., Bland-Altman plots, Passing-Bablok regression, and logistic regression. The data import is very easy and is possible from Excel, SPSS, dbase, Lotus, and as a text file. The layout is clearly arranged, it is possible to export data, and the graph can be edited in the program. MedCalc provides an online manual, and a 30-day demonstration version can be downloaded from the company homepage.

A clear disadvantage of this program is that only two curves can be presented in one graph.

mROC

mROC is a computer program that implements an approach of combining the ROC curves of several tumor markers or test values by the best linear combination, which maximizes the AUC under the hypothesis of a multivariate gaussian distribution (9). Methods for estimating CIs for the AUC are also provided (10). Furthermore, conventional ROC analysis is possible. Learning to work with the program is easy, the layout is well structured, and the provided manual is intelligible. However, the data input is quite complicated, and the data cannot be edited in the program. Numerical and graphic results can be exported. Unfortunately, only one curve can be displayed in a graph, and a comparison of different ROC curves is not possible.

By combining several markers or tests into one ROC curve, thus creating a "virtual marker", this program brings interesting additional new aspects to ROC analysis. Nevertheless, it cannot be recommended for a convenient ROC analysis.

ROCKIT

ROCKIT is a free program developed by C.E. Metz et al. (11-13). Although it is mathematically a very well thought-out program, we would not recommend this program unless the user has a statistical background. It is uncomfortable to create an input file, the layout is somewhat confusing, the interplay with other programs is not optimized, it does not have a help function, and it frequently crashed when we used it.

Apart from these disadvantages, it calculates all necessary results, and with the included software PLOTROC (a program in Excel), several curves can be displayed in one graph.

SPSS

Although SPSS is a widely used statistical program, the ROC analysis within this package is not yet fully developed. In SPSS it is not possible to compare ROC curves. More than one curve in a graph can be displayed only if either higher or lower values of a marker are associated with a higher risk of disease. Despite the advantage of this program to show a wide range of other statistics, a valid ROC analysis cannot be performed with this software.

As can be seen in Table 2, we did not find any software that fulfilled all our expectations perfectly. Every program had advantages and disadvantages. More detailed characteristics of each program are summarized in Table 5.

Discussion

Since the original paper by Metz (14) describing ROC analysis and its use in optimizing diagnostic strategies, many enhancements have been made to further improve its use (2,5). ROC analysis has recently been included in the checklist for reporting studies concerning diagnostic accuracy of medical tests (1). Other studies have focused on preconditions and their influence on diagnostic performance (15). Most studies comparing tumor markers (e.g., PSA and its molecular forms) are already using ROC comparisons (3). To perform these ROC comparisons, many commercially available programs have been introduced (4,7-14,16); however, to our knowledge, a comparison study of the available programs regarding their technical and mathematical aspects has not been published. With this study, we analyzed the advantages and drawbacks of eight ROC programs to find the bestoptimized program for ROC analysis for clinicians. The programs Analyse-it, AccuROC, MedCalc, and to a certain extent GraphROC show good performance, but each program has different limitations.

The results of the comparison show that three of the eight programs can make ROC analysis easier and more economical. The leading program is Analyse-it with a final score of 91%. Although this program received maximum scores for the criteria data input, software comfort, and user manual, it is not acceptable that only three curves can be displayed and that the CIs for the sensitivities and specificities are not calculated. However, add-in software for a program, such as Excel, that is already widely used is potentially valuable, and if the drawbacks can be removed in a future version, this software could make ROC analysis much easier. Except for SPSS, none of the other programs provides as good a help function and tutorial. Questions concerning the program are answered quickly via e-mail. Therefore, the price is acceptable considering such good service. Additionally, a full demonstration version can be downloaded at www.analyseit.com.

In second place is AccuROC with a total score of 85%. Its use of the totally nonparametric method of DeLong et al. (4) and bootstrap methods (17) and its well-structured layout are the strong points of this program. On the other hand, complicated data input and the fact that data output (except the graph) can only be printed and not be saved or copied are disadvantages. Another drawback is the limited license for 2 years and the limited use of this program for only one computer. If one attaches great importance to highly accurate results and accepts the mentioned drawbacks, we can recommend AccuROC.

The third software that we recommend is MedCalc, with a total score of 84%. Although the ROC analysis is only one tool of this program, all necessary parameters are calculated. Data and results are clearly arranged, and the general handling is easy. Unfortunately, only two curves can be presented in one graph, which limits the relevant use of this program. If it were not for this drawback, MedCalc would fulfill most of our expectations of efficient ROC analysis. Even the price is reasonable, considering the additional statistical methods included. For those who do not need a multicurve presentation and are interested in a wide range of other statistics, MedCalc is a reasonable software.

GraphROC achieved a score of 78%. The completeness of the results cannot be criticized. All the main parameters can be calculated with this software. It even has a feature that shows every possible cutoff point with its sensitivity and specificity in a separate diagram with automatic updating of clinical sensitivity and specificity values, by use of simple mouse clicks. The main drawbacks are the user-unfriendly data input and the longwinded processing of results and graphs. The user-friendliness of the program would be improved if there was a way to export the points of the ROC curve to either a text file or spreadsheet. This would give the user more flexibility in terms of graphic capability. GraphROC can still compete with the other programs for ROC analysis, although the software has not been further developed since 1996.

The shortcomings of the other four programs outlined above make it difficult to recommend these programs for regular ROC analysis.

In summary, it is surprising that valid ROC analysis with all necessary data and a good plotting function is not offered in a single program. It should not be necessary to use more than one software to perform a valid ROC analysis. Therefore, the programs Analyse-it, AccuROC, or MedCalc should be enhanced as described above to provide all necessary functions.

We gratefully acknowledge Prof. Wernecke for helpful suggestions and Silke Klotzek for helpful technical assistance. The study contains parts of the thesis of S.W.

Received September 12, 2002; accepted December 11, 2002.

References

(1.) Bruns DE, Huth EJ, Magid E, Young DS. Toward a checklist for reporting of studies of diagnostic accuracy of medical tests. Clin Chem 2000;46:893-5.

(2.) Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine [Review]. Clin Chem 1993;39:561-77.

(3.) Stephan C, Jung K, Cammann H, Vogel B, Brux B, Kristiansen G, et al. An artificial neural network considerably improves the diagnostic power of percent free prostate-specific antigen in prostate cancer diagnosis-results of a five year investigation. Int J Cancer 2002;99:466-73.

(4.) DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837-45.

(5.) Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29-36.

(6.) Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristics curves derived from the same cases. Radiology 1983;148:839-43.

(7.) Venkatraman ES, Begg CB. A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996;83:835-48.

(8.) Kairisto V, Poola A. Software for illustrative presentation of basic clinical characteristics of laboratory tests-GraphROC for Windows. Scand J Clin Lab Invest 1995;55(Suppl 222):43-60.

(9.) Su JQ, Liu LS. Linear combinations of multiple diagnostic markers. J Am Stat 1993;88:1350-5.

(10.) Reiser B, Faraggi D. Confidence intervals for the generalized ROC criterion. Biometrics 1997;53:644-52.

(11.) Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27: 723-31.

(12.) Metz CE, Wang PL, Kronman HB. A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Deconinck F, ed. Information processing in medical imaging. The Hague: Nijhoff, 1984:432-45.

(13.) Metz CE. Statistical analysis of ROC data in evaluating diagnostic performance. In: Herbert D, Myers R, eds. Multiple regression analysis: applications in the health sciences. New York: American Institute of Physics, 1986:365-84.

(14.) Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-98.

(15.) Jung K, Stephan C, Lein M, Brux B, Sinha P, Schnorr D, et al. Receiver-operating characteristic as a tool for evaluating the diagnostic performance of prostate-specific antigen and its mo lecular forms-what has to be considered? Prostate 2001;46: 307-10.

(16.) Kramar A, Faraggi D, Fortune A, Reiser B. mROC: a computer program for combining tumour markers in predicting disease states. Comput Methods Programs Biomed 2001;66:199-207.

(17.) Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall, 1993:436pp.

CARSTEN STEPHAN, [1] * SEBASTIAN WESSELING, [1] * TANIA SCHINK, [2] and KLAUS JUNG [1] [dagger]

[1] Departments of 'Urology and [2] Medical Biometry, University Hospital Charite, Humboldt University, D-10098 Berlin, Germany.

* Both authors contributed equally to this article. tAddress correspondence to this author at: Department of Urology, University Hospital Charite, Humboldt University Berlin, Schumannstrasse 20/21, D-10098 Berlin, Germany. Fax 49-30-450-515904; e-mail Klaus.jung@charite.de.

[3] Nonstandard abbreviations: tPSA and fPSA, total PSA and free prostate- specific antigen, respectively; AUC, area under the ROC curve; and CI, confidence interval.

Table 1. Survey of software studied. AccuROC 2.5 Analyse-It Developed by Accumetric Corp. Analyse-It Software Reference Homepage and www.accumetric.com www.analyse-it.com information System Windows 95 or higher Windows 95 or higher; requirements (a) 16 MB RAM; 6 MB hard drive; Pentium 100 MHz processor; Excel 95 or higher Help function Yes Yes Tutorial No Yes Demonstration Yes (b) Yes (c) version Price Can $150 (e) ~76 [pounds sterling]-100 (f) Schoonjans Kramar Metz (16) (11-13) www.medcalc.be E-mail akramar@ www.xray.bsd.uchicago.edu/ valdorel.fnclcc.fr krl/toppage11.htm Windows 95 or Windows 95 or higher; Windows 3.1 or higher; higher; 8 MB 8 MB RAM; 5 MB hard 4 MB RAM RAM; 4 MB drive; i486 or hard drive; higher i486 or higher Yes Yes No No No No Yes (c) No -- (d) US $199 350 Free CMDT GraphROC 2.1 Developed by Briesofsky Kairisto and Poola Reference (8) Homepage and http://city.vetmed. www.netti.fi/~maxiw information fu-berlin.de/~ mgreiner/CMDT/ cmdt.htm System Windows 95 or higher; Windows 3.1 or higher; requirements (a) 4 MB RAM; 1.2 MB hard 4 MB RAM; i486 or drive; i486 or higher higher Help function Yes Yes Tutorial No No Demonstration -- (d) Yes (c) version Price Free ~61-297 [euro] (f) Schoonjans SPSS Inc. www.medcalc.be www.spss.com Windows 95 or Windows 95 or higher; higher; 8 MB 16 MB RAM; 160 RAM; 4 MB MB hard drive; hard drive; Pentium 90 MHz i486 or higher Yes Yes No Yes Yes (c) Yes US $199 1280 [euro] (a) According to the description of the software, not always tested. (b) Limited version. (c) Full version for 30 days. (d) Program can be downloaded free of charge. (e) For a 2-year license, including all updates during this time. (f) Depending on delivery mode (e-mail or disc), country from where it is ordered, and individual or institutional use. Table 2. Evaluation of ROC software. (a) Evaluation criteria AccuROC Analyse-It CMDT GraphROC Data input (10), % 7 10 6 4 Data output (15), % 8 11 6 8 Analysis results Correctness (40), % 40 40 40 40 Completeness (20), % 20 15 12 17 Software comfort (10), % 5 10 3 6 Manual (5), % 5 5 3 3 Final score, % 85 91 70 78 ++ Evaluation criteria MedCalc mROC ROCKIT SPSS Data input (10), % 10 4 3 10 Data output (15), % 7 6 5 6 Analysis results Correctness (40), % 40 40 40 40 Completeness (20), % 14 10 15 11 Software comfort (10), % 9 5 4 5 Manual (5), % 4 3 3 2 Final score, % 84 68 70 74 (a) Values are related to the respective evaluation criterion with the maximum values shown in parentheses. The final score was calculated from the results of all criteria. Table 3. Main mathematical methods used in ROC software and their characteristics. DeLong et al. (4) Hanley and McNeil (6) Metz et al. (12) Completely Nonparametric Bivariate binormal nonparametric estimation of the model AUC No ties: estimator of No ties: estimator of Maximum likelihood the true area under the true area under estimation of the the ROC curve is the ROC curve is parameters of the unbiased; area unbiased; area ROC curves correspondent to correspondent to ([a.sub.x], the Wilcoxon rank- the Wilcoxon rank- [b.sub.x]; sum statistic sum statistic [a.sub.y], [b.sub.y]) Ties: true area under Ties: true area under Maximum likelihood the ROC curve is the ROC curve is estimation of the underestimated when underestimated when variances of and number of distinct number of distinct covariances between values is small values is small those parameters (the greater the (the greater the (method of scoring) number of scores number of scores the smaller the the smaller the bias gets); area bias gets); area correspondent to correspondent to the Mann-Whitney the Mann-Whitney version of the version of the Wilcoxon rank-sum Wilcoxon rank-sum statistic with statistic with average ranks average ranks Use of the theory on Calculate for both Three different tests generalized U- the normal and the on this basis: statistics to abnormal population bivariate [chi generate an the correlation square] parameter estimated between the values test, true positive covariance matrix of the original fraction test, and measures; the area index test average of the correlation and the average of the areas are used to estimate the covariance matrix Estimation uses the Underlying gaussian Tests perform better method of distributions for lower ROC structural (binormal) are curves and larger components assumed numbers Consistent estimates Estimations may be Inference basing on of the covariance biased unless the Taylor series matrix assumptions are expansion to obtain satisfied an estimation of the covariance matrix has better statistical properties than inferences based on estimation of the parameters a and b Test statistic is Resulting test asymptotically [chi statistic is square] distributed asymptotically normally distributed Table 4. Area under the ROC curves calculated by the various ROC software. (a) Area under the ROC curve Output value of artificial Software Ratio of fPSA/tPSA neural network AccuROC Area (SE) 0.702 (0.049) 0.841 (0.039) 95% CI 0.606-0.798 0.764-0.918 Analyse-It Area (SE) 0.702 (0.049) 0.841 (0.039) 95% CI 0.607-0.797 0.764-0.917 CMDT Area (SE) 0.702 (0.049) 0.841 (0.038) 95% CI 0.607-0.798 0.764-0.914 GraphROC Area (SE) 0.702 (0.048) 0.841 (0.039) MedCalc Area (SE) 0.702 (0.048) 0.841 (0.038) 95% CI 0.610-0.783 0.762-0.902 mROC Area 0.702 0.841 95% CI 0.608-0.787 0.762-0.912 ROCKIT Area (SE) 0.702 (0.049) 0.841 (0.038) SPSS Area (SE) 0.702 (0.049) 0.841 (0.039) 95% CI 0.607-0.797 0.764-0.917 (a) A subset of 53 patients with prostate cancer and 64 patients with benign prostatic hyperplasia of a total group of 924 patients was analyzed to characterize the diagnostic power of the ratio of fPSA/tPSA and the artificial neural network output value regarding the differentiation between the two groups of patients (3). Data [mean of the area (SE) and/or 95% CI] are given as results calculated by the respective programs. Table 5. Characteristics of the ROC softwares. Characteristics AccuROC Analyse-It CMDT GraphROC Maximum number of curves 3 (a) 3 1 3 on graph Area with SE and/or CI Yes Yes Yes Yes Comparison of curves Yes Yes Yes Yes (d) Sensitivity/specificity Yes No No Yes with CI Processing of the graph (e) + ++ - - Data processing (e) - ++ + - Characteristics MedCalc mROC ROCKIT SPSS Maximum number of curves 2 1 3 (b) 3 (c) on graph Area with SE and/or CI Yes Yes Yes Yes Comparison of curves Yes No Yes No Sensitivity/specificity Yes No No No with CI Processing of the graph (e) ++ - - ++ Data processing (e) ++ - - ++ (a) Only by use of other graphics packages. (b) In PLOTROC, an add-in program for Excel. (c) Only possible if either higher or lower values are associated with a higher risk of disease. (d) It is also possible to compare curves at a certain sensitivity or specificity cutoff. (e) -, no processing possible; +, processing possible; ++, good processing.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Evidence-based Laboratory Medicine and Test Utilization |
---|---|

Author: | Stephan, Carsten; Wesseling, Sebastian; Schink, Tania; Jung, Klaus |

Publication: | Clinical Chemistry |

Date: | Mar 1, 2003 |

Words: | 4611 |

Previous Article: | PCR-based calibration curves for studies of quantitative gene expression in human monocytes: development and evaluation. |

Next Article: | Use of capillary zone electrophoresis for differentiating excessive from moderate alcohol consumption. |

Topics: |