Important considerations in using indicators to profile providers.
The demand is accelerating for information about the clinical performances of providers. In the more competitive and value-sensitive marketplace that is already developing, purchasers (consumers, employers, and insurers) of health care services will require more information to better assess the relative value of providers' (professional and hospital) services. The cornerstone of a wise, value-based strategy in selecting health care services is careful assessment of each provider's performance based on detailed quantitative data in the form of clinical indicators. The use of indicators to profile the comparative performances of providers allows purchasers to compare as well as to influence provider performance.
An indicator is a quantitative measure of a process or outcome of care and usually is expressed as a rate. Although providers have used indicators for some time in their own research and quality assessment efforts, purchasers' use of indicators to assess and compare providers is growing in importance. Providers and purchasers should consider the following important principles when using indicators to profile providers.
* Indicators should summarize the care
Valid comparative performance assessment does not require an indicator for every facet of care. It does, however, require indicators that summarize clinically important processes or outcomes of care. Indicators that provide clinically important summary information will be referred to here as macro indicators. Postsurgery mortality and achievement of desired technical result (clinical outcome measures), population-based rates of utilization (reflections of relative appropriateness and availability), hospital length of stay (an efficiency measure), and patient satisfaction with care are examples of macro indicators. Because the relationship between the indicator and the qualities of concern may be complex, meaningful and constructive use of indicators depends on their proper interpretation. For example, an unusually low utilization rate for a procedure may reflect a lack of availability, local patients' preferences for alternative methods of care, or providers' exceptionally judicious use of the procedure.
Both purchasers and providers need macro indicators in order to understand the general processes and outcomes of care provided by an organization of interest. Providers also need micro indicators to help them understand why they get the results they do. Because these indicators are designed to guide and assess internal processes of care, they may be quite specific and detailed in order to reflect small, discrete processes or outcomes.
Micro indicators are essential to providers to explain and improve performance as measured by macro indicators and to identify opportunities for improvement. However, they are not useful for comparative performance assessment, because they do not provide a general overview of processes and outcomes of care. For example, in comparative, or external, monitoring of patients undergoing coronary artery bypass surgery, macro indicators such as mortality rate and postoperative infection rate are useful, while micro indicators such as percentage of patients treated with penicillin or percentage of patients receiving perioperative prophylactic antibiotics are not. Purchasers should leave the use of micro indicators to providers' internal improvement efforts and instead should select macro indicators that paint in broad brush a clinically valid profile of care.
The Joint Commission on Accreditation of Healthcare Organizations' (JCAHO's) "phase I" indicators of hospital quality that recently have completed beta testing and that now are being used voluntarily by hospitals nationwide include a micro indicator that is being used for comparative performance measurement. It is a hospital's postoperative mortality for all patients within two days of all procedures involving nonlocal anesthesia. Although this indicator may be useful to an internal quality improvement program, it is of little value to purchasers who are interested not in how many coronary artery bypass patients survive to the third day, but in how many leave the hospital with a sustained improved functional status.
Determining whether a performance monitoring project uses the right kind of indicators requires an understanding of the goal of the project. The Maryland Quality Indicator Project (MQIP), for example, is a voluntary cooperative effort through which hospitals exchange information about performance on specified indicators-15 of them thus far. The MQIP indicators are both macro (inpatient mortality, unplanned readmission rate, cesarean-section rate) and micro (cases where discrepancy between initial and final x-ray reports required an adjustment in patient management, patients in the emergency department for more than six hours). Inclusion of both macro and micro indicators is appropriate to this project's goal of obtaining information that can help providers improve their care.
* Indicators should be valid reflections
of the aspect of care they measure.
Every indicator must reflect one of the basic measurable aspects of care: effectiveness (clinical outcomes), appropriateness, availability, efficiency, and satisfactoriness. The aspect of care that an indicator is intended to reflect must be clearly understood by both its developers and its users. The importance of this understanding can be illustrated by the problem of in-hospital falls, a source of potentially serious injury. While the rate of in-hospital falls after heart bypass surgery, for example, might be a useful indicator of general postoperative hospital care, such a rate may be a poor reflection of the quality of the bypass surgery itself.
Consider the use of hospital average length of stay (ALOS) as a measure of effectiveness. Although many patients with unsuccessful therapy or complications from care are hospitalized longer than they would have been otherwise, many patients with complications that are not life-threatening are discharged earlier than in past years to more vigilant home health care. Consequently, average or shorter lengths of stay do not always reflect lower complication rates. Moreover, physician practice patterns and efficiency of discharge planning have major influences on lengths of stay, independently of the physical condition of patients. Thus, ALOS is a biased indicator of clinical effectiveness.
Average length of stay can be improved as an indicator of effectiveness by risk adjustment, by standardization to a particular hospital's pattern of practice, and perhaps by limitation of its use to recovery after elective surgery. However, data are still potentially biased by the omission of patients who are discharged relatively early but who experience the same unsatisfactory outcome of care. Risk-adjusted ALOS may be a useful measure of a hospital's efficiency rather than of its effectiveness, however. It is used in this regard in some projects (the Cleveland Health Quality Choice Project, for example).
The outcome measured by an outcome indicator should be related to and influenced by the provider's care. Consider low birthweight, an outcome that reflects genetic predisposition, maternal behavior and environment, and prenatal care. Although a certain rate of low-birthweight infants appears unavoidable, it is thought that many cases can be avoided by altering maternal behavior and by improving the availability and effectiveness of prenatal care, especially early in pregnancy. Thus, by the time a woman enters the hospital for a delivery, the infant's birthweight is already established. The ability of a provider to delay birth in a woman who presents in premature labor might be useful to assess, but it is not clear that an infant's birthweight is a valid reflection of the quality of care of the hospital at which the infant's mother chose to deliver. Consequently, it is puzzling that a hospital's rate of low-birthweight infants is an indicator of hospital quality selected by the JCAHO as part of its set of "phase I" indicators of hospital quality.
Bias for or against a particular pattern of care may compromise the validity of an indicator. For example, unplanned readmission is another purported indicator of the quality of hospital care. The rationale behind its use is that patients discharged after receiving poor hospital care are more likely to be quickly readmitted than are patients who received better care. Early readmission, however, is the result of many factors, including patient age, patient compliance with treatment, home support, access to home health care, type of insurance, and physicians' posthospitalization care and practice patterns. Within a group of patients with the same postdischarge problem, some may be readmitted and some may not be, depending on these factors. In fact, some patients who received "good" care might be readmitted, while some who received "poor" care might not be. Consequently, the rate of early unplanned readmission may be a biased indicator of the quality of hospital care.
Sometimes, the only option is to measure an outcome of interest indirectly. For example, incomplete medical records may make it impossible to assess retrospectively the occurrence of in-hospital infections. Moreover, discharge ICD-9-CM diagnosis codes are notoriously unreliable as to the occurrence of a complication of care. This is due both to upcoding (i.e., false positives) and to the absence of chart documentation (i.e., false negatives). Those wishing to assess this aspect of care may choose a different indicator that will provide surrogate information. In such a situation, under defined circumstances, institution of antibiotic therapy may be useful as a proxy for the occurrence of in-hospital infections. Similarly, transfusion of blood may be a useful indicator of perioperative blood loss.
Indirect measures must be used carefully, however. Provider behavior (e.g., deciding to start an antibiotic or give a transfusion) can be used as a proxy for underlying pathophysiologic change (e.g., infections or blood loss) only when provider behavior in response to that change is relatively uniform, so that differences in the rates among providers represent differences in patient condition and not in practice style. Furthermore, the use of indirect measures as indicators may sometimes be counter-productive. For example, if ALOS is used as an indirect measure of resource consumption in comparative profiling, providers might successfully reduce ALOS by actually increasing resource consumption (for example, by indiscriminately ordering a battery of tests for certain types of patients on their admission ). Therefore, indirect measures should be used only if the relationship between the outcome of interest and the proxy is understood and accounted for in the analysis.
* Indicators should reflect occurrences
that can be observed reliably.
An indicator's usefulness will be compromised if the occurrence of any event necessary to its calculation is not observed or recorded in a consistent manner. This is true especially if the inconsistency is frequent and is not randomly distributed throughout the population of providers being evaluated.
For example, the appearance of angina after coronary artery bypass surgery is considered a reflection of therapeutic failure. However, if one hospital puts nitroglycerin at the bedside routinely, while another requires patients to ask for it when experiencing pain, some postoperative anginal episodes in the former hospital will not come to the attention of the nurses and physicians and, therefore, will not be recorded in the record. Consequently, a comparison of the rates of postoperative angina in these two hospitals would be quite misleading.
Differences among providers in labeling and reporting are particularly noticeable with respect to complications of medical care. Providers often are reluctant, for professional and legal reasons, to identify adverse occurrences. Also, clinical labeling can be highly subjective; one surgeon's wound dehiscence may be another's stitch abscess. In addition, medical record keeping can be fragmentary, especially when providers are occupied with caring for unstable patients. Consequently, it is hazardous to assume that providers have recorded complications of care reliably.
The use of ICD-9-CM codes to identify complications of care is notoriously problematic, because choices of diagnostic codes are likely to be affected by an interest in obtaining maximum reimbursement for services rendered. Moreover, examinations of medical records often reveal conditions for which no code was recorded. Recent research at Duke University on the reliability of ICD-9-CM coding supports these observations. Researchers compiled a database of 12,937 patients hospitalized consecutively between 1985 and 1990 for cardiac catheterization because of suspected ischemic heart disease. They studied the relationship between each patient's discharge ICD-9-CM diagnosis codes and data gathered clinically. Among their findings was that "twenty-four percent of clinically identified acute myocardial infarctions were not coded...." Generally, they concluded that "claims data [discharge ICD-9-CM diagnosis codes] failed to identify more than one half of the patients with prognostically important conditions...." Therefore, provider documentation or ICD-9-CM codes should be used only with great caution to identify complications of care.
Despite their limitations in identifying complications, ICD-9-CM codes are used for this purpose in some quality measuring projects. For example, the state of California has begun a program to measure and publicly release information on outcomes of hospital care at all state hospitals. In 1993, a preliminary report was issued privately to hospitals for two conditions: acute myocardial infarction and intervertebral disc excision. For the latter condition, one indicator of comparative hospital performance was inpatient complications as measured by specific codes in each patient's list of discharge ICD-9-CM codes. Because of widespread problems like those described above in coding complications, public release of comparative data based on this indicator is likely to lead to erroneous conclusions about the true relative performances of the hospitals.
* All data elements used to obtain indicators
should be measured uniformly.
Providers should ensure that data elements used in comparative performance monitoring are obtained in a uniform manner, because uniformity of measurement is essential for accurate and meaningful comparisons of providers' performances. Uniform measurement depends on clear definitions and reliable collection of data elements. Each data element should be defined in terms that will be understood readily by all providers participating in the project. Whenever possible, data elements should incorporate objectively measured values. Definitions should be designed to avoid any need for subjective determinations of a condition's extent or degree of change. Methods of data collection should be specified in appropriate detail, and they should take into account the particularities and capabilities of each provider in the program. Data elements that cannot be clearly defined and reliably collected should be omitted.
Because uniformity of measurement is so critical to the validity of comparative performance analyses, indicator values obtained from different sources using different data definitions and collection methodologies should be compared only with great caution and with sensitivity to the limitations of the data. Facile comparisons of the face values of indicators should be avoided, because they may easily (however unintentionally) distort the truth and thereby create inaccurate impressions of relative performance.
When indicators are described similarly but are measured in different ways by different providers, comparisons can be misleading. For example, a table accompanying a recent Wall Street Journal article juxtaposed, for each of a variety of indicators, values recently released by three HMOs. Only footnotes in small print explained the substantial differences in measurement techniques that made direct comparison of most of the values extremely misleading. Misinterpretation can be avoided by limiting comparisons to indicators that were measured in the same manner.
* Indicator data should be adjusted to
account for differences in the condition
of patients before care is
Quantitative assessment of clinical processes and outcomes now is established as indispensable to a proper understanding of health care delivery. However, providers with an unsatisfactory outcome or process rate may claim that their patients are inherently more at risk than are the patients of other providers, or that their process rates are necessitated by the special characteristics of their patient populations. To compare process and outcome rates fairly and credibly among providers, the rates must be adjusted to account for inherent differences in the patient populations.
Consider the primary cesarean-section (C-section) rate, a reflection of appropriateness of obstetrical care. Some cesarean sections are unavoidable, but too many or too few may indicate inappropriate care. Determination of the need for a C-section at the time of delivery usually is not the result of random and arbitrary decision making. Some pregnant women have conditions that make them more likely than others to require a C-section, regardless of the provider who manages the delivery.
For example, suppose that the unadjusted overall C-section rates of three different providers are compared in a state with a rate of 19 percent (table 1, below). Imagine that the unadjusted C-section rates of Providers A, B, and C are 30, 22, and 13 percent, respectively. Provider A's rate is 8 percentage points higher than B's and 11 percentage points higher than the state average. Provider B's rate is 3 percentage points higher than the state average. Provider C's rate, however, is 9 percentage points lower than Provider B's rate, 17 points lower than Provider A's rate, and 6 percentage points lower than the state average. Is Provider C the most judicious in the use of cesarean sections? If no other information is provided, many would reach that conclusion.
To truly determine the best performance, one must assess quantitatively each patient's inherent risk of needing a C-section. In our work, we have found some of the risk factors for a primary C-section to be: patient older than age 38, multiple infants, genital herpes, and malpresentation of the fetus. Using logistic regression, all identified risk factors can be weighted and a mathematical predictive model can be created to determine the expected C-section rate for a group of patients based on their inherent risk factors.
Once predicted values are created, the relationship can be determined between the observed (actual) C-section rate and the rate that is predicted from the patients' characteristics prior to care; this is known as the observed-to-predicted ratio (O/P). A ratio larger than 1.0 reflects a C-section rate that is higher than predicted (expected) for the provider's patient population.
Suppose that risk-adjusted data for Providers A, B, and C reveal that patients of Providers A and B are more likely to require a C-section than either the patients of Provider C or those of the state as a whole, regardless of provider. Providers A and B then may be expected to have rates that are higher than both the state average and the rate of Provider C. The data in table 1 demonstrate that Providers A and C have O/P ratios that are identical, 1.2, meaning that their C-section rates were 20 percent higher than predicted. On the other hand, Provider B has an O/P ratio of 0.95, indicating an observed rate 5 percent lower than predicted on the basis of characteristics of the patient population. Which provider now appears more judicious in performing C-sections, Provider A, B, or C? It should be noted that risk adjustment affects a provider's performance data only in relation to the data of other providers in the comparison group; it does not determine the optimum indicator (in this scenario, the optimum C-section rate).
[TABULAR DATA OMITTED]
This hypothetical scenario about C-section rates illustrates how failure to adjust indicator values for risk makes comparisons among providers difficult and potentially misleading. Moreover, without proper risk adjustment, providers whose performances appear unsatisfactory can claim persuasively that their patients are at higher risk. Consequently, they may neglect to review their care carefully to identify opportunities to improve quality.
Many quality measuring projects risk adjust the comparative performance indicators they report. Among these are the Cleveland Health Quality Choice Program and the state-sponsored outcomes monitoring programs in New York, Pennsylvania, and California. However, some comparative performance indicators, such as the HEDIS indicators of the National Committee for Quality Assurance, are not risk-adjusted. Failure to risk adjust performance indicators compromises their usefulness in comparative performance monitoring and has the potential to sour providers on quantitative performance assessment if they believe that truly good care and truly bad care are not clearly differentiated.
Current methods of risk adjustment differ in such features as the method of quantifying risk and the type of data used. Generally, more precise risk-adjustment methods cost more. In part, this is because clinical data are more costly to collect than are billing data, and also because greater effort is required to tailor risk-adjustment methodology to the specific project in order to maximize its predictive power. Purchasers and providers who want risk-adjusted indicators for private, internal use as a general estimate of provider performance may settle for a less expensive, less accurate approach, while those who wish to make indicator measurements public to guide purchasing decisions will be best served by choosing the most accurate and clinically defensible risk-adjustment method available.
The usefulness of risk-adjusted comparative performance monitoring depends in great part on how good the predictive models are. A model's technical worth rests primarily on its predictive power and on the uniformity of its performance throughout subpopulations of patients (i.e., the model is not biased against particular types of patients). All purchasers and providers who examine risk-adjusted comparative performance data should be familiar with the strengths and weaknesses of the risk-adjustment method used, with whether it is appropriate to the specific goals of the monitoring program, and with the power of the predictive model itself.
* Indicators should measure relatively
Because survival after major surgery depends, in part, on the surgeon's skill, it can be a useful indicator of the quality of surgery. On the other hand, death may occur with major surgery even when the best possible care is rendered. Comparative risk-adjusted outcome rates can be used, not to assess quality in individual cases, but to determine whether, on average, a specific provider's performance plays an important role in raising or lowering mortality rates.
To evaluate postoperative mortality, death among patients of various providers must occur frequently enough so that statistically valid comparisons can be made from realistically achievable sample sizes. Therefore, postoperative mortality could be a useful indicator of the quality of coronary artery bypass surgery, for which mortality rates for some providers may reach double digits. However, it would not be a useful indicator in comparing the quality of hysterectomies, for which mortality rates are likely to be far less than 1 percent. During the period in which data are collected, many good gynecologists might have one patient death, while some marginal ones might have none. Because meaningful, statistically valid inferences are difficult to make from small numbers, indicators should measure relatively common events.
Table 2, page 43, illustrates the effect of the frequency of an outcome on the number of patients necessary to reach a statistically valid conclusion. Consider a provider whose mortality rate for a surgical procedure is 25 percent higher (i.e., an O/P ratio of 1.25) than the mortality rate predicted on the basis of "average" care for the population of patients in the database. If the predicted mortality rate is 0.64 percent and if the provider treated 1,000 patients in the period examined, the difference between the observed and predicted rates would not be statistically significant at the 5 percent level (that is, there would a greater than 5 percent probability that the difference was due to chance alone). If this provider treated 10,000 patients or more and had the same O/P ratio, the difference would be statistically significant at the 5 percent level.
Table 2. The Effect of Cases on Statistical Significance of Differences in Provider Performance Data(*)
Number Observed Predicted Observed/Predicted of Cases Rate (%) Rate (%) Ratio Z-Statistic
1,000 0.8 0.64 1.25 0.63 10,000 0.8 0.64 1.25 2.01 1,000 8 6.4 1.25 2.01 2,100 8 6.4 1.25 3
(*) When outcome rates are low, sizable differences in percentage terms between observed and predicted outcome rates may not be statistically significant in commonly achievable sample sizes over reasonable lengths of time (e.g., one year). The z-statistic indicates the number of standard deviations the observed rates is from predicted rate.
To continue from the last illustration, if the predicted mortality rate were 6.4 percent (ten times greater) and if the provider treated only 1,000 patients with the same O/P mortality ratio of 1.25, the probability would be less than 5 percent that the difference between observed and predicted values was due to chance alone. Moreover, if 2,100 or more patients of that provider were treated with the same O/P ratio of 1.25, the probability would be less than 0.1 percent that the difference between observed and predicted values was due to chance alone. Even if a relatively common condition such as hospital care of acute myocardial infarction is studied, few hospitals have more than 1,000 cases a year, and most have even fewer cases of such popularly studied conditions as pneumonia. Given the likelihood of such limits on sample size, selecting relatively common measures of outcome or process is crucial.
* The likelihood that chance alone
accounts for differences in providers'
performances should be determined.
Differences among providers in indicator values should be analyzed statistically to determine the likelihood that those differences appear by chance alone. This is especially important in examining databases with relatively small sample sizes, because, under these circumstances, even apparently large differences in performance among providers (a 30 percent range of values, for example) may lack statistical significance at the 5 percent level (i.e., differences may very likely be due to chance alone). Particular care must be taken in the presentation of statistical analyses when comparative performance data are released to the public, because many people may seize upon differences in performance even when they are more likely to be due to chance than to actual differences in quality of care.
* Process indicators must be interpreted
in light of desired outcomes.
In assessments of providers' performances, the fundamental interest is in the results of a provider's care. However, because some results may not be known for years, it is impractical to follow patients until all outcomes of care can be assessed. For example, it is not feasible to follow a provider's pediatric patients for 20 years to assess the rates of measles, rubella, and other communicable diseases for which there are vaccines. Instead, compliance with a set of immunization recommendations is measured, on the assumption that better outcomes will result when those recommendations are followed. So, when outcomes will not be apparent for a long time after care is given, the most practical indicator may be one that compares a process of care to standards set by experts. However, process variables should be used in comparative performance assessment only when appropriate outcome monitoring is not feasible, and the reason for using them should be understood and acknowledged.
The danger of relying on process measures alone to obtain information that can come only from outcomes data is illustrated by the use of the cesarean-section rate as an indicator of the quality of care. The C-section rate is a process indicator that conveys how often one decision was made about a particular method of caring for a group of patients. It conveys nothing about the outcome of peripartum care. Most clinicians would say that the goal of peripartum care is delivering the healthiest infant while minimizing problems for the mother. The quality of peripartum care must be measured against this standard.
It is unfortunate that most concerns about the quality of peripartum care have centered on the C-section rate, often with no acknowledgement that this rate does not indicate whether the goals of peripartum care are being met. The goal of peripartum care is not to avoid a C-section, since C-section is a valuable option in obstetrics. A C-section rate of zero would be a public health catastrophe. Nevertheless, analysts are drawn to the C-section rate because few data are available about infant outcomes related to peripartum care, because C-sections are easy to count (usually easily identifiable on a billing record that is generated in a hospital's usual course of business), and because replacing some C-sections with less-expensive vaginal deliveries saves money.
However, comparing providers by C-section rates is potentially misleading and unfair to providers. Risk-adjusted data can be obtained, but they would indicate only what the rate for each provider's population of patients would be if those patients received "average" care, as defined by the providers whose patients are in the database. Using risk-adjusted C-section rates rather than unadjusted rates makes the comparison of rates fair but does not identify the optimum C-section rate. The optimum C-section rate is the lowest rate associated with optimum outcomes for infants.
The figure below illustrates the postulated relationship between C-section rate and infant adverse outcomes. Infants' adverse outcomes related to delivery can result from a number of factors, not all of which are avoided by C-sections. As C-sections are introduced to a population, its adverse outcome rate for infants drops, because adverse outcomes related to the process of vaginal births are avoided. However, there comes a point at which all adverse outcomes from vaginal births are avoided and additional C-sections do not diminish infant adverse outcomes any further.
The widespread perception that many C-sections are performed unnecessarily in this country has engendered many efforts to significantly reduce the rate. Unfortunately, many of these efforts have been undertaken without careful monitoring of infant outcomes. Efforts to reduce the C-section rate to an arbitrary level that is chosen without consideration of infant outcomes might have tragic consequences if doctors avoid necessary C-sections in the attempt to comply with the arbitrary standard. This example underscores the need to understand the link between process and outcome indicators and to avoid relying on the former alone as a guide to the quality of care.
Few indicators are perfect, either in their design or in the data used in their calculation. Compromise is necessary in their development and use. In the end, indicators that are carefully selected and defined, reliably measured, accurately risk-adjusted, and analyzed statistically can provide meaningful information only to those who understand the limitations as well as the strengths of the data and the nature of the relationship between processes and outcomes of care.
Comparative Surveys Show
Substantial Managed Care
Participation by Physicians
From 1991 to 1993, there was substantial growth in doctor participation in HMOs and PPOs, according to Medical Economics. Nonparticipation dropped from 45 percent to 38 percent in both managed care categories, and fewer physicians participated in just one HMO or PPO. For participation in multiple plans, the figures are uniformly up between the two surveys. While there was only modest growth in the numbers of physicians who participated in six or more plans, much greater growth occurred in the 2-6 plan categories.
[1.] Nadzam, D., and others. "Data-Driven Performance Indicator Measurement System (IM System)." Journal on Quality Improvement 19(11):492-500, Nov, 1993. [2.] Jollis, J., and others. "Discordance of Databases Designed for Claims Payment versus Clinical Information Systems. Implications for Outcomes Research." Annals of Internal Medicine 119(8):844-50, Oct. 15, 1993. [3.] Anders, G. "Three HMOs Evaluate Themselves." Wall Street Journal, Nov. 16, 1993, p. B1.
Richard R. Balsamo, MD, JD, is Medical Director, CIGNA Healthcare of Illinois, Des Plaines, Ill., and Michael Pine, MD, MBA, is Research Associate, Division of Biological Sciences, Section of Cardiology, University of Chicago, and President, Michael Pine and Associates, Inc., Chicago, Ill.