Printer Friendly

The relationship between small-area variations in the use of health care services and inappropriate use: a commentary.

Leape, Park, Solomon, et al. (1990) used data from 23 counties within one state to study the relationship between variations in use rates of coronary angiography, carotid endarterectomy, and upper gastrointestinal tract endoscopy and variations in the percent of these uses deemed inappropriate. Based on these and previous results (Chassin, Kosecoff, Park, et al. 1987), they conclude that "little of the variation in the rates of use of these procedures can be explained by inappropriate use." Given the limitations of the available data, the analysis of Leape et al. is in many respects very well done, with the actual data and plots presented to help readers evaluate the analysis. However, Davidson (1993) questions whether the conclusion drawn by Leape et al. is justified. Davidson's critique can be summarized as follows: (1) the sample size is too small to have adequate power; (2) the percent of reviewed cases that were inappropriate is not really a good measure of the degree of inappropriate usage; (3) important covariates are not included in the model; and (4) the bias due to "measurement error" in use rates is ignored.

While we do not agree with all of Davidson's conclusions, the concerns he raises are important. In this commentary we discuss these general issues, and illustrate some of the points with additional analyses of the data presented in Table 1 of Leape et al.


Davidson's first point is that nonsignificant results should not be interpreted too strongly when the sample size is small, as in the Leape et al. analysis of 21 to 22 counties. Davidson's power analysis makes this point effectively. His power calculations actually overestimate power since they do not account for the fact that a weighted analysis is being done. Since some counties have a relatively small impact on the analysis, the effective sample size is really less than 21. An alternative approach would calculate confidence intervals on the estimated correlations. For example, Leape et al. estimate a correlation of .18 between the use rate of carotid endarterectomy and the percent of cases reviewed that are inappropriate. This is not significantly different from zero. However, a 95 percent confidence interval for the estimate goes from -.33 to .73, the upper bound representing a rather high potential correlation. This confidence interval is based on the bootstrap (Efron 1982), which is appropriate when the data may not be normally distributed.

Davidson also points out that in a simple linear regression analysis, measurement error in the independent variable can bias the estimated correlation toward zero. In fact, measurement error in the dependent variable will have the same effect. This follows from the formula defining correlation, namely Correlation (X, Y) = Covariance (X, Y)/||Variance(X) Variance(Y)~.sup.1/2~. Adding random noise to either X or Y will inflate the denominator but not change the numerator. In the current context, "measurement error" can be thought of as the random binomial variation in the observed use rates and percent inappropriate, as indicated in the columns labeled "SE" in Table 1 of Leape et al. These standard errors are large enough to induce a substantial bias in the estimated correlations. For example, a simulation we performed based on the data for carotid endarterectomy shows that when the true correlation is 1.00, the observed correlation will be on average about .70 (95 percent prediction interval .47 to .88).


Davidson's second point is that the percent of cases that are inappropriate is not a good measure of the extent of inappropriate use, and that a better measure is the population-based rate of inappropriate utilizations. Davidson's reason for preferring the latter measure is his claim that "policy is interested in identifying high absolute levels of inappropriate use per age/sex-adjusted population, which may be quite imperfectly measured by relative or percentage levels of inappropriate use." This claim may or may not be true, depending on the types of policy decisions being considered. The remainder of this commentary deals with the more general issue of how the analysis approach chosen should depend on the specific question being asked.

The question posed in the title of the article by Leape et al. is: "Does inappropriate use explain small-area variations in the use of health care services?" We now consider two different interpretations of this question, plus two alternative questions that might in fact be of more direct policy interest.

1. "Is there a statistically significant correlation between overall use rate and inappropriate use 'rate'?"

This is the question addressed by both Leape et al. and Davidson, with the main disagreement being over how one measures the inappropriate use "rate." Leape et al. define this inappropriate "rate" as the number of inappropriate uses of a procedure in a county divided by the total number of uses in that county. Let us denote this by |USE.sup.I~/|USE.sup.TOT~. Davidson defines the "rate" as the number of inappropriate uses in the county divided by the population of the county (perhaps age/sex-stratified). Let us denote this by |USE.sup.I~/POP. This notation emphasizes that the difference between the two measures is the choice of denominator. The overall use rate per capita is denoted |USE.sup.TOT~/POP.

The conclusions drawn from analyzing the data presented in Leape et al. differ drastically, depending on which measure of inappropriate use is used. For coronary angiography, the correlation (weighted, as per Leape et al.) between |USE.sup.I~/|USE.sup.TOT~ and |USE.sup.TOT~/POP is .53, the square of which is interpreted by Leape et al. to mean that 28 percent of the variance in |USE.sup.TOT~/POP is explained by inappropriate use. In contrast, the correlation between |USE.sup.I~/POP and |USE.sup.TOT~/POP is .92, the square of which might be interpreted as 85 percent of the variance in |USE.sup.TOT~/POP being explained by inappropriate use. This latter result seems quite impressive, until one notes that the correlation between |USE.sup.TOT~/POP and the appropriate use rate per capita (denoted |USE.sup.A~/POP) is .98, meaning that 96 percent of the variance in |USE.sup.TOT~/POP is explained by variations in appropriate use! (We use the term "appropriate" to include both appropriate and equivocal uses, as defined by Leape et al.)

This can be better understood by considering the hypothetical situation in which |USE.sup.I~/|USE.sup.TOT~ varies only slightly across counties, but |USE.sup.TOT~/POP varies considerably. Then both |USE.sup.I~/POP and |USE.sup.A~/POP will be highly correlated with |USE.sup.TOT~/POP, since each is approximately equal to a constant times |USE.sup.TOT~/POP. It does not make sense in this situation to claim that close to 100 percent of the variation in use rates is due to inappropriate use.

As an aside, it should be noted that this hypothetical situation may in fact be true for the data used by Leape et al. We used a chi-square test to test the null hypothesis that the expected value of |USE.sup.I~/|USE.sup.TOT~ is identical in all counties. The resulting p-values are nonsignificant for all three procedures, indicating that the observed variations in percent inappropriate may be entirely due to the binomial sampling variance.

The confusion about interpretation is partially due to the difference between the statistical meaning of "explain" and the more commonly used, causal interpretation. This latter interpretation can be expressed by rephrasing the original question.

2. "If all inappropriate use were discontinued, how much of the small-area variation would disappear?"

In the hypothetical situation of the previous section, only a fraction of the variation would disappear, since |USE.sup.A~/POP would still vary across counties. This new phrasing of the question makes it clear that the only way in which inappropriate use can account for most of the variation in |USE.sup.TOT~/POP is if |USE.sup.A~/POP is fairly constant across counties, while |USE.sup.I~/POP varies.

There is no commonly used statistical measure to address this rephrased question directly. One possible measure might be the quantity

1 - Variance (|USE.sup.A~)/Variance (|USE.sup.TOT~),

which is 1 minus the fraction of the variance in overall utilization rates that would remain if all inappropriate utilization were discontinued. Calculating this measure for coronary angiography, carotid endarterectomy, and upper gastrointestinal tract endoscopy gives values of .54, .39, and .16, respectively. As with the preceding correlations, these estimates will be biased by the measurement error in the data, although the direction of bias is not so clear.

We now consider the more practical issue of using analysis results to guide policy decisions. Assume that policymakers have some kind of action in mind whose goal is to reduce inappropriate use. The action could consist of an educational program to change physician behavior, chart reviews to identify cases of inappropriate use, or sanctions against providers suspected of having high inappropriate use rates. Two of the possible questions of interest to policymakers in this situation are discussed next. These two questions have been addressed by Davidson, both explicitly (question 3) and implicitly (question 4).

3. "Can we improve the cost-effectiveness of an action by targeting it to only those counties with a high |USE.sup.TOT~/POP?"

Presumably it would be more cost-effective to target the action to counties with high inappropriate use "rates" (with an appropriate definition of "rate"). However, data on |USE.sup.TOT~/POP are often available, while it is not known how many of these uses are inappropriate (the data used by Leape et al. were collected specifically for the purpose of research). Thus, the question here is whether high |USE.sup.TOT~/POP is a good proxy for high inappropriate utilization "rate."

If cost-effectiveness is the question of interest, correlation is probably not the best measure of strength of association. Rather, we should look at differences between those counties that would be targeted for action, compared to those that would not. For example, for coronary angiography, among the five counties with the highest |USE.sup.TOT~/POP, 28 percent of the reviewed cases were inappropriate, compared to 16 percent among the remaining 17 counties. If the action being considered is a review of charts in order to deny payment for inappropriate use, then the yield of inappropriate cases detected per 100 cases reviewed would be considerably increased by focusing on only these five counties.

We now return to the issue of what denominator to use when calculating the inappropriate use rate. We argue that the answer should depend on the nature of the action. Costs and benefits should be measured in comparable units, and the natural units for measuring costs will depend on how the action is delivered.

Let us first consider the example discussed by Davidson, in which the action consists of reviewing charts to determine whether payment should be denied. The cost of the action is measured in dollars per 100 charts reviewed. The benefit of the action should then be expressed as dollars saved per 100 charts reviewed, which will be proportional to the number of inappropriate cases detected per 100 charts reviewed. In this case the action would be aimed at counties with high |USE.sup.I~/|USE.sup.TOT~.

This example can be understood further by realizing that the limiting factor for this action probably will be the personnel available to do chart reviews. For example, there may be resources available to do 2,000 chart reviews. The policy question is clearly which 2,000 charts to review in order to maximize the total number of inappropriate cases detected. Contrary to the claims of Davidson, it is not clear that dollars saved per capita is a meaningful way to measure the benefit of this action.

As another example, suppose the action consists of a one-on-one physician educational session, so that cost is measured in dollars per physician. The benefit should then be measured as the number of inappropriate cases prevented, per physician. The action should be targeted at physicians who do a large absolute number of inappropriate procedures. If the analysis is being conducted on county-level data, as in Leape et al., then the appropriate countywide measure is inappropriate procedures per physician.

From a cost-effectiveness point of view, the per-capita measure of inappropriate use advocated by Davidson, |USE.sup.I~/POP, is appropriate only if the cost of the action is naturally measured as dollars per capita. Examples might be a direct mail campaign to change patients' behavior or brochures for physicians to hand out to patients.

4. "Is it fair to providers to target an action only at those counties with high |USE.sup.TOT~/POP?"

This is an important question if the action involves potential sanctions or excessive costs to be borne by providers. As suggested by Davidson, the most appropriate denominator here may be the number of patients who present themselves at doctors' offices with inappropriate indications (denoted |PRESENT.sup.I~), giving an inappropriate use rate of |USE.sup.I~/|PRESENT.sup.I~. A physician should not be punished because a large number of patients present themselves with inappropriate indications, so long as he orders the procedure for only a small fraction of such patients.

Unfortunately, data on |USE.sup.I~/|PRESENT.sup.I~ will rarely, if ever, be available for analysis. The question then becomes which other measures might be used as proxies for |USE.sup.I~/|PRESENT.sup.I~. The answer depends on what assumptions one is willing to make. For example, if we assume that |PRESENT.sup.I~/POP is fairly constant across counties, then |USE.sup.I~/POP will be highly correlated with |USE.sup.I~/|PRESENT.sup.I~ and can be used as a proxy for it. A somewhat weaker assumption would be that |PRESENT.sup.I~/POP is fairly constant across counties after adjusting for differences in county-level covariates such as racial composition and median income. This assumption seems rather reasonable, and appears to be the basis for the analysis approach advocated by Davidson, including his recommendation to adjust for covariates that might be predictive of |PRESENT.sup.I~/POP.

Alternatively, let us assume that the ratio of inappropriate to appropriate presentations is fairly constant across counties, but the total rate of presentations per capita varies. In this situation, if a county has high |USE.sup.I~/|USE.sup.TOT~, then either |USE.sup.I~/|PRESENT.sup.I~ is higher than average or |USE.sup.A~/|PRESENT.sup.A~ is lower than average. Either of these suggests a potential problem with quality of care, so that targeting counties with high |USE.sup.I~/|USE.sup.TOT~ is reasonable from the point of view of fairness.


Determining the impact of inappropriate use on variations in total use rates is not straightforward. The choice of analysis approach should depend on the specific goal of the analysis, which needs to be stated explicitly. Davidson advocates using |USE.sup.I~/POP as the measure of inappropriate use, and adjusting for covariates such as the socioeconomic makeup of the counties. This may be reasonable if fairness to providers being targeted is the main criterion for judging an analysis approach, and if certain assumptions are met. However, different analysis approaches may be preferred in other circumstances, such as when cost-effectiveness of an action is the question of interest. Adjusting for covariates may not be appropriate in this case, if assigning blame for inappropriate use is not of interest.

Ideally, a large number of areas should be used in order to increase power. Nonsignificant results, in particular, should be accompanied by either a discussion of power or confidence intervals on the estimated measures of association. If the areas are so small that within-area measurement error (i.e., the binomial sampling variance) is significant, then the bias caused by this measurement error needs to be considered, with either a formal attempt to correct the bias, or at least a discussion of its potential impact.

Let us close by reiterating a concern raised by Leape et al., namely that an emphasis on reducing inappropriate use should not cause us to ignore the problem of underuse among those for whom the procedure would in fact be beneficial. Understanding the variations in appropriate use rates (|U.sup.A~/POP) is just as important as examining variations in inappropriate use. The underlying goal of any analysis of use rate variations should be to improve the appropriateness of care, not just to save money by reducing use rates.


Chassin, M. R., J. Kosecoff, R. E. Park, C. M. Winslow, K. L. Kahn, N. J. Merrick, J. Kessey, A. Fink, D. H. Solomon, and R. H. Brook. "Does Inappropriate Use Explain Geographic Variations in the Use of Health Care Services? A Study of Three Procedures." Journal of the American Medical Association 258, no. 18 (13 November 1987): 2533-37.

Davidson, G. "'Does Inappropriate Use Explain Small-Area Variations in the Use of Health Care Services?' A Critique." Health Services Research 28, no. 4 (October 1993): 389-400.

Efron, B. The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics, 1982.

Leape, L. L., R. E. Park, D. H. Solomon, M. R. Chassin, J. Kosecoff, and R. H. Brook. "Does Inappropriate Use Explain Small-Area Variations in the Use of Health Care Services?" Journal of the American Medical Association 263, no. 5 (2 February 1990): 669-72.

Dr. Cain is Research Scientist, Department of Biostatistics and Office of Nursing Research and Practice, University of Washington; Paula Diehr, Ph.D. is Professor of Biostatistics, University of Washington.
COPYRIGHT 1993 Health Research and Educational Trust
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1993 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Research Issues Symposium
Author:Cain, Kevin C.; Diehr, Paula
Publication:Health Services Research
Date:Oct 1, 1993
Previous Article:"Does inappropriate use explain small-area variations in the use of health care services?" A reply.
Next Article:Calculating the probability of rare events: why settle for an approximation?

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters