# Critical reaction to "A Study of the Relationship Between Severity of Illness and Hospital Cost in New Jersey Hospitals." (response to article by Richard F. Averill et. al. in this issue, p. 587)(includes authors' response) (Research Dialogue)

Averill et al. show that splitting hospital DRGS into subgroups based on severity of illness results in decreased variance of cost for a subset of high-volume DRGs. They argue from this result that a severity adjustment to this subset of DRGs is necessary to achieve a more equitable prospective payment system.This study was large, costly, and competent. The research question is of significant interest to health services researchers and policymakers. Nevertheless, I do not recommend publication in its present form. My reason is methodological and has to do with the built-in bias of the Averill et al. "data-splitting" technique. In general, data-splitting is a very aggressive technique that can result in an "improvement" (i.e., a smaller weighted variance or coefficient of variation) even though the split makes no medical sense whatsoever. This is a troublesome problem, because it calls into question the baseline from which we compute improvements in the data. For example, consider splitting the admissions in a DRG into subgroups containing only one admission each. It is obvious that the variance in each subgroup is zero and, therefore, that the weighted variance is zero. What is not so obvious is the following: when a DRG is split into any two subgroups, the weighted variance of the subgroups cannot be greater than the total variance for that DRG. To my knowledge, this statistical fact has never been pointed out, although it is not hard to prove. Furthermore, what is true for splitting a DRG into two groups must also be true for any additional splitting of these subgroups. Therefore, the weighted variance must weakly decrease at every successive split, and it must strongly decrease to zero for complete splitting.

Because data-splitting contains a built-in bias toward reduced variance, it is impossible to interpret the Averill et al. results, in terms of both the number of DRGs that experienced qualitatively large variance reductions and the overall average reduction achieved by their method. It's not clear what they should do. My recommendation would be to do a Monte Carlo analysis to establish the average variance reduction and, possibly, the sampling distribution achieved by random data-splitting. They could then compare their results against this baseline.

Even supposing the Averill et al. variance reduction to be statistically better than random, it is not clear that it is qualitatively important. They found that, by spending $1.5 million, the State of New Jersey could redirect about $3.66 million in hospital revenues in a budget-neutral payment system. This is a ratio of about 40 cents in administrative costs per dollar of revenue redirected. Given the already high level of administrative expenses in the U.S. health care system, I doubt that federal policymakers would agree that the proposed severity adjustment would significantly improve the Medicare DRG system.

Seventy-six DRGs satisfied criteria for judging that severity had a qualitatively important effect on patient cost. Although these tended to be high-cost DRGs, they nevertheless represented only 41.4 percent of the hospitals' total direct patient care costs. When adjustments were made for severity differences, it was found that 17 of the 25 study hospitals would experience payment changes of less than 3 percent for these 76 DRGs. Again, these do not seem to be large differences from my perspective.

It is also questionable to argue that the objective of the DRG system is to provide equitable hospital payment. This is one objective of the DRG system but it is not the primary one -- DRGs were introduced primarily to promote efficiency and control hospital costs. The system of cost reimbursement that they replaced was equitable in the sense that there were no losers. DRGs try to be equitable in a somewhat different sense, that is, by adjusting payments for hospital cost differences not related to efficiency.

On the positive side, the nonlinear effect of Medicaid patients on the severity adjustment for teaching hospitals is interesting. Averill et al. suggest that hospitals with high Medicaid patient share resemble extended primary care facilities. I did not see a note as to whether their calculations included the "disproportionate share" adjustment already built into Medicare DRG payments. They noted that the New Jersey system adjusts hospital payments for wage rate variation and the indirect cost of teaching, but they did not refer to Medicare's disproportionate share policy. It would be interesting to see how their results change after making this further adjustment. (See Medicare's Disproportionate Share Adjustment for Hospitals, Congressional Budget Office, May 1990.)

The Authors Respond

Feldman states that the splitting of the data into subgroups should cause a reduction in variance even if there was no medical rationale to the splits. In other words, randomly dividing the data into four subgroups will cause a reduction in variance purely by chance. As discussed below, the formation of random subgroups will only reduce the reduction in variance (RIV) in extreme situations that do not apply to the data presented in this article. The formula for RIV of cost can be written

[Mathematical Expression Omitted] where [C.sub.i] is the cost for the [i.sup.th] patient in the data, M is the mean cost across all patients in the data and Mg is the mean cost of the patients assigned to the [g.sup.th] subgroup. If patients are assigned randomly to each of the subgroups, then the expected value of the mean cost ([M.sub.g]) of each subgroup would be equal to the mean of the original population (M). If [M.sub.g] is set equal to M in the above formula, the expected RIV value for a random partitioning of data into four subgroups is zero. Thus, for sample sizes large enough to obtain a reliable estimate of the mean of each subgroup, the expected RIV for a random splitting of data into subgroups is zero. However, as Feldman points out, if the sample size is small (i.e., the sample size approaches the number of subgroups), then a reliable estimate of the mean of each subgroup cannot be obtained and the RIV will be greater than zero. As Feldman points out, in the extreme case when the number of patients is equal to the number of subgroups the RIV will be 100 percent. Therefore, the question he is raising is this: For cost distributions of the type observed in these data, how large a population is necessary to have the RIV approach zero?

In order to answer this question we selected a typical DRG (148) that contained 734 patients. We randomly selected 25 of these patients and then randomly assigned each of the 25 patients to one of four subgroups; we then computed the RIV. We repeated the random assignment to the subgroups and RIV computation ten times. Then we increased the number of patients in the sample in increments of 25 up to 200. The results of the simulation are pictured in the graph shown. The horizontal axis is the number of patients in the sample and the vertical axis is the RIV. The box in the graph represents the interquartile range (i.e., the 25th percentile to the 75th percentile) of the distribution of the RIVs in the ten random simulations. The line in the box is the median value of the RIV across the ten random simulations. The bars in the graph (called whiskers) are the distance to the first point that is within one and one-half interquartile ranges above the 75th percentile or below the 25th percentile. The asterisk and circle are individual RIV values outside the range of the box and whisker.

With only 25 patients the average RIV that results from four randomly assigned subgroups is 13.2 percent. However, the average RIV quickly drops. With 100 patients the average RIV from four randomly assigned subgroups is 3.1 percent, and for 200 patients it is 1.5 percent. The number of patients in the 76 study DRGs ranged from 104 to 2,152. DRG 269 had the fewest patients at 104, but had an RIV of 46.8 percent or more than 15-fold greater than would be expected randomly (i.e., 3.1 percent). The DRG with an RIV that is closest to the value that would be expected randomly is DRG 78 with an RIV of 15.6 and 135 patients. For 125 patients the RIV expected randomly is 2.3 percent, and thus the actual RIV is nearly sevenfold greater. Clearly, the RIVs reported could not be caused by chance.

In the article we reported that the F-statistic was statistically significant at the .001 level for all of the 76 DRGs. The F-statistic was reported in order to assure the reader that the RIVs reported could not have occurred by chance. The analysis described in this response was performed solely for the purpose of demonstrating to the reviewer a second method of reaching the same conclusion as the F-statistic. Performing a complete simulation for each DRG would be a significant amount of work and would really constitute a paper in itself. The F-statistic is a much more traditional way of demonstrating that the RIVs could not have occurred by chance, and one with which the readers will be more familiar. It is difficult in the article to discuss a simulation analysis without going into a great deal of detail. We do not believe that the simulation analysis would add anything more than is already reported with the F-statistic.

The state of New Jersey spent $1,535,195 to collect 76,798 records. However, part of the analysis was to determine which DRGs needed to have severity data collected. The 76 DRGs selected in the analysis included 28,021 patients and at $20 per patient would cost $560,420 to collect. The amount of dollars redirected was $7.3 million (add up the absolute values of the dollar amounts in Table 4). Under a Medicare type payment system, the dollar amount would be 39 percent higher and would be $10.2 million. Thus, the ratio is 5.5 percent in administrative dollars per dollar of revenue redirected. Even this estimate of the ratio of administrative costs to dollars redirected is high since with a severity specific definition of outliers and a less homogeneous collection of hospitals the amount of the money redistributed would be expected to be higher. Thus, based on the above discussion, we do not agree with Feldman's comments regarding the relative administrative costs of such a system.

Feldman does not feel that 2-3 percent shifts in payment were significant. As noted in our article the shift in aggregate payments to an individual hospital could be as high as $1.3 million. Many hospitals in New Jersey have operating margins of less than 1 percent. Rate appeals are often instituted for lower dollar amounts than those reported in the study. To hospital administrators, 2-3 percent payment shifts are considered quite significant.

It should really be left up to the reader to decide if the administrative cost and effect on payment make implementation of a severity adjustment feasible. The important point is that this article provides, for the first time, an accurate assessment of the administrative costs and payment impact.

Although Feldman feels that the objective of an equitable payment system is questionable, equity was one of the primary objectives of the New Jersey Department of Health when they implemented the DRG prospective payment system.

The New Jersey Department of Health does not include in its payment system methodology any adjustment that is equivalent to the Medicare disproportionate share adjustment. It was beyond the scope of the study to attempt to incorporate a disproportionate share adjustment into the New ersey methodology.

Printer friendly Cite/link Email Feedback | |

Author: | Feldman, Roger |
---|---|

Publication: | Health Services Research |

Date: | Dec 1, 1992 |

Words: | 1946 |

Previous Article: | A study of the relationship between severity of illness and hospital cost in New Jersey hospitals. |

Next Article: | Commentary: are we splitting hairs over splitting DRGs? |

Topics: |