Printer Friendly

Adjusting Medicare capitation payments using prior hospitalization data.

Adjusting Medicare capitation payments using prior hospitalization data

The diagnostic cost group approach to a reimbursement model for health maintenance organizations is presented. Diagnostic information about previous hospitalizations is used to create empirically determined risk groups, using only diagnoses involving little or no discretion in the decision to hospitalize. Diagnostic cost group and other models (including Medicare's current formula and other prior-use models) are tested for their ability to predict future costs, using [R.sup.2] values and new measures of predictive performance. The diagnostic cost group models perform relatively well with respect to a range of criteria, including administrative feasibility, resistance to provider manipulation, and statistical accuracy.


In 1982, the Tax Equity and Fiscal Responsibility Act permitted risk contracts with full prospective payment to health maintenance organizations (HMO's) for enrollment of Medicare beneficiaries. Such contracts are intended to be priced on the basis of 95 percent of what HMO enrollees would have cost Medicare had they remained in the fee-for-service (FFS) sector. This requires a methodology for estimating the hypothetical cost to Medicare. The current HMO payment model, known as the adjusted average per capita cost (AAPCC), starts with projected Medicare FFS reimbursements per capita in the counties of residence of HMO enrollees as the basis of payment. The average cost projections are then adjusted for differences in the distribution of HMO enrollees by age, sex, welfare status, and institutional status relative to the distribution of beneficiaries in the same geographic area who receive their care in the FFS sector.

Although individuals exhibit a great deal of variation in yearly medical costs, purely random variability will tend to average out in the aggregate even in a moderate-sized HMO. However, a major shortcoming of the existing AAPCC formula is its inability to adequately adjust capitation levels for systematic differences in the health status of enrolled groups (Trapnell, McKusick, and Genuardi, 1982; Thomas, Lichtenstein, and Wyszewianski, 1986; Gruenberg, Wallack, and Tompkins, 1986; Newhouse, 1986). To the extent that the AAPCC underestimates the expected costs for certain individuals (such as the chronically ill), HMO's with disproportionately many such enrollees are unfairly penalized through adverse selection. Conversely, an HMO with favorable selection is the beneficiary of underserved profits.

In several studies, it has been shown that substantial selection bias does occur in Medicare HMO's (Eggers and Prihoda, 1982; Brown, 1988; U.S. General Accounting Office, 1986). However, healthy people can change only by getting sicker, and the heaviest health utilizers die soonest. Therefore, the health status of an HMO's continuing enrolled population will tend, over time, to regress toward the mean. Although this tendency may lessen the urgency for a health status adjuster in the Health Care Financing Administration (HCFA) payment formulas (Welch, 1985), the effects of continuing selection bias among new recruits and of nonrandom disenrollment patterns can still lead to a perpetual situation of enrollment bias.

Although the Government is worried about HMO's attracting healthier enrollees and discouraging the sicker ones through targeted marketing and selective program development, some HMO's remain convinced that they experience adverse selection. HMO's may also be concerned by the finding that more recent enrollees tend to be sicker than their continuing HMO enrollee counterparts (Halvorson and Stix, 1988). However, as long as pricing is fair, enrollment bias is not a problem; many HMO's would welcome the opportunity to enroll sicker patients without fear of economic disaster. Thus, it is important to be able to modify the current AAPCC classifications to better reflect differences in health status risks among Medicare beneficiaries (Trapnell, McKusick, and Genuardi, 1982; Lubitz, Beebe, and Riley, 1985; Gruenberg, Wallack, and Tompkins, 1986).

In the work reported here, we studied the future cost implications of current hospital utilization, distinguishing among different reasons for prior hospitalizations through principal diagnoses, and developed a diagnostic cost classification that could serve as the basis for health status rate adjusters in an HMO payment formula.

Incorporating health status adjustments into a capitation payment formula through diagnostic information from hospitalizations has the potential to produce a payment model that is easier to administer, more sensitive, and less easily manipulated than models based on volume of previously used services or costs. In this article, we compare the predictive performance of payment models developed using diagnostic hospitalization information with several previously proposed prior-use models. The use of common data (one file for estimating each model and a second for testing) and a wide array of measures of predictive accuracy provides for direct and comprehensive comparisons of the predictive performance of a variety of alternative models.

Prior use and diagnostic refinements

The superior predictive performance of prior-use models relative to the AAPCC risk classifications is well established (McCall and Wai, 1983; Anderson and Knickman, 1984; Beebe, Lubitz, and Eggers, 1985; Anderson et al., 1982; Anderson, Resnick, and Gertman, 1983; Thomas and Lichtenstein, 1986; Lubitz, Beebe, and Riley, 1985). By paying more for sicker enrollees than for healthy ones, HCFA can defuse the debate about the amount and direction of biased selection and can encourage the enrollment of Medicare's most costly beneficiaries into the apparently more efficient setting of managed care. How should we choose among prior-use models?

Thomas et al. (1983) proposed the following criteria for alternative payment models: predictive accuracy, invulnerability to provider manipulation, and administrative simplicity. These are useful for making comparisons. However, a lack of agreement exists about their relative importance, and even for a single criterion it may be unclear how to achieve a definitive ranking.

A prior-use model developed by Beebe, Lubitz and Eggers (1985) was tested and evaluated in an HMO demonstration project at Senior Health Plan in Minnesota. In addition to age, sex, and welfare status, the model uses an individual's number of inpatient hospital days and an indication as to whether Part B deductibles were met in the 2 years prior to HMO enrollment. This model yields more accurate predictions of future costs than the current AAPCC does. Further, it is simpler to administer than many other prior-use models because it relies only on data that are readily available for all Medicare beneficiaries in the FFS system. However, the model can be used to set payments only for individuals with such Medicare FFS experience just prior to their HMO enrollment. In particular, data would not be available for newly eligible Medicare beneficiaries (65-year-olds) and beneficiaries previously enrolled in an HMO. Also, the prior-use information required by the formula (namely, the number of days hospitalized and the meeting of Part B deductibles) are coarse indicators of ongoing health needs; no distinctions are made between self-limiting and continuing illnesses, nor are hospitalizations that might be avoided by an efficient provider distinguished from those that any provider would feel to be necessary.

A problem with any model that bases its payments on a person's utilization in the Medicare FFS sector prior to HMO enrollment is the issue of updating. Over time, the health status of individuals will change, and the model will become less accurate as the prior experience becomes more remote. Updating risk classification on the basis of utilization within the HMO, however, raises questions. One question concerns the nature of the data that it is appropriate to ask HMO's to collect. Another concerns perverse incentives in the payment model that would pay more to an HMO that is a higher utilizer, irrespective of its enrollees' actual level of need.

The use of condition-specific prior-use information to classify enrollees with higher risks or significant future medical expenses was proposed quite early as a potential refinement to the AAPCC (Trapnell, McKusick, and Genuardi, 1982). In the works of Anderson et al. (1982) and Anderson, Resnick, and Gertman (1983), it was demonstrated that certain types of hospitalizations, irrespective of their current costs, could serve as predictors of high future costs. Using a priori clinical judgment, hospitalizations were classified as HCN (high cost next year) if the associated principal diagnosis suggested that the hospitalized individual had entered a state of chronic or long-term frailty requiring a continuing high level of care. It was clear from empirical analysis (even with the crudely coded clinical data available in the 1970's) that the subsequent costs of care for beneficiaries incurring an HCN hospitalization were, in fact, higher than the costs for other hospitalized beneficiaries (Ash et al., 1986).

Another important research contribution toward improving simple prior-use models by use of hospital diagnostic information was the more recent work of Anderson et al. (1986). They suggested that HMO payments should be influenced only by medically necessary prior hospital use, that is, use likely to have been judged necessary in any delivery system. The value of this approach is that actual HMO hospitalization experience could be used in rate adjustments without penalizing providers who reduce hospitalization rates by avoiding unnecessary episodes. A limitation of their methodology is that no pricing distinctions were made among hospitalizations based on the expected level of future need that was suggested by the reason for the hospital episode.

Both of these diagnostic payment models use a priori clinical judgment in defining diagnostic classifications for prior hospitalizations. In terms of explained variance, each approach yielded a model far superior to the current AAPCC payment model. In this article, we develop an alternative diagnostic approach, based on risk categories that we call diagnostic cost groups (DCG's). DCG's are formed using empirically determined similarities in the future costs of individuals hospitalized for different reasons. However, diagnoses thought to involve too much discretion in the decision to hospitalize are ignored in setting payments with the recommended model, irrespective of their future cost implications.

Although our primary objective was to extend and improve the HCN concept by empirically identifying diagnoses associated with high future costs, clinical judgment was incorporated in the study in two ways. First, it was used to aggregate individual three-digit codes of the International Classification of Diseases, 9th Revision, Clinical Modification, or ICD-9-CM (Public Health Service and Health Care Financing Administration, 1980) into a manageable number of clinically meaningful subgroups for empirical analysis, Second, it was used to reclassify a number of diagnoses involving substantial ambiguity or discretion either in the definition of the diagnosis or in the medical necessity of hospitalizations for individuals with that diagnosis. Thus, although we did not consider physician discretion with the rigor reflected in the work of Anderson et al. (1986), clinical judgments about physician discretion were used in the model development.



The data for this study were drawn from HCFA's Continuous Medicare History Sample (CMHS), a file that contains demographic and Medicare usage information derived from claims forms for a systematic 5-percent sample of all Medicare beneficiaries. In turn, a 5-percent sample of the CMHS was randomly selected, yielding 38,705 beneficiaries, or 0.25 percent of the Medicare population. For each of the years 1974 through 1980, beneficiary records were obtained containing the following utilization data: total Part A (hospital) and Part B (physician) costs, number of hospital admissions and days in hospital, and a single code for principal diagnosis for each of up to five hospital admissions in each year.

An estimation file, based on 1975-77 data, and a test file, using the 3 years 1978-80, were extracted from this sample file for the purposes of model fitting and validation, respectively. In each of these two files, the first 2 years serve as a base period, and the final year serves as a target for prediction. In each file, eligible individuals were defined as those whose original Medicare entitlement was not for renal disease, who were at least 65 years of age on January 1 of the first base year, who were alive on January 1 of the target year, and who were eligible for both Parts A and B Medicare reimbursements during the entire 3-year period or until their death in the third year. There were 18,677 eligible individuals in the estimation file and 20,263 eligible individuals in the test file.

A third file, drawn directly from the CMHS (rather than from our 5-percent sample of it), contains data on a 1-percent sample of the eligible individuals who were hospitalized in 1979. (Here, "eligible" is defined as previously outlined, except that the base period is now the single year 1979 and the target year is 1980.) The file consists of one observation for each of the more than 71,000 hospitalizations experienced by the eligible enrollees in 1979, the principal diagnosis of that hospitalization, and the total 1980 cost for the individual who experienced it. This episode file was used in establishing a classification of diagnoses into cost groups (the DCG's) based on different levels of 1980 costs for people who were hospitalized with different diagnoses in 1979.

The CMHS data have two shortcoming for the purposes of our study. The first is that the International Classification of Diseases system changed in 1978 from the Eighth Revision to the Ninth Revision, which is currently in use. Thus diagnostic groupings formed on 1979 hospital data can be only approximately replicated in the earlier time period. Also, a classification based on the current codes cannot be applied to more than 1 year of subsequent data (that is, 1980). Because the coding systems have a substantial amount of overlap, these problems were not considered crucial.

More importantly, however, major changes in coding and in other aspects of hospital behavior have been introduced since 1983 as a result of Medicare's change to prospective payments for hospital care under the diagnosis-related group system. This means that the current work needs to be validated with more recent data, both to see if the ideas are still applicable and, if they are, to find the proper prices in the new environment.

Development of diagnostic cost groups

The nine diagnostic cost groups used in this article were developed in two steps. First we aggregated approximately 800 three-digit ICD-9-CM diagnostic classifications into 78 diagnostic subgroups. This was done as follows. When an approximate minimum sample size of 100 was not met by an individual three-digit code, it was pooled with another code chosen on the basis of clinically judged similarity. Some codes with fewer than 100 episodes were retained as distinct if no clinically meaningful grouping could be identified; some codes with slightly more than 100 cases were pooled if a clinically close category was available. Otherwise we did not combine codes. The pooling was done to provide reliable estimates of expected 1980 costs for each group and yielded the 78 diagnostic subgroups shown in Table 1. Also shown are the number of individuals who incurred at least one hospitalization in 1979 with a principal diagnosis in each subgroup as well as the 1980 mean costs per person-year for the subgroups. In computing these means, the 1980 costs for a person who was hospitalized twice in 1979 with widely different principal diagnoses would contribute to establishing the expected future costs in two distinct subgroups. Costs were treated as costs per person-month of survival in the target year. Thus, for example, a person who died at midyear with $10,000 in costs was counted as one-half of a person-year of experience, at a cost of $20,000 per year.

In the second step of the analysis, the 78 diagnostic subgroups (based on 1979 utilization) were aggregated into nine diagnostic cost groups. The major criterion for inclusion in the same cost group was similar average total costs in 1980. A histogram of these costs is shown in Figure 1. Two clusters of lower cost diagnostic subgroups and four with higher future-cost implications than the middle group (with costs of $2,100-$3,400) can be seen. These clusters eventually became DCG's 1 and 2 (lower cost) and 6-9 (higher cost). The middle cost range, which did not produce distinct clusters was split into three groups. Each of these three (DCG's 3, 4, and 5) has an expected cost range equal to approximately $400, which is somewhat larger than the standard error for mean 1980 cost within most of the 78 subgroups. It is simpler to work with 9 groups with than 78, and little loss in predictive accuracy resulted from the aggregation.

A diagnosis at the low end of a break point in terms of cost was classified in the next lower DCG if, in the opinion of our clinicians, substantial ambiguity or discretion existed in either the definition of the diagnosis or the appropriateness of hospitalization for a person with that diagnosis. The reclassification eliminated the bonus an HMO would otherwise receive for using relatively well-paid, nonspecific categories of diagnosis or for hospitalizing individuals who might be properly treated otherwise. For example, the diagnostic category that comprises symptoms involving head and neck (code 784) was downgraded from DCG 6 to DCG 5 because of concerns about vagueness. In the stub of Table 1, we indicate the DCG for each of the 78 diagnostic subgroups. On a second pass, a more drastic downward reclassification was undertaken, placing many diagnoses in DCG 0, which is equivalent to no hospitalization. Footnotes in Table 1 indicate that all (footnote 2) or some (footnote 3) of the diagnoses that were originally classified in the DCG shown in the stub were eventually reclassified as DCG 0. Pneumonia and influenza (codes 480-487), for example, were downgraded to DCG 0 because the decision to hospitalize many individuals with these conditions is highly discretionary.

This completed the classification of all hospitalizations into 1 of 10 categories, from DCG 0 to DCG 9. Individuals were then classified on the basis of the highest numbered DCG into which any of their hospitalizations fell. Those with no hospital experience in the base year were included in DCG 0. The DCG approach is based on the following presumptions: first, that the rates of occurrence of the high-cost DCG's in a population are a reasonable proxy for the health status of the population; second, that a subsequent year's health care costs can be reasonably estimated through this proxy measure of need.

The 1980 costs of people with different 1979 utilization experiences span a wide range. These differences in costs are summarized using relative costliness indexes (RCI's). The RCI for a subgroup of enrollees is the ratio of the group's average cost to the average cost of the total Medicare study population. Although actual dollar costs change markedly from year to year, the relative costliness of individuals within DCG groups (as reflected in their RCI) should be quite stable. The 79.1 percent of Medicare enrollees who had not been hospitalized at all in 1979 experienced a 1980 RCI of 0.75 (computed as $840, their 1980 average cost, divided by $1,113, the mean 1980 cost for all Medicare enrollees). When the 7.0 percent classified in DCG 1, 2, or 3 were pooled with those never hospitalized, the RCI for this group (now 86.1 percent of the population) rose only to 0.77. Finally, when an additional 5 percent of the population was reclassified into DCG 0 in the revisions indicated by footnotes in Table 1, the RCI for this so-called BASE group became 0.83. Thus, in the final model, DCG 0 is a mix of those not hospitalized and those whose hospitalizations either had relatively low expected future costs or were highly discretionary. The actuarially correct price for this mixed population is only about 10 percent higher than the price for the nonhospitalized people alone (0.83 versus 0.75).

Relative group sizes and their 1980 RCI's based on the 1979 DCG classification are shown in Table 2. There, we can see that RCI's for the more expensive DCG's range from more than 2 to as high as 11.50.

The DCG classifications were checked against the list of HCN diagnoses reported in the work of Anderson et al. (1982) and Anderson, Resnick, and Gertman (1983). Reassuringly, there was substantial agreement between the clinically judged high cost next year diagnoses and those that actually were followed by high costs in the following year.

A second check on the validity of the DCG classification as a way of identifying individuals who will experience a continuing need for high levels of health services in the future was to extend the prediction period from 1 to 3 years. Using 1976 as the base year, future costs for 1977 alone and for the 3-year period 1977-79 were regressed on age, sex, welfare status, and DCG classification. The reimbursement per year for each DCG was similar in both analyses. This suggests that the DCG designation captures some element of long-term chronicity.

Testing models

Our methodology for comparing models is similar to that of Beebe, Lubitz, and Eggers (1985). Each of several candidate models (six of which are described in detail here) was fitted to the estimation file of 1975-77 data described in the "Data" section. Least-squares regressions with predictor variables relating to usage in 1975 and 1976 (the base years) were constructed to predict annualized total costs for eligible persons in 1977 (the target year). Observations were weighted by the fraction of the target year during which the person remained alive and eligible for FFS reimbursement. Models were then compared by studying their behavior on the test file of data relating to the period 1978-80. This procedure simulates the way in which the methodology might actually be applied, because some gap will certainly exist between the availability of data for model fitting and the time at which prospective payment levels must be set.

The fitted models, with coefficients for the base year variables derived from the estimation file, were applied to the new base years, 1978 and 1979, in order to predict-costs in the new target year, 1980. To adjust for inflation, each predicted value was multiplied by the ratio of total actual 1980 costs to total predicted costs. This had the effect of making the sum of all the new predictions for 1980 add to the total 1980 actual costs without changing the relative size of cost predictions for individuals.

Each model was thus determined on the basis of information obtained from an earlier period and used to predict 1980 costs for anyone for whom 1978 and 1979 usage data were available. Models were then evaluated on the basis of how well their predictions fit the actual 1980 costs of the people in the test set, as described next.

Measures of goodness of fit

We have found it useful to compare the performance of models for predicting future costs using several different measures of a model's goodness of fit. Brief descriptions of these follow. Explicit definitions can be found in the "Technical note."

The first measure is the conventional [R.sup.2] calculated by regressing 1977 costs on 1975-76 information. We call it the 1977 [R.sup.2]. A cross-validating, or 1980, [R.sup.2] is also reported. This is a measure of how well the 1980 predictions made using the estimation file (as just described) match 1980 expected costs.

The [R.sup.2] is a measure of how well a formula predicts actual costs for individuals. For our purposes, a formula need only be able to make good predictions for the expected costs of groups of people who, if not priced correctly, might be discriminated against in HMO's. (Newhouse, 1986, discusses this point.) The current AAPCC can make accurate predictions for a good-sized group of randomly selected individuals. However, models that are able to successfully distinguish future costs for groups with a typical health histories are required so that the right payments can be made when an HMO experiences biased enrollment.

Four measures of prediction bias, called predictive ratios (PR's), were produced to evaluate how closely each model predicts the average 1980 cost for four different subgroups of the test file, which were selected on the basis of information available during the base years, 1978 and 1979. The predictive ratio for a particular group and a given model is formed by dividing the total costs that the model predicts for the group by the total costs actually incurred by the group in 1980. PR's greater than 1 indicate groups for which the model will lead to overpayment; PR's less than 1 reflect groups whose costs are higher than the model predicts. The best models will have all PR's for a wide selection of subgroups quite close to 1.

For this article, predictive ratios have been computed for four groups, constructed to address a range of questions about our ability to correctly predict costs for people with specific characteristics. Information about the number of people in these groups and their average 1980 cost experience is given in Table 3. The four groups are named on the left and defined as follows:
65YRF Women 65-69 years of age (in the first
 base year) who were not receiving
 welfare and whose original reason for
 Medicare entitlement was becoming 65
 years old (rather than a disability
 identified at an earlier age).
NOCOST A subset of those in group 65YRF
 defined by having no Medicare
 reimbursement in the prior year.
CVD/CA Anyone hospitalized in the prior year
 for cerebrovascular disease and/or
 cancer (Eighth Revision International
 Classification of Diseases codes 140-201
 or 430-438).
2HOSPS Anyone with at least two
 hospitalizations in the prior year.

The first group was selected by demographic characteristics alone and should otherwise exhibit a representative distribution of prior and future health utilization needs. The other groups all had a typical prior health utilization. The expected Medicare costs of these four groups differ considerably. The relative costliness index (ratio of the mean cost of a subgroup to the total population mean cost) for these groups varied from a low of .35 (NOCOST group) to 2.87 (2HOSPS group). Models were compared on their ability to yield PR's near 1 for each of the four groups.

In summary, we have six different measures of performance for each of the models under consideration: its [R.sup.2] for both the estimation and the test data sets and its predictive ratio for the four subgroups.

Comparing regression models

In this article, six regression models that could be used in setting HMO payments are compared for predictive power. The first four regressions, which have appeared in the literature, establish benchmarks against which the performance of the two new diagnostic models may be judged. All models except Model 4 use the age, sex, and welfare status information that makes up the current AAPCC. (Institutional status, the final factor in the current AAPCC, could not be included because the data were unavailable.) Model 1 uses only this demographic information. Models 2 and 3, originally proposed by Beebe, Lubitz, and Eggers (1985), also use administrative information relating to utilization, which is maintained for all beneficiaries in HCFA's Health Insurance Master Accretions (HIMA) file. Neither model distinguishes among hospital episodes on the basis of diagnostic information. Model 2 uses the presence or absence of any hospital admission in the 1 year immediately preceding the target (or prediction) year; thus, we can evaluate the additional benefit of diagnostic information by comparing its performance with that of the DCG regressions (Models 5 and 6). Model 3, implemented in the Senior Health Plan demonstration, uses 2 years of data and counts total days of hospital stay as well as whether Part B (physician expense) deductibles were met. The last of the benchmark models, Model 4, bases its predictions on a single variable that cannot be obtained from the HIMA file, that is, the dollar amount of Part B expenses.

Models 5 and 6 employ the DCG methodology. The first of these differentiates among five different groups of people classified using the original DCG designations given in Table 1. Model 6 is coarser grained, using fewer groups (DCG's 0-3, 4 and 5, and 6-8), and is based on the revised DCG categories. (Categories footnoted in Table 1 were downgraded into the DCG 0 group.) In developing Model 6, we assumed that individuals in DCG 9, who were hospitalized for treatment related to renal failure, would be reimbursed by a separate, noncapitated mechanism; these individuals were dropped from the analysis.


In this section, we examine the relative empirical performance of the models just described. Table 4 contains the estimated regression coefficients for each of the models. All variables were statistically significant at the 0.05 level (two-sided). Table 5 displays the six goodness-of-fit measures for each model.

The best models cannot predict individual annual costs well. Even with the most complete and complex models that we have ever specified with these data, the [R.sup.2] values obtained have not exceeded 10 percent for the conventional 1977 [R.sup.2] and 7 percent for the cross-validated 1980 [R.sup.2]. Most of the variability in annual costs remains unexplained. Even if we knew a person's expected cost exactly, actual expenditures in any given year would vary widely around this average. For example, Welch (1985) shows that certain plausible assumptions about the functional form of the autocorrelation of costs between 2 years lead to the conclusion that the maximum achievable [R.sup.2] is .20. In light of this, the [R.sup.2] values achieved by the prior-use models of this article may be seen as explaining a substantial part of the explainable variance.

Although each model has a smaller cross-validating 1980 [R.sup.2] than its conventional 1977 [R.sup.2], the relative rankings of the models by these two measures is the same. The current AAPCC formula is the least predictive of the models. It has a 1977 [R.sup.2] of only 0.5 percent, and its 1980 [R.sup.2] is even slightly negative. Model 4 is the best predictive model tested. It has a 1977 [R.sup.2] of 8.5 percent, and its 1980 [R.sup.2] is 4.6 percent. Models 5 and 6 (the two DCG models) and Model 3, which uses information from the HIMA file on hospital days and Part B deductibles, all have similar explanatory power. Somewhat lower [R.sup.2] values are obtained for Model 2, which does not distinguish among hospitalizations through diagnostic information. Only a small loss in explanatory power results from the revisions concerning discretionary admissions and the coarser classification of DCG's in Model 6 relative to Model 5.

All models yielded predictive ratios close to unity for the group 65YRF (defined only on the basis of AAPCC demographic attributes). Thus, the AAPCC, despite its low [R.sup.2], appears to be fully adequate for paying HMO's that enroll a representative mix of enrollees. On the other hand, its predictive ratios were considerably more variable for the other three subgroups than the predictive ratios for the other models were. The current AAPCC formula, for example, dramatically overpays the NOCOST subgroup and underpays those in the two hospitalized groups (CVD/CA and 2HOSPS). Like the [R.sup.2] results, the subgroup predictive ratios were best (that is, closest to unity) for Model 4 (the Part B reimbursement model) and by far the worst for Model 1. Among the remaining models, the HIMA file models (2 and 3) have better predictive ratios than the DCG models (5 and 6) for the NOCOST subgroup, but the DCG models have better PR's for the CVD/CA subgroup. There is little to recommend one DCG model over the other on the evidence of these four predictive ratios.

That the ability to capture future cost differences increases with each model as one moves from Model 1 to Model 4 could have been predicted on the basis of previous research and logic. Demographics alone (Model 1) are known to have low predictive power. Classification by whether a person was hospitalized (Model 2) only splits off the 20 percent who are hospitalized from a heterogeneous residual group. Of the residual group, about one-half have no Medicare expenses, and the rest incur ambulatory care costs ranging from minimal to extremely high. Model 3 has more power because it distinguishes the healthy subset of individuals with no costs to Medicare. Model 4 has the most power of all because even better distinctions among those who use different amounts of ambulatory care can be made by using Part B dollars.

The promising feature of the DCG models is that, despite their inability to make any distinctions among the 80 percent who are not hospitalized, they perform only a little less well than Model 3, which does make such distinctions. In the case of Model 6, in particular, this is true even after the deletions for discretion, which means that less than 10 percent of the people are being distinguished for special payments based on past utilization.

As noted by Thomas et al. (1983), predictive performance is not the sole criterion for evaluating alternative payment models for HMO's. Administrative feasibility and the invulnerability of risk classification to potential manipulation by providers are crucially important for a successful payment system. Despite its strong predictive performance, the Part B reimbursement model has significant drawbacks on grounds of administrative infeasibility and the sensitivity of payment to potential provider manipulation. Part B reimbursement data are not contained in readily accessible HCFA files, and payment rates in a subsequent year would increase for each dollar expended by providers on Part B services. The strong predictive results of the Part B reimbursement model are indicators of the value of research aimed at improving the AAPCC further by using health status information obtained during ambulatory encounters.

Although the HIMA file models are quite attractive on grounds of administrative feasibility, they are also more likely to be subject to provider manipulation than the DCG models are. For example, increased rates associated with meeting Part B deductibles may encourage providers to incur minimal costs so that enrollees meet such a threshold. The DCG models did not surpass all others in predictive performance, but the relatively modest decrease in predictive performance associated with their use may be reasonably traded off with other factors in choosing among alternative HMO payment models.

Discussion and conclusions

The diagnostic models developed here could be used to adjust HMO payments to reflect HMO and local FFS differences in population health status composition in the following way. The first two components of the current AAPCC formula, namely, the projected U.S. per capita Medicare cost and the county geographic adjustment, would be unaffected by the use of the new model. However, the third component of the AAPCC formula, which is a ratio of the average risks for the HMO's enrollees and for members of the FFS population in the counties from which HMO enrollees are drawn, would be estimated using a DCG model.

HMO payments would change under a DCG payment model to the extent that the average risk factor of HMO enrollees relative to the average risk factor of their FFS counterparts differs between the current AAPCC and DCG risk classifications. For example, an HMO may enroll beneficiaries who are quite similar to local FFS beneficiaries in terms of their distribution across current AAPCC risk categories; this would yield a ratio near unity for the third component of the current AAPCC formula. Suppose, however, that significantly greater fractions of the HMO's enrollees than local FFS beneficiaries have experienced hospitalizations with DCG primary diagnoses. Then this ratio would be greater than unity, and HMO payments would be higher than payments under the DCG model.

The data needs for implementing such DCG models are far more demanding than those of the current AAPCC; they are also more demanding than those of prior-use models that require only HIMA file data. However, DCG information is much easier to collect than prior-use expenditure information or any data relating to outpatient utilization would be. This is an important factor affecting administrative feasibility, because the current regulations of the Tax Equity and Fiscal Responsibility Act require HMO payments to be based on expected FFS experience. Thus, any risk classification employed for HMO enrollees must also be used to classify FFS beneficiaries. Because Medicare already requires hospitals to collect diagnostic information for reimbursement purposes under its prospective payment system, the additional data burden would not be as great as it could be with other types of utilization information that might be used as proxies for health status.

An attractive feature of risk classification based on hospital diagnostic information is that it opens up the possibility of setting capitation rates for newly eligible beneficiaries, for enrollees switching from other HMO's, and for continuing HMO enrollees whose health status should be updated. The required diagnostic information for such people should be retrievable through hospital records that can be verified by Medicare audits. Implementation of a model of the kind discussed here would require further work aimed at assuring interphysician recording consistency and, in addition, testing to assure that manipulation resulting in a form of diagnostic coding upgrading ("creep") does not occur.

The value of other data or a longer base period should continue to be explored. Other data could include more detailed information regarding the hospital stay, such as surgical or other procedures, comorbidities, and severity measures. Ambulatory care diagnoses or risk factor information not associated with a particular hospital stay, such as laboratory test results or documentation of functional impairments, might also be used. If predictions can be made substantially more accurate with such data, then the benefit of ensuring that people are properly identified and priced will have to be weighed against the cost of additional data collection and auditing.

When a payment model is used to update risk classifications for HMO enrollees on the basis of HMO utilization, the issue of whether the diagnostic classifications reward inefficient provider behavior becomes key. In recently completed research, Ellis and Ash (1988) developed DCG classifications using the better data reported under Medicare's prospective payment system for hospitals and addressed the issue of physician discretion with more rigor than in the research reported here. In that work, level of physician discretion is defined on the basis of clinical judgments of a physician panel. Approaches for empirically testing the reliability of less discretionary codes (for example, via studies of recording consistency or comparisons of HMO and FFS usage) need to be developed. For the present, however, the incorporation of health status adjustments into an HMO payment model through diagnostic information holds significant promise for improving the current AAPCC method of paying Medicare HMO's.


We are grateful to Michael Shwartz, Randall Ellis, Susan Payne, John Lamperti, Joseph Restuccia, and three careful anonymous reviewers for helpful comments.

Technical note: Goodness-of-fit measures

Conventional and cross-validated [R.sup.2]

When a model is fit to data, a convenient measure of how well its predicted values match the actual ones is the [R.sup.2]. For each model fitted to 1977 outcomes based on 1975-76 data, an [R.sup.2] for the 1977 predictions is computed in the usual way as:

[R.sup.2] = 1 - SS (predicted - actual)/SS (total), where SS (predicted - actual) is the sum, over all observations in the estimation file, of the squares of differences between predicted and actual costs in the target year 1977. With weighted regressions, all sums are weighted sums. For each model, the predicted costs are computed by applying the regression formula to the base-year data, the same data that were used to fit the formula. Then SS (total) is the sum of the squared deviations of each actual cost in the estimation sample from the average such cost.

Because the models are fit using one set of data and will be used to make predictions at a later time, these 1977 [R.sup.2] values overstate our ability to predict costs. Thus, we tend to put more faith in a second and more realistic measure of goodness of fit, the 1980 [R.sup.2]. It has the same form as the equation previously given. However, in this case, the predicted values are made using a formula that employs the estimated parameters obtained from the 1975-77 sample applied to the test-set base-year information (1978-79) and multiplied by an inflation factor. Actual costs are from 1980.

With this methodology, there is no guarantee that a model's 1980 [R.sup.2] will be positive. From the equation, we can see that the 1980 [R.sup.2] would achieve its maximum value of 1.00 only if predicted and actual costs were identical for every person.

In computing these [R.sup.2] values, each person's actual costs are compared with predicted annual rates of expenditure multiplied by the fraction of the year for which the individual remained alive. This reflects the reality that when a person dies in midyear, the HMO receives only one-half of the yearly premium.

Predictive ratios

The predictive ratio for a given subgroup and a given model is defined as

PR = Average of the model's predictions/Average of actual costs, where each of the two averages is taken over the individuals in the subgroup. [Tabular data 1 to 5 omitted] [Figure 1 omitted]

This research was supported by Grant No. 99-C-98526/1 from the Health Care Financing Administration. The views and opinions expressed are the author's, and no endorsement by the Health Care Financing Administration or the Department of Health and Human Services is intended or should be inferred.
COPYRIGHT 1989 U.S. Department of Health and Human Services
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1989 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Ash, Arlene; Porell, Frank; Gruenberg, Leonard; Sawitz, Eric; Beiser, Alexa
Publication:Health Care Financing Review
Date:Jun 22, 1989
Previous Article:Conventional health insurance: a decade later.
Next Article:Trends in hospital labor and total factor productivity, 1981-86.

Related Articles
Adjusting capitation rates using objective health measures and prior utilization.
Evaluation of the Medicare competition demonstrations.
Medicare risk contracting: determinants of market entry.
Analysis of underwriting factors for AAPCC.
Increasing Medicare revenues: the EverCare approach.
Medicare part A utilization and expenditures for psychiatric services: 1995.
Health status of Medicare enrollees in HMOs and fee-for-service in 1994.
Principal Inpatient Diagnostic Cost Group Model for Medicare Risk Adjustment.

Terms of use | Copyright © 2016 Farlex, Inc. | Feedback | For webmasters