# Comparing the agreement among alternative models in evaluating HMO efficiency.

In this article we analyze several methods for estimating firm
efficiency and apply each approach to a nationwide sample of HMOs.
Numerous ways exist to measure the technical efficiency of a firm, based
on the firm's use of resource inputs relative to its production of
outputs. Here we present three of the more common approaches used by
researchers: data envelopment analysis (DEA), stochastic production
frontiers (SPF), and fixed-effects regression (FER). As this article
also shows, the approaches all assume that output is homogeneous;
researchers therefore face the decision not only of choosing a way to
model efficiency, but also of choosing a way to adjust for differences
in output quality and case mix.

Previous work has discussed the theoretical merits of choosing one approach for efficiency analysis over another, but in practice there appears to be no real consensus. All of the models enable researchers to identify the best-practicing firms (or persons) in an industry and to provide benchmarks for other, less productive firms. The use of these models in health services research has increased as a result of the growing interest in the efficiency of health care organizations and personnel (Chilingerian 1997; Nyman, Bricker, and Link 1990; Zuckerman, Hadley, and Iezzoni 1994). Newhouse (1994), however, warns against taking efficiency estimates too literally and using them for report card or reimbursement purposes, since efficiency analyses are unlikely to account completely for heterogeneity and quality differences within an entire industry.

If efficiency analysis cannot be used to determine reimbursement to health care organizations, then what insights can it provide? Moreover, how do researchers choose the most appropriate way to model efficiency, and does the approach matter? We explore these questions by constructing three efficiency models in parallel and comparing their results for a sample of HMOs operating from 1985 through 1994. The HMOs are heterogeneous in terms of size, age, organizational structure, and market environment, all of which potentially affect efficiency. We compare both individual ratings and industry-wide trends, and examine whether, in fact, our conclusions change according to the model used.

DEFINITION AND MEASUREMENT OF EFFICIENCY

For many industries, efficiency is a topic of interest because it reflects a firm's "degree of preparedness" (Aigner, Lovell, and Schmidt 1977) and suggests that production decisions made by some organizations are better than those made by others. Here, we estimate productive technical efficiency, which considers the quantity of resources used relative to the amount of output produced and avoids the need for information on input prices. Technical efficiency is synonymous with managerial efficiency (Charnes, Cooper, and Rhodes 1978) in its assumption that the use of resources is a choice variable over which firms ("managers") have control. It is also similar to Leibenstein's X-efficiency (1966), which predicts that firms use resources more productively when faced with pressures such as market competition or resource scarcity.

The interest in efficiency has led to the development of a class of models known as frontier estimation techniques, to be distinguished from conventional, regression-based models. Regression is designed to depict average behavior; for example, a production function estimates the average output, [??] generated from a set of resources, [X.bar] In contrast, a production frontier estimates maximal output, [Y.sup.*], that a firm should generate from [X.bar], and this estimate, at least intuitively, provides the appropriate benchmark for assessing efficiency.

In choosing a frontier model, researchers must consider a variety of factors (Fare, Grosskopf, and Lovell 1985). For example, some models examine overall efficiency while others focus on allocative, scale, or technical components. The model choice affects the specification of the production technology, as some approaches restrict the technology to a single output (or index of output) while others allow for multiple outputs. Efficiency estimates also can reflect different organizational goals (e.g., profit maximization or cost minimization) by virtue of the primal and dual formulations of the models. Model comparisons can be organized around any of these points, but Schmidt (1985) and Lovell (1993) distinguish among the approaches in terms of two essential features. First, frontiers may be modeled as either parametric or nonparametric functions, and, second, the interpretation of a firm's deviation from the frontier--its error term--depends on whether the model is stochastic or deterministic.

Functional Form

Parametric techniques impose a functional form on the production technology to describe the transformation of inputs into outputs, introducing a risk associated with misspecification. The sensitivity of efficiency estimates to misspecification has been demonstrated using Monte Carlo simulations, where both the true functional form of the technology and the distribution of efficiency across observations are known (Gong and Sickles 1992; Banker, Gadh, and Gorr 1993). From these studies, researchers conclude that parametric approaches are best applied to industries with well-defined technologies to minimize the risk of misspecification. On the other hand, for industries with imprecise technologies (such as the service sector), non-parametric approaches are flexible and may be more desirable to use (Charnes, Cooper, Rhodes 1978).

Error Term

Frontier techniques also differ with respect to the error term, which may be deterministic or stochastic. In either case, the error is computed as the difference between the optimal quantity of goods or services predicted by the frontier and the actual level of output. Both types of models assume that inefficiency is derived from this error term. In the deterministic case the error term is inefficiency, and the entire observed deviation is attributed to poor decision making by the firm. This interpretation implies that a firm's level of production is totally within its control and that it is always feasible for the firm to perform optimally. Although computationally convenient, the assumption is unrealistic and makes no allowance for the possibility of random noise or measurement error. A stochastic framework, on the other hand, provides a less restrictive interpretation, and the error term (called a "composed error") is assumed to contain both randomnesss and technical inefficiency. Stochastic models allow production to be affected by measurement error, omitted variables, and simple good or bad luck (Schmidt 1985).

Theoretically, the most desirable frontier approaches appear to be those that allow for flexibility with respect to both functional form and the error term. Although some success has been achieved in developing models that are non-parametric and stochastic (Banker 1986; Land, Lovell, and Thore 1993), such models are computationally difficult and remain inaccessible to most researchers. By far, the most popular and widely applied frontier models are intermediate methods, either non-parametric, deterministic models or parametric, stochastic ones. With readily available software, these approaches are the logical starting point for researchers who want to conduct applied efficiency studies without developing their own programs. In the sections that follow, we briefly review the models for this article.

DATA ENVELOPMENT ANALYSIS

Data Envelopment Analysis (DEA) uses mathematical programming to compute non-parametric, deterministic measures of technical efficiency based on the notion of minimal resource usage relative to production (Koopmans 1951). DEA treats efficiency as a weighted productivity ratio, comparing the quantity of outputs generated to that of inputs consumed. Weights are chosen in such a way that the ratio, or efficiency score, is maximized for each firm in the interval [0, 1]. The best performers in the sample trace out the production frontier and receive scores of 1. This frontier strictly bounds ("envelopes") the remaining data points from above, and the interior observations receive non-negative scores less than 1 based on their proximity to the frontier.

Because DEA is deterministic, all of the distance between an interior point and the frontier is attributed to inefficiency, but there may be other drawbacks. DEA uses extreme observations to trace out the frontier; if any of these contain measurement error, then interior firms may be compared to the "wrong" frontier. Likewise, if the set of inputs and outputs is misspecified, either by omitting pertinent variables or including irrelevant ones, the frontier can be displaced, potentially affecting efficiency ratios (Caulkins and Duncan 1993). Finally, some DEA models impose a restrictive assumption of constant return-to-scale on the entire sample. For this article we use a variant of DEA introduced by Banker, Charnes, and Cooper (1984) that is fairly common and allows for variable returns-to-scale. (See the appendix for details.)

STOCHASTIC PRODUCTION FRONTIERS

A second popular frontier method, referred to as a Stochastic Production Frontier (SPF) (Aigner, Lovell, and Schmidt 1977; Meeusen and van den Broeck 1977), allows production to be affected by randomness as well as inefficiency. Incorporating a stochastic error term is useful, but researchers observe only the net effect of randomness and inefficiency. Consequently, the composed error must be separated into a two-sided error term and a one-sided measure of inefficiency. The decomposition requires three types of assumptions: a functional form to describe the production technology, a distribution for the one-sided inefficiency term, and an estimator for the model. (See the appendix for details.)

Our example, included in the appendix, uses a translog production function, assumes a half-normal distribution for the inefficiency term, (1) and uses maximum likelihood estimation. Finally, the error term is decomposed as [[epsilon].sub.it] = [[upsilon].sub.it] - [u.sub.it] where [v.sub.it] is the usual two-sided measure of randomness and [u.sub.it] is a one-sided estimate of firm inefficiency--in other words, output that is expected but not realized. (2) We use this in place of the standard decomposition, [[epsilon].sub.it] = [[upsilon].sub.it] - [u.sub.i], which provides a firm-specific measure of inefficiency that is constant over time and precludes the testing of whether efficiency changes or improves during the time period. (3)

FIXED-EFFECTS REGRESSION

In addition to frontier models, organizational efficiency can be estimated using traditional fixed-effects regression. Like SPF, the FER model is parametric and has a random error term, but it differs from SPF in that it does not derive estimates of efficiency from this error term. Instead, the FER model includes binary (dummy) regressor variables to measure efficiency in terms of "firm effects" and "time effects" (Greene 1993). Since efficiency is included as a regressor variable and is measured directly, it is unnecessary to impose any distributional assumptions.

Because FER is not a frontier approach, firms are not (necessarily) evaluated in terms of best practice. Instead, any arbitrary observation in the data set may be used as the standard against which the remainder of the sample is compared; this may include average production, median production, maximal production--whatever best suits the researcher's objectives. In other words, FER estimates [bar.Y], which need not be the maximal value ([Y.sub.*]) used in SPF.

The ordinary two-way fixed-effects regression model uses dummy variables to estimate the effect of each firm and each time period on efficiency (Kumbhakar 1990), and the model's error term ([v.sub.it] is the usual two-sided measure of randomness. The limitation of using the two-way model is that the change in efficiency over time is highly specialized. The coefficients associated with dummy variables for time impute the same average change in efficiency on all firms. As a result, the numerical estimates vary by firm and by time, but the rank ordering of firms remains constant in all years.

For firm-specific changes over time, the two-way FER model can be modified to include a linear, firm-specific time trend. The time trend generates more flexible measures while maintaining the stochastic framework. (4) Greene (1993) describes the model as an "enhanced fixed effects model"; we provide an example in the appendix. Each observation is still characterized in terms of a firm effect ([[delta].sub.i]). A linear trend, denoted t (where t = 1, 2, ..., T for each time period), now replaces the separate dummy variables for time, and the time effect is the product of a firm-specific coefficient and the linear trend ([[omega].sub.i]t), thereby allowing each firm to experience separate changes in efficiency for each time period.

From the preceding overview, it should be apparent that, on balance, no single model dominates the other two and that researchers have several options for estimating technical efficiency. Table 1 summarizes the properties for all three models. In some applications, researchers may be able to choose one of the models on the basis of a specific characteristic, but in many cases researchers have competing objectives that cannot be reconciled with a single approach. Moreover, we do not know if these differences lead to significantly different estimates. To examine the practical implications, we turn to our sample of HMOs.

EXAMPLE: THE HMO INDUSTRY

Data Sources

The data set contains annual, firm-level information for health maintenance organizations in the United States from 1985 through 1994. Financial information, enrollment figures, and utilization statistics have been obtained from Health Care Investment Analysts (HCIA). HMOs file reports with state regulators each year, and HCIA compiles this information into a standard format. This database is then matched with the InterStudy HMO Census, which describes model type, profit status, and geographic location. Together with the GHAA Directory of Health Maintenance Organizations, it records changes in ownership and plan terminations and can be used to construct measures of market competition and saturation.

State-level mandates defining the regulatory environment in which HMOs operate are reported by Aspen Systems Corporation. General demographic information and community characteristics such as population statistics, per capita income, physician supply, and average hospital occupancy are summarized in the Area Resource File (U.S. Department of Commerce).

Matching information across these sources provides information for 585 HMOs during the ten-year period. Because of new entry, mergers, and failures, the panel is unbalanced; further, more than 500 observations have been removed because of missing information. The final sample contains 2,739 observations. (5) (See Table 2.)

We specify a production technology consisting of one output and four inputs. The output is defined as total member-years of coverage provided by an HMO during the year. Outputs are goods or services that generate revenue for the firm, and in the case of HMOs, enrolling members (selling coverage) constitutes the primary source of revenue. A single output is admittedly a simplification, but it is necessary in order to develop all of the models in parallel. Multiple outputs to estimate coverage for private, Medicare, and Medicaid members separately have been used in the context of cost functions (Wholey et al. 1996) and other extensions of the current study.

The inputs used by HMOs consist of hospital days, ambulatory visits, administrative expenditures, and other expenditures. This conceptualization of resource inputs differs from previous work, where HMOs have been assigned the same production technology as hospitals, and both inpatient days and outpatient visits count as outputs rather than inputs (Bothwell and Cooley 1980). We disagree with this specification because days and visits are used in providing enrollee coverage; both impose costs onto the HMO (Bryce 1996). Total hospital days and ambulatory visits are reported as quantities; they are price-free measures of inpatient and outpatient services, respectively, and they are highly correlated with the corresponding expenditure information (r > 0.90). (6) The third input, administrative expenditures, is reported in real dollars and estimates labor resources for administrative personnel not directly associated with the delivery of services to HMO enrollees. Likewise, the last input (other expenditures) is reported in real dollars; it refers to interest expenditures and medical supplies (such as pharmaceuticals or crutches).

The data set also includes organizational attributes (model type, profit status, federal qualification); measures of case mix (percent of Medicare enrollees, percent of Medicaid enrollees); and state regulations (whether the state requires reserves, guarantees for payment, rate approval, or consumer representative membership on the HMO board)--any of which may be related to an HMO's use of resources and, hence, the model estimates of technical efficiency. These variables are not included in the production function specification but are retained in the data set for later comparisons.

Model Estimates

We compute three estimates of technical efficiency for each observation in our sample: a DEA estimate that allows for variable returns-to-scale, a time-varying SPF estimate, and an enhanced FER estimate using a linear trend.

However, the models themselves do nothing to control for differences in quality of output (quality of coverage) or the case mix of HMO enrollees. Although it might be reasonable to assume output homogeneity in some industries, this assumption is unlikely to hold for the HMO industry. Moreover, differences in case mix and quality offer legitimate explanations about why some firms may use resources more or less intensively than their competitors. Inferences regarding technical efficiency are therefore confounded, and we need to adjust the model estimates accordingly.

In general, examples of possible ways to adjust for HMO differences include parsing the sample into homogeneous subgroups and estimating efficiency separately for each group; weighting inputs before computing the efficiency scores to reflect differences in quality or case mix; and using the estimated efficiency score as a dependent variable in a second-stage-determinants regression against possible covariates.

We present the last of these adjustments and estimate a separate determinants regression for each model. Organizational differences are measured by defining the HMO's model type as either independent practice associations (IPAs) or groups (by collapsing group, staff, network, and mixed HMOs). We also include a variable to denote whether the HMO is a for-profit or a non-profit organization. The data set has only limited case mix information; we include the percentage of enrollees covered by Medicare and Medicaid separately. Likewise, federal qualification and state regulations are, at best, rough proxies for quality, and we again use dummy variables to denote the presence or absence of these mandates. (7)

After controlling for these covariates, an improved measure of efficiency is left in the observed error term. The correspondence between the unadjusted and adjusted efficiency estimates is shown in Table 3 for each of the models. We do not compare the actual numerical estimates, because adjusted and unadjusted values are scaled differently; instead, we present the Spearman correlation coefficients to demonstrate the ordinal ranking of HMOs relative to one another. In all three models, HMO ranks change, albeit only slightly.

The usefulness of adjusting for case mix is perhaps more apparent in the coefficient estimates of the regression models, presented for each model in Table 4. We see that HMOs with a higher proportion of Medicare or Medicaid enrollees (more intensive case mix) tend to receive lower efficiency estimates from all three models. The models provide higher estimates to federally qualified HMOs, which is surprising since qualification is presumably costly for the HMO. The other proxies for quality (state mandates) show almost no significance, which may be explained by the fact that there is little variation in these variables (the mandates are applicable to most of the HMOs in our sample). In addition to case mix and quality, we can also see the effect of model type, suggesting that group models tend to receive higher efficiency estimates. The negative relationship between profit status and efficiency is counterintuitive, since we expect for-profit organizations to eliminate any excess use of resource inputs. Overall, the results suggest that the three approaches to modeling efficiency are affected in similar ways by the covariates that we included. In other words, we identify similar industry-wide trends (e.g., group HMOs are generally rated as more efficient than IPA organizations) irrespective of the model employed to estimate efficiency.

We also can display industry trends graphically. Figures 1 through 3 show the distribution (interquartile range) of efficiency estimates for each model over time. Based on these figures, we see similar changes (in terms of direction) during the ten-year period, especially for DEA and SPF. The FER model also shows the same pattern in the median estimates through 1989, after which the FER model suggests continued improvement in efficiency while DEA and SPF suggest more variable fluctuations or downturns.

[FIGURES 1-3 OMITTED]

Finally, it is important to consider whether the three models agree in their assessment of efficiency for individual HMOs since, in practice, efficiency estimates are often used to rate a firm relative to its competitors. Whether we use DEA, SPF, or FER, we want to reliably identify more efficient, best-practicing firms and ensure stable decision-making behavior--for example, either in an employer's decision to contract with HMOs or in a policymaker's decision to develop incentives on the basis of efficient industry practices. If the models do not agree in their assessments but identify different HMOs as efficient, then our conclusions (and our decisions) are sensitive to choice of model. On the other hand, if the models assign similar ratings, then the theoretical differences described earlier (parametric versus nonparametric, etc.) may be less important in practice.

Table 5 presents the Spearman correlation coefficients for DEA, SPF, and FER efficiency estimates controlling for case mix and quality. The two stochastic models (SPF and FER) demonstrate the strongest agreement, but all of the correlation coefficients are reasonably high. However, even though the correlations are significantly different from zero, in every case, they are also significantly different from 1.0 (p < .0001). This implies that the overall rankings of the HMOs in our sample differ across the three models, and--depending on the model we use--we would identify different firms as efficient. These results serve as a caution to researchers: the models may be useful in understanding overall trends, but they are not informative in ranking individual performance.

As we saw previously in Table 1, each model has desirable properties as well as tradeoffs associated with it. One finding that may surprise some readers is that the FER estimates compare reasonably well with the two frontier approaches. Not only does FER allow for flexible changes in efficiency over time, it also uses the information provided by the panel data set to identify observations for each firm. There appear to be only two potential shortcomings with FER: it requires defining a functional form (which is also true of SPF), and it does not compare firms to the production frontier but may use other benchmarks instead. If, however, the researcher's goal is to distinguish more efficient firms from less efficient ones, then it is not obvious that an estimate of the frontier is required. Average or median functions still allow researchers to rank firms, suggesting that a large part of "efficiency analysis" can take place by using regression methods that are available through virtually any statistical package.

DISCUSSION

There are two important caveats to the work presented here. First, because it was our intent to compare the three models, we constrained them in ways that ordinarily might not be necessary. For example, estimating productive efficiency limited us to a single output in the stochastic models (SPF, FER); hence, DEA also used the same definition of output even though it is capable of defining multiple outputs. On the other hand, DEA is more sensitive to missing values than either SPF or FER, so our sample was restricted to observations with complete information. Thus, if allowed to operate independently and not for reasons of comparison, the models might each be able to "do more" than we have shown here. Nevertheless, the comparison seems useful for demonstrating that alternative models intended to address the same notion and based on the same information can still produce important differences. Second, several variants of all three models are available. We did not choose the simplest (and, in most cases, the most restrictive) version, nor did we choose the most advanced. Instead, we chose well-known variants of each model that researchers are likely to encounter when they examine the alternatives for modeling efficiency. There are other, more specialized models, and we encourage researchers to investigate the variants that may be more appropriate for their specific application.

It is worth re-emphasizing that, in the strictest sense, a production function (or a production frontier) makes no allowance for differences in service quality or enrollee case mix. Both are treated as homogeneous across the sample by all three models. It is the responsibility of the researcher to decide whether or not such assumptions are accurate for any particular example. If not, then the researcher can make the appropriate adjustments in any of several ways (e.g., by parsing, weighting, or running secondary analyses on the sample). Such adjustments are necessary in order to make reasonable inferences regarding efficiency; yet, as we have shown here, even with adjustments the techniques may not provide consistent information on individual performance.

Choosing an appropriate technique for evaluating efficiency in any given application is difficult. While some researchers may prefer an approach on the basis of its theoretical properties, each choice requires tradeoffs that may be undesirable. Few researchers, however, have demonstrated the practical implications that model choice carries regarding efficiency estimates and the inferences that follow (Banker, Conrad, and Strauss 1985). This article illustrates that model selection can influence which firms are rated as most efficient. We therefore cannot simply dismiss the decision as arbitrary.

This article introduces the reader to the most common techniques for conducting efficiency analyses. Using a multiplicity of approaches and comparing the sensitivity of our findings to the methods is useful, whether or not the methods reinforce one another. Our example shows that model choice can matter a great deal, which should be of concern to researchers looking to benchmark individual performance. However, the models may be useful, too, in providing us with overall insights about industry-wide trends. We hope that the example presented here does not discourage researchers from conducting studies in efficiency, but that, instead, it enables researchers to approach the efficiency analyses thoughtfully and to test the robustness (and the reasonableness) of their assumptions and their conclusions.

APPENDIX: MODELS FOR ESTIMATING TECHNICAL EFFICIENCY

Data Envelopment Analysis (DEA)

The DEA estimate of efficiency is non-parametric and treats efficiency as a weighted productivity ratio comparing outputs to inputs. There are several versions of the DEA model; here we present the Banker, Charnes, and Cooper (1984) (BCC) algorithm that allows for variable returns-to-scale. The following optimization problem is solved separately for each observation in the sample:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In this weighted ratio of outputs (Y) to inputs ([X.sub.j]), DEA chooses optimal weights u and [v.sub.j] for the observation under consideration (denoted as firm o, above). The input and output combinations used by the remaining observations (denoted above as q) serve as constraints in the problem. The variable w serves as a convexity constraint that allows the frontier to envelope the observations more tightly than algorithms that impose constant returns-to-scale. Solving for u, [v.sub.j] and w maximizes [h.sub.o] and gives us the BCC efficiency score.

Stochastic Production Frontiers (SPF)

SPF models are parametric and must specify a production technology for transforming inputs into outputs. The model includes a "composed error" term that estimates the net effect of inefficiency and randomness. To decompose the error term, researchers must specify a functional form for the production technology, choose a distribution for the inefficiency term, and choose a model estimator.

To alleviate problems associated with the first assumption, researchers increasingly use the transcendental logarithmic ("translog") function because it is a flexible form that includes more restrictive forms such as Cobb-Douglas and Constant Elasticity of Substitution (CES) as special cases. For the second assumption, various one-sided distributions have been tested, including the half-normal (Aigner, Lovell, and Schmidt 1977), exponential (Greene 1990), and gamma distributions (Meeusen and van den Broeck 1977), and there are no strict rules for selecting a distribution. Finally, in choosing an estimator, SPF models appear to be fairly robust across maximum likelihood, corrected OLS, generalized least squares and the within estimator (Cornwell, Schmidt, and Sickles 1990). Gong and Sickles (1992) prefer the within estimator on the basis of computational ease, but the current availability of software packages makes most of these techniques practical to use.

For example, the following SPF model uses a translog production function and estimates efficiency for firm i in year t:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where Y and X still denote output and inputs, and [[epsilon].sub.it] is the composed error term.

In a standard SPF model, [[epsilon].sub.it]= [[upsilon].sub.it] - [u.sub.i] where [v.sub.it] is the usual two-sided error term and [u.sub.i] is a one-sided estimate of firm inefficiency--in other words, output that is expected but not realized. Based on the work of Jondrow et el. (1982) and Battese and Coelli (1988), [u.sub.i] is derived from firm i's error terms over time:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If researchers are interested in whether or not firm i changes or improves its efficiency during the time period, then this approach is unsatisfactory. Instead, SPF may define the error term as [[epsilon].sub.it] = [[upsilon].sub.it] - [u.sub.it] and estimate [u.sub.it] instead of [u.sub.i], computing inefficiency measures that vary by firm and by year. However, in so doing, we treat the sample as a cross-section rather than a panel and no longer can use the time series for firm i to estimate its inefficiency.

Fixed-Effects Regression (FER)

The ordinary two-way fixed-effects regression model uses dummy variables to estimate efficiency directly (Kumbhakar 1990). Again using a translog specification, the basic two-way FER model is specified as:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The variables [a.sub.i] and [c.sub.i] identify firms and time periods, respectively. To avoid problems of perfect collinearity, dummies for one firm and one year are omitted and captured in the intercept, [[alpha].sub.o]. The error term ([v.sub.it] is the two-sided measure of randomness. The function can be estimated using least squares after including dummy variables (LSDV) or the within estimator.

Coefficients associated with the [a.sub.i] and [c.sub.t] denote the firm effect ([[delta].sub.i]) and time effect ([[gamma].sub.t]), respectively, but change in efficiency over time is highly specialized. The estimates of [[gamma].sub.t] impute the same average change in efficiency on all firms in the sample for a given time period. As a result, the numerical estimates vary by firm and by time, but the rank ordering of firms remains constant in all years.

To allow for flexible, firm-specific changes over time, the basic two-way FER model can be modified to include a linear, firm-specific time trend. The time trend generates more flexible measures while maintaining the stochastic framework. Greene (1993) describes the model as an "enhanced fixed effects model":

Each observation is still characterized in terms of a firm effect ([[delta].sub.i]). A linear trend, denoted t (where t = 1, 2 ..., T), now replaces the separate dummy variables. The time effect is the product of a firm-specific coefficient and the linear trend ([[omega].sub.i]t), thereby allowing each firm to experience individual changes in efficiency for each time period.

The work presented here is part of a larger study supported by the Agency for Health Care Policy and Research (now Agency for Healthcare Research and Quality) to evaluate changes in the HMO industry (R01 HS09200-01).

NOTES

(1.) To test the distributional assumption in this study, we compared SPF measures for half-normal, truncated normal, and exponential distributions. The measures of [u.sub.it] demonstrated near-perfect correlation ([r.sub.s] > 0.99); hence, we include only estimates for the half-normal.

(2.) Of the approaches presented here, only SPF produces a measure of inefficiency (shortfall flora the frontier) rather than efficiency (proximity to the frontier).

(3.) However, this decomposition has the advantage of using the panel nature of the data set and linking all observations for firm i together.

(4.) Cornwell, Schmidt, and Sickles (1990) present an extended version, using both linear and quadratic trend variables.

(5.) Steps to impute missing values do not change the basic results of this article.

(6.) Although a recent paper by Rosenman, Siddharthan, and Ahem (1997) defines expenditures as inputs, we use input quantities instead of expenditures whenever possible. Both geographic variation and price breaks for larger firms affect factor prices; this confounds the results because the efficiency estimate consists of both technical and allocative components.

(7.) HEDIS-type measures are not consistently available during the 1985-1994 time period.

REFERENCES

Aigner, D., C. A. K. Lovell, and P. Schmidt. 1977. "Formulation and Estimation of Stochastic Frontier Production Function Models." Journal of Econometrics 6 (July): 21-27.

Aspen Systems Corporation. 1988-1991. A Report to the Governor on State Regulation of Health Maintenance Organizations. Aspen Systems Corporation, Rockville, MD.

Banker, R. D. 1986. "Stochastic Data Envelopment Analysis." Working Paper 86-8, Carnegie Mellon University, School of Urban and Public Affairs, Pittsburgh, PA.

Banker, R. D., A. Charnes, and W. W. Cooper. 1984. "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis." Management Science 30 (9): 1078-92.

Banker, R. D., R. F. Conrad, and R. P. Strauss. 1985. "A Comparative Application of DEA and Translog Methods: An Illustrative Study of Hospital Production." Management Science 32 (January): 30-44.

Banker, R. D., V. M. Gadh, and W. L. Gorr. 1993. "A Monte Carlo Comparison of Two Production Frontier Estimation Methods: Corrected Ordinary Least Squares and Data Envelopment Analysis." European Journal of Operational Research 67 (3): 332-43.

Battese, G. E., and T.J. Coelli. 1988. "Prediction of Firm-Level Technical Efficiencies with a Generalized Frontier Production Function and Panel Data." Journal of Econometrics 38 (3): 387-99.

Bothwell, J. L., and T. F. Cooley. 1982. "Efficiency in the Provision of Health Care: An Analysis of Health Maintenance Organizations." Southern Economic Journal 48 (4): 970-84.

Bryce, C. L. 1996. "Alternative Approaches to Estimating the Efficiency of Health Maintenance Organizations." Ph.D. dissertation, Carnegie Mellon University, Heinz School of Public Policy and Management.

Caulkins, J. P., and G. T. Duncan. 1993. "Robustness of Data Envelopment Analysis with Respect to Misspecification of Input and Output Variables." Working Paper 93-76, Carnegie Mellon University, Heinz School of Public Policy and Management, Pittsburgh, PA.

Charnes, A., W. W. Cooper, and E. Rhodes. 1978. "Measuring Efficiency of Decision Making Units." European Journal of Operational Research 2 (6): 429-44.

Chilingerian, J. A. 1997. "DEA and Primary Care Physician Report Cards: Deriving Preferred Practice Cones from Managed Care Service Concepts and Operating Strategies." Annals of Operations Research 73 (1): 35-66.

Christianson, J. B., S. M. Sanchez, D. R. Wholey, and M. Shadle. 1991. "The HMO Industry: Evolution in Population Demographics and Market Structure." Medical Care Review 48 (1): 3-46.

Cornwell, C., P. Schmidt, and R. C. Sickles. 1990. "Production Frontiers with Cross-Sectional and Time-Series Variation in Efficiency Levels." Journal of Econometrics 46 (October/November): 185-200.

Fare, R., S. Grosskopf, and C. A. K. Lovell. 1985. The Measurement of Efficiency of Production. Hingham, MA: Kluwer Academic Publishers.

Gong, B., and R. C. Sickles. 1992. "Finite Sample Evidence on the Performance of Stochastic Frontiers and Data Envelopment Analysis Using Panel Data." Journal of Econometrics 51 (January/February): 259-84.

Greene, W. H. 1993. Econometric Analysis, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, Inc.

--. 1990. "A Gamma-Distributed Stochastic Frontier Model." Journal of Econometrics 46 (October/November): 141-63.

Health Care Investment Analysts, Inc. 1988-1991. HMO Data Base, Diskette Series, User's Manual. Baltimore, MD: HCIA.

InterStudy. 1988-1994. The InterStudy Competitive Edge. Excelsior, MN: InterStudy.

Jondrow, J., C. A. K. Lovell, I. S. Materov, and P. Schmidt. 1982. "On the Estimation of Technical Inefficiency in the Stochastic Frontier Production Function Model." Journal of Econometrics 19 (August): 233-38.

Koopmans, T. C. 1951. "An Analysis of Production as an Efficient Combination of Activities." In Activity Analysis of Production and Allocation, edited by T. C. Koopmans. Cowles Commission for Research in Economics, Monograph No. 13. New York: John Wiley & Sons, Inc.

Kumbhakar, S. C. 1990. "Production Frontiers, Panel Data, and Time-Varying Technical Inefficiency." Journal of Econometrics 46 (October/November): 201-22.

Land, K. C., C. A. K. Lovell, and S. Thore. 1993. "Chance-Constrained Data Envelopment Analysis." Managerial and Decision Economics 14 (6): 541-54.

Leibenstein, H. 1966. "Allocative Efficiency vs. 'X-Effinciency.'" The American Economic Review 56 (3): 392-415.

Lovell, C. A. K. 1993. "Production Frontiers and Productive Efficiency." In The Measurement of Productive Efficiency, edited by H. O. Fried, C. A. K. Lovell, and S. S. Schmidt. 1993. New York: Oxford University Press.

Meeusen, W., and J. van den Broeck. 1977. "Efficiency Estimation from Cobb-Douglas

Production Functions with Composed Error." International Economic Review 18 (2): 435-45.

Newhouse, J. P. 1994. "Frontier Estimation: How Useful a Tool for Health Economics?" Journal of Health Economics 13 (3): 317-22.

Nyman, J. A., D. L. Bricker, and D. Link. 1990. "Technical Efficiency of Nursing Homes." Medical Care 28 (6): 541-51.

Rosenman, R., K. Siddharthan, and M. Ahem. 1997. "Output Efficiency of Health Maintenance Organizations in Florida." Health Economics 6 (3): 295-302.

Schmidt, P. 1985. "Frontier Production Functions." Econometric Review 4 (2): 289-328.

Wholey, D. R., R. Feldman, J. Christianson, and J. Engberg. 1996. "Scale and Scope Economies Among HMOs." Journal of Health Economics 15 (6): 657-84.

Zuckerman, S., J. Hadley, and L. Iezzoni. 1994. "Measuring Hospital Efficiency with Frontier Cost Functions." Journal of Health Economics 13 (3): 255-80.

Address correspondence to Cindy L. Bryce, Ph.D., Research Assistant Professor of Medicine, Center for Research on Health Care, University of Pittsburgh, 200 Lothrop Sweet, Pittsburgh PA 15213. John B. Engberg, Ph.D. is Associate Professor of Economics, H. John Heinz III School of Public Policy and Management, Carnegie Mellon University; and Douglas R. Wholey, Ph.D. is Professor, Division of Health Services Research, University of Minnesota. This article, submitted to Health Services Research on September 15, 1998, was revised and accepted for publication on July 7, 1999.

Previous work has discussed the theoretical merits of choosing one approach for efficiency analysis over another, but in practice there appears to be no real consensus. All of the models enable researchers to identify the best-practicing firms (or persons) in an industry and to provide benchmarks for other, less productive firms. The use of these models in health services research has increased as a result of the growing interest in the efficiency of health care organizations and personnel (Chilingerian 1997; Nyman, Bricker, and Link 1990; Zuckerman, Hadley, and Iezzoni 1994). Newhouse (1994), however, warns against taking efficiency estimates too literally and using them for report card or reimbursement purposes, since efficiency analyses are unlikely to account completely for heterogeneity and quality differences within an entire industry.

If efficiency analysis cannot be used to determine reimbursement to health care organizations, then what insights can it provide? Moreover, how do researchers choose the most appropriate way to model efficiency, and does the approach matter? We explore these questions by constructing three efficiency models in parallel and comparing their results for a sample of HMOs operating from 1985 through 1994. The HMOs are heterogeneous in terms of size, age, organizational structure, and market environment, all of which potentially affect efficiency. We compare both individual ratings and industry-wide trends, and examine whether, in fact, our conclusions change according to the model used.

DEFINITION AND MEASUREMENT OF EFFICIENCY

For many industries, efficiency is a topic of interest because it reflects a firm's "degree of preparedness" (Aigner, Lovell, and Schmidt 1977) and suggests that production decisions made by some organizations are better than those made by others. Here, we estimate productive technical efficiency, which considers the quantity of resources used relative to the amount of output produced and avoids the need for information on input prices. Technical efficiency is synonymous with managerial efficiency (Charnes, Cooper, and Rhodes 1978) in its assumption that the use of resources is a choice variable over which firms ("managers") have control. It is also similar to Leibenstein's X-efficiency (1966), which predicts that firms use resources more productively when faced with pressures such as market competition or resource scarcity.

The interest in efficiency has led to the development of a class of models known as frontier estimation techniques, to be distinguished from conventional, regression-based models. Regression is designed to depict average behavior; for example, a production function estimates the average output, [??] generated from a set of resources, [X.bar] In contrast, a production frontier estimates maximal output, [Y.sup.*], that a firm should generate from [X.bar], and this estimate, at least intuitively, provides the appropriate benchmark for assessing efficiency.

In choosing a frontier model, researchers must consider a variety of factors (Fare, Grosskopf, and Lovell 1985). For example, some models examine overall efficiency while others focus on allocative, scale, or technical components. The model choice affects the specification of the production technology, as some approaches restrict the technology to a single output (or index of output) while others allow for multiple outputs. Efficiency estimates also can reflect different organizational goals (e.g., profit maximization or cost minimization) by virtue of the primal and dual formulations of the models. Model comparisons can be organized around any of these points, but Schmidt (1985) and Lovell (1993) distinguish among the approaches in terms of two essential features. First, frontiers may be modeled as either parametric or nonparametric functions, and, second, the interpretation of a firm's deviation from the frontier--its error term--depends on whether the model is stochastic or deterministic.

Functional Form

Parametric techniques impose a functional form on the production technology to describe the transformation of inputs into outputs, introducing a risk associated with misspecification. The sensitivity of efficiency estimates to misspecification has been demonstrated using Monte Carlo simulations, where both the true functional form of the technology and the distribution of efficiency across observations are known (Gong and Sickles 1992; Banker, Gadh, and Gorr 1993). From these studies, researchers conclude that parametric approaches are best applied to industries with well-defined technologies to minimize the risk of misspecification. On the other hand, for industries with imprecise technologies (such as the service sector), non-parametric approaches are flexible and may be more desirable to use (Charnes, Cooper, Rhodes 1978).

Error Term

Frontier techniques also differ with respect to the error term, which may be deterministic or stochastic. In either case, the error is computed as the difference between the optimal quantity of goods or services predicted by the frontier and the actual level of output. Both types of models assume that inefficiency is derived from this error term. In the deterministic case the error term is inefficiency, and the entire observed deviation is attributed to poor decision making by the firm. This interpretation implies that a firm's level of production is totally within its control and that it is always feasible for the firm to perform optimally. Although computationally convenient, the assumption is unrealistic and makes no allowance for the possibility of random noise or measurement error. A stochastic framework, on the other hand, provides a less restrictive interpretation, and the error term (called a "composed error") is assumed to contain both randomnesss and technical inefficiency. Stochastic models allow production to be affected by measurement error, omitted variables, and simple good or bad luck (Schmidt 1985).

Theoretically, the most desirable frontier approaches appear to be those that allow for flexibility with respect to both functional form and the error term. Although some success has been achieved in developing models that are non-parametric and stochastic (Banker 1986; Land, Lovell, and Thore 1993), such models are computationally difficult and remain inaccessible to most researchers. By far, the most popular and widely applied frontier models are intermediate methods, either non-parametric, deterministic models or parametric, stochastic ones. With readily available software, these approaches are the logical starting point for researchers who want to conduct applied efficiency studies without developing their own programs. In the sections that follow, we briefly review the models for this article.

DATA ENVELOPMENT ANALYSIS

Data Envelopment Analysis (DEA) uses mathematical programming to compute non-parametric, deterministic measures of technical efficiency based on the notion of minimal resource usage relative to production (Koopmans 1951). DEA treats efficiency as a weighted productivity ratio, comparing the quantity of outputs generated to that of inputs consumed. Weights are chosen in such a way that the ratio, or efficiency score, is maximized for each firm in the interval [0, 1]. The best performers in the sample trace out the production frontier and receive scores of 1. This frontier strictly bounds ("envelopes") the remaining data points from above, and the interior observations receive non-negative scores less than 1 based on their proximity to the frontier.

Because DEA is deterministic, all of the distance between an interior point and the frontier is attributed to inefficiency, but there may be other drawbacks. DEA uses extreme observations to trace out the frontier; if any of these contain measurement error, then interior firms may be compared to the "wrong" frontier. Likewise, if the set of inputs and outputs is misspecified, either by omitting pertinent variables or including irrelevant ones, the frontier can be displaced, potentially affecting efficiency ratios (Caulkins and Duncan 1993). Finally, some DEA models impose a restrictive assumption of constant return-to-scale on the entire sample. For this article we use a variant of DEA introduced by Banker, Charnes, and Cooper (1984) that is fairly common and allows for variable returns-to-scale. (See the appendix for details.)

STOCHASTIC PRODUCTION FRONTIERS

A second popular frontier method, referred to as a Stochastic Production Frontier (SPF) (Aigner, Lovell, and Schmidt 1977; Meeusen and van den Broeck 1977), allows production to be affected by randomness as well as inefficiency. Incorporating a stochastic error term is useful, but researchers observe only the net effect of randomness and inefficiency. Consequently, the composed error must be separated into a two-sided error term and a one-sided measure of inefficiency. The decomposition requires three types of assumptions: a functional form to describe the production technology, a distribution for the one-sided inefficiency term, and an estimator for the model. (See the appendix for details.)

Our example, included in the appendix, uses a translog production function, assumes a half-normal distribution for the inefficiency term, (1) and uses maximum likelihood estimation. Finally, the error term is decomposed as [[epsilon].sub.it] = [[upsilon].sub.it] - [u.sub.it] where [v.sub.it] is the usual two-sided measure of randomness and [u.sub.it] is a one-sided estimate of firm inefficiency--in other words, output that is expected but not realized. (2) We use this in place of the standard decomposition, [[epsilon].sub.it] = [[upsilon].sub.it] - [u.sub.i], which provides a firm-specific measure of inefficiency that is constant over time and precludes the testing of whether efficiency changes or improves during the time period. (3)

FIXED-EFFECTS REGRESSION

In addition to frontier models, organizational efficiency can be estimated using traditional fixed-effects regression. Like SPF, the FER model is parametric and has a random error term, but it differs from SPF in that it does not derive estimates of efficiency from this error term. Instead, the FER model includes binary (dummy) regressor variables to measure efficiency in terms of "firm effects" and "time effects" (Greene 1993). Since efficiency is included as a regressor variable and is measured directly, it is unnecessary to impose any distributional assumptions.

Because FER is not a frontier approach, firms are not (necessarily) evaluated in terms of best practice. Instead, any arbitrary observation in the data set may be used as the standard against which the remainder of the sample is compared; this may include average production, median production, maximal production--whatever best suits the researcher's objectives. In other words, FER estimates [bar.Y], which need not be the maximal value ([Y.sub.*]) used in SPF.

The ordinary two-way fixed-effects regression model uses dummy variables to estimate the effect of each firm and each time period on efficiency (Kumbhakar 1990), and the model's error term ([v.sub.it] is the usual two-sided measure of randomness. The limitation of using the two-way model is that the change in efficiency over time is highly specialized. The coefficients associated with dummy variables for time impute the same average change in efficiency on all firms. As a result, the numerical estimates vary by firm and by time, but the rank ordering of firms remains constant in all years.

For firm-specific changes over time, the two-way FER model can be modified to include a linear, firm-specific time trend. The time trend generates more flexible measures while maintaining the stochastic framework. (4) Greene (1993) describes the model as an "enhanced fixed effects model"; we provide an example in the appendix. Each observation is still characterized in terms of a firm effect ([[delta].sub.i]). A linear trend, denoted t (where t = 1, 2, ..., T for each time period), now replaces the separate dummy variables for time, and the time effect is the product of a firm-specific coefficient and the linear trend ([[omega].sub.i]t), thereby allowing each firm to experience separate changes in efficiency for each time period.

From the preceding overview, it should be apparent that, on balance, no single model dominates the other two and that researchers have several options for estimating technical efficiency. Table 1 summarizes the properties for all three models. In some applications, researchers may be able to choose one of the models on the basis of a specific characteristic, but in many cases researchers have competing objectives that cannot be reconciled with a single approach. Moreover, we do not know if these differences lead to significantly different estimates. To examine the practical implications, we turn to our sample of HMOs.

EXAMPLE: THE HMO INDUSTRY

Data Sources

The data set contains annual, firm-level information for health maintenance organizations in the United States from 1985 through 1994. Financial information, enrollment figures, and utilization statistics have been obtained from Health Care Investment Analysts (HCIA). HMOs file reports with state regulators each year, and HCIA compiles this information into a standard format. This database is then matched with the InterStudy HMO Census, which describes model type, profit status, and geographic location. Together with the GHAA Directory of Health Maintenance Organizations, it records changes in ownership and plan terminations and can be used to construct measures of market competition and saturation.

State-level mandates defining the regulatory environment in which HMOs operate are reported by Aspen Systems Corporation. General demographic information and community characteristics such as population statistics, per capita income, physician supply, and average hospital occupancy are summarized in the Area Resource File (U.S. Department of Commerce).

Matching information across these sources provides information for 585 HMOs during the ten-year period. Because of new entry, mergers, and failures, the panel is unbalanced; further, more than 500 observations have been removed because of missing information. The final sample contains 2,739 observations. (5) (See Table 2.)

We specify a production technology consisting of one output and four inputs. The output is defined as total member-years of coverage provided by an HMO during the year. Outputs are goods or services that generate revenue for the firm, and in the case of HMOs, enrolling members (selling coverage) constitutes the primary source of revenue. A single output is admittedly a simplification, but it is necessary in order to develop all of the models in parallel. Multiple outputs to estimate coverage for private, Medicare, and Medicaid members separately have been used in the context of cost functions (Wholey et al. 1996) and other extensions of the current study.

The inputs used by HMOs consist of hospital days, ambulatory visits, administrative expenditures, and other expenditures. This conceptualization of resource inputs differs from previous work, where HMOs have been assigned the same production technology as hospitals, and both inpatient days and outpatient visits count as outputs rather than inputs (Bothwell and Cooley 1980). We disagree with this specification because days and visits are used in providing enrollee coverage; both impose costs onto the HMO (Bryce 1996). Total hospital days and ambulatory visits are reported as quantities; they are price-free measures of inpatient and outpatient services, respectively, and they are highly correlated with the corresponding expenditure information (r > 0.90). (6) The third input, administrative expenditures, is reported in real dollars and estimates labor resources for administrative personnel not directly associated with the delivery of services to HMO enrollees. Likewise, the last input (other expenditures) is reported in real dollars; it refers to interest expenditures and medical supplies (such as pharmaceuticals or crutches).

The data set also includes organizational attributes (model type, profit status, federal qualification); measures of case mix (percent of Medicare enrollees, percent of Medicaid enrollees); and state regulations (whether the state requires reserves, guarantees for payment, rate approval, or consumer representative membership on the HMO board)--any of which may be related to an HMO's use of resources and, hence, the model estimates of technical efficiency. These variables are not included in the production function specification but are retained in the data set for later comparisons.

Model Estimates

We compute three estimates of technical efficiency for each observation in our sample: a DEA estimate that allows for variable returns-to-scale, a time-varying SPF estimate, and an enhanced FER estimate using a linear trend.

However, the models themselves do nothing to control for differences in quality of output (quality of coverage) or the case mix of HMO enrollees. Although it might be reasonable to assume output homogeneity in some industries, this assumption is unlikely to hold for the HMO industry. Moreover, differences in case mix and quality offer legitimate explanations about why some firms may use resources more or less intensively than their competitors. Inferences regarding technical efficiency are therefore confounded, and we need to adjust the model estimates accordingly.

In general, examples of possible ways to adjust for HMO differences include parsing the sample into homogeneous subgroups and estimating efficiency separately for each group; weighting inputs before computing the efficiency scores to reflect differences in quality or case mix; and using the estimated efficiency score as a dependent variable in a second-stage-determinants regression against possible covariates.

We present the last of these adjustments and estimate a separate determinants regression for each model. Organizational differences are measured by defining the HMO's model type as either independent practice associations (IPAs) or groups (by collapsing group, staff, network, and mixed HMOs). We also include a variable to denote whether the HMO is a for-profit or a non-profit organization. The data set has only limited case mix information; we include the percentage of enrollees covered by Medicare and Medicaid separately. Likewise, federal qualification and state regulations are, at best, rough proxies for quality, and we again use dummy variables to denote the presence or absence of these mandates. (7)

After controlling for these covariates, an improved measure of efficiency is left in the observed error term. The correspondence between the unadjusted and adjusted efficiency estimates is shown in Table 3 for each of the models. We do not compare the actual numerical estimates, because adjusted and unadjusted values are scaled differently; instead, we present the Spearman correlation coefficients to demonstrate the ordinal ranking of HMOs relative to one another. In all three models, HMO ranks change, albeit only slightly.

The usefulness of adjusting for case mix is perhaps more apparent in the coefficient estimates of the regression models, presented for each model in Table 4. We see that HMOs with a higher proportion of Medicare or Medicaid enrollees (more intensive case mix) tend to receive lower efficiency estimates from all three models. The models provide higher estimates to federally qualified HMOs, which is surprising since qualification is presumably costly for the HMO. The other proxies for quality (state mandates) show almost no significance, which may be explained by the fact that there is little variation in these variables (the mandates are applicable to most of the HMOs in our sample). In addition to case mix and quality, we can also see the effect of model type, suggesting that group models tend to receive higher efficiency estimates. The negative relationship between profit status and efficiency is counterintuitive, since we expect for-profit organizations to eliminate any excess use of resource inputs. Overall, the results suggest that the three approaches to modeling efficiency are affected in similar ways by the covariates that we included. In other words, we identify similar industry-wide trends (e.g., group HMOs are generally rated as more efficient than IPA organizations) irrespective of the model employed to estimate efficiency.

We also can display industry trends graphically. Figures 1 through 3 show the distribution (interquartile range) of efficiency estimates for each model over time. Based on these figures, we see similar changes (in terms of direction) during the ten-year period, especially for DEA and SPF. The FER model also shows the same pattern in the median estimates through 1989, after which the FER model suggests continued improvement in efficiency while DEA and SPF suggest more variable fluctuations or downturns.

[FIGURES 1-3 OMITTED]

Finally, it is important to consider whether the three models agree in their assessment of efficiency for individual HMOs since, in practice, efficiency estimates are often used to rate a firm relative to its competitors. Whether we use DEA, SPF, or FER, we want to reliably identify more efficient, best-practicing firms and ensure stable decision-making behavior--for example, either in an employer's decision to contract with HMOs or in a policymaker's decision to develop incentives on the basis of efficient industry practices. If the models do not agree in their assessments but identify different HMOs as efficient, then our conclusions (and our decisions) are sensitive to choice of model. On the other hand, if the models assign similar ratings, then the theoretical differences described earlier (parametric versus nonparametric, etc.) may be less important in practice.

Table 5 presents the Spearman correlation coefficients for DEA, SPF, and FER efficiency estimates controlling for case mix and quality. The two stochastic models (SPF and FER) demonstrate the strongest agreement, but all of the correlation coefficients are reasonably high. However, even though the correlations are significantly different from zero, in every case, they are also significantly different from 1.0 (p < .0001). This implies that the overall rankings of the HMOs in our sample differ across the three models, and--depending on the model we use--we would identify different firms as efficient. These results serve as a caution to researchers: the models may be useful in understanding overall trends, but they are not informative in ranking individual performance.

As we saw previously in Table 1, each model has desirable properties as well as tradeoffs associated with it. One finding that may surprise some readers is that the FER estimates compare reasonably well with the two frontier approaches. Not only does FER allow for flexible changes in efficiency over time, it also uses the information provided by the panel data set to identify observations for each firm. There appear to be only two potential shortcomings with FER: it requires defining a functional form (which is also true of SPF), and it does not compare firms to the production frontier but may use other benchmarks instead. If, however, the researcher's goal is to distinguish more efficient firms from less efficient ones, then it is not obvious that an estimate of the frontier is required. Average or median functions still allow researchers to rank firms, suggesting that a large part of "efficiency analysis" can take place by using regression methods that are available through virtually any statistical package.

DISCUSSION

There are two important caveats to the work presented here. First, because it was our intent to compare the three models, we constrained them in ways that ordinarily might not be necessary. For example, estimating productive efficiency limited us to a single output in the stochastic models (SPF, FER); hence, DEA also used the same definition of output even though it is capable of defining multiple outputs. On the other hand, DEA is more sensitive to missing values than either SPF or FER, so our sample was restricted to observations with complete information. Thus, if allowed to operate independently and not for reasons of comparison, the models might each be able to "do more" than we have shown here. Nevertheless, the comparison seems useful for demonstrating that alternative models intended to address the same notion and based on the same information can still produce important differences. Second, several variants of all three models are available. We did not choose the simplest (and, in most cases, the most restrictive) version, nor did we choose the most advanced. Instead, we chose well-known variants of each model that researchers are likely to encounter when they examine the alternatives for modeling efficiency. There are other, more specialized models, and we encourage researchers to investigate the variants that may be more appropriate for their specific application.

It is worth re-emphasizing that, in the strictest sense, a production function (or a production frontier) makes no allowance for differences in service quality or enrollee case mix. Both are treated as homogeneous across the sample by all three models. It is the responsibility of the researcher to decide whether or not such assumptions are accurate for any particular example. If not, then the researcher can make the appropriate adjustments in any of several ways (e.g., by parsing, weighting, or running secondary analyses on the sample). Such adjustments are necessary in order to make reasonable inferences regarding efficiency; yet, as we have shown here, even with adjustments the techniques may not provide consistent information on individual performance.

Choosing an appropriate technique for evaluating efficiency in any given application is difficult. While some researchers may prefer an approach on the basis of its theoretical properties, each choice requires tradeoffs that may be undesirable. Few researchers, however, have demonstrated the practical implications that model choice carries regarding efficiency estimates and the inferences that follow (Banker, Conrad, and Strauss 1985). This article illustrates that model selection can influence which firms are rated as most efficient. We therefore cannot simply dismiss the decision as arbitrary.

This article introduces the reader to the most common techniques for conducting efficiency analyses. Using a multiplicity of approaches and comparing the sensitivity of our findings to the methods is useful, whether or not the methods reinforce one another. Our example shows that model choice can matter a great deal, which should be of concern to researchers looking to benchmark individual performance. However, the models may be useful, too, in providing us with overall insights about industry-wide trends. We hope that the example presented here does not discourage researchers from conducting studies in efficiency, but that, instead, it enables researchers to approach the efficiency analyses thoughtfully and to test the robustness (and the reasonableness) of their assumptions and their conclusions.

APPENDIX: MODELS FOR ESTIMATING TECHNICAL EFFICIENCY

Data Envelopment Analysis (DEA)

The DEA estimate of efficiency is non-parametric and treats efficiency as a weighted productivity ratio comparing outputs to inputs. There are several versions of the DEA model; here we present the Banker, Charnes, and Cooper (1984) (BCC) algorithm that allows for variable returns-to-scale. The following optimization problem is solved separately for each observation in the sample:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In this weighted ratio of outputs (Y) to inputs ([X.sub.j]), DEA chooses optimal weights u and [v.sub.j] for the observation under consideration (denoted as firm o, above). The input and output combinations used by the remaining observations (denoted above as q) serve as constraints in the problem. The variable w serves as a convexity constraint that allows the frontier to envelope the observations more tightly than algorithms that impose constant returns-to-scale. Solving for u, [v.sub.j] and w maximizes [h.sub.o] and gives us the BCC efficiency score.

Stochastic Production Frontiers (SPF)

SPF models are parametric and must specify a production technology for transforming inputs into outputs. The model includes a "composed error" term that estimates the net effect of inefficiency and randomness. To decompose the error term, researchers must specify a functional form for the production technology, choose a distribution for the inefficiency term, and choose a model estimator.

To alleviate problems associated with the first assumption, researchers increasingly use the transcendental logarithmic ("translog") function because it is a flexible form that includes more restrictive forms such as Cobb-Douglas and Constant Elasticity of Substitution (CES) as special cases. For the second assumption, various one-sided distributions have been tested, including the half-normal (Aigner, Lovell, and Schmidt 1977), exponential (Greene 1990), and gamma distributions (Meeusen and van den Broeck 1977), and there are no strict rules for selecting a distribution. Finally, in choosing an estimator, SPF models appear to be fairly robust across maximum likelihood, corrected OLS, generalized least squares and the within estimator (Cornwell, Schmidt, and Sickles 1990). Gong and Sickles (1992) prefer the within estimator on the basis of computational ease, but the current availability of software packages makes most of these techniques practical to use.

For example, the following SPF model uses a translog production function and estimates efficiency for firm i in year t:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where Y and X still denote output and inputs, and [[epsilon].sub.it] is the composed error term.

In a standard SPF model, [[epsilon].sub.it]= [[upsilon].sub.it] - [u.sub.i] where [v.sub.it] is the usual two-sided error term and [u.sub.i] is a one-sided estimate of firm inefficiency--in other words, output that is expected but not realized. Based on the work of Jondrow et el. (1982) and Battese and Coelli (1988), [u.sub.i] is derived from firm i's error terms over time:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If researchers are interested in whether or not firm i changes or improves its efficiency during the time period, then this approach is unsatisfactory. Instead, SPF may define the error term as [[epsilon].sub.it] = [[upsilon].sub.it] - [u.sub.it] and estimate [u.sub.it] instead of [u.sub.i], computing inefficiency measures that vary by firm and by year. However, in so doing, we treat the sample as a cross-section rather than a panel and no longer can use the time series for firm i to estimate its inefficiency.

Fixed-Effects Regression (FER)

The ordinary two-way fixed-effects regression model uses dummy variables to estimate efficiency directly (Kumbhakar 1990). Again using a translog specification, the basic two-way FER model is specified as:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The variables [a.sub.i] and [c.sub.i] identify firms and time periods, respectively. To avoid problems of perfect collinearity, dummies for one firm and one year are omitted and captured in the intercept, [[alpha].sub.o]. The error term ([v.sub.it] is the two-sided measure of randomness. The function can be estimated using least squares after including dummy variables (LSDV) or the within estimator.

Coefficients associated with the [a.sub.i] and [c.sub.t] denote the firm effect ([[delta].sub.i]) and time effect ([[gamma].sub.t]), respectively, but change in efficiency over time is highly specialized. The estimates of [[gamma].sub.t] impute the same average change in efficiency on all firms in the sample for a given time period. As a result, the numerical estimates vary by firm and by time, but the rank ordering of firms remains constant in all years.

To allow for flexible, firm-specific changes over time, the basic two-way FER model can be modified to include a linear, firm-specific time trend. The time trend generates more flexible measures while maintaining the stochastic framework. Greene (1993) describes the model as an "enhanced fixed effects model":

Each observation is still characterized in terms of a firm effect ([[delta].sub.i]). A linear trend, denoted t (where t = 1, 2 ..., T), now replaces the separate dummy variables. The time effect is the product of a firm-specific coefficient and the linear trend ([[omega].sub.i]t), thereby allowing each firm to experience individual changes in efficiency for each time period.

Table 1: Attributes of Technical Efficiency Models DEA SPF FER Imposes functional form N Y Y Allows flexible changes over time Y Y ([dagger]) Y ([double dagger]) Maintains link between i (firm) and t (time) N * N ([dagger]) Y Defines best-practice frontier Y Y N Treats error term as stochastic N Y Y Imposes distribution on error term N Y N Note: Although this table generally includes all variants of each of the models, there are some exceptions: * "Window analysis," an extension of DEA discussed in the literature is intended for use with panel data sets. ([dagger]) The standard SPF model described in this article does not provide flexible changes in efficiency over time, but it does link all observations for firm i. ([double dagger]) The two-way FER model provides only limited information on efficiency over time; enhanced FER allows for more flexible changes. Table 2: Sample Size by Year Year Number of HMOs 1985 151 1986 224 1987 333 1988 301 1989 323 1990 325 1991 277 1992 264 1993 260 1994 281 Table 3: Spearman Correlation Coefficients of HMO Rank (Unadjusted Versus Adjusted for Case Mix) Correlation p-Value [r.sub.s](DEA, adj DEA) 0.97 .001 [r.sub.s](SPF, adj SPF) 0.90 .001 [r.sub.s](FER, adj FER) 0.91 .001 Table 4: Coefficient Estimates for Determinants of Resource Utilization Defendant Variable: Estimate of "Efficiency" from Each Model DEA SPF FER Independent Variables: Intercept 0.466 0.780 -2.152 Model type 0.052 ** 0.023 ** 0.045 ** (1 = group) Profit status -0.016 * -0.003 -0.047 ** (1 = for-profit) Percent of enrollees who have Medicare -0.242 ** 0.530 ** -1.366 ** Percent of enrollees who have Medicaid 0.001 -0.032 -0.082 Federal qualification 0.029 ** 0.026 ** 0.090 ** (1 = qualified) Capital requirements? 0.031 * -0.000 -0.026 (1 = yes) Reserve requirements? -0.042 * -0.017 -0.006 (1 = yes) Rate approval? 0.002 0.006 0.008 (1 = yes) Consumer representative? 0.001 0.006 0.043 * (1 = yes) Adjusted R-squared 0.037 0.136 0.099 * p < .05. ** p < .001. Tables 5: Spearman Correlation Coefficients Between Models DEA SPF SPF 0.77 -- FER 0.67 0.79

The work presented here is part of a larger study supported by the Agency for Health Care Policy and Research (now Agency for Healthcare Research and Quality) to evaluate changes in the HMO industry (R01 HS09200-01).

NOTES

(1.) To test the distributional assumption in this study, we compared SPF measures for half-normal, truncated normal, and exponential distributions. The measures of [u.sub.it] demonstrated near-perfect correlation ([r.sub.s] > 0.99); hence, we include only estimates for the half-normal.

(2.) Of the approaches presented here, only SPF produces a measure of inefficiency (shortfall flora the frontier) rather than efficiency (proximity to the frontier).

(3.) However, this decomposition has the advantage of using the panel nature of the data set and linking all observations for firm i together.

(4.) Cornwell, Schmidt, and Sickles (1990) present an extended version, using both linear and quadratic trend variables.

(5.) Steps to impute missing values do not change the basic results of this article.

(6.) Although a recent paper by Rosenman, Siddharthan, and Ahem (1997) defines expenditures as inputs, we use input quantities instead of expenditures whenever possible. Both geographic variation and price breaks for larger firms affect factor prices; this confounds the results because the efficiency estimate consists of both technical and allocative components.

(7.) HEDIS-type measures are not consistently available during the 1985-1994 time period.

REFERENCES

Aigner, D., C. A. K. Lovell, and P. Schmidt. 1977. "Formulation and Estimation of Stochastic Frontier Production Function Models." Journal of Econometrics 6 (July): 21-27.

Aspen Systems Corporation. 1988-1991. A Report to the Governor on State Regulation of Health Maintenance Organizations. Aspen Systems Corporation, Rockville, MD.

Banker, R. D. 1986. "Stochastic Data Envelopment Analysis." Working Paper 86-8, Carnegie Mellon University, School of Urban and Public Affairs, Pittsburgh, PA.

Banker, R. D., A. Charnes, and W. W. Cooper. 1984. "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis." Management Science 30 (9): 1078-92.

Banker, R. D., R. F. Conrad, and R. P. Strauss. 1985. "A Comparative Application of DEA and Translog Methods: An Illustrative Study of Hospital Production." Management Science 32 (January): 30-44.

Banker, R. D., V. M. Gadh, and W. L. Gorr. 1993. "A Monte Carlo Comparison of Two Production Frontier Estimation Methods: Corrected Ordinary Least Squares and Data Envelopment Analysis." European Journal of Operational Research 67 (3): 332-43.

Battese, G. E., and T.J. Coelli. 1988. "Prediction of Firm-Level Technical Efficiencies with a Generalized Frontier Production Function and Panel Data." Journal of Econometrics 38 (3): 387-99.

Bothwell, J. L., and T. F. Cooley. 1982. "Efficiency in the Provision of Health Care: An Analysis of Health Maintenance Organizations." Southern Economic Journal 48 (4): 970-84.

Bryce, C. L. 1996. "Alternative Approaches to Estimating the Efficiency of Health Maintenance Organizations." Ph.D. dissertation, Carnegie Mellon University, Heinz School of Public Policy and Management.

Caulkins, J. P., and G. T. Duncan. 1993. "Robustness of Data Envelopment Analysis with Respect to Misspecification of Input and Output Variables." Working Paper 93-76, Carnegie Mellon University, Heinz School of Public Policy and Management, Pittsburgh, PA.

Charnes, A., W. W. Cooper, and E. Rhodes. 1978. "Measuring Efficiency of Decision Making Units." European Journal of Operational Research 2 (6): 429-44.

Chilingerian, J. A. 1997. "DEA and Primary Care Physician Report Cards: Deriving Preferred Practice Cones from Managed Care Service Concepts and Operating Strategies." Annals of Operations Research 73 (1): 35-66.

Christianson, J. B., S. M. Sanchez, D. R. Wholey, and M. Shadle. 1991. "The HMO Industry: Evolution in Population Demographics and Market Structure." Medical Care Review 48 (1): 3-46.

Cornwell, C., P. Schmidt, and R. C. Sickles. 1990. "Production Frontiers with Cross-Sectional and Time-Series Variation in Efficiency Levels." Journal of Econometrics 46 (October/November): 185-200.

Fare, R., S. Grosskopf, and C. A. K. Lovell. 1985. The Measurement of Efficiency of Production. Hingham, MA: Kluwer Academic Publishers.

Gong, B., and R. C. Sickles. 1992. "Finite Sample Evidence on the Performance of Stochastic Frontiers and Data Envelopment Analysis Using Panel Data." Journal of Econometrics 51 (January/February): 259-84.

Greene, W. H. 1993. Econometric Analysis, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, Inc.

--. 1990. "A Gamma-Distributed Stochastic Frontier Model." Journal of Econometrics 46 (October/November): 141-63.

Health Care Investment Analysts, Inc. 1988-1991. HMO Data Base, Diskette Series, User's Manual. Baltimore, MD: HCIA.

InterStudy. 1988-1994. The InterStudy Competitive Edge. Excelsior, MN: InterStudy.

Jondrow, J., C. A. K. Lovell, I. S. Materov, and P. Schmidt. 1982. "On the Estimation of Technical Inefficiency in the Stochastic Frontier Production Function Model." Journal of Econometrics 19 (August): 233-38.

Koopmans, T. C. 1951. "An Analysis of Production as an Efficient Combination of Activities." In Activity Analysis of Production and Allocation, edited by T. C. Koopmans. Cowles Commission for Research in Economics, Monograph No. 13. New York: John Wiley & Sons, Inc.

Kumbhakar, S. C. 1990. "Production Frontiers, Panel Data, and Time-Varying Technical Inefficiency." Journal of Econometrics 46 (October/November): 201-22.

Land, K. C., C. A. K. Lovell, and S. Thore. 1993. "Chance-Constrained Data Envelopment Analysis." Managerial and Decision Economics 14 (6): 541-54.

Leibenstein, H. 1966. "Allocative Efficiency vs. 'X-Effinciency.'" The American Economic Review 56 (3): 392-415.

Lovell, C. A. K. 1993. "Production Frontiers and Productive Efficiency." In The Measurement of Productive Efficiency, edited by H. O. Fried, C. A. K. Lovell, and S. S. Schmidt. 1993. New York: Oxford University Press.

Meeusen, W., and J. van den Broeck. 1977. "Efficiency Estimation from Cobb-Douglas

Production Functions with Composed Error." International Economic Review 18 (2): 435-45.

Newhouse, J. P. 1994. "Frontier Estimation: How Useful a Tool for Health Economics?" Journal of Health Economics 13 (3): 317-22.

Nyman, J. A., D. L. Bricker, and D. Link. 1990. "Technical Efficiency of Nursing Homes." Medical Care 28 (6): 541-51.

Rosenman, R., K. Siddharthan, and M. Ahem. 1997. "Output Efficiency of Health Maintenance Organizations in Florida." Health Economics 6 (3): 295-302.

Schmidt, P. 1985. "Frontier Production Functions." Econometric Review 4 (2): 289-328.

Wholey, D. R., R. Feldman, J. Christianson, and J. Engberg. 1996. "Scale and Scope Economies Among HMOs." Journal of Health Economics 15 (6): 657-84.

Zuckerman, S., J. Hadley, and L. Iezzoni. 1994. "Measuring Hospital Efficiency with Frontier Cost Functions." Journal of Health Economics 13 (3): 255-80.

Address correspondence to Cindy L. Bryce, Ph.D., Research Assistant Professor of Medicine, Center for Research on Health Care, University of Pittsburgh, 200 Lothrop Sweet, Pittsburgh PA 15213. John B. Engberg, Ph.D. is Associate Professor of Economics, H. John Heinz III School of Public Policy and Management, Carnegie Mellon University; and Douglas R. Wholey, Ph.D. is Professor, Division of Health Services Research, University of Minnesota. This article, submitted to Health Services Research on September 15, 1998, was revised and accepted for publication on July 7, 1999.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Methods; Health maintenance organizations |
---|---|

Author: | Bryce, Cindy L.; Engberg, John B.; Wholey, Douglas R. |

Publication: | Health Services Research |

Geographic Code: | 1USA |

Date: | Jun 1, 2000 |

Words: | 6580 |

Previous Article: | A trauma resource allocation model for ambulances and hospitals. |

Next Article: | Effect of multiple-source entry on price competition after patent expiration in the pharmaceutical industry. |

Topics: |