Printer Friendly

Indexing performance measures.

In this article...

Find out how to include multiple measures in a balanced scorecard to better analyze performance.

Health care organizations possess hundreds of performance measures consisting of administrative, financial, process and clinical outcome measures. The question has always been how should these measures be displayed in order to allow for an accurate and comprehensive interpretation of the organization's performance over time?

In 1992, Robert Kaplan and David Norton published a landmark article that has allowed organizations, both health care and non health care, to format and interpret their performance using the concept of a balanced scorecard. (1) They believed that it was important for companies to incorporate the measurement of the organization's intangible assets, rather than just its tangible assets, into an integrated measurement approach.

This approach, for the first time, integrated the idea of measuring data based not just on financial metrics, but required the alignment of multiple other metrics such as human resource, process and customer data.

Since then, Kaplan and Norton have stressed the importance of using the scorecard as not only a way to present an organization's performance metrics, but also to use the alignment of the metrics on the scorecard to evaluate how well the organization is fulfilling its strategy. (2)

They suggested that most organizational performance metrics should reflect four basic perspectives or strategic focus areas:

1. Financial

2. Internal business processes (including quality)

3. Learning and growth

4. Customer

They believed that these four focused areas are the drivers for creating long-term shareholder value. Each of these perspectives is generally populated with five or six performance metrics that have been carefully selected by the organization's senior leaders to reflect the organization's strategy.

Some organizations have added a fifth perspective--people--to specifically delineate the human resource metrics from the learning and growth measures, even though the people measures are arguably incorporated into the learning/growth perspective.

Organizations have separated out the people measures on their balanced scorecard because of the importance of the work force in driving performance in the organization. Since the introduction of the scorecard by Kaplan and Norton, it has evolved into a more comprehensive tool that consists of three components:

1. A measurement tool

2. A strategic management tool

3. A communication tool3

Strategy consultant Paul Niven emphasized the importance of viewing the scorecard not only as a reporting tool (short-term, e.g., quarterly report of performance), but also as a long-term tool to follow whether senior leaders are achieving the goals established in the organization's strategic plan.

In addition, the scorecard allows for short-term and long-term communication of performance.3 It allows an organization to bind its short-term performance with its long-term strategy and communicate the results to its key stakeholders.

The genius of the scorecard is that it forces senior leaders to carefully select only a few measures for each perspective that truly reflect performance (short-term) aligned with its longer-term strategy. Unfortunately, for health care organizations there are a large number of process measures, both clinical and administrative, that require review and reporting, particularly to external regulatory agencies.

So the question arises, how do you incorporate multiple measures into a scorecard and still maintain a manageable and interpretable document? Adding all of an organization's process measures to a scorecard would defeat its purpose and create unnecessary complexity.

For example, if an organization desired to place the 2011-12 Centers for Medicare and Medicaid Services Outcome Measures (#24)4, or the Joint Commission's National Hospital Inpatient Quality Measures for acute myocardial infarction (#w), heart failure (#3), venous thromboembolism (#6) or stroke prevention (#8)5 measures on their scorecard, the total number of measures would add up to 51.

Clearly this approach would create a scorecard that would be too cumbersome in size, considering all of the other performance measures that would have to be added to it. The question then, is how can an organization trend, evaluate and determine the reliability of potentially hundreds of process measures without increasing the size and complexity of its scorecard, and still achieve a high-level comprehensive measurement and meet strategic management and communication objectives?

The answer is: aggregating multiple process measures into a single index measure that is capable of reflecting change. This aggregation can be either a simple arithmetic mean or a more complex mathematical bundling using specific weighting, depending on the metrics involved.

Bundling happens

In everyday life indexing (bundling of individual measures into one number) is frequently used. For example, in the stock market the Dow Jones Industrial Average and the Standard and Poor's 500 Index are used to track activity.

In health care, examples of bundling of metrics can be seen in measures such as the bispectral index in the operating room (a single number that reflects the bifrontal cortical electrical activity from a processed electroencephalogram, (6) or the Short Form Health Status Survey (SF-12) contains 12 questions about an individual's health resulting in two indexed or composite scores. (7)

These efforts at developing a reference or index number to reflect multiple outcome or process measures are a rational approach to simplifying the large amount of data that confronts us daily.

In health care management, the quandary is to find the best way to index process or outcomes measures so that the indexed or bundled value reflects a high-level metric that can provide senior leaders with insight into ongoing performance. With the development of electronic records, decision support systems, and automated reporting, the ability to create indexed numbers like the Dow Jones Industrial Average for health care may become more common.

In 2004, James Benneyan, PhD, addressed the issue of evaluating multiple process measures from the standpoint of process reliability. (8) He defined process reliability in health care as failure-free performance over time.

The types of failure seen in health care can be due to the production process itself, the environment, or the post-production process. He suggested that one could approach this issue of measurement using one of three strategies:

1. Measuring each individual metric, e.g., percent of patients meeting a particular measure and reporting that measure

2.. Using a composite metric, e.g., total number of measures met divided by the total number of opportunities (number of measures times the number of patients) across all patients

3. Using an "all-or nothing" (aggregate bundle) approach, i.e., fraction of patients meeting all measures (total bundle met)

Although, Benneyan approached this problem from a reliability perspective, it is instructive that he provided insight into methods of indexing using either the composite or all-or-nothing methods that could be used to trend and review multiple measures for a scorecard.

For the composite method, Benneyan defined the numerator as the total number of opportunities met for all patients in the sample and the denominator as the total number of opportunities in the sample. He provided the following example:

* Four measures of performance and 20 patients

* Measured failure rate for each measure, 2/20, 3/20, 4/20, and 11/20

* Total number of failures equals 2+3+4+11 = 20 failures in 4 measures in 20 patients

* 20 patients times 4 measures of performance = So opportunities

* Composite failure rate = 20/80 or 25 percent

* The 25 percent metric could then be used for the scorecard reflecting the four measures. This method could be expanded to as many individual process measures as one would deem reasonable for the balanced scorecard.

* Another way to display these data would be to create a composite reliability measure by subtracting the 25 percent from loo percent, which would equal 75 percent, thus reflecting the organization's reliability in achieving the four process measures in the 20 patients.

In the "all-or-nothing" approach, Benneyan defined the numerator as the number of patients for whom all measures were met and the denominator as the number of patients in the sample.

The ultimate question using this method is: In what percentage of patients did the organization achieve 100 percent performance for all measures in a specific bundle of measures? Benneyan used four congestive heart failure (CHF) measures in of patients as an example:

The organization achieved:

* 90 percent compliance with left ventricular ejection fraction assessment (nine out of 10 patients received the assessment)

* 70 percent compliance with discharge instructions (seven out of 10 patients received discharge instructions)

* 80 percent compliance with ACE inhibitor for left ventricular systolic dysfunction (eight out of to patients received ACE inhibitors)

* 60 percent compliance with smoking cessation (six out of 10 patients received instructions)

* The calculation of this indexed measure would be as follows:

Composite measure = 9+7+8+6 (4 met) divided by 10 times 4 (# of opportunities) = 0.75

However in the example cited only four of the 10 patients (table not reproduced) actually received all four activities for an all-or-nothing rate of 0.4. In other words, the organization achieved a loo percent reliability rate for this CHF bundle in only 40 percent of its patients. The organization could then choose to place the composite metric of 0.75 on its scorecard reflecting the number of measures met in the 10 patients (or whichever the total number of patients there were in a given reporting period), or it could elevate the rigor of this approach by reporting on its scorecard the reliability number of 0.40. That is the percent of patients in a given reporting period that actually achieved all of the measures.

A variation of the above concept (developed by author Eugene Fibuch for his institution) uses the Six Sigma concept of common cause variation and special cause variation. It is well-recognized that all processes have common cause variation, normal variation over time, which is part of the intrinsic nature of a process.

Statistically, common cause variation in the process management world is between a +/- 3 standard deviations (SD). Any outcome or process result that falls within the +/- 3 SD (or upper (ucp and lower control (LCL)) limits over time would be considered normal variation and statistically not different from the mean value.

Special cause variation is assigned to values that fall outside the UCL/LCL and would be considered statistically significantly different from the mean value. The rationale for using this approach in reviewing and analyzing performance is that senior leaders should not be concerned with performance that is changing over time yet still within the 99 percent confidence bands, rather they should be spending their time addressing performance that is statistically different from the mean. So, how should health care organizations approach this issue?

If an organization measured a given process such as providing discharge instructions over time, (e.g., monthly) using a run chart, the graphic analysis would provide not only the mean value for the process, but also the UCL and LCL.

This information can be used to create a binary scoring system for each process measure. If the process measure is in control, between the UCL and the LCL, it would be assigned a value of o, and if the measure is outside of the control limits it would be assigned a value of 1.

50 measures

Now, let us assume that an organization has so process measures that it wants to include on its scorecard, but lacks the space to display them. How can one use the run chart information for each individual process measure to create a single indexed number to reflect the organization's performance for the so different processes?

If each process was plotted on a run chart over time and the UCL and LCL were calculated and displayed, then it would be simple to note which of the so measures were in control and which were out of control (special cause variations that needed to be addressed by the organization) at a specific point in time.

Let us further assume that in a given month the so measures were reviewed by the data analysts in the organization's quality department and they noted that five of the so measures were out of control. Each of those five measures would be assigned a value of i and the other 45 measures a value of o, the sum of which would equal 5.

Therefore, the index for these so measures is 5. The interpretation of this information is that 45 of the so performance measures were in control and, therefore, any change in the individual numerical values of each of these measures would be considered common cause variation and would not have to be addressed.

The other five measures would require detailed review and analysis to determine why those measures were outside the control limits. The organization can now use a single indexed number, i.e., 5 on their scorecard to represent an overarching view of its performance relative to the so process measures.

Some organizations have designed their scorecards to include a range of scoring criteria or boxes for each metric. An example would be scoring criteria that would encompass nine or 10 scoring possibilities, ranging from i (very low performance) to 9 or 10 (benchmark performance). This physical representation of the organization's performance for each metric on the scorecard allows for an easy and comprehensive view of all of the metrics in terms of poor to outstanding performance.

One might be asking at this point how the binary methodology can be used if the organization has scoring criteria built into the scorecard?

The answer is that one would subtract the indexed number of 5 (using the example noted earlier) from the top scoring criteria on the scorecard (assume 9 as the maximum scoring criteria); the resulting measure that would be reported on the scorecard as 4.

If this process is used then the number 4 reflects a change from the maximum number of 9 (usually designated as benchmark performance) and corresponds to the binary index number of.

What if the binary index was calculated to be is? How should this be noted on a scorecard with only 9 scoring boxes? Subtracting is from 9 would result in a negative number. The answer is that the indexing method is qualitative in its structure and a negative number would reflect very poor performance in the measures that make up the index.

The practical approach would be to place the negative number in the scoring box on the scorecard that is noted as a i (scoring criteria of i usually is set up to reflect performance outside the 99 percent confidence band).

A potential criticism of using run charts to calculate means and the UCL/LCL over time is that the moving average of the data is likely to change with time. It is important to recalculate the UCL/LCI, at agreed upon time intervals, recognizing that this indexing approach is not quantitative but rather more qualitative, providing senior leaders time-sensitive insight into those process measures that require further evaluation.

This methodology can be applied not only to clinical measures, but also to administrative measures of performance. Clearly, at the process level (departmental) in an organization one would not want to use an indexing methodology because it is not specific enough.

But at a higher organizational level the indexing of multiple performance metrics provides sufficient information to warn senior leaders that action needs to be taken. The indexed number can also be trended over time to provide additional insight into organizational performance.

RELATED ARTICLE: Get connected!

Did you know:

* There are an average of 16,000 Tweets per second.

* 1.2 billion people around the world use social media.

* If Facebook was a country, it would be the third most populated on the planet.

Join the crowd:

Stay up-to-date with ACPE on social media.

Meet colleagues, get the latest ACPE news (including special offers and discounts), and find out what's going on in the world of medical management.

Like ACPE on Facebook; follow ACPE on Twitter and join the ACPE discussion group on Linkedin.


1. Kaplan RS, Norton DP. The balanced scorecard: Measures that drive performance. Harvard Business Review, Jan-Feb, 1992,71-79.

2. Kaplan RS, Norton DP. The strategy-focused organization: How balanced scorecard companies thrive in the new business environment. Harvard Business School Press, Boston Massachusetts, 2001.

3. Niven PR. Balanced Scorecard: Step-by-step. Maximizing performance and maintaining results. John Wiley and Sons. New York, New York, 2002.

4. CMS Outcome Measures (2012): http://

5. The Joint Commission. Specifications Manual for National Hospital Inpatient Quality Measures (v4.21) (,12-31-2012)). aspx.

6. Avidan MS, and others. Anesthesia awareness and the bispectral index. NEJM, mo8, 358: 1097-1108.

7. Ware JE, Kosinski M, and Keller SD. A 12-Item Short-Form Health Survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34 (3), 1996, 2.2.0-233.

8. Benneyan J. Measuring Process Reliability: Composite and "Ml-or-Nothing" Measures. November 2.4, 2004. Institute for Healthcare Improvement.

9. aspx.

By Eugene Fibuch, MD, CPE, CHCQM, FAIHQ, and Arif Ahmed, BDS, PhD, MSPH

Eugene Fibuch, MD, CPE, CHCQM, FAIHQ is chair and program director in the Department of Anesthesiology at the University of Missouri, Kansas City School of Medicine.

Arif Ahmed, BDS, PhD, MSPH, is associate professor of Health Administration at the Bloch School of Management at the University of Missouri, Kansas City.
COPYRIGHT 2013 American College of Physician Executives
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2013 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Measurement
Author:Fibuch, Eugene; Ahmed, Arif
Publication:Physician Executive
Geographic Code:1USA
Date:Nov 1, 2013
Previous Article:Physician-led intervention improves critical care documentation and reimbursement.
Next Article:Smoothing the transition from resident to attending physician using mentors.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters