Severity of illness: red herring or horse of a different color?
A multidimensional concept, qualitys measurement requires dermition of the essential attributes of interest to the person using the results. One can then compare the quality of different goods and services. One must weigh the value of each attribute of a service or product to derive an overall quality score. In selecting a service or product, one must compare performance on one attribute to that on others. With respect to health care, one may be interested in convenience or amenities, cost, ethics, or other dimensions, as well as health outcomes. Further, for any dimension, one must distinguish between the health system level and provider-patient interaction or provider performance. For example, access to care affects health outcomes at the system level but is irrelevant at the level of provider-patient interaction.
Quality, as defined in this three-part series of articles, excludes amenities, convenience, and interpersonal aspects of care that do not contribute to improved health outcomes and that the patient can judge. However, provider interactions with the patient regarding prognosis, therapeutic compliance, drug interactions, preventive activities, etc. that are essential to maximizing health outcomes in ways consistent with the patient's values dearly fall within the technical aspects of quality and thus must be encompassed by quality measurement.
The technical quality of care may be defined as the extent to which effectiveness is realized in practice. While in theory this determination could be made on a single case, in practice, because of inherent variability, it would need to be made on a population of patients. Thus, a physician's performance can be measured by the health outcomes he achieves for his patients compared to those he could have achieved technically. Such comparisons depend on being able to measure patient health outcomes and calculate what health outcomes should have been achieved, which depends on adjusting for patient diagnostic classification and the physician's practice circumstances. Given the complexity of the issues involved and the dearth of knowledge, this direct approach to quality assessment is not usually practical (or even possible) now.
The alternative, indirect approach is caseby-case quality assessment, with statistical analysis of results. This approach depends on accepting two critical assumpdons: interventions are effective and some interventions are more effective than others. With these assumptions, one reduces the assessment problem to ascertaining whether or not the physician selected the correct intervention, which depends on arriving at the correct patient diagnostic classification, and judging whether or not interventions were implemented properly. Assessment of outcomes is an important part of assessing interventions; processes and outcomes must be assessed simultaneously. Given that one can develop acceptable practice criteria and standards, physicians can be judged on the proportion of cases treated in compliance with those practice standards.
Clearly, operationalizing even this much simpler approach to quality assessment is a formidable task. Quality of care can be judged practically only from medical records, which are at best an imperfect reflect ion of what was found and done. As a footprint is to a man, so the medical record is to care processes and outcomes. Thus, quality assessment must encompass judging the adequacy with which the medical record documents care.
Examination of completed individual care episodes is essential for measuring the quality of care. Assessment of care interventions prior to their use, such as may occur with diagnostic algorithms or therapeutic expert systems, or during the care episode represents an integral part of the care process and should not be regarded as quality measurement (although, these activities fall within the sphere of quality assurance). So-called concurrent review is also part of medical care, a means of ensuring that all and only necessary interventions are provided and any untoward outcomes are recognized and treated quickly.
Case-by-case quality assessment depends on either reviewing all cases or screening them to identify those that merit further review. Given the multiplicity of criteria needed to account for all relevant patient diagnostic classifications and therapeutic responses, computerized systems offer the only practically valid approach to screening cases. Simple approaches, involving so-called generic screens, are likely to be quite inadequate. One problem is sensitivity. How can a few items applied to all cases be sensitive to the myriad possibilities for unacceptable care? How can a simple item be specific enough to reject cases of acceptable care? The result is likely to be poor positive and negative predictive values, causing quality problems to be missed and leaving a lot of work for peer reviewers or requiring some other means of assessing quality. Computerized screening is far more objective than unstructured peer review of medical records. However, even computerized screens cannot presently account for all possibilities. Thus, one must rely on peer review to determine if the care provided in cases failing screens was, in fact, acceptable. Overtime, with feedback of peer review results, the sophistication of computerized screens should increase, reducing the number of cases requiring peer review.
Peer review is inherently subjective. Some of the subjectivity is eliminated by use of sophisticated computerized screens, because the reviewer's attention is focused on care not meeting screening criteria and standards. The reviewer's task is to determine whether or not exceptions were warranted. Further, reviewers can exam ine whether or not patients' preferences were appropriately taken into account in selecting and implementing treatments. Subjectivity canbe reduced further by structuring the peer review process and by allowing providers whose care is criticized to respond to reviewers.
Peer review will remain the ultimate means of judgment for cases failing screens until sufficient knowledge exists to develop practical all-encompassing screens or population-based approaches are adopted. The perfection of case-by-case screens depends on their use in practice and on the feedback of results. Perfection of population-based approaches depend on the generation of solid data through research. The former is more practical and the latter more ideal because consensus on processes and outcomes offers no guarantees of conferring health benefits.
Severity of Illness
Severity is defined by the expected amount of health status loss within a specified period from an untreated disease. A less generic, but more widely used, definition of severity is the probability of death within a given period from untreated disease. Because this definition is simpler to understand, it is used in the remainder of this article. As may be obvious from figure 2, page 14, severity is the quantification of expected natural history. Observed natural history is the basis for determining effectiveness. Severity of illness does not depend on how ill the patient looks, but on expected health outcomes. A patient with an acute gastrointestinal infection may look severely ill, at time of presentation, but may recover completely without treatment. A patient with amelanotic melanoma may look well, but will likely soon be dead. Severity may be related to diagnostic classification, in that patients within the class have the same or similar probability of death. Patients could be classified by their predicted probability of death within a specified period from untreated disease. This concept is illustrated in figure 3, page 15. Because everyone's probability of death increases over time, severity should be corrected for the risk of death inherent for a person in the same age-sex cohort as the patient. Where such adjustment is made, we refer to cohort-corrected severity (figure 1). Unadjusted severity will correspond closely to corrected severity when the period of interest is short, e.g., death within 30 days. However, where the period is long, e.g., one year, lack of adjustment may be a source of error, especially in the case of old people or where groups to be compared vary in their proportion of old people.
Acuity of illness is the rate of change of severity. Acuity is represented by the slope of the graphs in figure 1. An average patient with an uncomplicated upper respiratory tract infection has a low severity of illness. Moreover, severity depends little on whether or not the patient is treated today or tomorrow. Thus, the patient has a low acuity of illness. (See case A in figure 1.) Contrast this situation with a man who is brought to the emergency department with a dagger in his heart. This patient is severely ill; without treatment he would almost certainly die quickly. An immediate response may change prognosis for the better, but delay would almost certainly not. Thus the patient has a high acuity of illness. (See case B in figure 1.)
Measuring severity requires a valid severity measurement system and its correct use. The population ofinterest for severity measurement is usually patients presenting for treatment, although in principle such a system could estimate the probability of death for someone drawn randomly from the general population. Also of interest is the period for the prediction, potentially from one second to decades. In principle, a universal severity measurement system could provide a valid estimate as a function of any future time. Thus, any severity measurement system should state both the population and prediction period for which it is intended to be valid.
A severity measurement system would use certain symptoms, signs, and signals to generate a probability of death for a specific individual. This information may be a subset of all diagnostic information. In theory, if one were interested only in measuring severity, one would not need to elicit information specific to treatment selection, for example. Interest runs high in devising a system to measure severity independent of medical diagnosis. Such a system would allow the physician to ascertain the patient's prognosis without knowing what is wrong with the patient. Of course, it is conceivable that a severity measurement system would tell us what is wrong with the patient as a by- product of ascertaining severity.
The physician or other provider is in the best position to apply a severity measurement system because certain specific information must be elicited from the patient. The physician would be supported by an aide memoire, or, ideally, a computerized system, to ensure all necessary information was elicited. Moreover, the system supporting severity measurement would be capable of actually estimating probability of death. Given what has been presented in this paper so far, no one would be surprised to learn that no such severity measurement system exists, or is on the horizon. Further, any system that could be devised likely could not be validated empirically, because no one would be willing to withhold treatment from sick patients to compare estimated with actual deaths. Validation would therefore depend largely on experts' judgments. Given the present dearth of knowledge, even so-called expert judgments might be no more than guesses. Further, the data to build the system would likely have come from the same experts whose knowledge (or assumptions) would be needed to validate the system. Hence, the possibilities for validation are reduced to discussions among so-called experts.
The severity measurement systems that exist today operate from the patients' medical record, an inherent and possibly fatal weakness. If the physician did not elicit information needed by the severity measurement system, it will not be in the medical record. Further, all elicited information may not be recorded. Because the physician elicits information for diagnostic purposes, all existing severity measurement systems are diagnosisdependent. They also depend on the physician's diagnostic acumen or practice style (a critical problem in using severity to measure provider performance).
Today's severity measurement systems may be differentiated by whether or not they use diagnostic coding as the point of entry into the medical record. Severity measurement systems that are explicitly diagnosis-related must specify variables particular to each diagnosis (although some may be shared) and their interrelationships (which may vary by diagnosis among shared variables) to estimate probability of death or classify patient severity levels. Systems that depend first on ascertaining medical diagnosis would be errorprone, if they relied on medical records coding, or must determine diagnosis independently (to the extent that this could be done at all from medical records, because needed data might be missing or in error). The latter type of systems could be used to code medical records, thereby reducing coding errors.
Bypassing diagnostic coding avoids coding errors. However, systems that bypass medical record coding must either assign patients to a diagnosis (in which case they might be useful for coding), look for a subset of the information necessary for coding,or ignore entirely information useful for diagnosis. The extent to which a severity system is diagnosis-related can be measured by the correlation between system items and medical diagnosis. In a system with hundreds of variables, if 80 percent or more of them were used to classify each patient, the system could be regarded as not being diagnosis-related (although it would still be diagnosis-dependent). It however, only a few variables were used to determine the severity level of each case, the system may likely be diagnosis-related (otherwise, only the few and not hundreds of variables would be sufficient to determine severity). Whether or not the system is diagnosisrelated may be ascertained by seeing if the few variables used to classify each case cluster by diagnosis.
Today's severity measurement systems do not estimate probability of death. Rather, they dassify patients into a limited number of severity levels; some systems use as few as five. Assigning a patient to a severity level may appropriately acknowledge the lack of precision inherent in the measurement system. However, it also detracts from the utility of a ratio scale such as probability of death. A probability of death of 0.5 means that one is twice as likely to die as one with a probability of 0.25. A patient in severity class 4 may not be twice as likely to die as one in class 2. In fact, one of the difficulties of ordinal scales is weighting one point relative to others. Without such weights scale points cannot be averaged.
A system with only five severity levels, for example, would have to be capable of assigning the patient to the correct level, irrespective of diagnosis. For severity measurement systems based on medical diagnosis, this means either knowing the probability of death for each stage of disease A and for each stage of disease B or otherwise calculating weights to apply to each severity level within a diagnosis to make them commensurable across diagnoses. Given such weights, no matter what was wrong with the patient, the resultant severity level would predict probability of death. Thus, if all of a hospital's patients were assigned to weighted severity levels and analyzed by clinical diagnosis (or any other variable), the probability of death for all patients of a given level would be identical, if the system were valid.
Validating Severity Measures
No severly measurement system has been validated empirically or its ability to predict, at time of presentation (e.g.,hospital admission), the probability of death from untreated disease. Indeed, empirical validation is virtually impossible. Yet without validation, any system of measurement is suspect. A possible but less useful type of validation is to examine the process by which predictions are made. This approach is limited by knowledge, of which there is a dearth; hence science's preference for empirical observation. Thus, use of a severity measurement system rests on one's faith that it produces valid predictions. Assumed validity is an obvious and inherent limitation of such systems.
Some authors have characterized patients' courses as journeys through treatment and have suggested measuring severity not only at admission but also at discharge and at the midpoint of treatment. The utility of such measurements is elusive. Nevertheless, a valid severity of illness measure would provide a perfect correlation, independent of diagnosis, between severity level at discharge and, say, probability of death in a defined period, assuming the patient did not undergo subsequent treatment that would alter outcomes. Further, the probability of death for patients discharged at level 2, regardless of their admission level, would be identical, because a valid severity measurement system would yield probabilities independent of past probabilities. This line of inquiry may be useful for validating existing severity measurement systems, especially if probabilities were corrected for age-sex cohort experience.
Being able to validate severity measurements systems only by expert judgment (of measurement methods or in comparison to expected outcomes), people have turned to correlating severity scores with probability of death after treatment. Such correlations cannot validate severity measurement systems, of course, because one is observing treated outcomes. At best, such studies are validating the difficulty of treatment.
Treatment Difficulty Measures
The outcome of treatment depends, in part, on the types of patients treated. Further, patients of any type vary in their response to a given treatment.
Difficulty of treatment may be defined as the amount of health status loss in a specified period from treated disease. Less generically but more simply, treatment difficulty is predicted probability of death in a given period from treated disease. Ideally, treatment difficulty should be expressed in terms of optimal treatment for the patient's disease. However, for such purposes as measuring provider performance, it can be expressed in terms of average treatment (practice expectation or a statistical norm). Difficulty of treatment tells us nothing about severity of illness or, without knowing severity, about treatment effectiveness. Considerations for devising and using treatment difficulty measurement systems are similar to those for measuring severity. To be most useful, such a system would be independent of diagnosis and treatment. As maybe apparent by now, such measures might be difficult to devise. Nevertheless, a treatment difficulty system limited to a particular diagnosis or certain treatments may still be useful.
Validating Treatment Difficulty Measures
Difficulty of treatment measurement systems are simpler to validate than severity measurement systems because they predict probability of death after treatment-ideally after optimal treatment. For example, a five-level treatment difficulty measurement system could be validated by calculating, by level or score, the probability of death of all hospital patients. If valid, the levels would yield ratio probabilities, i.e., level 4 patients would die twice as often as level 2 patients. If this condition could not be met, at least the probability of death would rise consistently with increasing level. Further, this analysis would yield identical results if cases were divided by diagnosis or any other variable. Any system intended to measure treatment difficulty independent of diagnosis not meeting this condition would not be valid. A valid severity measurement system would also yield a perfect correlation between severity and treated outcomes under two circumstances: the treatment had no effect or the effectiveness of the same (or different) treatments varied perfectly with severity level. In the latter case, the probability of death would be uniformly lower for treated than untreated patients at all severity levels.
Treatment survival measurement systems are a subset of treatment difficulty systems. They are used to predict who wiU survive treatment, e.g., intensive care. Completed treatment means treatment until the patient leaves the ICU alive or dies in it. Certain variables are compiled to yield a score or to assign a patient to a survival class. Given that the ICU delivers optimal care for all those admitted, you can correlate a survival score with the probability of leaving the ICU alive. A valid system would yield a perfect Goffelation whether cases were divided by diagnosis or other patient variables. The system's validity likely would be sensitive to the criteria used to select patients for the ICU and to the consistency with which the criteria were applied. With a valid system and the assurance that all cases met the ICU's selection criteria, within the system's valid range, one could evaluate the ICU's performance by comparing expected with measured outcomes. Thus, ICUs could be rated from excellent to poor based on dividing the statistical distribution into equal parts or based on standard deviations. Of course, such systems predict only who will survive the treatment. They say nothing about its effectiveness or the patient's severity of illness.
The following additional sources of information on severity of illness and quality assurrrance were obtained through a computerized search of databases. Copies of cited articles may be obtained from the College for a nominal charge. For further information on the citations, contact Gwen Zins, Director of Information Services, at College headquarters, 813/2872000.
Aquilina, D., and other"Using Severity Data to Measure Quality."Business and Health 5(8):40-2, June 1988.
Aronow, D. "Severity-of-Illness Measurement: Applications in Quality Assurance and Utilization Review." Medical Care Review 45(2):339-66, Fall 1988.
Backofen, J., and others. "The Computerized Severity Index. A New Sophisticated Tool to Measure Hospital Quality of Care." Healthcare Forum 30(2):35-7, March-April 1987.
Brewster, A., and others. "MEDISGRPB: A Clinically Based Approach to Classifying Hospital Patients at Admission."Inquiry 22(4):377-87, Winter 1985.
Horn, S., and others. "Ambulatory Severity Index: Development of an Ambulatory Case Mix System." Journal of Ambulatory Care Management 11 (4) :53-62, Nov. 1988.
Louis, D., and Gonnella, J. "Disease Staging. Applications for Utilization Review and Quality Assurance." Quality Assurance and Utilization Review 1 (1) :13-8, Feb. 1986.
Rosko, M. "DRGs and Severity of Illness Measures: An Analysis of Patient Classification Systems." Journal of Medical Systems 12(4):257-74, Aug. 1988.
Wieners, W. "Quality Measurement and Severity Systems: An Overview." Computers in Healthcare 9(10):27,29,31-2, Oct. 1988.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||part 2|
|Author:||Goldschmidt, Peter G.|
|Date:||Sep 1, 1989|
|Previous Article:||Toward a definition of quality.|
|Next Article:||Providing Quality Care: The Challenge to Physicians.|
|Toward a definition of quality.|
|Red herring or horse of a different color?|
|Severity of illness: red herring or horse of different color?|
|A comparative study of severity indexes.|