Measuring subjective clinical outcomes.Measuring Subjective Clinical Outcomes The measurement of subjective clinical outcomes is one of the most perplexing per·plex tr.v. per·plexed, per·plex·ing, per·plex·es 1. To confuse or trouble with uncertainty or doubt. See Synonyms at puzzle. 2. To make confusedly intricate; complicate. areas within the arena of clinical decision making. When one mentions the word "subjective" to clinicians and researchers alike, two associations immediately come to mind: 1) subjective equals "soft" or unreliable information and 2) to be subjective is to be inherently bad. If these associations are accurate, then we should stop trying to measure subjective clinical outcomes and focus our attention in more fruitful areas. If these associations are inaccurate, we need to determine why, clarify the misunderstandings, and improve our measurements of these clinical outcomes. The first objective of this article is to clarify the meaning of the term "subjective" as it is commonly used in clinical decision making and argue that whether the concept is pejorative pejorative Medtalk Bad…real bad depends upon the context in which it is being used. Second, I will review briefly some of the basic scientific criteria we should use in judging the adequacy of clinical measures. Finally, I wil discuss some of the frequent sources of error in clinical measurements and suggest ways of eliminating or minimizing the degree of error introduced. Let us begin by clarifying major terms. The word "subjective" is frequently misused, misunderstood, and maligned ma·lign tr.v. ma·ligned, ma·lign·ing, ma·ligns To make evil, harmful, and often untrue statements about; speak evil of. adj. 1. Evil in disposition, nature, or intent. 2. in discussions about clinical measurement, because the word being modified by the adjective "subjective" is almost never clearly indicated. In the context of clinical measurement, I see two major objects being modified: The word "subjective" can be used to describe either 1) the phenomenon or clinical outcome being assessed or 2) the specific test or instrument being used to produce the particular measurement. For example, we can talk about measuring "pain," an important subjective clinical outcome, or we can talk about an unstandardized self-report of pain, a subjective measure of this phenomenon. When the term "subjective" is used to describe the phenomenon or clinical outcome of interest, I use the following definition: A subjective outcome is where the observed entity is the subjective state of a person, where the entity results from the feelings of the subject or person, or where an entity is perceptible per·cep·ti·ble adj. Capable of being perceived by the senses or the mind: perceptible sounds in the night. [Late Latin perceptibilis, from Latin perceptus only to the person being assessed. Examples of subjective clinical outcomes of crucial clinical importance to physical therapists abound: pain, fatigue, difficulty performing activities of daily living, and shortness of breath Shortness of Breath Definition Shortness of breath, or dyspnea, is a feeling of difficult or labored breathing that is out of proportion to the patient's level of physical activity. , to name just a few. What designates these clinical outcomes as subjective is the degree to which they involve the perception of the individual being examined. Objective phenomena, in contrast, have an existence independent of the perception of the individual. Rarely, however, is a clinical outcome either objective or subjective. I find it most useful to think of these clinical outcomes along a continuum ranging from fully subjective to fully objective. The term "subjective," when used to describe the measurement process or test itself, is quite different. I define a subjective measure or subjective test as a measure that is determined or influenced by the ideas or feelings of the person doing the assessment, the test itself, or extraneous ex·tra·ne·ous adj. 1. Not constituting a vital element or part. 2. Inessential or unrelated to the topic or matter at hand; irrelevant. See Synonyms at irrelevant. 3. characteristics of the person being assessed. In this context, the emphasis is on the way in which the measure itself is applied. It reflects the quality of the procedure, regardless of what is being measured. Measures that are subjective are those that are influenced by bias introduced by the observer or other factors that introduce error into the measurement procedure. In measuring subjective clinical outcomes, what we strive for are measurement processes that reflect the perceptions or feelings of the person being assessed and not the feelings or judgments of the person doing the measurement procedure. Measures that are subjective yield data that are inconsistent and are of the little clinical or scientific use (eg, the understandardized self-report of pain in my earlier example). Measuring clinical outcomes that are subjective (eg, pain itself) is a necessity for clinicians and clinical scientists. I find the matrix illustrated in the Figure very helpful in clarifying the two uses of the term "subjective" (Eugene Michels, personal communication). As the Figure illustrates, what we should be striving for are objective measures of both objective and subjective clinical outcomes of interest to physical therapy researchers and clinicians. We should avoid the use of subjective measures when assessing both types of clinical outcomes. Improving the Scientific Quality of Data on Subjective Clinical Outcomes The processes we use to convert information frequently needed about subjective clinical outcomes into basic scientific information are receiving increasing attention by epidemiologists and other clinical investigators A clinical investigator involved in a clinical trial is responsible for ensuring that an investigation is conducted according to the signed investigator statement, the investigational plan, and applicable regulations; for protecting the rights, safety, and welfare of subjects under . The goal of much of this research is the elimination of subjective measures in clinical practice. Alvin Feinstein, Professor of Epidemiology at Yale Medical School, has used the term "clinimetrics" to describe this emerging field of scientific inquiry. [1,2] Feinstein focuses on the key challenge we face to improve clinical measurement when he argues that to advance art and science in clinical examination, the equipment a clinician clinician /cli·ni·cian/ (kli-nish´in) an expert clinical physician and teacher. cli·ni·cian n. most needs to improve is himself. [3,4] Although spoken from the perspective of medicine, Feinstein's challenge applies equally to clinical research in physical therapy. Few standardized standardized pertaining to data that have been submitted to standardization procedures. standardized morbidity rate see morbidity rate. standardized mortality rate see mortality rate. methods or procedures exist for many of the observational techniques In marketing and the social sciences, observational research (or field research) is a social research technique that involves the direct observation of phenomena in their natural setting. , rating scales, and interpretive criteria used to identify the distinctly clinical phenomena of interest to the physical therapist. Measures of spasticity spasticity /spas·tic·i·ty/ (spas-tis´i-te) the state of being spastic; see spastic (2). spas·tic·i·ty n. 1. A spastic state or condition. 2. Spastic paralysis. , balance, muscle strength, tone, and shortness of breath are frequently assessed using unstandardized methods and techniques. Even when one wishes to use a standardized approach According to International Convergence of Capital Measurement and Capital Standards, known as Basel II, the standardized approach is a set of risk measurement techniques for banking institutions. The term may be used in the context of credit risk or operational risk. , there exists a paucity pau·ci·ty n. 1. Smallness of number; fewness. 2. Scarcity; dearth: a paucity of natural resources. of references to standards one can use that are analogous to more traditional "objective procedures," such as those used to convert a few drops of blood into paraclinical paraclinical /para·clin·i·cal/ (-klin´i-k'l) pertaining to abnormalities (e.g., morphological or biochemical) underlying clinical manifestations (e.g., chest pain or fever). paraclinical pertaining to abnormalities (e.g. data such as serum glucose level. [2] The lack of standardized clinical measures in physical therapy leads to a variety of important negative consequences. The first problem is the proliferation proliferation /pro·lif·er·a·tion/ (pro-lif?er-a´shun) the reproduction or multiplication of similar forms, especially of cells.prolif´erativeprolif´erous pro·lif·er·a·tion n. of basically subjective clinical measures of little real value to either the clinician or researcher. When using subjective clinical measures, it is hard to distinguish when a clinical finding can be believed from when the finding is in error because of how the measure was applied. The second consequence in much clinical research is that many important clinical phenomena are neither cited nor counted in quantifying results emerging from clinical investigations. Traditionally, most clinical treatment outcomes are determined by reference to traditional physiologic tests. In rheumatology rheumatology /rheu·ma·tol·o·gy/ (-tol´ah-je) the branch of medicine dealing with rheumatic disorders, their causes, pathology, diagnosis, treatment, etc. rheu·ma·tol·o·gy n. , for example, until recently most clinical trials of drug and other interventions were judged solely on traditional physiological tests such as the erythrocyte sedimentation rate Erythrocyte Sedimentation Rate Definition The erythrocyte sedimentation rate (ESR), or sedimentation rate (sed rate), is a measure of the settling of red blood cells in a tube of blood during one hour. or the rheumatoid factor rheumatoid factor n. Abbr. RF Any of the immunoglobulins found in the serum of individuals with rheumatoid arthritis that enhance the agglutination of suspended particles that are coated with pooled human gamma globulin and that are used and not on important clinical outcomes such as functional disability levels. The proliferation of standardized functional disability methods over the past decade, however, has totally transformed the outcomes now observed in rheumatology clinical trials. [5] Criteria for Judging Measures of Subjective Clinical Outcomes Clearly, a different set of processes is required for adequate measurement when the observed entity is the subjective state of a person. The more distinctly human and intangible a phenomenon, the more distinctly human will be the observational system needed to identify it. [2] Although a thorough discussion of these methods is well beyond the scope of this article, interested readers can refer to the many reference books now available on this topic in the literature on psychophysics psychophysics Branch of psychology concerned with the effect of physical stimuli (such as sound waves) on mental processes. Psychophysics was established by Gustav Theodor Fechner in the mid-19th century, and since then its central inquiry has remained the quantitative and other recent approaches in medicine. [1,6-10] Regardless of the nature of the phenomenon, be it objective or subjective, the clinician needs clear and precise scientific methods for converting clinical observations into data useful in clinical decision making and clinical research. The criteria we use to judge the adequacy of these observational methods are the same as those used in judging measures of traditional objective phenomena [6]: 1. Reliability--the consistency of scores or equivalence of scores from different users or over time. 2. Validity--the degree to which a scale is measuring what it is intended to assess and to which it is suited to its intended purpose. 3. Efficacy--how well the measure accomplishes its stated purpose. 4. Sensibility--the degree to which a procedure is practical or sensible to use in a particular situation and circumstance. This article focuses on issues surrounding the reliability of data on subjective clinical outcomes because this criterion is absolutely fundamental to achieving scientific adequacy of any measure. [11] The physical therapy research community has come a long way in the past decade toward documenting the degree to which our clinical measures fulfill fundamental scientific criteria. Studies that examine the consistency of clinical measures are seen with regularity in Physical Therapy (some would say with too much regularity). Unfortunately, we too have fallen into the trap Feinstein argues is a common problem for clinical investigators in medicine. Too often, investigators who study measurement variability note the disagreements and frequently quantify them with Kappa or intraclass correlation In statistics, the intraclass correlation (or the intraclass correlation coefficient[1]) is a measure of correlation, consistency or conformity for a data set when it has multiple groups. coefficients or other indexes of concordance concordance /con·cor·dance/ (-kord´ins) in genetics, the occurrence of a given trait in both members of a twin pair.concor´dant con·cor·dance n. , but all too frequently do not go on to identify and suggest ways to remove the key sources of the observed discrepancies. This latter step is crucial if our goal is to improve the scientific quality of these clinical measures. In part, this failure is due to inadequate understanding of the key sources of observer variability in measures of subjective clinical phenomena. I will highlight a few key areas that need further investigation as we continue to work on improving the consistency of our clinical data. Sources of Inconsistency of Clinical Measures Inconsistency can be introduced from three distinct sources: the individual examined, the examiner, and the examination itself. [8,12] All three need to be considered in finding better methods of assessing these clinical phenomena. The Examined Clinical attributes such as body weight, blood pressure, and pulse rate pulse rate n. The rate of the pulse as observed in an artery, expressed as beats per minute. vary from hour to hour, from day to day, and so forth. These attributes are influenced by such factors as position, diet, stress, fatigue, and exercise. Furthermore, we know that clinical measurements found to be extreme at one examination (ie, surprisingly high or low) will frequently regress REGRESS. Returning; going back opposed to ingress. (q.v.) toward the mean of the distribution (ie, move toward a more typical value) on reexamination re·ex·am·ine also re-ex·am·ine tr.v. re·ex·am·ined, re·ex·am·in·ing, re·ex·am·ines 1. To examine again or anew; review. 2. Law To question (a witness) again after cross-examination. . If not acknowledged and attended to, these sources of inconsistency can introduce important errors in the subsequent clinical information. Some sources of error can easily be eliminated by standardization standardization In industry, the development and application of standards that make it possible to manufacture a large volume of interchangeable parts. Standardization may focus on engineering standards, such as properties of materials, fits and tolerances, and drafting (eg, of time, position, preparation). Other sources of error, such as regression toward the mean Regression toward the mean The tendency that a random variable will ultimately have a value closer to its mean value. , are more challenging to eliminate and may require more routine repetition of measurement or corroboration through multiple measures of the same underlying phenomenon. The Examination The environment in which clinical measures are administered can affect the senses and sensibilities of the examiner and the individual examined. Privacy, to take one example, is probably a prerequisite for the accurate disclosure of sensitive information about personal or family history. Noise levels in many physical therapy gyms, to take another example, make cardiac or pulmonary examinations most challenging, if not impossible. An effective personal relationship between the clinical assessor and the patient is widely known to be an important prerequisite for effective communication. Numerous studies have shown, for example, that much of what is traditionally labeled as "patient noncompliance noncompliance failure of the owner to follow instructions, particularly in administering medication as prescribed; a cause of a less than expected response to treatment. noncompliance " is actually misunderstanding between the patient and the provider resulting from poor communication and inadequate interaction. [13] Frequent staff rotations and lack of continuity of care as patients pass through different stages of the health care system during one episode of illness all contribute to reducing the opportunity to establish personal relationships between client and clinician. Finally, inconsistency will always result from the use of miscalibrated or faulty clinical instruments, such as a defective dynamometer dynamometer /dy·na·mom·e·ter/ (di?nah-mom´e-ter) an instrument for measuring the force of muscular contraction. dy·na·mom·e·ter n. An instrument for measuring the degree of muscular power. or a miscalibrated isokinetic isokinetic /iso·ki·net·ic/ (-ki-net´ik) maintaining constant torque or tension as muscles shorten or lengthen; see isokinetic exercise, under exercise. device. The ways to avoid this source of error are obvious; the difficulty is found in implementing them in busy clinical practices. The Examiner A frequently overlooked source of error is variation in the senses of the examiner (ie, sight, hearing, touch, and occasionally smell). Variations in acuity acuity /acu·i·ty/ (ah-ku´i-te) clarity or clearness, especially of vision. a·cu·i·ty n. Sharpness, clearness, and distinctness of perception or vision. will often lead to inconsistency across therapists. Variation in sensory acuity within therapists, for example because of fatigue, will also lead to inconsistent information. The same phenomena can be perceived differently by two therapists or by the same therapist at two different points in time. Standardizing examiners over time and introducing redundancy into our clinical measures will help reduce this common source of error. A second source of examiner error is inappropriate application of a clinical test or procedure by an untrained or inexperienced in·ex·pe·ri·ence n. 1. Lack of experience. 2. Lack of the knowledge gained from experience. in clinician. Developers of clinical measures need to clearly state the level of expertise and training needed to yield consistent clinical information. One of the most important forms of measurement error and one that has received a good deal of research attention is the introduction of variability as a result of using different or inadequately described methods of converting what a clinician perceives into classification schemes or clinical scales. I will illustrate this source of error with an example drawn from an article by Hutcheson et al, [14] who investigated the methodological properties of the Karnofsky scale Kar·nof·sky scale n. A performance scale that rates a person's normal activities and that can be used to evaluate a patient's progress after a therapeutic procedure. . The Karnofsky scale, which was introduced in 1948 as a means of classifying patients with cancer, is commonly used to quantify patients' functional status. As the Table illustrates, the Karnofsky scale consists of three alphabetic groups for classifying patients' ability to work, to carry on normal activity, and to care for themselves. These alphabetic groups are further divided into 11 categories that appear to cover all possible levels of function, from completely normal to dead. Hutcheson et al assessed the reliability of the Karnofsky scale by determining the extent of agreement between an emergency room physician and a senior medical resident on the functional status assessment of a group of patients. Both physicians assessed the same patients' functional status and circled the appropriate response on the scale sheet. The two physicians agreed in just 10 (35%) of 29 pairs of scores in the 11-category portion of the scale; 7 of their disagreements involved differences of at least 20% in the score assigned. A comparison of Karnofsky alphabetic scores yielded better results, with the two renal physicians achieving 71% agreement, a Kappa reliability score of 46%. Two probable causes Apparent facts discovered through logical inquiry that would lead a reasonably intelligent and prudent person to believe that an accused person has committed a crime, thereby warranting his or her prosecution, or that a Cause of Action has accrued, justifying a civil lawsuit. for the observed inconsistency related to the use of the Karnofsky alphabetic and percentage scales can readily be seen. The first is the lack of operational criteria to define such elements of the scale as gradations in the categorization of work, normal activity, and self-care ability. This problem could easily be corrected by supplying suitable criteria to specify the elements. For example, what constitutes ability to carry on normal activity? When do "normal" activity levels become "abnormal"? This particular source of error is commonly seen in clinical outcome scales that lack a detailed protocol defining the key elements, including instructions on how they should be administered. A second problem is the non-exhaustive aggregation of the constituent elements of the Karnofsky scale, a flaw that is obvious by inspecting the alphabetic section of the scale. Each of the three main variables of the scale are dichotomous di·chot·o·mous adj. 1. Divided or dividing into two parts or classifications. 2. Characterized by dichotomy. di·chot : a patient may be able or unable to work, to engage or not engage in normal activity, to provide or not provide self-care. These three elements can be aggregated eight (2X2X2) ways; yet, the scale provides for only three aggregations. As this example of the Karnofsky scale illustrates, using a scale that has been inadequately constructed and standardized can produce serious inconsistency. Imagine the potential magnitude of error when different clinicians use entirely different scales or classification schemes. For example, therapists frequently use different scales to assess muscle strength or physical function. This practice can introduce considerable bias into the resultant clinical data. Conclusion We are facing a considerable challenge in attempting to develop more adequate clinical measures of important subjective clinical outcomes in physical therapy. We need to continue to develop and refine our methods of converting these clinical observations and assessments into scientifically meaningful data. Advancing the state of the art in clinical decision making in physical therapy requires that we be able to measure important clinical outcomes, be they subjective or objective phenomena. The criteria and methods for evaluating our success are clear and used with increasing frequency in physical therapy. Efforts are needed to better identify the important sources of error in commonly used clinical measures, and we must support the research needed to develop and test standardized methods for achieving our desired outcomes. Ultimately, our patients and the physical therapy profession will be the beneficiaries. A Jette, PhD, PT, is Associate Professor and Director, Graduate Program in Physical Therapy, MGH MGH Massachusetts General Hospital MGH McGraw-Hill Companies MGH Montreal General Hospital (Montreal, Canada) MGH Monumenta Germania Historica MGH May Go Home MGH Minneapolis General Hospital Institute of Health Professions, 15 River St, Boston, MA 02108-3402 (USA). References [1] Feinstein AR: Clinimetrics. New Haven New Haven, city (1990 pop. 130,474), New Haven co., S Conn., a port of entry where the Quinnipiac and other small rivers enter Long Island Sound; inc. 1784. Firearms and ammunition, clocks and watches, tools, rubber and paper products, and textiles are among the many , CT, Yale University Yale University, at New Haven, Conn.; coeducational. Chartered as a collegiate school for men in 1701 largely as a result of the efforts of James Pierpont, it opened at Killingworth (now Clinton) in 1702, moved (1707) to Saybrook (now Old Saybrook), and in 1716 was Press, 1987 [2] Feinstein AR: An additional basic science for clinical medicine: IV. The development of clinimetrics. Ann Intern intern /in·tern/ (in´tern) a medical graduate serving in a hospital preparatory to being licensed to practice medicine. in·tern or in·terne n. Med 99:843-848, 1983 [3] Feinstein AR: Scientific methodology in clinical medicine: I. Introduction, principles and concepts. Ann Intern Med 61:564-579, 1964 [4] Feinstein AR: Scientific methodology in clinical medicine: IV. Acquisition of clinical data. Ann Intern Med 61:1162-1193, 1964 [5] Guyatt G, Bombardier C, Tugwell P: Measuring disease-specific quality of life in clinical trials. Can Med Assoc J 134:889-895, 1986 [6] Feinstein AR: Clinical Epidemiology: The Architecture of Clinical Research. Philadelphia, PA, W B Saunders Co, 1985 [7] Nunnally JC: Psychometric psy·cho·met·rics n. (used with a sing. verb) The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and Theory. New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of , NY, McGraw-Hill Book Co, 1967 [8] Sackett, DL, Haynes RB, Tugwell P: Clinical Epidemiology: A Basic Science for Clinical Medicine. Boston, MA, Little, Brown & Co Inc, 1985. [9] McDowell I, Newell C: Measuring Health: A Guide to Rating Scales and Questionnaires. New York, NY, Oxford University Press Inc, 1987 [10] Baird JC, Noma E: Fundamentals of Scaling and Psychophysics. New York, NY, John Wiley John Wiley may refer to:
[11] Koran IM: The reliability of clinical methods, data and judgments. N Engl J Med 293:695-701, 1975 [12] Department of Clinical Epidemiology and Biostatics, McMaster University McMaster University, at Hamilton, Ont., Canada; nondenominational; founded 1887. It has faculties of humanities, science, social sciences, business, engineering, and health sciences, as well as a school of graduate studies and a divinity college. , Hamilton, Ontario, Canada: Clinical disagreement: II. How to avoid it and how to learn from one's mistakes. Can Med Assoc J 123:613-617, 1980 [13] Jette AM: Improving patient cooperation with arthritis treatment regimens. Arthritis Rheum rheum (rldbomacm) any watery or catarrhal discharge. rheum n. A watery or thin mucous discharge from the eyes or nose. rheum any watery or catarrhal discharge. 25:452-455, 1982 [14] Hutcheson T, Boyd N, Feinstein A, et al: Scientific problems in clinical sales, as demonstrated in the Karnofsky index of performance status. J. Chronic Dis 32:661-666, 1979 |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion