Printer Friendly
The Free Library
22,728,960 articles and books

Health outcome measures.


Over the past two decades increasing moves towards 'measuring' health outcome have influenced much of health care and indeed much of the physiotherapy physiotherapy: see physical therapy.  profession. However, health outcome measures can have multiple purposes, are associated with evolving and sometimes confusing terminology, and may have perceived and actual barriers to use. As a result the level of understanding and incorporation into physiotherapy practice is variable despite increasing national and international professional guidelines. In order to appropriately understand and use outcome measures (as well as interpret the information from them), it is essential to consider three key areas covered in this paper: conceptual frameworks to place an outcome measure within, practical considerations regarding implementation and finally identifying and describing the measurement qualities of an outcome measure; its psychometric psy·cho·met·rics  
n. (used with a sing. verb)
The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and
 properties. Horner D, Larmer PJ (2006): Health outcome measures. New Zealand New Zealand (zē`lənd), island country (2005 est. pop. 4,035,000), 104,454 sq mi (270,534 sq km), in the S Pacific Ocean, over 1,000 mi (1,600 km) SE of Australia. The capital is Wellington; the largest city and leading port is Auckland.  Journal of Physiotherapy 34(1): 17-24.

Keywords: Health outcome measure, Physiotherapy, Conceptual frameworks, Psychometric properties


'Measuring outcome' is a term used by a large number of industries across the world to determine how well the specific goals of any one business activity are met. Within the health care arena, the measurement of outcome has become increasingly widespread over the past two decades in response to calls to move beyond mere 'appearance of benefit' as an indicator of therapeutic impact. The tools derived for this purpose are usually referred to as health 'outcome measures' (Duckworth 1999). A health outcome measure has been described as a measure of health change, at a defined point in time, as a result of one or more health care processes (Baumberg et al 1995, Wennberg and Glittelsohn 1982). The implementation, interpretation and evaluation of outcome measures have caused much debate and controversy within the health literature.

Internationally the physiotherapy profession has been actively involved in promoting the use of outcome measures (Chartered Society of Physiotherapy 2000, Cole et al 1994, Kendall 1997). Physiotherapy practice commonly uses health outcome measures for their evaluative purpose, particularly since the advent of evidenced-based practice (Huijbregts et al 2002, Klassen et al 2001, Patrick and Chiang 2000). An evaluative outcome measure is used to aid in the measurement of effectiveness (or not, as the case may be) of physiotherapeutic phys·i·o·ther·a·py  
See physical therapy.

 interventions, indicating whether there has been a change in status since the last measurement. Robust and well targeted outcome measurement is therefore integral to clinical trial methodology or determining whether change can actually be attributed to the intervention delivered.

This paper outlines the historical development of the outcome movement, and then specifically implementation within the physiotherapy profession. The paper introduces three conceptual frameworks and practical considerations for usage of outcome measures. Finally an emphasis has been placed on how best to test the quality of an outcome measure to enable an informed interpretation of the outcome measure result. To aid clarity and practical application, examples from physiotherapy practice are supplied with further references given where depth of discussion is beyond the scope of this paper.


Relman (1988) suggests that there have been three distinct revolutions in western health care. The first revolution being the 'Era of Expansion' extended from the 1920s to the 1960s when significant growth in hospitals and specialists occurred along with increasingly sophisticated technology. Inflationary pressures of the first revolution led to the second revolution the 'Era of Cost Containment' from 1960s to the 1980s. The third revolution in health care the 'Era of Assessment and Accountability' was brought about by the need for information which would aid in the rationalisation Noun 1. rationalisation - (psychiatry) a defense mechanism by which your true motivation is concealed by explaining your actions and feelings in a way that is not threatening
, effectiveness and quality of health care. Relman (1988) reported that there were significant variations in practice across many sectors of health care with differences in both utilization rates, costs and the effects of interventions. These variations seemed to occur without any discernible dis·cern·i·ble  
Perceptible, as by the faculty of vision or the intellect. See Synonyms at perceptible.

dis·cerni·bly adv.
 measurable difference in patient outcomes (Gersten 1998). As a result, funders, particularly health insurance companies had a significant role in driving in the development of outcome measurements as a means of assuring that the treatment they were paying for was effective (Relman 1988). Health professionals recording that the patient reported 'feeling better', was no longer considered sufficient. Funders required objective measurements that could demonstrate that they were getting value for their money. One of the primary tools for obtaining this information was the use of outcome measures and hence this era has also been referred to as the 'Outcome Movement' (Epstein 1990).


Physiotherapy is reportedly often aligned with the traditional medical model (Nicholls and Larmer 2005). As medical interventions were put under the spotlight regarding treatment effectiveness, so this same process was applied to physiotherapy interventions. Hence, as the medical profession and those interacting with them saw outcome measures as a way of answering the critics, physiotherapists were also encouraged to incorporate standardised tests and measurements (outcome measures) in their practice (Rothstein et al 1991).

Physiotherapy professional organisations (international and national) are incorporating more overtly the use of outcome measures within core documentation. Specifically physiotherapy outcome measures have been described as '...a scale utilised and interpreted by physical therapists [physiotherapists] designed to measure a specific attribute [of interest to patient and therapist] that is expected to change owing to owing to
Because of; on account of: I couldn't attend, owing to illness.

owing to prepdebido a, por causa de 
 the intervention of a physical therapist [physiotherapist physiotherapist /phys·io·ther·a·pist/ (-ther´ah-pist) physical therapist.


physical therapist.
]' (Mayo et al 1993 p.81). In addition, for data from an outcome measure to be 'trustworthy', it must have been evaluated and results reported in peer reviewed literature demonstrating adequate measuring properties (Mayo et al 1993).

In 1994 the Chartered Society of Physiotherapy in the United Kingdom as part of a quality assurance initiative indicated the growing importance of taking accurate tests and measurements within general documentation (Chartered Society of Physiotherapy 1994). By 2000 the Chartered Society of Physiotherapy (2000) had identified 22 core standards of professional practice, one (Standard 6) requiring members to use appropriate and high quality outcome measures in their routine clinical practice. Naming outcome measure within the core standards raises the profile and reflects the increasing importance for physiotherapists to collect and utilise appropriate information to inform their practice.

Nationally, the New Zealand Physiotherapy Board published their second edition of registration requirements, a document that describes 10 competencies to enable a physiotherapist to register to practice in New Zealand (New Zealand Physiotherapy Board 1999). At least four of the 10 competencies (3 4, 8 and 10), contain terminology, that outcome measures should be applied and evaluated.

Although there has been this growing understanding of outcome measures, research with physiotherapy practitioners internationally and within New Zealand suggests that there is not a clear understanding of the use and interpretation of outcome measures (Huijbregts et al 2002, Kendall 1997). The aim here therefore is to further enhance the understanding of the use and interpretation of outcome measures. The following section identi . es three conceptual frameworks that health outcome measures can be aligned with to enhance the appropriate choice of which outcome measure(s) to use.


Three of the most dominant frameworks suggested for the measurement of health outcomes are: the International Classification of Functioning, Disability and Health International Classification of Functioning, Disability and Health, also known as ICF, is a classification of the health components of functioning and disability.  (ICF (Internet Connection Firewall) The built-in firewall in Windows XP. It provides a stateful inspection of packets which accepts only responses to requests originated by the user. ), Health Related Quality of Life (HRQoL) and thirdly, cost (Finch finch, common name for members of the Fringillidae, the largest family of birds (including over half the known species), found in most parts of the world except Australia.  et al 2002).

The first conceptual framework For the concept in aesthetics and art criticism, see .

A conceptual framework is used in research to outline possible courses of action or to present a preferred approach to a system analysis project.
, the ICF, precursors of which were known as the International Classification of Impairment Disability and Handicap (ICIDH ICIDH International Classification of Impairments, Disability and Handicaps ), is a comprehensive conceptual framework of outcomes in the measurement of health (World Health Organisation 2001). The ICF assigns the term 'functioning' as encompassing the positive components of health, and 'disability' as encompassing the negative components of health. Disability is further subdivided into impairments, activity limitations and participation restrictions within the context of environmental facilitators and barriers.

The second conceptual framework for the measurement of health outcomes is HRQoL. Although the precise definition of HRQoL is debated, there is agreement that HRQoL measures include multiple dimensions and that they are important to the individual and relevant to the particular health intervention health intervention Health care An activity undertaken to prevent, improve, or stabilize a medical condition  (Oldridge 1997). HRQoL is purported to include dimensions that describe a person's physical, social and psychological health (Bulpitt 1997, Oldridge 1997, Stewart et al 1987). Some HRQoL instruments not only measure specific dimensions but also place a value on each dimension (Muldoon et al 1998).

The third conceptual framework that a health outcome could be identified with is the cost of the service, both direct and indirect. Direct cost relates to the resources consumed in providing the service and indirect cost refers to the costs to the patient in undergoing an episode of care. The cost to the patient may include family, whanau and other support. This framework is set within the demand for and increases in health care services against a finite health dollar. Physiotherapists may see themselves in the role of patient advocate rather than considering the economic implications of their decisions and therefore may not always consider cost as a primary outcome. However, economic components of health care influence (either explicitly or implicitly) all levels of decision making in health (Kernick 2003, Robinson 1999).

While the three conceptual frameworks have been outlined separately, they can co-exist in practice with areas of overlap and often with no definitive borders (Finch et al 2002, Jette 1993) or may even be in tension with one another. An example of this overlap is demonstrated in cost effectiveness analysis--the comparison of two interventions where they would have a common outcome measure would aid in the clinical decision to which intervention was used (Kernick 2003). In this situation an outcome indicator from within the ICF framework would be compared to the cost outcome in the delivery of the two interventions being compared. Therefore, the choice of specific health outcome measures relating to relating to relate prepconcernant

relating to relate prepbezüglich +gen, mit Bezug auf +acc 
 these two frameworks will depend on multiple issues, including who and what the information is being used for.

Being able to identify the conceptual framework underpinning un·der·pin·ning  
1. Material or masonry used to support a structure, such as a wall.

2. A support or foundation. Often used in the plural.

3. Informal The human legs. Often used in the plural.
 an outcome measure is paramount for robust evaluation of health care. Once a health outcome measure is associated within conceptual frameworks the next issue to consider is how practical (or not) it is to complete within an identified setting.


Ideally, the burden for either the physiotherapist or patient in applying or using an outcome measure should be minimal. Burden could be perceived or actual with examples for the physiotherapist reported previously as including cost, time, and limited knowledge of appropriate outcome measure (Cole et al 1994, Kendall 1997). Burden for the patients may include time, cultural barriers, perception of relevance, and understanding of the measure.

One of the most commonly used health outcome measures in current usage is the Short-Form 36 (SF-36) questionnaire which purports to measure health status (Ware and Sherbourne 1992). The SF-36 has been widely used and tested across countries, within different diseases and in different health states since it was first developed (Ware and Gandek 1998). Whilst this questionnaire has undergone extensive validation and improvements this outcome measure can be used to illustrate possible examples of burden.

The use of the SF-36 questionnaire requires obtaining user registration, purchasing a licence, (approximately $US360), and incidental costs such as photocopying photocopying, process whereby written or printed matter is directly copied by photographic techniques. Generally, photocopying is practical when just a few copies of an original are needed. When many copies are required, printing processes are more economical. . Depending on the type of administration (self or interviewer-delivered) the questionnaire on average takes between 10 to 30 minutes to complete (Ware et al 2000). The analysis of the data then requires further manipulation of the numerical scores to allow interpretation of the results. Furthermore, although the SF-36 has a specific version for use within New Zealand, this version is only available in English (Sanson-Fisher and Perkins 1998). A patient must be sufficiently fluent with the English language English language, member of the West Germanic group of the Germanic subfamily of the Indo-European family of languages (see Germanic languages). Spoken by about 470 million people throughout the world, English is the official language of about 45 nations.  to complete the questionnaire without consultation, as required by the standardised instructions (Ware et al 2000). Within New Zealand's multi-cultural society the language barrier may be too great a burden for some and may also be culturally unacceptable.

Having identified possible burdens to undertaking outcome measurement, each setting must assess their own needs (and restraints) to consider how practical any one outcome measure is to use. In the clinical setting time may be the greatest barrier; however, this barrier may be overcome in the research setting with adequate funding to employ staff to gather outcome measures.


Having identified conceptually where a health outcome may lie and highlighted practical issues surrounding the actual use of health outcomes, the final section of the paper describes how to assess and critique the measurement qualities or properties (psychometrics psychometrics

Science of psychological measurement. Psychometricians design and administer psychological tests (see psychological testing), both to generate empirical data on mental processes and to refine their understanding of measurement techniques and the
) of an outcome measure. This will in turn enable a more informed interpretation of the outcome measure result.

Psychometrics is the theory and rules of measurement (Nunnally 1978). For a measure to be used as an outcome measure certain psychometric properties need to be demonstrated. Figure 1 provides a schematic A graphical representation of a system. It often refers to electronic circuits on a printed circuit board or in an integrated circuit (chip). See logic gate and HDL.  overview of psychometrics properties. Two classic psychometric properties, reliability and validity, are described. Finally, a third property that is sometimes overlooked but is clearly important in evaluation of outcome is discussed--the ability to detect change (sensitivity to change and responsiveness).


All measurement involves some internal error or lack of precision, whether in measuring joint angle or blood pressure (which is sometimes referred to as hard' measurements) or HRQoL (sometimes called soft')(Fries 1983). It is therefore important for the potential user, clinician clinician /cli·ni·cian/ (kli-nish´in) an expert clinical physician and teacher.

 or researcher to access the degree of validity and reliability of any measure for a specific population and for a specific purpose (Kirshner and Guyatt 1985). From a mathematical perspective, error of measurement can be in the form of either non-random error (otherwise referred to as validity) or random error (otherwise referred to as reliability).

Non-random error is a systematic biasing that affects measurement (Nunnally 1978). This can be explained by considering the situation of a physiotherapist measuring joint range of motion using a goniometer goniometer /go·ni·om·e·ter/ (go?ne-om´e-ter)
1. an instrument for measuring angles.

2. a plank that can be tilted at one end to any height, used in testing for labyrinthine disease.
. Although the goniometer measures precisely, it may have been calibrated cal·i·brate  
tr.v. cal·i·brat·ed, cal·i·brat·ing, cal·i·brates
1. To check, adjust, or determine by comparison with a standard (the graduations of a quantitative measuring instrument):
 incorrectly (measuring 13 degree higher than it should). The extent of validity of an outcome measure is dependent on the degree of non-random error. The larger the non-random error the less valid the measure is.

Random error can be defined as chance unexplained fluctuations in data (Nunnally 1978). An example of non-random error can be demonstrated by the previous example measuring joint range of motion with a goniometer. Random error would occur if the goniometer was accurate, but the physiotherapist had eyesight eye·sight
1. The faculty of sight; vision.

2. Range of vision; view.
 problems and misread mis·read  
tr.v. mis·read , mis·read·ing, mis·reads
1. To read inaccurately.

2. To misinterpret or misunderstand: misread our friendly concern as prying.
 the angle whilst taking repeated measurements, on some occasions reading slightly higher and on some occasions slightly lower. The larger the random error the less reliable the measurement instrument is.

For measurement instruments to perform as successful outcome measures, both random and non-random error of measurement need to be minimised and this is done by ensuring the measure is valid and reliable.

Validity of an Outcome Measure

Validity--or non-random error--indicates the extent an instrument measures what it is intended to measure (Jette 1993). Validity is a complex concept; it is not an all or nothing property and should be considered in relation to the specific purpose for use and, in the specific population of interest. The description of the population should at minimum include the following baseline characteristics; age, gender, ethnicity, diagnosis and severity, and the presence of co-morbidities (Case and Smith 2000, Juni et al 2001)

Validity has historically been divided into three basic types: content validity content validity,
n the degree to which an experiment or measurement actually reflects the variable it has been designed to measure.
, criterion validity The introduction to this article provides insufficient context for those unfamiliar with the subject matter.
Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page.
, and construct validity construct validity,
n the degree to which an experimentally-determined definition matches the theoretical definition.
 (Carmines and Zeller 1979, McDowell and Newell 1996, Nunnally 1978). The classic definitions of the types of validity are presented in Table 1. In the development of a measure the concepts to be measured would need to be clearly and comprehensively defined. Content validity addresses whether the measure adequately covers all of the concepts previously defined. As there is no statistical analysis to assess content validity this commonly relies on critique from health care experts and patients from within a particular field (McDowell and Newell 1996).

Unlike content validity, the estimation of criterion and construct validity include statistical methods. Criterion validity would be demonstrated by the extent of correlation between the given instrument and an identified external criterion considered to be gold standard. Therefore, empirical evidence is required for identification of an adequate correlation; however, theory is required to demonstrate the selection of the external criterion.

Criterion validity at first seems a straightforward aspect of validity; however, challenges exist. First, in some situations there is no criterion gold standard. A relevant example is the absence of a gold standard criterion theoretically linked to the outcome measure of HRQoL (Guyatt 1993, Jette 1993, Nanda and Andresen 1998). Due to this limitation some authors have argued that individuals should act as their own judge for the external criterion (Deyo and Centor 1986, Ni et al 2000). Others advise that using more than one external criterion aids in the validation process, as long as the choice of additional criterion is based on appropriate theoretical evidence (Carmines and Zeller 1979, McDowell and Newell 1996). A second challenge is that original gold standard outcome measures need to be carefully critiqued in their own right as new measures are often evaluated against them (Saltzman et al 1998).

Construct validity, the identification of theoretical concepts between the two measures is determined by examining their empirical relationship In science, an empirical relationship is one based solely on observation rather than theory. An empirical relationship requires only confirmatory data irrespective of theoretical basis. . If a high correlation was established this would then add to the body of knowledge supporting construct validity of a measurement. Construct validity is an aspect of validity where evidence accrues to support or refute re·fute  
tr.v. re·fut·ed, re·fut·ing, re·futes
1. To prove to be false or erroneous; overthrow by argument or proof: refute testimony.

 the use of a specific instrument. Construct validity should be identified if there is no gold standard criterion or no universal content (validity) that is clearly accepted to define the measure. As the New Zealand population demonstrates considerable cultural diversity it is important to assess construct validity across differing ethnic and cultural groups. This concept has been referred to in the literature as equivalence (Bullinger et al 1993, Hahn and Cella 2003, Herdman et al 1997).

Reliability of an Outcome Measure

Reliability is the stability of a test over time when no important changes have occurred (Jette 1993). An outcome measure is considered to be reliable --have minimal random error--when it gives the same results (or close to the same) over time when no change has occurred (Carmines and Zeller 1979, Nunnally 1978). Reliability of a measure must be identified for a specific population. If adequate reliability has not been demonstrated then it is unknown if changes over a given time are the result of a specific intervention, or the fact that the outcome measure has poor reliability.

There is no one single attribute to assess the reliability of a measure. Table 1 identifies and defines three key classical aspects of reliability; internal consistency In statistics and research, internal consistency is a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether several items that propose to measure the same general construct produce similar scores. , interrater (inter-observer), and intrarater (intra-observer and also sometimes referred to test-retest, repeatability or reproducibility) which all support different aspects of reliability (McDowell and Newell 1996, Nunnally 1978).

It may not be appropriate or necessary to assess all three attributes of reliability for every outcome measure. Internal consistency is an aspect of reliability that is commonly associated with an instrument (such as a questionnaire measuring HRQoL) that has multiple items (questions/ statements) and would demonstrate the degree of correlation/cohesion among items within the instrument (Cronbach 1951). The assessment of internal consistency has the practical advantage in that it requires only one completion of a measure whereas the assessments of intra- and interater reliability both require the completion of a measure twice and hence in certain contexts may be more burdensome. Of course it only provides a single piece of information--the degree to which all items of the measure are addressing a related concept.

Psychometric literature can be challenging as some authors identify and define terms with subtle differences. An example of this challenge is where a fourth attribute of reliability has been described; test-retest (Bowling 2001, Ottenbacher and Tomchek 1993, Rousson et al 2002). As described previously test-retest has been a term that has been used interchangeably with intrarater reliability (McDowell and Newell 1996, Nunnally 1978). However, Ottenbacher and Tomchek (1993) describe test-retest as a term associated with reliability studies when the outcome measure did not require rater rat·er  
1. One that rates, especially one that establishes a rating.

2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater. 
 observation or judgement. An example of such a situation would be where the measurement of HRQoL was completed via a questionnaire. The completion of the questionnaire relies on the subjectivity of the patient rather than a rater observation or judgement. This therefore reserves the term intrarater reliability for situations where rater observation or judgement is required such as the use of a goniometer. Particular attention is required to clearly define, and adequately reference terminology used in the context of reliability.

An Outcome's Measure Ability to Detect Change

A 'third' psychometric property, responsiveness, has been proposed and described in terms of the ability of an instrument to detect a minimally clinically important change over time, when one is present (Guyatt et al 1987, Kirshner and Guyatt 1985).

Terminology applied to literature exploring the ability to detect change is confusing. Internal responsiveness, external responsiveness, sensitivity and responsiveness tend to be used interchangeably (Hocking Hocking may refer to:
  • Hocking County, Ohio
  • Hocking Hills in Ohio
  • Hocking College in Ohio
  • Hocking River in Ohio
  • William Ernest Hocking, American Idealist philosopher
 et al 1999, Husted et al 2000). Furthermore, some authors describe the ability to detect change as another aspect of validity rather than a separate psychometric property (Hays and Hadorn 1992, Liang 2000, McDowell and Newell 1996). To add to the confusion, sensitivity has a specific but different technical meaning when used in the field of epidemiology as it refers to '... the proportion of persons with a particular disease who are correctly classified as diseased by the test' (McDowell and Newell 1996 p.31). For the particular purpose of this paper and to aid in the ability to clarify the application of statistical tests, the terms sensitivity to change and responsiveness will be used to signify the ability to detect change, with their accompanying definitions as offered by Liang (2000) as included in Table 1.

Sensitivity describes a significant statistical change over time. However, this change may not mean anything to either the patient or the health professional. An instrument, therefore, may detect statistically significant change (ie be 'sensitive') but the patient or the health professional may not consider that change to be meaningful or important. Responsiveness encompasses the notion of a statistical difference, but importantly includes a focus on clinically important change as evaluated or defined from the perspective of either (singularly or a combination of) the person, carer carer

a person who looks after someone who is ill or old, often a relative: the group offers support for the carers of those with dementia

carer n
, society or health professional. Responsiveness is therefore reliant on a criterion, external to the instrument, whereas sensitivity is not.

Interpretation of the Outcome Measure Result

The next two steps to consider when interpreting outcome measures are: "How do we estimate psychometric properties?" and "What are the levels of acceptability?" There are many statistical approaches to evaluating the various psychometric properties of measures. Table 1 lists and gives examples of some common forms of statistical tests and guidelines on interpretations of findings. More information can be found in; Physical Rehabilitation physical rehabilitation See Physical therapy.  Outcome Measures: A Guide to Enhancing Clinical Decision Making (Finch et al 2002), Measuring Health: A Guide to Rating Scales and Questionnaire (McDowell and Newell 1996) and Psychometric Theory (Nunnally and Bernstein 1994).

Reliability, validity (and responsiveness) are not finite concepts. Increasing confidence in the use of a measure would be achieved by evidence gained from multiple studies on differing (and adequately described) populations for clearly identified purposes.


As part of evidence based-practice and clinical decision making processes, physiotherapists are required to assess the effectiveness of their interventions and the use of appropriate and high quality outcome measures aids this process. This paper has provided a framework for understanding where outcome measures have come from and outlined the important conceptual, practical and mathematical properties that measures should have. Just because an outcome measure is commonly used does not guarantee that it is a good measure.

Unless we as physiotherapists have knowledge on the reliability, validity (and responsiveness) of an outcome measure for a particular purpose in a particular population how can we correctly, with confidence, interpret findings? Further evaluation of established outcome measures and the development of new outcome measures for differing interventions, in differing settings and with patients demonstrating diversity, should help build confidence in the use of outcome measures. However, knowledge, and careful consideration of terminology and methods, in particular statistical analysis, is needed if the most appropriate use and interpretation of outcome measure results is to occur.

Physiotherapists and other health professionals have to balance a number of practical and professional issues in striving for excellence in clinical practice. Being able to wisely use and interpret outcome measures is one such issue given that services are increasingly being contracted for according to according to
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

 their contribution to health gain. Ensuring our patients receive the very best physiotherapeutic interventions means contributing to the appropriate use (and critique) of outcome measures for research and practice. This commitment to health outcome measures is a professional responsibility.


The authors wish to thank Dr Kathryn McPherson, Professor Rehabilitation rehabilitation: see physical therapy. ), Division of Rehabilitation and Occupation Studies, and Dr Jane Koziol-McLain, Associate Professor, Division of Health Care Practice, AUT AUT n abbr (BRIT) (= Association of University Teachers) → sindicato de profesores de universidad

AUT n abbr (Brit) (= Association of University Teachers) →
 University for their comments and advice in the development of this paper.


Baumberg L, Long A and Jefferson J (1995): International workshop: Culture and outcomes, Clearing Houses on Health Outcomes. [accessed February 1, 2006].

Bowling A (2001): Measuring Disease: A review of disease specific quality of life measurement scales (2nd ed.). Buckingham: Open University Press.

Bullinger M, Anderson R, Cella D and Aaronson N (1993): Developing and evaluating cross-cultural instruments from minimum requirements to optimal models. Quality of Life Research 2: 451-459.

Bulpitt CJ (1997): Quality of life as an outcome measure. Postgraduate Medicine 73: 613-616.

Carmines E and Zeller R (1979): Reliability and Validity Assessment. Newbury Park: Sage Publications This article or section needs sources or references that appear in reliable, third-party publications. Alone, primary sources and sources affiliated with the subject of this article are not sufficient for an accurate encyclopedia article. .

Case L and Smith T (2000): Ethnic representation in a sample of the literature of applied psychology. Journal of Consultant Clinical Psychology 68: 1107-10.

Chartered Society of Physiotherapy (1994): Standards for Tests and Measures in Physiotherapy. London, United Kingdom.

Chartered Society of Physiotherapy (2000): Core Standards. London, United Kingdom.

Cohen cohen
 or kohen

(Hebrew: “priest”) Jewish priest descended from Zadok (a descendant of Aaron), priest at the First Temple of Jerusalem. The biblical priesthood was hereditary and male.
 J (1988): Statistical Power Analysis for the Behavioural Sciences Behavioural sciences (or Behavioral science) is a term that encompasses all the disciplines that explore the activities of and interactions among organisms in the natural world.  (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Cole B, Finch E, Gowland C and Mayo N (1994): Physical Rehabilitation Outcome Measures. Ontario: Canadian Physiotherapy Association.

Cronbach L (1951): Coefficient alpha and the internal structure of tests. Psychometrika 16: 297-334.

Deyo RA and Centor RM (1986): Assessing the responsiveness of functional scales to clinical change: An analogy to diagnostic test performance. Journal of Chronic Diseases 39: 897-906. Duckworth M (1999): Outcome measurement selection and typology typology /ty·pol·o·gy/ (ti-pol´ah-je) the study of types; the science of classifying, as bacteria according to type.


the study of types; the science of classifying, as bacteria according to type.
. Physiotherapy 85: 21-27.

Epstein A (1990): The outcome movement: Will it get us where we want to go? New England Journal of Medicine The New England Journal of Medicine (New Engl J Med or NEJM) is an English-language peer-reviewed medical journal published by the Massachusetts Medical Society. It is one of the most popular and widely-read peer-reviewed general medical journals in the world.  323: 266-270.

Finch E, Brooks D, Stratford P and Mayo N (2002): Physical Rehabilitation Outcome Measures: A Guide to Enhance Decision Making (2nd ed.). Hamilton: Lippincott, Williams & Wilkins.

Fries JF (1983): Towards an understanding of patient outcome measurement. Arthritis and Rheumatism rheumatism (r`mətĭzəm), general term for a number of disorders that cause inflammation and pain in muscles, bones, joints, or nerves.  26: 697-704.

Gersten P (1998): Outcome research: A review. Neurosurgery neurosurgery /neu·ro·sur·gery/ (noor´o-sur?jer-e) surgery of the nervous system.

Surgery on any part of the nervous system.
 43: 1146-1156.

Guyatt G (1993): Measurement of health-related quality of life in heart failure. Journal American College of Cardiology cardiology

Medical specialty dealing with heart diseases and disorders. It began with the 1749 publication by Jean Baptiste de Sénac of contemporary knowledge of the heart. Diagnostic methods improved in the 19th century, and in 1905 the electrocardiograph was invented.
 22: 185-191.

Guyatt G, Walters S and Norman G (1987): Measuring change over time: assessing the usefulness of evaluative instruments. Journal of Chronic Diseases 40: 171-178.

Hahn E and Cella D (2003): Health outcomes assessment in vulnerable populations: Measurement challenges and recommendations. Archives of Physical Medicine of Rehabilitation 84: S35-S42.

Hanley JA and McNeil BJ (1982): The meaning and use of the area under a receiver operating characteristics (ROC) curve. Radiology radiology, branch of medicine specializing in the use of X rays, gamma rays, radioactive isotopes, and other forms of radiation in the diagnosis and treatment of disease.  143: 2936.

Hays R and Hadorn D (1992): Responsiveness to change: an aspect of validity, not a separate dimension. Quality of Life Research 1: 73-75.

Herdman M, Fox-Rushby J and Badia X (1997): Equivalence and the translation and adaptation of health-related quality of life questionnaires. Quality of Life Research 6: 237-247.

Hocking C, Williams M, Broad J and Baskett J (1999): Sensitivity of Shal, Vanclay and Coopers's Modified Barthel Index Barthel index, standard, well-validated assessment that measures functional outcomes, including independence in mobility and self-care. Commonly used in rehabilitation medicine.
. Clinical Rehabilitation 13: 141-147.

Huijbregts M, Myer A, Kay T and Gavin T (2002): Systematic outcome measurement in clinical practice: Challenges experienced by physiotherapists. Physiotherapy Canada 54: 25-31.

Husted J, Cook R, Farewell V and Gladman D (2000): Methods for assessing responsiveness: A critical review and recommendations. Journal of Clinical Epidemiology 53: 459-468.

Jette AM (1993): Using health-related quality of life measures in physical therapy outcome research. Physical Therapy 73: 528-537.

Juni P, Altman D and Egger M (2001): Assessing the quality of controlled clinical trials. British Medical Journal The British Medical Journal, or BMJ, is one of the most popular and widely-read peer-reviewed general medical journals in the world.[2] It is published by the BMJ Publishing Group Ltd (owned by the British Medical Association), whose other  323: 42-46.

Kendall N (1997): Developing outcome assessments: A step--by--step approach. New Zealand Journal of Physiotherapy Dec: 11-17.

Kernick D (2003): Introduction to health economics for the medical practitioner. Postgraduate Medicine 79: 147-150.

Kirshner B and Guyatt G (1985): A methodological framework for assessing health indices. Journal of Chronic Diseases 38: 27-36.

Klassen L, Grzybowski W and Rosser B (2001): Trends in physical therapy research and scholarly activity. Physiotherapy Canada 53: 40-47.

Landis J and Koch G (1977): The measurement of observer agreement for categorical data categorical data

data relating to category such as qualitative data, e.g. dog, cat, female. It may be nominal when a name is used, e.g. location, breed, or ordinal when a range of categories is used, e.g. calf, yearling, cow.
. Biometrics 33: 159-174.

Liang MH (2000): Longitudinal construct validity: Establishment of clinical meaning in patient evaluation instruments. Medical Care 38: 84-90.

Mayo N, Cole B, Dowler J, Gowland C and Finch E (1993): Use of outcomes in physiotherapy: A survey of current practice. Canadian Journal of Rehabilitation 7: 81-82.

McDowell I and Newell C (1996): Measuring Health: A Guide to Rating Scales and Questionnaires (2nd ed.). New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
: Oxford University Press.

Muldoon M, Barger S, Flory J and Manuck S (1998): What are quality of life measurements measuring? British Medical Journal 316: 542-545.

Nanda U and Andresen EM (1998): Health-related quality of life: A guide for the health professional. Evaluation & The Health Professions 21: 197-215.

New Zealand Physiotherapy Board (1999): Registrations Requirements: Competencies and Learning Objectives (2nd ed.). Wellington, New Zealand.

Ni H, Toy W, Burgess D, Wise K, Nauman DJ, Crispell K and Hershberger RE (2000): Comparative responsiveness of Short-Form 12 and Minnesota Living with Heart Failure Questionnaire in patients with heart failure. Journal Cardiac Failure cardiac failure: see congestive heart failure.  6: 83-91.

Nicholls D and Larmer P (2005): Possible futures for physiotherapy: An exploration of the New Zealand context. New Journal of Physiotherapy 33: 55-60.

Nunnally J (1978): Psychometric Theory (2nd ed.). New York: McGraw-Hill.

Nunnally J and Bernstein I (1994): Psychometric Theory (3rd ed.). New York: McGraw-Hill.

Oldridge N (1997): Outcome assessment in cardiac rehabilitation Cardiac Rehabilitation Definition

Cardiac rehabilitation is a comprehensive exercise, education, and behavioral modification program designed to improve the physical and emotional condition of patients with heart disease.
: Health related quality of life and economic evaluation. Journal of Cardiopulmonary Rehabilitation Cardiopulmonary Rehabilitation is a branch of rehabilitation medicine dealing with optimizing function patients with cardiac and pulmonary diseases.  17: 179-194.

Ottenbacher K and Tomchek S (1993): Reliability analysis in therapeutic research: Practice and procedures. The American Journal of Occupational Therapy 47: 10-16.

Patrick D and Chiang Y (2000): Measurement of health outcomes in treatment effectiveness evaluation: Conceptual and methodological challenges. Medical Care 38: S14-S25.

Relman A (1988): Assessment and accountability: The third revolution in medical care. New England Journal of Medicine 319: 1220-1222.

Robinson R (1999): Limits to rationality: Economics, economists and priority setting. Health Policy 49: 13-26.

Rothstein J, Campell S, Echternach J, Jette A, Knecht H and Rose S (1991): Standards for tests and measurements in physical therapy practice. Physical Therapy 71: 589-622.

Rousson V, Gasser Gas·ser , Herbert Spencer 1888-1963.

American physiologist. He shared a 1944 Nobel Prize for research on the functions of nerve fibers.
 T and Burkhardt S (2002): Assessing intrarater, interrater and test-retest reliability of continuous measurements. Statistics in Medicine 21: 3431-3446.

Saltzman C, Mueller C, Zwior-Maron K and Hoffman R (1998): A primer on lower extremity lower extremity
The hip, thigh, leg, ankle, or foot. Also called inferior limb, pelvic limb.
 outcome measurement instruments. Iowa Othopeadic Journal 18: 101-111.

Sanson-Fisher RW and Perkins JJ (1998): Adaptation and validation of the SF-36 health survey for use in Australia. Journal of Clinical Epidemiology 51: 961-967.

Stewart A, Green field S, Hays R, Wells K, Rodgers W, Berry S, McGlynn E and Ware J (1987): Functional status, well-being of patients with chronic conditions. Journal of American Medical Association 267: 907-913.

Ware J and Gandek B (1998): Overview of the SF-36 health survey and the international quality of life assessment (IQOLA IQOLA International Quality of Life Assessment ) project. Journal of Clinical Epidemiology 51: 903-912.

Ware J, Kosinski M and Gandek B (2000): SF-36 health survey: Manual and interpretation guide. Lincoln: Qualitymetric Incorporated.

Ware J and Sherbourne C (1992): The MOS (1) (Metal Oxide Semiconductor) See MOSFET.

(2) (Mean Opinion Score) The quality of a digitized voice line. It is a subjective measurement that is derived entirely by people listening to the calls and scoring the results from
 36-item short-form health survey (SF-36). Medical Care 30: 473-483.

Wennberg J and Glittelsohn A (1982): Variations in medical care among small areas. Scientific American Scientific American

U.S. monthly magazine interpreting scientific developments to lay readers. It was founded in 1845 as a newspaper describing new inventions. By 1853 its circulation had reached 30,000 and it was reporting on various sciences, such as astronomy and
 246: 120-134.

World Health Organisation (2001): International Classification of Functioning, Disability and Health. http:/ [Accessed November 20, 2002].


* Health outcome measures are being embedded Inserted into. See embedded system.  into physiotherapy practice.

* Not all health outcome measures are good health outcome measures.

* Health outcome measures should be placed within conceptual frameworks and be practical.

* Health outcome measures should be reliable, valid and responsive for a particular purpose in a particular population.


Diana Horner MHSc(Hons), BSc(Hons) Physiotherapy, Senior Lecturer senior lecturer
n. Chiefly British
A university teacher, especially one ranking next below a reader.
, School of Physiotherapy School of Physiotherapy is located in Lahore, Punjab, Pakistan. It is located in Mayo Hospital and is affiliated with King Edward Medical College. , Auckland University of Technology Not to be confused with the University of Auckland.
The Auckland University of Technology (AUT) (Māori: Te Wananga Aronui o Tāmaki Makau Rau) is the newest university in New Zealand.
, Private Bag 92006, Auckland, New Zealand. Email Tel. 00 64 9 921 9999 ext 7083. Fax. 00 64 9 921 9620

Diana Horner

Senior Lecturer, School of

Physiotherapy, Auckland University of Technology

Peter J Larmer

Senior Lecturer, Division of Rehabilitation and Occupation Studies

Auckland University of Technology
Table 1. Psychometric Properties: Definitions, Common Statistical
Tests and Guidelines on Interpretation.

Properties       Definitions                      Statistical Tests


Content          The adequacy of which an         Not applicable
                 instrument addresses/samples
                 all relevant aspects that were
                 defined in the conceptual
                 definition of the instrument
                 (Nunnally and Bernstein 1994).

Criterion        The ability of an instrument     Correlation
                 to estimate some important       statistics;
                 feature or behaviour that is     Peasons Product-
                 external to the actual           Moment--Correlation
                 measuring tool itself, the       Coefficient (r)
                 feature or behaviour being
                 known as the criterion           Spearmans Rank--
                 (Nunnally 1978). If the          Correlation
                 criterion exists in the          Coefficient
                 present this is referred to as   ([r.sub.])
                 concurrent validity. Whereas,
                 the term predictive validity
                 applies to an external
                 criterion that is to be
                 measured in the future.

                 The '... extent to which a
                 particular measure relates to
                 other measures consistent with
                 theoretically derived
                 hypotheses concerning the
                 concepts (or constructs) that
                 are being measured' (Carmines
                 and Zeller 1979 p.23).
                 Convergent validity tests for
                 correlations with other
                 instruments intending to
                 measure the same or similar
                 concepts; divergent validity
                 tests for a lack of
                 correlations with instruments
                 that assess concepts that are
                 opposite (McDowell and Newell


Internal         Consistency or homogeneity of    Cronbachs Alpha
Consistency      a particular instrument or       (([alpha]) Kuder
                 measurement across its items     Richardson--Formular
                 (Cronbach 1951).                 20 (KR-20)

Interrater       Indicates the consistency/       Agreement statistics
                 level of agreement of results    Intraclass
                 when two or more raters/         Correlation--
                 assessors complete the same      Coefficients (ICC)
                 measurement on the same          Kappa statistics
                 patient(s) where there is no     ([kappa])
                 evidence of change (McDowell
                 and Newell 1996).

Intrarater       Indicates the consistency/
                 level of agreement of results
                 when repeatedly completing the
                 same measurement by the same
                 rater/assessor where there is
                 no evidence of change
                 (McDowell and Newell 1996).

Ability to detect change

Sensitivity      '... the ability of an           Effect sizes (ES)
                 instrument to measure change
                 in a state regardless of
                 whether it is relevant or
                 meaningful to the decision
                 maker' (Liang 2000 p.85).

Responsiveness   '... the ability of an           Receiver Operating--
                 instrument to measure a          Characteristic (ROC)
                 meaningful or clinically         Curve
                 important change in a clinical
                 state' (Liang 2000 p.85).        Correlation

Properties       Guidelines to Interpretation


Content          Not applicable

Criterion        r or [r.sub.s]     .10 = small
                                    .30 = medium
                                    .50 = large (Cohen 1988).

                                    NB, Correlation coefficients range
                                    from -1 to +1. Negative
                                    coefficients indicate a negative
                                    correlation. Positive coefficients
                                    indicate a positive correlation.


Internal         [alpha] or KR-20   < .70 = inadequate
Consistency                         [greater than or equal to].70 =
                                    [greater than or equal to].80 =
                                    (Nunnally 1978).
                                    NB. Coefficients range from 0-1.

Interrater       ICC                < .70 = inadequate
                                    [greater than or equal to] .70 =
                                    [greater than or equal to].80 =
                 [kappa]            excellent (Nunnally 1978).
                                    < .40 = poor
                                    .41-.60 = moderate

Intrarater                          .61-.80 = substantial
                                    >.80 = almost perfect
                                    (Landis and Koch 1977).
                                    NB, Coefficients range from 0-1

Ability to detect change

Sensitivity      ES                 .20 = small size
                                    .50 = moderate size
                                    .80 = large size (Cohen 1988).
                                    NB. ES have no upper limit.

Responsiveness   ROC auc            [less than or equal to].50 =
                                    inadequate discrimination
                                    > .60 = adequate discrimination
                                    [greater than or equal to].80 =
                                    good discrimination
                                    (Hanley and McNeil 1982).
                                    NB. Coefficients range from 0-1.
                 r or [r.sub.s]     as above (see validity),
COPYRIGHT 2006 New Zealand Society of Physiotherapists
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2006 Gale, Cengage Learning. All rights reserved.

 Reader Opinion




Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Invited Clinical Commentary
Author:Horner, Diana; Larmer, Peter J.
Publication:New Zealand Journal of Physiotherapy
Geographic Code:8NEWZ
Date:Mar 1, 2006
Previous Article:Contributorship, ethics, and integrity: preparing and submitting a manuscript for publication.
Next Article:Anatomy in practice: the Popliteus muscle.

Terms of use | Copyright © 2014 Farlex, Inc. | Feedback | For webmasters