Printer Friendly
The Free Library
14,504,020 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

On "Test-retest reliability and minimal detectable change on balance ..." Steffen T, Seney M. Phys Ther. 2008;88:733-746.


Translating reliability coefficients into clinically meaningful representations of measurement error is a necessary and important step when the goal is to link clinical research to clinical practice. The study by Steffen and Seney (1) investigates the reliability of several balance and ambulation am·bu·late  
intr.v. am·bu·lat·ed, am·bu·lat·ing, am·bu·lates
To walk from place to place; move about.



[Latin ambul
 tests and converts the obtained coefficients into minimal detectable change (MDC (1) (Mobile Daughter Card) See riser card.

(2) See Meta Data Coalition.
) estimates. The authors apply Shrout and Fleiss (2) type 3,k intraclass correlation In statistics, the intraclass correlation (or the intraclass correlation coefficient[1]) is a measure of correlation, consistency or conformity for a data set when it has multiple groups.  coefficients (ICC ICC

See: International Chamber of Commerce
) to quantify Quantify - A performance analysis tool from Pure Software.  relative reliability and, from these estimates, they calculate the standard error of measurement (SEM) to quantify measurement error in the same units as the original measurement. For some of the balance and ambulation tests, 2 trials were performed on each of 2 occasions (eg, Timed "Up & Go" Test [TUG]); for other tests (eg, Six-Minute Walk Test six-minute walk test

an assessment of a dog's ability to undertake daily activities.
 [6MWT MWT Maintenance of Wakefulness Test
MWT MicroWave Technology Inc., (Fremont, CA)
MWT Movable Weight Technology (Taylor Made Golf Company, Inc.
]), a single measurement was performed on each of 2 occasions. In the former case, the authors reported a type 3,2 ICC; in the latter case, they presented a type 3,1 ICC.

The authors' rationale for applying the type 3,k ICC was "The ICC(3,k) was used instead of the Pearson correlation coefficient Correlation Coefficient

A measure that determines the degree to which two variable's movements are associated.

The correlation coefficient is calculated as:
 (r) for test retest re·test  
tr.v. re·test·ed, re·test·ing, re·tests
To test again.

n.
A second or repeated test.
 reliability because it assesses rating reliability by comparing the variability of different ratings of the same subject with the total variation across all ratings and all subjects." (1)(pp740-741) In fact, the type 3,1 ICC provides an estimate of reliability similar to the Pearson r because neither coefficient coefficient /co·ef·fi·cient/ (ko?ah-fish´int)
1. an expression of the change or effect produced by variation in certain factors, or of the ratio between two different quantities.

2.
 accounts for a systematic difference in scores between the replicate rep·li·cate
v.
1. To duplicate, copy, reproduce, or repeat.

2. To reproduce or make an exact copy or copies of genetic material, a cell, or an organism.

n.
A repetition of an experiment or a procedure.
 measures (eg, either trials or occasions in Steffen and Seney's study). Presumably pre·sum·a·ble  
adj.
That can be presumed or taken for granted; reasonable as a supposition: presumable causes of the disaster.
, in a test-retest reliability test-retest reliability Psychology A measure of the ability of a psychologic testing instrument to yield the same result for a single Pt at 2 different test periods, which are closely spaced so that any variation detected reflects reliability of the instrument  study, one is interested in both systematic and random errors, and, if this is true, the type 2,k ICC is the better choice because it includes both sources of variance in the reliability coefficient calculation. When the systematic error is zero, the type 2,k and 3,k ICCs provide identical estimates of reliability. However, when systematic error is present, as in the case of Steffen and Seney's 6MWT data, the type 2,k ICC will be less than the type 3,k ICC.

My second reflection addresses the use of the Shrout and Fleiss classification system in situations where 2 or more facets exist, such as for the TUG data. Here, the facets are trials and occasions. A dilemma occurs when attempting to interpret the meaning of the type 3,2 ICC reported by Steffen and Seney. It is not clear if the second digit (2) refers to 2 trials, 2 occasions, or 2 trials performed on each of 2 occasions (ie, a total of 4 measurements). I propose that a generalizability (3) approach to the analysis has the potential to provide a clearer picture of the sources of variance, their magnitude, and the relative merits of averaging over either trials or occasions, or both.

To illustrate the points raised above, I have generated synthetic data for the TUG. Paralleling the design of Steffen and Seney, the synthetic data represent 2 TUG trials performed on each of 2 occasions for 10 persons. The data presented in Table 1 were contrived con·trived  
adj.
Obviously planned or calculated; not spontaneous or natural; labored: a novel with a contrived ending.



con·triv
 to illustrate a systematic difference between occasions, but no systematic difference between trials.

Table 2 reports the mean scores for trials and occasions. Of interest is that the trial means averaged over occasions are almost identical; however, the occasion means differ. Stated another way, a systematic difference exists between occasions, but not between trials averaged over occasions.

Table 3 displays Shrout and Fleiss type 2,1 and type 3,1 ICCs obtained by performing randomized ran·dom·ize  
tr.v. ran·dom·ized, ran·dom·iz·ing, ran·dom·iz·es
To make random in arrangement, especially in order to control the variables in an experiment.
 block analysis of variance (ANOVA anova

see analysis of variance.

ANOVA Analysis of variance, see there
). Negative variance estimates were set to zero for all analyses. Pearson r values also are reported in this table. That the inter-trial type 2,1 and 3,1 ICCs are identical to 2 decimal places decimal place
n.
The position of a digit to the right of a decimal point, usually identified by successive ascending ordinal numbers with the digit immediately to the right of the decimal point being first:
 reflects the similarity of trial means shown in Table 2. By contrast, the inter-occasion means shown in Table 2 differed, and this systematic difference is not reflected in the type 3,1 ICC or in the Pearson r. Accordingly, the type 3,1 ICC is greater than the type 2,1 ICC because the variance due to occasion is greater than zero.

The following section illustrates a generalizability analysis that includes both trials and occasions in a single analysis. I applied a 3-way random effects Random effects can refer to:
  • Random effects estimator
  • Random effect model
 ANOVA. The rationale for applying a random effects model In statistics, a random effect(s) model, also called a variance components model is a kind of hierarchical linear model. It assumes that the data describe a hierarchy of different populations whose differences are constrained by the hierarchy.  was that I wished to generalize generalize /gen·er·al·ize/ (-iz)
1. to spread throughout the body, as when local disease becomes systemic.

2. to form a general principle; to reason inductively.
 beyond the persons, trials, and occasions composing com·pose  
v. com·posed, com·pos·ing, com·pos·es

v.tr.
1. To make up the constituent parts of; constitute or form:
 the study sample. The ANOVA and variance components were calculated using MINITAB statistical software *, and the results appear in Table 4. Once again, negative variance estimates were set to zero.

Inspection of the variance components reveals the following important findings: (1) there is a large variance among persons, and this is desirable, (2) the variance between trials averaged over occasions is zero (this reflects the near identical means reported in Table 2), (3) there is a relatively large variance due to occasions (this reflects the difference in occasion means reported in Table 2), (4) the person by occasion (P x O) variance is substantially greater

Equation 1:

[MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression.  NOT REPRODUCIBLE re·pro·duce  
v. re·pro·duced, re·pro·duc·ing, re·pro·duc·es

v.tr.
1. To produce a counterpart, image, or copy of.

2. Biology To generate (offspring) by sexual or asexual means.
 IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ]

Equation 2:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

than the person by trial (P x T) variance (this suggests that averaging over occasion will have a greater effect than averaging over trials), and (5) the residual error (Mensuration) See Error, 6 (b).

See also: Residual
 is relatively small compared with the person variance.

The variance components reported in Table 4 can be applied to calculate generalizability coefficients that represent inter-trial and inter-occasion reliability. They also can be used to examine the distinct effect of averaging over trials, occasions, or both.

The theoretical inter-trial reliability (generalizability) for a single trial is obtained by substituting the variance components into Equation 1 and by setting n and n,, to 1. The obtained value is. (97),

and this is analogous analogous /anal·o·gous/ (ah-nal´ah-gus) resembling or similar in some respects, as in function or appearance, but not in origin or development.

a·nal·o·gous
adj.
 to the Shrout and Fleiss type 2,1 inter-trial ICCs of .96 reported in Table 3. The inter-trial reliability for an average of 2 trials can be obtained by setting n to 2 and no to 1. This yields an inter-trial reliability of .98, which is analogous to a Shrout and Fleiss type 2,2 ICC.

When the goal is to draw inferences about the change status of a person, as is the case when MDC is applied, the inter-occasion reliability (generalizability) coefficient is of interest. It is calculated by applying Equation 2. The theoretical inter-occasion reliability for a single trial is obtained by substituting the variance components into Equation 2 and by setting n and n to 1. This gives an inter-occasion reliability of .74, which is the average of the 2 inter-occasion reliability estimates reported in Table 3. The inter-occasion reliability for a single trial performed on each of 2 occasions is obtained by setting [n.sub.t] to 1 and [n.sub.o] to 2. This yields an inter-occasion reliability of .85.

Finally, one can examine the inter-occasion reliability for the average of 2 trials on each of 2 occasions. This is accomplished by setting [n.sub.t] to 2 and [n.sub.o] to 2 in Equation 2. A value of .86 is obtained, and, to my knowledge, there is no equivalent Shrout and Fleiss coding scheme to represent this combination.

* Minitab lnc, Quality Plaza, 1829 Pine Hall Rd, State College, PA 16801-3008.

Paul W Stratford

PW Stratford, PT, MSc, is Professor, School of Rehabilitation rehabilitation: see physical therapy.  Science, McMaster University McMaster University, at Hamilton, Ont., Canada; nondenominational; founded 1887. It has faculties of humanities, science, social sciences, business, engineering, and health sciences, as well as a school of graduate studies and a divinity college. , Hamilton, Ontario, Canada.

This letter was posted as a Rapid Response on June 3, 2008, at www.ptjournal.org.

References

(1) Steffen T, Seney M. Test-retest reliability and minimal detectable change on balance and ambulation tests, the 36-Item Short-Form Health Survey, and the Unified Parkinson Disease Parkinson Disease Definition

Parkinson disease (PD) is a progressive movement disorder marked by tremors, rigidity, slow movements (bradykinesia), and posture instability.
 Rating Scale in people with parkinsonism. Phys Ther. 2008;88:733-746.

(2) Shrout PE, Fleiss JL. Intraclass correlation: uses in assessing rater rat·er  
n.
1. One that rates, especially one that establishes a rating.

2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater. 
 reliability. Psychol Bull. 1979;86:420--428.

(3) Brennan RL. Elements of Generalizability Theory Generalizability theory (G Theory) is a statistical framework for conceptualizing, investigating, and designing reliable observations. It was originally introduced by Lee Cronbach and his colleagues. . Iowa City, Iowa Iowa City is a city in Johnson County, Iowa, United States. It is the principal city of the Iowa City, Iowa Metropolitan Statistical Area which encompasses Johnson and Washington counties. : ACT Publications; 1983.

[DOI (Digital Object Identifier) A method of applying a persistent name to documents, publications and other resources on the Internet rather than using a URL, which can change over time. : 10.2522/ptj.2008.88.7.888]

Author Response

Using a type 2,1 intraclass correlation coefficient (ICC) rather than a type 3,1 ICC changed the ICCs for 13 of the 24 tests less than one hundredth of a point. An ICC(2,1) increased the reliability coefficients for the Berg Balance Scale and the Sharpened sharp·en  
tr. & intr.v. sharp·ened, sharp·en·ing, sharp·ens
To make or become sharp or sharper.



sharp
 Romberg Test with eyes open, reducing the minimal detectable change (MDC) scores by 1 point each. ICC(2,1) decreased the remaining ICCs by one hundredth of a point, which increased the MDC scores of 6 tests by 1 point; 2 showed no change, and the Six-Minute Walk Test (6MWT) increased to 86 meters. Dr Stratford was sent the gait speed data to utilize his suggested ICC(2,2) formula for tests that incorporated averaged scores. This ICC formula is not available in the SPSS A statistical package from SPSS, Inc., Chicago (www.spss.com) that runs on PCs, most mainframes and minis and is used extensively in marketing research. It provides over 50 statistical processes, including regression analysis, correlation and analysis of variance.  software we utilized. The analysis did not change the ICC values or the MDCs for the gait speed tests. Our article states that gait speed is the strongest gait outcome variable in the population with parkinsonism, and Stratford's analysis supports this.

We understand Stratford's suggestion on ICCs that test-retest reliability should always use ICC(2,k) formula. However, the article by Shrout and Fleiss (1) did not suggest an ICC formula for test-retest reliability, and changing the ICC formula had little effect on our study. Considering the same rater performed the same test each session, the formula for intrarater reliability ICC(3,k) was used. We appreciate Stratford's correction that arose from our report of the 6MWT being the only test to demonstrate a small learning effect.

The incorrect use of the ICC formula can affect test-retest reliability when a systematic error occurs. The Table reports ICC(3,k), ICC(2,k), and minimal detectable change values using a 95% confidence interval confidence interval,
n a statistical device used to determine the range within which an acceptable datum would fall. Confidence intervals are usually expressed in percentages, typically 95% or 99%.
 (MDC95) for all the tests. Eleven MDC95 values had no change, 6 decreased, and 7 increased utilizing ICC(2,k) rather than ICC(3,k).

Teresa M Steffen and Megan Seney

TM Steffen, PT, PhD, is Professor in Physical Therapy at Concordia University Wisconsin Concordia University Wisconsin is a higher education institution and an affiliate of the 10-member Concordia University System, which is operated by the second-largest Lutheran church body in the United States, the Lutheran Church - Missouri Synod (LCMS). , Mequon, WI.

This letter was posted as a Rapid Response on June 3, 2008, at www.ptjournal.org.

Reference

(1) Shrout PE, Fleiss JL. Intraclass correlation: uses in assessing rater reliability. Psychol Bull. 1979;86:420-428.

[DOI: 10.2522/ptj.2008.88.7.890]
Table 1.
Synthetic Timed "Up & Go" Data

Person      Occasion 1            Occasion 2
            Trial 1     Trial 2   Trial 1    Trial 2

Person 1    26.7         25.2      27.6       25.8
Person 2     4.6          6.9       7.6        7.1
Person 3     8.7          6.1      12.5       15.9
Person 4    18.1         19.1      26.1       28.5
Person 5    11.1          8.0      16.6       14.7
Person 6    20.7         24.0      20.4       22.6
Person 7    16.4         16.8      15.4       18.9
Person 8     4.3          6.4      16.0       14.2
Person 9    13.8         12.6      16.0       17.8
Person 10   25.7         24.8      34.5       34.6
Mean        15.0         15.0      19.3       20.0

Table 2.
Trial and Occasion Means

            Order
            1       2

Trial       17.1    17.4
Occasion    15.0    19.6

Table 3.
Type 2, 1 and 3, 1 Inter-trial and Inter-occasion Intraclass
Correlation Coefficients (ICC)

                             Occasion 1   Occasion 2

Inter-trial reliability
  Type 2, 1 ICC              .96          .96
  Type 3, 1 ICC              .96          .96
  Pearson r                  .96          .96
                             Trial 1      Trial 2
Inter-occasion reliability
  Type 2, 1 ICC              .76          .72
  Type 3, 1 ICC              .86          .85
  Pearson r                  .86          .85

Table 4.
Analysis of Variance and Variance Components

Source           Sum of Squares   Degrees of   Mean Square
                                  Freedom

Person (P)       2114.88          9            234.99
Trials (T)          1.30          1              1.30
Occasion (O)      215.30          1            215.30
P x T (to)         23.17          9              2.58
P x O (po)        143.23          9             15.92
T x O (to)          1.44          1              1.44
Error (e)          19.40          9              2.16

Source           Variance
                 Components
                 ([rho.sup.2])

Person (P)       54.66
Trials (T)        0
Occasion (O)     10.00
P x T (to)        0.21
P x O (po)        6.88
T x O (to)        0
Error (e)         2.16

Table.
Intraclass Correlation Coefficients (ICC) for Test-Retest Reliability
and Minimal Detectable Change Scores Utilizing a 95% Confidence
Interval (MDC95) for Functional Tests, a Quality-of-Life Measure, and
Disease Severity Rating Scale in People With Parkinsonism (a)

Test Performed                           ICC(3,k)   [MDC.sub.95]

Balance tests
  Berg Balance Scale (b)                      .94              5
  (0-56 points)
  Activities-specific Balance                 .94             13
  Confidence Scale (b) (%)
  Functional Reach Test (c) (cm)
    Forward                                   .73              9
    Backward                                  .67              7
  Romberg Test (b) (s)
    Eyes open                                 .86             10
    Eyes closed                               .84             19
  Sharpened Romberg Test (b) (s)
    Eyes open                                 .70             39
    Eyes closed                               .91             19
Mobility tests
  Six-Minute Walk Test (b) (m)                .96             82
  Timed "Up & Go" Test (c) (s)                .85             11
  Gait speed (c) (m/s)
    Comfortable                               .96            .18
    Fast                                      .97            .25
SF-36 (b) (0-100 points)
  Physical Functioning                        .80             28
  Role-Physical                               .85             45
  Bodily Pain                                 .89             25
  General Health                              .85             28
  Vitality                                    .88             19
  Social Functioning                          .71             29
  Role-Emotional                              .84             45
  Mental Health                               .83             19
UPDRS (b) (points)
  Mentation, Behavior, and Mood (0-16)        .89              2
  Activities of Daily Living (0-52)           .93              4
  Motor Examination (0-108)                   .89             11
  Total Score (0-176)                         .91             13

Test Performed                           ICC(2,k)   [MDC.sub.95]

Balance tests
  Berg Balance Scale (b)                      .95              4
  (0-56 points)
  Activities-specific Balance                 .94             13
  Confidence Scale (b) (%)
  Functional Reach Test (c) (cm)
    Forward                                   .72              9
    Backward                                  .67              7
  Romberg Test (b) (s)
    Eyes open                                 .86             10
    Eyes closed                               .85             19
  Sharpened Romberg Test (b) (s)
    Eyes open                                 .71             38
    Eyes closed                               .90             19
Mobility tests
  Six-Minute Walk Test (b) (m)                .95             86
  Timed "Up & Go" Test (c) (s)                .85             11
  Gait speed (c) (m/s)
    Comfortable                               .96            .18
    Fast                                      .97            .25
SF-36 (b) (0-100 points)
  Physical Functioning                        .80             29
  Role-Physical                               .85             44
  Bodily Pain                                 .89             24
  General Health                              .84             29
  Vitality                                    .87             20
  Social Functioning                          .70             30
  Role-Emotional                              .83             46
  Mental Health                               .83             18
UPDRS (b) (points)
  Mentation, Behavior, and Mood (0-16)        .89              2
  Activities of Daily Living (0-52)           .93              4
  Motor Examination (0-108)                   .89             10
  Total Score (0-176)                         .90             14

(a) SF-36=36-Item Short Form Health Survey, UPDRS=Unified Parkinson
Disease Rating Scale.

(b) ICC: 3,1 and 2,1.

(c) ICC: 3,2 and 2,2.
COPYRIGHT 2008 American Physical Therapy Association, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Stratford, Paul W.; Steffen, Teresa M.; Seney, Megan
Publication:Physical Therapy
Article Type:Letter to the editor
Date:Jul 1, 2008
Words:2386
Previous Article:On "work-related musculoskeletal disorders ..." Campo MA, et al. Phys Ther. 2008;88:608-609.(Letters to the Editor)(Letter to the editor)
Next Article:APTA Ethics and Judicial Committee Disciplinary Action.(Association Business)(American Physical Therapy Association)(Brief article)



Related Articles
Scapular muscle tests in subjects with shoulder pain and functional loss: reliability and construct validity.
Clinimetric properties of the performance-oriented mobility assessment.(Research Report)
Reliability, sensitivity to change, and responsiveness of the Peabody Developmental Motor Scales--Second Edition for children with cerebral...
Hop testing provides a reliable and valid outcome measure during rehabilitation after anterior cruciate ligament reconstruction.(Research Report)
High-Level Mobility Assessment Tool (HiMAT): interrater reliability, retest reliability, and internal consistency.(Research Report)
Climbing out of our silos to improve practice.(Editorial)(Editorial)
Usefulness of the Berg Balance Scale in stroke rehabilitation: a systematic review.(Research Report)
Temporal and spatial characteristics of gait during performance of the dynamic gait index in people with and people without balance or vestibular...
Test-retest reliability and minimal detectable change on balance and ambulation tests, the 36-item short-form health survey, and the unified...
On "Journal publication productivity ..." Richter et al. Phys Ther. 2008;88:376-386.(Letters to the Editor)(Letter to the editor)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles