Printer Friendly
The Free Library
14,709,470 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example.


A common design for assessing the reliability of joint angle measurements in physical therapy research involves collecting repeated measurements made by two or more raters in a sample of patients. For example, Boone et al[1] carried out a study on 12 subjects' upper and lower extremities lower extremity
n.
The hip, thigh, leg, ankle, or foot. Also called inferior limb, pelvic limb.
, where each extremity extremity /ex·trem·i·ty/ (eks-trem´i-te)
1. the distal or terminal portion of elongated or pointed structures.

2. limb.


ex·trem·i·ty
n.
1.
 was measured three times by each of 4 therapists to determine the reliability among therapists and within therapists. All measurements were made using the same goniometer goniometer /go·ni·om·e·ter/ (go?ne-om´e-ter)
1. an instrument for measuring angles.

2. a plank that can be tilted at one end to any height, used in testing for labyrinthine disease.
 system. In another study, Clapper et al[2] collected data on 20 subjects whose lower-extremity joints were measured three times by each of two goniometers (an Orthoranger(*) and a standard full-circle goniometer) to assess reliability between goniometers and within goniometers. All measurements in this study were taken by one therapist. Subsequent data analysis of both studies consisted of separate analyses to calculate intraclass correlation In statistics, the intraclass correlation (or the intraclass correlation coefficient[1]) is a measure of correlation, consistency or conformity for a data set when it has multiple groups.  coefficients (ICCs) as estimates of interrater (between-rater) reliability[dagger] and intrarater (withinrater) reliability.[double dagger double dagger
n.
A reference mark () used in printing and writing. Also called diesis.

Noun 1.
]

As illustrated by the two examples, the word "rater rat·er  
n.
1. One that rates, especially one that establishes a rating.

2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater. 
" can be used synonymously to characterize either therapists or goniometer systems. Specific terms, however, have been suggested in the literature[3] to differentiate between a therapist's and an instrument's sources of variability (error). A reliability study of therapists (eg, Boone et al[1]) predominantly pre·dom·i·nant  
adj.
1. Having greatest ascendancy, importance, influence, authority, or force. See Synonyms at dominant.

2.
 measures the variability among the therapists; any other source of variability, such as from patients or instruments, is kept to a minimum. The term "intertester" often identifies both this study design and associated reliability coefficient coefficient /co·ef·fi·cient/ (ko?ah-fish´int)
1. an expression of the change or effect produced by variation in certain factors, or of the ratio between two different quantities.

2.
. On the other hand, a study assessing the reliability among several goniometers, using one therapist (eg, Clapper et al[2]), is primarily indicative of only the variability in the instruments themselves. This type of study and the corresponding reliability coefficient are often referred to as parallel forms. Similarly, it has been suggested to differentiate the reliability of a single therapist's measurements from the reliability of a particular instrument (eg, goniometer) with the terms "intratester" and "test-retest," respectively. For convenience in presenting the statistical methods in this article, the term "rater" will be used throughout.

Although other methods for evaluating measurement reliability have been described in the literature,[4] the acceptance of the ICC ICC

See: International Chamber of Commerce
, over such methods as the Pearson product-moment correlation or the comparison of means for interval- and ratio-scaled measurements, has finally appropriately permeated the physical therapy research community. Within the context of an interrater reliability study characterized char·ac·ter·ize  
tr.v. character·ized, character·iz·ing, character·iz·es
1. To describe the qualities or peculiarities of: characterized the warden as ruthless.

2.
 by each rater making only one measurement or in an intrarater reliability study, the methods of Bartko and Carpenter[5] and of Shrout and Fleiss[6] may be used for estimation and hypothesis testing hypothesis testing

In statistics, a method for testing how accurately a mathematical model based on one set of data predicts the nature of other data sets generated by the same process.
. On the other hand, in a reliability study distinguished by each rater making repeated measurements, their methods are no longer directly applicable. Some researchers,[1,7] appearing to recognize this, have used the mean value of each rater's repeated measurements to carry out the ICC calculations. Although this is a statistically valid approach, difficulty arises when comparing interrater reliabilities across studies. It is generally known[8] that the ICC based on the mean of several measurements will necessarily be greater than the ICC of a single measurement. Thus, an estimate of interrater reliability based, for example, on the mean of three repeated measurements cannot be compared in a straightforward manner with an estimate from another study based on the mean of two repeated measurements or based on single measurements. Other investigators[9,10] have used the first of two goniometric go·ni·om·e·ter  
n.
1. An optical instrument for measuring crystal angles, as between crystal faces.

2. A radio receiver and directional antenna used as a system to determine the angular direction of incoming radio signals.
 measurements from each pair of raters to calculate the ICC. The disadvantage of this maneuver is the loss of information from the selection of only one measurement per rater.

The purpose of this article is to present appropriate statistical analyses for the concurrent assessment of interrater and intrarater reliability where, by design, two or more raters take repeated measurements on a series of patients. The proposed methods differ from common practice in that both sources of error (interrater and intrarater) are simultaneously incorporated into the resulting statistical analysis rather than being delineated de·lin·e·ate  
tr.v. de·lin·e·at·ed, de·lin·e·at·ing, de·lin·e·ates
1. To draw or trace the outline of; sketch out.

2. To represent pictorially; depict.

3.
 in two separate analyses, and each individual measurement contributes to the estimation of both interrater and intrarater reliability coefficients. Consequently, both coefficients are derived from multiple observations. This effectively increases the sample size available for estimating each coefficient and thereby improves its precision.

Special attention is given to the statistical methods of hypothesis testing, confidence interval confidence interval,
n a statistical device used to determine the range within which an acceptable datum would fall. Confidence intervals are usually expressed in percentages, typically 95% or 99%.
 construction, and calculation of the standard error of measurement (SEM) because their use in physical therapy research remains rare. A review of all articles reporting reliability studies published between january 1991 and December 1992 in Physical Therapy revealed that of the 21 articles, only 4 applied tests of statistical significance to the estimated reliability coefficients, 2 provided confidence intervals, and 7 furnished fur·nish  
tr.v. fur·nished, fur·nish·ing, fur·nish·es
1. To equip with what is needed, especially to provide furniture for.

2.
 SEMs. It is noted that a reliability coefficient is just a point estimate based on one selected sample. Without carrying out appropriate inferences, it is impossible to credibly establish the true level of reliability in the population from a point estimate. In addition, the SEM is important for its clinical application. It can be used in deciding whether a real clinical change has occurred in a patient (ie, a change in excess of measurement error). The intent of focusing on these statistical methods is to increase physical therapy researchers' familiarity with statistical methods in reliability studies.

The first section of this article describes the study design and data layout used to collect the data. A statistical model for analyzing such data is also introduced. The next section provides formulas for computing computing - computer  the coefficients of reliability in the case where the raters are randomly selected from a larger population, and in the case where they are considered to be the only ones of interest (fixed raters). Subsequently, hypothesis testing is described in terms of establishing an acceptable lower-bound level of reliability. Methods for confidence interval construction are also presented. The next section shows how to determine the sample size required to yield ample statistical power for the inference procedures An inference procedure is a key component of the knowledge engineering process, sometimes known as abduction. After all preliminary information gathering and modeling is completed, queries are passed to the inference procedure to get answers. . In the following section, definition and clinical interpretation of the SEM are presented. The concluding two sections include an example from a clinical study in which one therapist repeatedly measured subjects' knee joint angles using two different goniometers, and a summary of recommendations. Issues pertaining per·tain  
intr.v. per·tained, per·tain·ing, per·tains
1. To have reference; relate: evidence that pertains to the accident.

2.
 to study execution, such as observer bias, passive versus active joint angle measurements, end-digit preference (ie, observer tendency to read values that end with a particular digit),[4] and so forth are not addressed in this article, as they are discussed by other authors.[4,11-13]

Study Design and Model

If it is assumed that m repeated measurements are made by each of t raters on a random sample of n subjects, then the collected data conform to Verb 1. conform to - satisfy a condition or restriction; "Does this paper meet the requirements for the degree?"
fit, meet

coordinate - be co-ordinated; "These activities coordinate well"
 a repeated-measures design.[14,15] This design is not to be confused with a randomized ran·dom·ize  
tr.v. ran·dom·ized, ran·dom·iz·ing, ran·dom·iz·es
To make random in arrangement, especially in order to control the variables in an experiment.
 block factorial factorial

For any whole number, the product of all the counting numbers up to and including itself. It is indicated with an exclamation point: 4! (read “four factorial”) is 1 × 2 × 3 × 4 = 24.
 design,[16] because a rater makes all m measurements on a subject before the next rater begins, nor a split-plot design,[16] because only one group of n patients is appraised in the study.

Let [x.sub.ijk] denote de·note  
tr.v. de·not·ed, de·not·ing, de·notes
1. To mark; indicate: a frown that denoted increasing impatience.

2.
 the kth measurement taken by the jth rater on the ith patient, and let the data be summarized as in Table 1, where i = 1, 2, ..., n; j = 1, 2, ..., t; and k = 1, 2, ..., m. It is noted that the measurements [x.sub.ijk] arise from either a continuous interval or ratio scale. The repeated-measures model for reliability studies with equal numbers in the subclasses (ie, no missing data) will be defined as

[Mathematical Expression A group of characters or symbols representing a quantity or an operation. See arithmetic expression.  Omitted]

where [mu] is the overall population mean of the measurements and [[pi].sub.i] and [[gamma].sub.j] are the subject and rater effects, respectively. The terms [([pi] [gamma]).sub.ij] and [[epsilon].sub.ijk] represent the interrater and intrarater random errors. The components [pi.sub.i] and [[epsilon.sub.ijk] are assumed to vary normally with means of zero and variances of [[sigma].sub.s.sup.2] and [[sigma].sub.e.sup.2], respectively; they are independent of each other and of all other components in the model. If the rater effects are random, then the components [[gamma].sub.j] and [([pi] [gamma]).sub.ij] are assumed to vary normally with means of zero and variances of [[sigma].sub.R.sup.2] and [[sigma].sub.SR.sup.2], respectively; they are also independent of each other and of all other components.

However, if the rater effects are fixed, then the components [[gamma].sub.j] and [([pi] [gamma]).sub.ij] are further constrained con·strain  
tr.v. con·strained, con·strain·ing, con·strains
1. To compel by physical, moral, or circumstantial force; oblige: felt constrained to object. See Synonyms at force.

2.
 to satisfy

[Mathematical Expression Omitted]

thereby inducing a negative covariance Covariance

A measure of the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely.
 between interrater error effects on the same subject: cov{[([pi] [gamma]).sub.ij], [([pi] [gamma]).sub.ij']} = -[[sigma].sub.SR.sup.2]/t. Also, the component [([pi] [gamma]).sub.ij] is assumed to vary normally across subjects with mean zero and variance (t-1)[[sigma].sub.SR.sup.2]/t.

The analysis of variance (ANOVA anova

see analysis of variance.

ANOVA Analysis of variance, see there
) tables and estimates of variance components corresponding to the repeated-measures model for reliability studies are shown in Tables 2 and 3 for the cases of random and fixed rater effects, respectively. These tables are identical to those arising from two-way random and mixed effects models.[17] All ANOVA calculations for the observed mean squares are exactly the same in both the random and fixed rater effects cases. The difference arises in the expressions for the expected mean squares. Because estimates of the variance components are derived by equating e·quate  
v. e·quat·ed, e·quat·ing, e·quates

v.tr.
1. To make equal or equivalent.

2. To reduce to a standard or an average; equalize.

3.
 the expected mean squares to the observed mean squares, the estimates of variance components are also different in the two tables. This difference will lead, subsequently, to different estimates of the reliability coefficients.

As discussed by Shrout and Fleiss,[6] the choice between ANOVA tables depends on whether the raters are representative of a larger population of raters or are considered to be the only ones of interest. If the results from the reliability study are to be generalized gen·er·al·ized
adj.
1. Involving an entire organ, as when an epileptic seizure involves all parts of the brain.

2. Not specifically adapted to a particular environment or function; not specialized.

3.
 to other raters, then the random rater effects ANOVA table (Tab. 2) is appropriate for carrying out estimation and inferences. If the investigator applies the results only to the fixed set of t raters in the study, however, then a fixed rater effects ANOVA table (Tab. 3) is appropriate. It is noted that in the case of single measurements (m = 1), these ANOVA tables reduce to Shrout and Fleiss' cases 2 and 3, respectively.

[TABULAR tab·u·lar
adj.
1. Having a plane surface; flat.

2. Organized as a table or list.

3. Calculated by means of a table.



tabular

resembling a table.
 DATA OMITTED]

Estimating the interrater

Reliability Coefficient

The coefficient of interrater reliability indicates the consistency of measurements and the extent to which the raters are interchangeable in·ter·change·a·ble  
adj.
That can be interchanged: interchangeable items of clothing; interchangeable automotive parts.



in
. It is an ICC defined in the strictest sense (ie, defined as the covariance between two measurements made by different raters on the same subject divided by the total variance). Adopting this rigorous definition, the coefficient will sometimes equal the "popularized" version of the ICC, regarded as the variance component due to subjects divided by the total variance, and sometimes not as in the fixed rater effects case for interrater reliability. The formulas for the estimates corresponding to the two types of rater effects follow:

[Mathematical Expression Omitted]

Estimates of the variance components appearing in [[rho].sub.inter, random] and [[rho].sub.inter, fixed] are given in Tables 2 and 3, respectively. It is imperative to use the appropriate ANOVA table when computing a coefficient. Specifically, it is erroneous erroneous adj. 1) in error, wrong. 2) not according to established law, particularly in a legal decision or court ruling.  to substitute estimates of variance components from the random rater effects ANOVA table into the expression for a fixed rater effects reliability coefficient, and vice versa VICE VERSA. On the contrary; on opposite sides. .

Estimating the intrarater

Reliability Coefficient

The coefficient of intrarater reliability characterizes the consistency and reproducibility of the measurements and, like other correlation coefficients Correlation Coefficient

A measure that determines the degree to which two variable's movements are associated.

The correlation coefficient is calculated as:
, is also defined as a covariance-variance ratio. The coefficients, corresponding to the two types of rater effects, are given as

[Mathematical Expression Omitted]

where the variance components are defined as before (Tabs. 2 and 3). Computed in this manner, the coefficients are regarded as estimates of the overall ("average") intrarater reliability across all raters. Although computable computable - computability theory  in both cases (random rater effects and fixed rater effects), the overall estimate is perhaps best suited for studies in which the raters are considered to be a representative sample from a larger population. It is unlikely that a researcher, in the fixed case, would wish to average together the reliability coefficients from two or more raters that are of interest individually.

In the case where the raters are of interest individually, it is preferable to compute To perform mathematical operations or general computer processing. For an explanation of "The 3 C's," or how the computer processes data, see computer.  separate estimates for each rater. Intrarater reliability coefficients for each rater are computed by partitioning To divide a resource or application into smaller pieces. See partition, application partitioning and PDQ.  the residual error (Mensuration) See Error, 6 (b).

See also: Residual
 [[sigma].sub.et.sup.2] into t components, [[sigma].sub.e1.sup.2], [[sigma].sub.e1.sup.2], ..., [[sigma.sub.et.sup.2] (representing the within-rater variability for each rater) and then constructing the coefficients in the usual manner:

[Mathematical Expression Omitted]

Hypothesis Testing and

Confidence Interval

Construction

As indicated by Shrout and Fleiss,[6] tests of significance regarding the ICC can be carried out by calculating the F statistic statistic,
n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample.


statistic

a numerical value calculated from a number of observations in order to summarize them.
 and corresponding probability (P) value from an ANOVA table. Although statistically valid, this practice is not directly relevant in reliability studies because a significant test result simply provides reassurance REASSURANCE. When an insurer is desirous of lessening his liability, he may procure some other insurer to insure him from loss, for the insurance he has made this is called reassurance. , at a specified significance level, of some consistency or reliability. That is, rejection of the null hypothesis null hypothesis,
n theoretical assumption that a given therapy will have results not statistically different from another treatment.

null hypothesis,
n
 leads to the conclusion that the true population coefficient is greater than zero. This fact can usually be taken for granted Adj. 1. taken for granted - evident without proof or argument; "an axiomatic truth"; "we hold these truths to be self-evident"
axiomatic, self-evident

obvious - easily perceived by the senses or grasped by the mind; "obvious errors"
 in physical therapy research because many of the estimated coefficients reported in the literature are greater than 0.6.

A more pertinent test can be constructed by considering hypotheses of the form [H.sub.0]:[rho] [less than or equal to] [[rho].sub.0] versus [H.sub.1]:[rho]>[[rho].sub.0], where the criterion value [[rho].sub.0] is the minimum level of reliability that an investigator considers acceptable. Landis and Koch[18] have characterized values of reliability coefficients as follows: slight (0.0-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect (0.81-1.00). These divisions (benchmarks) may be of assistance in selecting [[rho].sub.0]. For example, an investigator who wishes to demonstrate that the measurements can be characterized by a "substantial" level of reliability, according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 these guidelines guidelines,
n.pl a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks.
, would test [H.sub.0]: [rho] [less than or equal to] 0.6 versus [H.sub.1]:[rho]>[[rho].sub.0.6], Rejection of the null hypothesis would, in this case, result in a qualitative statement regarding the level of reliability.

Construction of confidence intervals is recommended as a complement or alternative to hypothesis testing. Confidence intervals not only test all possible criterion values simultaneously, but also summarize sum·ma·rize  
intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es
To make a summary or make a summary of.



sum
 the uncertainty (precision) in an estimated reliability coefficient by providing a range of values that is likely to cover the true population value. Reporting confidence limits is particularly important if the null hypothesis is not rejected, because it allows a more careful interpretation of the study results.

When using confidence intervals as an alternative to hypothesis testing, the null hypothesis (as specified earlier) is rejected at the significance level [alpha] (one-sided) whenever the criterion value is less than the lower limit of a 95% one-sided lower-limit confidence interval. It is noted that the one-sided lower-limit confidence interval will not extend above 1, because the reliability coefficient itself does not exceed 1.

Interrater Reliability - Hypothesis

Testing [H.sub.0]:[rho] [less than or equal to] [[rho].sub.0] Versus

[H.sub.1]:[rho] > [[rho].sub.0]

For both random and fixed rater effects cases, the interrater reliability test statistic is computed and tested for significance in an identical manner[19]:

[Mathematical Expression Omitted]

is compared to a critical value from the F distribution on (n - 1) and (n - 1) (t - 1) degrees of freedom at a specified level of significance.

Interrater Reliability - Confidence

Interval Construction

Confidence interval construction is carried out differently in the two cases.

(i) Random rater effects. Following the work of Satterthwaite[20] and of Fleiss and Shrout,[21] the approximate 100(1-[alpha])% one-sided lower-limit confidence interval is given by

[Mathematical Expression Omitted]

where [F.sub.1] is the 100(1 - [alpha])th percentile percentile,
n the number in a frequency distribution below which a certain percentage of fees will fall. E.g., the ninetieth percentile is the number that divides the distribution of fees into the lower 90% and the upper 10%, or that fee level
 point of the F distribution on (n - 1) and [[upsilon up·si·lon or yp·si·lon
n.
Symbol The 20th letter of the Greek alphabet.
].sub.1] degrees of freedom. Letting [[rho].sub.r] = [[rho].sub.inter, random], then

[Mathematical Expression Omitted]

(ii) Fixed rater effects. The approximate 100(1 - [alpha])% one-sided lower-limit confidence interval is given by

[Mathematical Expression Omitted]

where [F.sub.2] is the 100(1 - [alpha])th percentile point of the F distribution on (n - 1) and [[upsilon].sub.2] degrees of freedom. Letting [[rho].sub.f] = [[rho].sub.inter, fixed],

[Mathematical Expression Omitted]

Intrarater Reliability

Two test statistics and confidence intervals are provided for intrarater reliability, overall and individual:

[Mathematical Expression Omitted]

The critical values are obtained from the F distribution on (n - 1) and n(m - 1) degrees of freedom.

(ii) Overall - confidence interval construction. The corresponding 100(1 - [alpha])% one-sided lower-limit confidence interval is given by

[Mathematical Expression Omitted]

where [F.sub.3] is the 100(1 - [alpha])th percentile point of the F distribution on (n - 1) and n(m - 1) degrees of freedom.

(iii) Individual - hypothesis testing [H.sub.0]:[rho] < [[rho].sub.0] versus [H.sub.1]:[rho] > [[rho].sub.0]

[Mathematical Expression Omitted]

is tested against critical values from the F distribution on n - 1 and n(m - 1) degrees of freedom,

(iv) Individual - confidence interval construction. The corresponding 100(1 - [alpha])% one-sided lower-limit confidence interval is given by

[Mathematical Expression Omitted]

where [F.sub.4] is the 100(1 - [alpha])th percentile point of the F distribution on (n - 1) and n(m - 1) degrees of freedom.

It is noted that the specified criterion value for hypothesis testing, [[rho].sub.0], need not be the same for testing both interrater and intrarater reliability. For example, one might choose the criterion value for interrater reliability to be 0.6, whereas for intrarater reliability, po may be more stringently chosen to be 0.8.

Sample Size Estimation

An important consideration prior to conducting a reliability study is the number of subjects to be recruited. If too few subjects are recruited, then it may be difficult to demonstrate that the population reliability coefficient exceeds a specified criterion value (ie, that it is difficult to reject the null hypothesis), whereas too many will expend ex·pend  
tr.v. ex·pend·ed, ex·pend·ing, ex·pends
1. To lay out; spend: expending tax revenues on government operations. See Synonyms at spend.

2.
 resources that could be placed elsewhere. In terms of point and interval estimation In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter. The most prevalent forms of interval estimation are confidence intervals (a frequentist method) and credible intervals (a , too small a sample size will yield an imprecise im·pre·cise  
adj.
Not precise.



impre·cisely adv.
 estimate of the reliability coefficient, which is indicated by an excessively wide confidence interval. Therefore, it is essential to determine an appropriate sample size.

Donner and Eliasziw[22] have provided. diagrams to help investigators determine the optimal number of subjects and measurements. Two graphs that are of relevance to physical therapy research have been reproduced in Figures 1 and 2. Originally developed for single reliability studies from a one-way random effects model In statistics, a random effect(s) model, also called a variance components model is a kind of hierarchical linear model. It assumes that the data describe a hierarchy of different populations whose differences are constrained by the hierarchy. , they can also serve as rough guides for designs assessing interrater and intrarater reliability concurrently. Because the statistical model for a concurrent design is of higher order than for a single study, the resulting sample sizes will be conservative. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, the contours Contours may mean:
  • Contour lines on a map indicating elevation
  • The Contours, a Motown musical group notable for the hit single "Do You Love Me"
See also: plain
 shown in Figures 1 and 2 will slightly overestimate o·ver·es·ti·mate  
tr.v. o·ver·es·ti·mat·ed, o·ver·es·ti·mat·ing, o·ver·es·ti·mates
1. To estimate too highly.

2. To esteem too greatly.
 the number of subjects, raters, and measurements required to test a reliability hypothesis at the 5% significance level with 80% power.

Regarding concurrent designs, because it is generally recognized that interrater reliability is lower than intrarater reliability, it is suggested that the minimal acceptable levels (criterion values) for the hypothesis tests be selected to reflect this. For example, one might choose 0.6 for the interrater criterion value and 0.8 for the intrarater value. Using Landis and Koch's[18] benchmarks, these criterion values imply that one wishes to demonstrate at least "substantial" interrater and "almost perfect" intrarater reliability. If two raters are being evaluated in the interrater portion of the study, Figure 1 shows that approximately 35 subjects are required to yield 80% power for the hypothesis test when the "true" coefficient of reliability is 0.8. For the intrarater portion of the study, the number of consecutive measurements per rater to be taken on each subject is obtained from Figure 2. With 35 subjects and a "true" reliability of 0.9, the number of measurements per rater is approximately 3.

Estimating the Standard

Error of Measurement

Another important aspect of a reliability study is to calculate the SEM, also known as measurement error. Estimating the SEM can aid a physical therapist in differentiating real clinical changes from irrelevant fluctuations. The SEM is expressed in the same units as the measurements and has comparable meaning to a standard deviation In statistics, the average amount a number varies from the average number in a series of numbers.

(statistics) standard deviation - (SD) A measure of the range of values in a set of numbers.
. In particular, the overall and individual intrarater SEMs summarize the variability inherent within the raters' own measurements. They are defined as [Mathematical Expression Omitted] and [Mathematical Expression Omitted], respectively. Corresponding to the two types of rater effects, the interrater SEMs are defined as [SEM.sub.inter, random] = [Mathematical Expression Omitted] respectively. it is noted that the interrater SEMs include both the variability among raters' measurements and the variability within raters' measurements. This form of the interrater SEM is much more clinically realistic than that obtained from a one-way random effects model[11 (p193)] because it reflects not only the disagreement among raters but also the imprecision im·pre·cise  
adj.
Not precise.



impre·cisely adv.
 with which the individual raters make their measurements. In contrast, an estimate of interrater SEM from the one-way model assumes a priori a priori

In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience.
 that the rates' measurements are made free of error

Suppose it is of interest to determine whether an intervention has effected change upon a patient's joint. Let [x.sub.ij1] and [x.sub.ij2] be measurements taken by the same rater on a patient, preintervention and postintervention, respectively. The magnitude of change, [vertical bar][x.sub.ij1]-[x.sub.ij2][vertical bar] can be tested for significance using the test statistics

[Mathematical Expression Omitted]

depending on whether the rater is considered to be part of a larger population of raters or is specifically the only one of interest. Nevertheless, the resulting test statistic is then compared with critical values of the standard normal distribution. If the P value corresponding to the test statistic is small enough, then one can be quite certain that a real change has occurred, in excess of intrarater variability.

Instead of one rater taking measurements on both occasions, suppose a different rater now takes the postintervention measurement. Let [x.sub.i11] and [x.sub.i22] represent the measurements preintervention and postintervention, respectively, on the same patient. In this case, not only is the assessment of change affected by intrarater variability as before, but also by interrater variability. The magnitude of change, [vertical bar][x.sub.i11]-[x.sub.i22][vertical bar], is therefore tested for significance using the test statistics

[Mathematical Expression Omitted]

again depending on whether the raters represent a larger population or are the only one of interest. Similarly, if the P value from the standard normal distribution corresponding to the test statistic is small enough, then one can be quite certain that a real change has occurred, in excess of both interrater and intrarater variability.

Equating the test statistic to a standard normal critical value leads to an alternate technique for appraising clinical change. In this case, the value [Z.sub.[alpha]/2] [Mathematical Expression Omitted] represents the minimum difference between preintervention and postintervention measurements that needs to be exceeded to be fairly certain that a real change has occurred. Thus, if [vertical bar][x.sub.i11]-[x.sub.i22][vertical bar] exceeds [Z.sub.[alpha]/2] [Mathematical Expression Omitted], then it is likely that change has occurred. The critical value, [Z.sub.[alpha]/2] is determined from the standard normal distribution and corresponds to the chance of mistakenly concluding at change has occurred when it has not (ie, chance of a Type I error). In practice, the Type I error is chosen as 5%, and therefore [Z.sub.[alpha]/2] = 1.96 (corresponding to [alpha]=0.05). This alternate technique can also be applied in the first situation (ie, same rater, two or more measurements). The minimum difference to be exceeded is [Z.sub.[alpha]/2] [Mathematical Expression Omitted].

Several authors[12,23] have also advocated the use of the SEM to determine "real" change. They proposed computing 95% confidence ranges about the measurement and pointed out that any subsequent measurement lying outside the range could be assumed to be "real" change. However, what these authors fail to take into account is that any subsequent measurement on a subject is also prone to measurement error. To propose computing an additional 95% confidence range about the second measurement and then inspecting their overlap is also erroneous. Even if the confidence ranges do overlap, there is still a statistical possibility that the preintervention and postintervention measurements could be different. It is therefore recommended that only the methods outlined be used to assess clinical change.

Example

A concurrent parallel-forms and test-retest study was conducted to assess the level of reliability between and within a large universal plastic manual goniometer and a Lamoreux-type electrogoniometer for measuring knee joint angle (in degrees) (SL Young and colleagues, unpublished research). The aim of the study was to measure the error inherent in the design of the instruments, while minimizing extraneous ex·tra·ne·ous  
adj.
1. Not constituting a vital element or part.

2. Inessential or unrelated to the topic or matter at hand; irrelevant. See Synonyms at irrelevant.

3.
 sources of error (eg, minimizing the error due to the therapist's ability to locate the body landmarks used to position the instruments).

Twenty-nine subjects (n=29) were measured three consecutive times (m=3) on each goniometer (t=2) at three joint positions. Data for only one of the three joint positions (ie, full passive extension) are used for this example. Inasmuch as in·as·much as  
conj.
1. Because of the fact that; since.

2. To the extent that; insofar as.


inasmuch as
conj

1. since; because

2.
 the study is focused on goniometers rather than therapists, the terms "interrater" and "intrarater" will be supplanted by the terms "intergoniometer" and "intragoniometer" to facilitate clarity in presentation. Because the two goniometers studied are the variables of interest, they will be considered as fixed effects in the subsequent analysis.

The data, the BMDP-8V program[24] used to carry out the computations, the abridged BMDP-8V output, and the ANOVA table have been presented in Tables 4 through 7, respectively, as paradigms to assist the readers in carrying out their own analyses. Although other statistical computer packages could have been used for the analyses, the BMDP-8V program has a definite advantage because it allows the user to specify which variables in the model are fixed effects and which are random effects Random effects can refer to:
  • Random effects estimator
  • Random effect model
. According to the model specifications, the BMDP-8V program systematically prints out appropriate estimates of the variance components that are then used to compute coefficients of reliability. It is noted that the partitioned par·ti·tion  
n.
1.
a. The act or process of dividing something into parts.

b. The state of being so divided.

2.
a.
 residual errors are obtained from supplementary one-way ANOVAs added onto the tail-end of the leading program. In this example, two goniometers are being evaluated and therefore two tailing programs are added. The computer code in Table 5 has been written so that all three programs are executed collectively. Elements used to construct the analysis of variance table (Tab. 7) and relevant variance components have been highlighted in Table 6 for ease of presentation. The following results numerically illustrate the methods presented in this article.
Table 5. BMDP-8V Program Used to Analyze Data From the Reliability
Study

/PROBLEM   TITLE IS `GONIOMETIC STUDY - FULL PASSIVE
           EXTENSION - FIXED RATER EFFECTS'.
/INPUT     VARIABLES = 7.
           FORMAT = FREE.
           FILE = `PHYSIO1.DAT'.
           CASES = 29.
/VARIABLE  NAMES = ID, MANUAL1, MANUAL2, MANUAL3,
                   ELECTRO1, ELECTRO2, ELECTRO3.
/DESIGN    DEPEND = MANUAL1,MANUAL2, MANUAL3,
                    ELECTRO1, ELECTRO2, ELECTRO3.
           LEVELS = 29,2,3
           NAMES = SUBJECT, GONIOMETER, TIME.
           RANDOM = SUBJECT, TIME.
           FIXED = GONIOMETER.
           MODEL = `S, G, T(SG)'.
/PRINT     LINESIZE = 80.
/END
/INPUT
/DESIGN    DEPEND = MANUAL1, MANUAL2, MANUAL3,
           LEVELS = 29,3.
           NAMES = SUBJECT, TIME.
           RANDOM = SUBJECT, TIME.
           MODEL - `S, T(S)'.
/END
/INPUT
/DESIGN    DEPEND = ELECTRO1, ELECTRO2, ELECTRO3.
           LEVELS = 29,3.
           NAMES = SUBJECT, TIME.
           RANDOM = SUBJECT, TIME.
           MODEL = `S, T9S)'.


[TABULAR DATA OMITTED]
Table 7. Analysis of Variance for
Full Passive Extension.

Source               Degrees  Observed
of                   of       Mean
Variation            Freedom  Square

Subject               28       312.781
Rater                  1        84.144
Error (SxR)           28         4.501
Error (R)            116         0.856
  Error ([R.sub.1])   58         0.736
  Error ([R.sub.2])   58         0.977

  Estimates of Variance Components

   [[sigma].sub.S.sup.2]  = 51.987
   [[sigma].sub.S.sup.2]  =  0.915
   [[sigma].sub.SR.sup.2] =  1.215
   [[sigma].sub.e.sup.2]  =  0.856
   [[sigma].sub.e1.sup.2] =  0.736
   [[sigma].sub.e2.sup.2] =  0.977


Estimate of the Intergoniometer

Reliability Coefficient

The estimate 0.961 is obtained by substituting appropriate estimated variance components into the fixed rater effects formula:

[Mathematical Expression Omitted]

Hypothesis Test for the

Intergoniometer Reliability

Coefficient

The test for the hypothesis, [H.sub.0]:[rho]=0.6 versus [H.sub.1]:[rho]>0.6 results in calculating

[Mathematical Expression Omitted]

which yields a P value of <.001 from an F distribution on 28 and 28 degrees of freedom.

Ninety-five Percent One-Sided

Lower-Limit Confidence Interval

for the Intergoniometer

Reliability Coefficient

A ninety-five percent confidence interval about the estimated coefficient is given by

[Mathematical Expression Omitted]

[F.sub.2] = 1.697 is the 95th percentile point of F distribution on 28 and 51 ([[upsilon].sub.2] degrees of freedom.

Estimates of the Intragoniometer

Reliability Coefficients

Estimated coefficients for the universal manual goniometer and Lamoureux-based electrogoniometer, [[rho].sub.1, fixed]=0.986 and [[rho].sub.2, fixed=0.982, are computed as

[Mathematical Expression Omitted]

Hypothesis Test for

Intragoniometer Reliability

Coefficients

individual tests of significance (testing [H.sub.0]:[rho]=0.8 versus [H.sub.1]:[rho]>0.8 for [[rho].sub.1,fixed] and [[rho].sub.2,fixed] may be carried out by comparing

[Mathematical Expression Omitted]

to critical values from an F distribution on 28 and 58 degrees of freedom, yielding P values of <.001 in both cases.

Ninety-five Percent One-Sided

Lower-Limit Confidence Intervals

About the Intragoniometer

Reliability Coefficients

One-sided confidence intervals for the manual goniometer and the electrogoniometer are computed as

[Mathematical Expression Omitted]

where [F.sub.4] = 1.671 is the 95th percentile point of an F distribution of 28 and 58 degrees of freedom.

Standard Error of Measurement

An estimate of the intergoniometer SEM (ie, [Mathematical Expression Omitted]) is 1.439 degrees. From this value, the minimum difference that needs to be exceeded to be reasonably certain that a real change has occurred is calculated to be 4.0 degrees (ie, (1.96)[Mathematical Expression Omitted]2(1.439)=3.99). The individual intragoniometer SEMs are computed from the partitioned residual errors. It is found that the manual goniometer has a SEM (ie, [[sigma].sub.e1],) of 0.858 degrees, whereas the electrogoniometer's SEM (ie, [[sigma].sub.e2]) is 0.988 degrees. Likewise, the minimum differences are 2.4 and 2.7 degrees for the manual goniometer and the electrogoniometer, respectively.

Discussion and

Recommendations

In this article, statistical methods based on a repeated-measures study design are recommended when both interrater and intrarater reliability are being assessed. The chosen design and statistical model closely reflect the way in which physical therapy researchers often collect their data, and also effectively increase the sample size available for estimating each coefficient. The design allows the raters to be a random sample from a larger population of raters or a fixed set and the measurements to be recorded on either a continuous interval or ratio scale. The statistical methods assume that there are no missing data (ie, each of the n subjects is measured exactly m times by exactly t raters) and that a rater completes his or her set of measurements on a subject before the next rater begins. Although estimates of reliability based on ICCs are commonly reported in the literature, tests of statistical significance, confidence interval construction, and estimation of the SEM are rarely carried out. Moreover, the common practice of testing the null hypothesis of zero reliability is not sufficient because a significant result only implies some reliability in the data. It is, therefore, recommended that significance tests against specified criterion values and construction of confidence intervals be carried out in conjunction with point estimation In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.

More formally, it is the application of a point estimator to the data.
. It is also recommended that the SEM be correctly reported and interpreted to enable appraisement APPRAISEMENT. A just valuation of property.
     2. Appraisements are required to be made of the property of persons dying intestate, of insolvents and others; an inventory (q.v.) of the goods ought to be made, and a just valuation put upon them.
 of rater variability and clinical change.

Finally, care must be taken in the interpretation of estimated reliability coefficients from different studies, and also in their application to different clinical populations. Because the estimates rely on the variability of the measurements in the study sample, they can only be interpreted within other populations that have similar attributes to the study's population. This dependence is clearly illustrated by expressing the reliability coefficient ([rho]) in terms of the SEM and the variance of the scores [var([x.sub.ijk])], as follows:

[Mathematical Expression Omitted]

For example, assume that a particular goniometer is accurate to within [+ or -] 2 degrees (ie, SEM=2) in a population of disabled patients and in a population of healthy subjects. Consider next a test-retest reliability test-retest reliability Psychology A measure of the ability of a psychologic testing instrument to yield the same result for a single Pt at 2 different test periods, which are closely spaced so that any variation detected reflects reliability of the instrument  study carried out on a sample of patients from each population. For the sample consisting of disabled patients, the joint measurements are expected to be fairly heterogeneous because the patients would have varying degrees of disability. Assume that the variance of the scores is equal to 20. Measurements arising from the sample of healthy subjects, however, are expected to be more homogeneous The same. Contrast with heterogeneous.

homogeneous - (Or "homogenous") Of uniform nature, similar in kind.

1. In the context of distributed systems, middleware makes heterogeneous systems appear as a homogeneous entity. For example see: interoperable network.
 because their range of motion is not likely to be impaired. Suppose the variance in this case is equal to 10. Using the expression relating SEM and var([x.sub.ijk]) to [rho], the intragoniometer reliability for patients with disability is computed as 0.8, whereas for the healthy subjects the intragoniometer reliability is 0.6. In the first study, one would report "substantial" reliability and praise the goniometer, yet report "moderate" reliability and denounce de·nounce  
tr.v. de·nounced, de·nounc·ing, de·nounc·es
1. To condemn openly as being evil or reprehensible. See Synonyms at criticize.

2. To accuse formally.

3.
 it in the other study, even though the goniometer's precision (SEM) has remained constant between the studies. To facilitate better interpretation of the reliability coefficients, it is recommended that the variability of the scores [var([x.sub.ijk])] be reported alongside the estimates of reliability and SEM.

References

[1.] Boone DC, Azen SP, Lin CM, et al. Reliability of goniometric measurements. Phys Ther 1978; 58:1355-1360. [2] Clapper MP, Wolf SL. Comparison of the reliability of the Orthoranger and the standard goniometer for assessing active lower extremity range of motion. Phys Ther. 1988;68:214 218. [3] Task Force on Standards for Measurement in Physical Therapy. Standards for tests and measurements in physical therapy practice. Phys Ther. 1991;71:589-622. [4] Stratford P, Agostino V, Brazeau C, Gowitzke BA. Reliability of joint angle measurement: a discussion of methodology issues. Physiotherapy physiotherapy: see physical therapy.  Canada. 1984;36:5-9. [5] Bartko JJ, Carpenter WT. On the methods and theory of reliability, J Nerv Ment Dis. 1976; 163:307-317. [6] Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420-428. [7] Rheault W, Miller M, Nothnagel P, et al. Intertester reliability and concurrent validity concurrent validity,
n the degree to which results from one test agree with results from other, different tests.
 of fluid-based and universal goniometers for active knee flexion flexion /flex·ion/ (flek´shun) the act of bending or the condition of being bent.

flex·ion
n.
1. The act of bending a joint or limb in the body by the action of flexors.

2.
. Phys Ther. 1988;68:1676-1678. [8] Fleiss JL. The Design and Analysis of Clinical Experiments. New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
, NY: John Wiley John Wiley may refer to:
  • John Wiley & Sons, publishing company
  • John C. Wiley, American ambassador
  • John D. Wiley, Chancellor of the University of Wisconsin-Madison
  • John M. Wiley (1846–1912), U.S.
 & Sons Inc; 1986:14-15. [9] Watkins MA, Riddle riddle, puzzling question, specifically one that consists of a fanciful description or definition of something to be guessed. A famous riddle was asked by the Sphinx: "What goes on four legs in the morning, on two at noon, on three at night?" Oedipus guessed the  DL, Lamb RL, Personius WJ. Reliability of goniometric measurements and visual estimates of knee range of motion obtained in a clinical setting. Phys Ther. 1991; 71:90-97 [10] Youdas JW, Carey JR, Garrett TR. Reliability of measurements of cervical spine cervical spine Clinical anatomy The region of the vertebral column encompassing C1 through C7  range of motion: comparison of three methods, Phys Ther. 1991;71:98-106. [11] Ghiselli EE, Campbell JP, Zedeck S. Measurement Theory for the Behavioral Sciences behavioral sciences,
n.pl those sciences devoted to the study of human and animal behavior.
. San Francisco San Francisco (săn frănsĭs`kō), city (1990 pop. 723,959), coextensive with San Francisco co., W Calif., on the tip of a peninsula between the Pacific Ocean and San Francisco Bay, which are connected by the strait known as the Golden , Calif: WH Freeman and Co; 1981;183 228. [12] Rothstein JM. Measurement in Physical Therapy. New York, NY: Churchill Livingstone Imprint of a medical publishing company owned by Elsevier Ltd, but previously owned by Harcourt and Pearsons. Originally formed from Livingstone, Edinburgh, Scotland, and J & A Churchill, London, UK, and subsequently with an office in New York, but now integrated with the rest of  Inc; 1985:1-55, [13] Gajdosik RL, Bohannon RW. Clinical measurement of range of motion: review of goniometry goniometry /go·ni·om·e·try/ (go?ne-om´e-tre) the measurement of angles, particularly those of range of motion of a joint.

goniometry

the measurement of range of motion in a joint.
 emphasizing reliability and validity. Phys Ther. 1987;67:1867-1872. [14] Myers JL, Well AD. Research Design and Statistical Analysis. New York, NY: HarperCollins Publishers; 1991:chap (Challenge Handshake Authentication Protocol) An access control protocol for dialing into a network that provides a moderate degree of security. When the client logs onto the network, the network access server (NAS) sends the client a random value (the  8. [15] Graybill FA. An Introduction to Linear Statistical Models, Volume 1. New York, NY: McGraw-Hill Inc; 1961:396-398. [16] Kirk RE. Experimental Design: Procedures for the Behavioral Sciences. 2nd ed. Belmont, Calif: Brooks/Cole Publishing Co; 1982:441-443, 535-536. [17] Neter J, Wasserman W, Kutner MH. Applied Linear Statistical Models: Regression, Analysis of Variance and Experimental Design. 2nd ed. Homewood, Ill: Richard D Irwin Inc; 1985: 782-785. [18] Landis JR, Koch GG. The measurement of observer agreement for categorical data categorical data

data relating to category such as qualitative data, e.g. dog, cat, female. It may be nominal when a name is used, e.g. location, breed, or ordinal when a range of categories is used, e.g. calf, yearling, cow.
. Biometrics. 1977;33:159-174. [19] Scheffe H. The Analysis of Variance. New York, Inc: John Wiley & Sons Inc; 1959:227. [20] Satterthwaite FE. An approximate distribution of estimates of variance components. Biometrics. 1946;2:110-114. [21] Fleiss JL, Shrout PE. Approximate interval estimation for a certain intraclass correlation coefficient. Psychometrika. 1978;43:259-262. [22] Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med. 1987;6: 441-448. [23] Harding B, Black T, Bruulsema A, et al. Reliability of a reciprocal test protocol performed on the kinetic kinetic /ki·net·ic/ (ki-net´ik) pertaining to or producing motion.

ki·net·ic
adj.
Of, relating to, or produced by motion.



kinetic

pertaining to or producing motion.
 communicator: an isokinetic isokinetic /iso·ki·net·ic/ (-ki-net´ik) maintaining constant torque or tension as muscles shorten or lengthen; see isokinetic exercise, under exercise.  test of knee extensor extensor /ex·ten·sor/ (-ser) [L.]
1. causing extension.

2. a muscle that extends a joint.


ex·ten·sor
n.
A muscle that extends or straightens a limb or body part.
 and flexor flexor /flex·or/ (flek´ser)
1. causing flexion.

2. a muscle that flexes a joint.


flexor retina´culum  see entries under retinaculum.
 strength. J Orthop Sports Phys Ther, 1988; 10:218-223. [24] BMDP BMDP - BioMeDical Package  Statistical Software, 1990 Release. Berkeley, Calif: University of California Press "UC Press" redirects here, but this is also an abbreviation for University of Chicago Press

University of California Press, also known as UC Press, is a publishing house associated with the University of California that engages in academic publishing.
; 1990.

M Eliasziw, PhD, is Assistant Professor, Department of Epidemiology epidemiology, field of medicine concerned with the study of epidemics, outbreaks of disease that affect large numbers of people. Epidemiologists, using sophisticated statistical analyses, field investigations, and complex laboratory techniques, investigate the cause  and Biostatistics biostatistics /bio·sta·tis·tics/ (-stah-tis´tiks) biometry.

bi·o·sta·tis·tics
n.
The science of statistics applied to the analysis of biological or medical data.
, The University of Western Ontario Western is one of Canada's leading universities, ranked #1 in the Globe and Mail University Report Card 2005 for overall quality of education.[2] It ranked #3 among medical-doctoral level universities according to Maclean's Magazine 2005 University Rankings. , London, Ontario, Canada N6A 5C1. Dr Eliasziw is also Research Scientist, The John P Robarts Research Institute The Robarts Research Institute is a non-profit medical research facility in London, Ontario, Canada with a staff of more than 600 people. Robarts scientists include physicians and physicists, biologists and biomedical engineers, and the range of diseases they study include heart , PO Box 5015, 100 Perth Dr, London, Ontario, Canada N6A 5K8. Address all correspondence to Dr Eliasziw.

SL Young, MA, PT, is Honorary Lecturer, Department of Physical Therapy, The University of Western Ontario. Ms Young was Assistant Director of clinical Studies, Department of Physiotherapy, Victoria Hospital, 800 Commissioners Rd E, London, Ontario, Canada N6A 4G5, when this project was initiated.

MG Woodbury, PhD, PT, is Consultant, Research Department, Parkwood Hospital, 801 Commissioners RD E, London, Ontario, Canada N8C 5J1, and Honorary Assistant Professor, Department of Physical Therapy, The University of Western Ontario.

K Fryday-Field, BSc, PT, is Senior Director of Corporate and Board Affairs, Victoria Hospital. Ms Fryday-Field was Assistant Director of Administration, Department of Physiotherapy, Victoria Hospital, when this project was initiated.

The data collection for the example was supported by a grant from the Ontario Ministry of Health. Dr Eliasziw's work was supported by an operating grant from the Natural Sciences and Engineering Research Council The Natural Sciences and Engineering Research Council (NSERC) is a Canadian government division that provides grants for research in the natural sciences and in engineering. In 2004-2005, it will invest CAD $850 million in university-based research and training.  of Canada.
COPYRIGHT 1994 American Physical Therapy Association, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1994, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Fryday-Field, Karen
Publication:Physical Therapy
Date:Aug 1, 1994
Words:6330
Previous Article:The statistical analysis of single-subject data: a comparative examination.
Next Article:Burn Care and Rehabilitation: Principles and Practice.
Topics:



Related Articles
Decreased shoulder range of motion on paretic side after stroke.
Reliability of goniometric measurements and visual estimates of knee range of motion obtained in a clinical setting. (includes commentary and reply)
Reliability of measurements of cervical spine range of motion - comparison of three methods. (includes commentary and reply)
Normal hip and knee active range of motion: the relationship to age.
Knee flexion contractures in institutionalized elderly: prevalence, severity, stability, and related variables. (includes commentary and author...
Reliability of passive wrist flexion and extension goniometric measurements: a multicenter study. (includes commentary and author response)
An examination of Cyriax's passive motion tests with patients having osteoarthritis of the knee. (includes commentary and authors' response)
The influence of body size on linear measurements used to reflect cervical range of motion.
Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data.
Effect of passive range of motion exercises on lower-extremity goniometric measurements of adults with cerebral palsy: a single-subject design....

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles