Printer Friendly

The reliability of hand held muscle testers with individuals with spinal cord injury.


This study was conducted to examine the intra- and interrater reliability of hand-held device's (HHD's) with individuals with spinal cord injury. The subjects in the study (N=24) were all military veterans and volunteered to be in the study. Each subject was tested bilaterally on three upper-body muscle groups, (biceps, triceps and wrist extension) using the Microfet (16) to ascertain their muscular strength. Two different testers tested half the subjects. The strength scores obtained were analyzed to determine the reliability of single and dual examiners using a handheld muscle tester, the Microfet (HHD). The results of the study indicate that the average reliability of each of the two testers was very high, ranging from .93 to .99. However, the inter-rater correlation coefficients were below acceptable limits and ranged from .21 to .84. This would indicate that using the scores of two different testers is not practical in a clinical setting. Predicated on this study and others, the authors suggest that utilizing a fixed lever arm instead of a tester's hand may improve the reliability of both the interrater and intrarater reliability of the Microfet HHD.


The utilization of hand-held muscle testing devices (HHD's) for assessing strength in individuals with disabilities is a relatively new trend. It is an objective method of measuring muscle strength in both disabled and non-disabled populations. In the past ten years a number of studies have examined the reliability of HHDs (1,2,11,14,16).

Isometric strength testing is an important part of a clinician's assessment protocol to develop a training program from baseline functioning, while repeated testing at appropriate intervals provides information regarding the efficacy of the treatment program. Accountability is essential for clients, but is particularly important for individuals with disabilities where changes in strength can have a profound effect on performing functional tasks and the quality of life. In order to make appropriate decisions concerning treatment and the effectiveness of the training program, it is essential to determine if the assessment procedure is reliable and not subject to variability based on the tester or the test procedure.

Subjective manual muscle testing (MMT) has been used by therapists in clinical settings to assess muscle strength with a subjective scoring system and the use of a rating scale (0-5) to note minimal or functional levels of strength. Unfortunately, this method is often unreliable and unable to detect small changes in strength, necessitating the need for a more quantitative procedure for testing weakened muscles (6).

With HHDs as well as manual muscle testing, the examiner selects one of two protocols to assess the muscle strength of the client, which are a "break test" and a "make test. A break test is defined as the testers providing sufficient resistance to overcome the subject's maximal effort within one second. In contrast a "make test" requires the tester to match the resistance provided by the client for 3-5 second interval. While there is no agreement regarding which method is most appropriate, a study involving individuals with multiple sclerosis and spinal cord injury found greater chances of muscle soreness and injury with the "break test" (1-3). Moreover, because a "break" test requires the examiner to exert maximum pressure on the muscle being tested, the resultant strength scores are often higher than those obtained with the "make test". The increased strength scores seem to be a combination of the tester and the individual being tested. For these reasons, the current study conducted at the VA Health Care System, Spinal Cord Injury Unit, La Jolla, California utilized employed the "make" test protocol.

Several studies have examined the reliability of HHDs with different types of individual (1-3, 15). These studies have reported reliability coefficients ranging from -.31 to .99 for both "break and make tests". The reasons for the variability range from strength of the examiner, type of individual being tested, number of trials administered, the use of a consistent and clearing defined testing protocol and whether a "make or break" test was used (3,5,12,11). Additionally the question of the reliability of HHDs with different examiners has been the topic of concern. This study was undertaken to examine the intra- and inter-rater reliability of HHD using the "make test" with individuals with spinal cord injury.



Twenty-five spinal cord injury individuals (11 paraplegics; 14 tetraplegics) participated in this study. The range of injuries went from C-2 to 1-4. All subjects were able to hold the initial starting position including tetraplegics with lesions above C-5/6. The utilization of the make test as opposed to a break test eliminated any problem with spaciticity. The mean age of the subjects was 52 + 16 years and ranged from 25 to 83 years. Average length of injury was 13 + 10 years, with a range of 1 to 34 years. Nineteen subjects were right handed and six were left-handed. All subjects were at least one year post-injury and were either at the VA Medical Center for their annual review or exercising in the VA Spinal Cord Injury unit. Approval for the use of human subjects was obtained from the local Institutional Review Boards.


Tester Experience. Two testers were employed for data collection. Tester 1 (university professor, PhD) had extensive experience in the use of HHDs for assessing muscular strength in individuals with physical disabilities. Tester 2 (a physical therapist at the VA Medical Center) had extensive experience with subjective manual muscle testing, but no experience with HHDs. Prior to any formal data collection, the two testers standardized a test protocol using existing research studies and pilot testing.

Prior to testing, all procedures were explained to each subject. After being allowed to ask any questions, each subject signed an informed consent agreement. Demographic information was obtained, including date of birth, age, onset of injury, hand preference, and level of exercise.

Testing Protocol. Upper-body strength was assessed using biceps, triceps, and wrist extension tests. For each test, the subject was shown the testing position and explained the testing procedure. One practice trial and three test trials were given for each muscle group and for both preferred and nonpreferred hands. Both testers assessed half the subjects on different days, so all subjects were tested twice using identical procedures. The second testing session was conducted within two weeks of the first testing session.

Statistical Analysis

Descriptive statistics including means and standard deviations were calculated for all variables. A one-way repeated measures analysis of variance was conducted to determine whether any significant differences occurred across the three trials for each muscle group within each tester. Intraclass correlation coefficients (and 95% confidence intervals) obtained from this analysis of variance was used to examine the reliability of each tester for each day and across days. The root mean square error was also calculated for each reliability coefficient to provide descriptive data regarding the error associated with each measurement.


Table 1 provides the means and standard deviations for the average of three trials of HHD for the three muscle groups tested in this study. The standard deviations indicate the heterogeneous nature of the sample, which was not unexpected. One-way analysis of variance with repeated measures revealed the following. For Tester 1 there were no significant differences for left biceps, left and right triceps, and left and right wrist extension. However, there were significant differences ([F.sub.2,48] = 10.43; p = .001) for the right biceps, with the first trial being significantly lower than trials 2 and 3. For Tester 2 there were no significant differences across trials for any of the tested muscles. For this reason reliability was examined across the three trials for each tester and the average of the trials was used to examine inter tester reliability.

Intratester Reliability. Table 2 provides the reliability coefficients (and 95% CI) for the average of three trials, as well as for a single trial. Average reliability coefficients (R's) ranged from.93 to .99 for Tester 1 and .96 to .99 for Tester 2. Lower-bound 95% confidence intervals ranged from R = .87 to R = .97 for Tester 1 and R = .91 to R = .98 for Tester 2. Single-trial reliability was slightly lower, but still acceptable in the majority of cases.

Because reliability can be inflated in heterogeneous samples, the root mean square error (RMSE) was also calculated to determine if the measurement error was tolerable. Results indicated RMSE ranging from 2.97 lb. to 5.39 lb. for Tester 1 and 1.72 lb. to 3.15 lb. for Tester 2.

Intertester Reliability. In order to make meaningful comparisons between testers, only the individuals who were tested by both Tester 1 and Tester 2 are included in analyses. Table 3 provides the means and standard deviations for all tested muscle groups. There was a significant difference between testers for left wrist extension with Tester 1 obtaining significantly higher strength scores than Tester 2 (28.04 [+ or -] 13.39 vs. 23.26 [+ or -] 10.00). However, the very large differences between testers for the left and right biceps. While these differences were not found to be statistically significant due to the large standard deviations and resultant loss of statistical power, their magnitude could have clinical significance.

Table 4 provides the reliability coefficients and 95% CI. Average reliability coefficients (R's) ranged from .21 to .89. In all cases, when the lower bound 95% CI was considered, these coefficients were not acceptable. The unacceptability of the reliability coefficients was further exemplified by the very high RMSE, which ranged from 5.70 lb. for the left triceps to 13.91 lb. for the right biceps.


The purpose of this study was to examine the intratester and intertester reliability of HHD in selected upper-body muscles. Results indicated very high reliability for each tester across all tested muscle groups, even when the lower bound 95% CI was considered. These results are consistent with other studies using single testers (1,2,8). This study is different as correlations were also calculated between testers.

No other studies have reported the RMSE associated with the reliability coefficients, an important statistic for understanding the quality of the measurement. In this study the RMSE ranged from 2.97 lb. to 5.39 lb. for Tester 1 and 1.72 lb. to 3.15 lb. for Tester 2. For muscle groups that are less strong (e.g., triceps, wrist extension) error of this magnitude can result in exercise prescriptions that may be inappropriate for clients. Thus, it is important that researchers understand the importance of viewing reliability within the context of the variability of the sample. Variability of response will likely always be quite large in samples of individuals with disabilities, making it essential that training and attention to protocol are given the utmost attention if the RMSE is to be reduced.

Intertester reliability was very low and unacceptable when the lower bound 95% CI was considered. Moreover, the RMSE was very high, ranging from 5.7 to 13.91 lb. Other studies have also found low intertester reliability (1,2). These results are quite troubling in view of the typical way data are collected in a clinical setting. It is likely that clients are tested by a number of clinicians. If the results of the current study are verified by other studies employing larger samples, this study calls into question the validity of these strength assessments and their usefulness in training programs as measures of progress.

This study was intended to provide the clinician in the field with some useful information for the use of HHDs in clinical settings. The results of this investigation and previous research conducted Aufsesser, Bohannon, Horvat and Croce (1-11,13-15,17-21,25) all indicate that HHDs can be reliable under proper procedures. This includes the tester being familiar and experienced with the device, the tester having the necessary strength for the client population and utilizing a make test protocol. In addition, the data indicate that replicating the protocol used in this study with one major change. The variability found in this investigation in the interrater reliability was most likely caused by difference in the tester's strength and the movement of the tester's arm during data collection. These authors recommend that the HHD, instead of being held by the tester, should be attached to a fixed lever arm that would not move during testing. This would ensure that the strength scores collected were due to the effort of the subject and not affected by the tester's strength or reaction to the pressure. Finally, it should be noted that the individual reliability scores of the two testers were excellent and give credence to the utilization of HHDs using single testers.


(1.) Aufsesser, P. M. Objective muscle testing of individuals with multiple sclerosis. (Graduate Division Research Grant). Unpublished manuscript, San Diego State University, 1992.

(2.) Aufsesser, P.M., and Horvat, M. The reliability of objective manual muscle testing devices. Res. Q. Exerc. Sport 65:98-99, 1994.

(3.) Aufsesser, P. M., M. Horvat, and R. Croce. A critical examination of selected hand-held dynamometers to assess isometric muscle strength. Adapted Phys Activity Q. 13:153-165, 1996.

(4.) Bohannon, R.W. Manual muscle testing scores and dynamometer test scores of knee extension strength. Arch. Phys. Med. Rehabil. 67:360-392, 1986.

(5.) Bohannon, R.W. Make and break tests of elbow flexor muscle strength. Phys. Ther. 68:193-194, 1988.

(6.) Bohannon, R.W. Testing isometric limb muscle strength with dynamometers. Crit. Rev. Phys. Med. Rehabil. 2:75-86, 1990.

(7.) Bohannon, R.W. Biomedical applications of hand held force gauges: A bibliography. Percept. Motor Skills. 77:235-242, 1993.

(8.) Bohannon, R.W., and A. W. Andrews. Inter-rater reliability of hand-held dynamometers. Phys. Ther. 67:931-933, 1987.

(9.) Bohannon, R.W., and W. Saunders. Hand held dynamometer: Single trial may be adequate for measuring muscle strength in healthy individuals. Physiother. Can. 2:6-9, 1990.

(10.) Bohannon, R.W., and J. B. Wikholm. Measurement of knee extension force obtained by two examiners of substantial different experience with a hand held dynamometer. Isokinetics Exerc. Sci. 2:5-8, 1992.

(11.) Bohannon, R.W. Internal consistency of manual muscle testing scores. Percept Motor Skills 85:736-738, 1997.

(12.) Brinkman, J. R. Comparison of hand-held dynamometer in measuring strength with patients with neuromuscular disease. J. Orthop. Sports Phys. Ther. 19:100-104, 1994.

(13.) Croce, R., and M. Horvat. Effects of reinforcement based exercise on fitness and work productivity in adults with mental retardation. Adapted Phys. Activity Q. 9:148-178, 1992.

(14.) Dawson, C., R. Croce, T. Quinn, and N. Vroman. Reliability of Nicholas Manual Muscle Tester on upper body strength in children ages 8-10. Pediatr. Exerc. Sci. 4:340-350, 1992.

(15.) Hill, C., R. Croce, F. Cleland, and J. Miller. Muscle torque relationship between hand-held dynamometry and isometric measurement in children ages 9 to 11. J. Strength Conditioning Res. 10:7-82, 1996.

(16.) Hoggan Health Industries. Force Evaluation and Testing System. Draper, UT: Hoggan Health Industries, 1986.

(17.) Horvat, M. Comparison of contraction periods to assess isometric muscle strength in elementary school girls. Isokinetics Exerc. Sci. 5:15-18, 1995.

(18.) Horvat, M., R Croce, and G. Roswal. Magnitude and reliability of measurements of muscle strength across trials for individuals with mental retardation. Percept. Motor Skills 7:643-649, 1993.

(19.) Horvat, M., R. Croce, and G. Roswal. Intra-tester reliability of the Nicholas Manual Muscle Tester with intellectual disabilities by a tester having minimal experience. Arch. Phys. Med. Rehabil. 75:808-811, 1994.

(20.) Horvat, M., R. Croce, G. Roswal, and F. Seagraves. Utilization of a single trial versus maximal or mean values for evaluation upper body strength in individuals with mental retardation. Adapted Phys. Activity Q. 12:52-59, 1995.

(21.) Horvat, M., B. McManis, and F. Seagraves. Reliability and objectivity of the Nicholas Manual Muscle Tester with children. Isokinetics Exerc. Sci. 2:175-181, 1992.

(22.) Lovett, R.W. The treatment of infantile paralysis. 2nd ed. Philadelphia: P. Blakiston's Son, 1917.

(23.) McAndrews, J. M., and C. Lewis. Hand Held Dynamometry. Great Seminars and Books Inc., Washington, DC, 1993.

(24.) Nicholas, J. A., M. W. Marino, and G. W. Gleim. Characteristic of a strength measurement device. In: Proc. Orthop. Res. Soc. 33rd Annu. Meet. San Francisco, CA, Jan. 19-22, 1987.

(25.) Seagraves, F., and M. Horvat. Comparison of isometric test procedures to assess muscular strength in elementary school girls. Pediatr. Exerc. Sci. 7:61-68, 1995.

(26.) Stuberg, W. A., and W. K. Metcalf. Reliability of quantitative muscle testing in healthy children and in children with Duchenne Muscular dystrophy using a hand-held dynamometer. Phys. Ther. 68:977-982, 1988.

(27.) Surburg, P. R., R. Suomi, and W. K. Poppy. Validity and reliability of a hand held dynamometer applied to adults with mental retardation. Arch. Phys. Med. Rehabil. 73:535-539, 1992.

(28.) Surburg, P. R, R. Suomi, and W. K. Poppy. Validity and reliability of a hand held dynamometer with two populations. J. Orthop. Sports Phys. Ther. 16:1342-1347, 1992.

Peter M. Aufsesser (1), Michael Horvat (2), and Ruth Austin (3)

(1) San Diego State University, (2) University of Georgia, Athens, Georgia, (3) VA Health Care System, Spinal Cord Injury Unit, La Jolla, California

Contact Information:

Peter M. Aufsesser, Director

Fitness Clinic for Individuals with Disabilities

San Diego State University

San Diego, California 92182-7251

(619) 594-1917

(619) 594-6553 (fax)

Table 1. Means and standard deviations for intratester comparison

 Muscle Tester 1 Tester 2

Left Biceps (lb) 46.79 [+ or -] 11.91 37.92 [+ or -] 8.23
Right Biceps (lb) 46.20 [+ or -] 14.70 34.97 [+ or -] 9.37
Left Triceps (lb) 26.28 [+ or -] 11.90 26.33 [+ or -] 12.51
Right Triceps (lb) 30.74 [+ or -] 9.41 27.21 [+ or -] 14.09
Left Wrist Extension (lb) 32.80 [+ or -] 13.55 23.26 [+ or -] 10.00
Right Wrist Extension 31.39 [+ or -] 11.99 23.05 [+ or -] 10.52

Note. N = 22-25 for Tester 1. N = 11-13 for Tester 2.

Table 2. Intratester reliability coefficients (R)
and 95% confidence intervals

 Tester 1

Muscle Average R Single-trial R

Left Biceps .93 .82
 (.87 to .97) (.69 to .91)

Right Biceps .99 .96
 (.97 to .99) (.92 to .98)

Left Triceps .98 .94
 (.96 to .99) (.88 to .97)

Right Triceps .96 .88
 (.91 to .98) (.78 to .94)

Left Wrist Extensors .98 .96
 (.97 to .99) (.91 to .98)

Right Wrist Extensors .98 .94
 (.96 to .99) (.89 to .97)

 Tester 2

Muscle Average R Single-trial R

Left Biceps .98 .95
 (.95 to .99) (.87 to .98)

Right Biceps .96 .90
 (.91 to .99) (.76 to .96)

Left Triceps .99 .97
 (.98 to .99) (.93 to .99)

Right Triceps .99 .97
 (.98 to .99) (.92 to .99)

Left Wrist Extensors .99 .97
 (.98 to .99) (.93 to .99)

Right Wrist Extensors .99 .97
 (.97 to .99) (.92 to .99)

Table 3. Means and standard deviations for intertester

 Muscle Tester 1 Tester 2

Left Biceps (lb.) 43.56 [+ or -] 12.78 37.92 [+ or -] 8.23
Right Biceps (lb.) 43.23 [+ or -] 18.12 34.97 [+ or -] 9.37
Left Triceps (lb.) 23.88 [+ or -] 12.99 26.33 [+ or -] 12.51
Right Triceps (lb.) 31.27 [+ or -] 10.98 29.57 [+ or -] 12.36
Left Wrist Extensors 28.04 [+ or -] 13.39 * 23.26 [+ or -] 10.00
Right Wrist Extensors 27.31 [+ or -] 11.99 23.05 [+ or -] 10.52

Note. N = 13 for left and right biceps and left and right wrist
extensors; N = 11 for left triceps; N = 10 for right triceps.

* Significant differences between Tester 1 and 2 at p = .05

Table 4. Intertester reliability (R) coefficients and 95% confidence

 Muscle Average Reliability Single-Trial Reliability

Left Biceps .36 .22
 (0 to .80) (0 to .67)

Right Biceps .21 .11
 (0 to .75) (0 to .61)

Left Triceps .89 .80
 (.61 to .97) (.44 to .94)

Right Triceps .74 .59
 (.03 to .94) (.02 to .88)

Left Wrist Extensors .84 .73
 (.51 to .95) (.34 to .91)

Right Wrist Extensors .84 .72
 (.48 to .94) (.32 to .90)
COPYRIGHT 2003 American Kinesiotherapy Association
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Aufsesser, Peter M.; Horvat, Michael; Austin, Ruth
Publication:Clinical Kinesiology: Journal of the American Kinesiotherapy Association
Article Type:Clinical report
Geographic Code:1USA
Date:Dec 22, 2003
Previous Article:Diabetic peripheral neuropathy and exercise.
Next Article:Physical education programs: Part II: perceptions in male juvenile offender facilities.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters