Printer Friendly
The Free Library
5,661,266 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Training users in the Gross Motor Function Measure: methodological and practice issues.


Development of a clinical test is a complex and time-consuming process. Researchers usually spend the majority of their effort to establish that the test is reliable (consistent) and valid (measuring what it is supposed to measure). After the test has been published, there is often little time, money, or energy left to evaluate how to teach others to use the test in an appropriate manner, let alone to assess the impact of the training on new users. Therapists should, however, know whether they are using tests in a manner that produces reliable measurements so that they can have confidence in their ability to attribute a change in score to changes in patient function rather than to measurement error.[1]

All measurements can be affected by several sources of variation, which can affect the reliability of the measure, including factors within the examinee (subject of the assessment), the examination (or test), the examiner (user), and the environment context).[2] Some important variables of the examinee are age, functional activity level, and degree of disability. The length of the assessment and the clarity of the administration guidelines guidelines,
n.pl a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks.
 are factors that may vary in the examination. Factors associated with the environment include the test setting (eg, room), temperature, and time of day. Other factors thought to be less controllable, but relevant, are patient compliance, age and background experience of the examiner, the examiner's familiarity with the examinee, and the method of assessment (direct contact or analysis of videotaped activities). One major controllable source of variation is the training of potential test users in the background, concepts, and application of the test.

Rothstein[3] states that when evaluating tests for clinical use, it is important to consider population-specific reliability for the particular group being measured and for the type of people administering the measures. It is useful to know the type of patients used in the reliability studies (and whether they are similar to the subjects who will be assessed) and the level of training of the examiners. It is important to know whether the examiners were a sample of typical therapists or whether they were part of the team developing the measure and therefore probably more expert in its administration and scoring. For reliability to be generalizable gen·er·al·ize  
v. gen·er·al·ized, gen·er·al·iz·ing, gen·er·al·iz·es

v.tr.
1.
a. To reduce to a general form, class, or law.

b. To render indefinite or unspecific.

2.
, reliability testing needs to be conducted with people thought to be typical of the users of the test. When considering whether to incorporate a new test for research or clinical practice, it is important to determine whether the test manual provides advice or preferably pref·er·a·ble  
adj.
More desirable or worthy than another; preferred: Coffee is preferable to tea, I think.



pref
 evidence about the best methods for training.

The Standards for Tests and Measurements in Physical Therapy Practice require primary test purveyors or test developers to include in a test manual

... descriptions of the qualifications

and competencies needed by test users.

These descriptions should include

statements regarding potential consequences

of unqualified users administering

the test.[1(p598)]

Test manuals should also "...describe how potential test users can obtain the competencies necessary to administer the tests."[1(p598)] Among the standards for clinicians using tests is the following:

Test users must be able to determine

before they use a test whether they

have the ability to administer that test

... based on an understanding of the

test user's own skills and knowledge

(competency COMPETENCY, evidence. The legal fitness or ability of a witness to be heard on the trial of a cause. This term is also applied to written or other evidence which may be legally given on such trial, as, depositions, letters, account-books, and the like.
     2.
) as compared with the

competencies described by the test

purveyor (World-Wide Web) Purveyor - A World-Wide Web server for Windows NT and Windows 95 (when available).

http://process.com/.

E-mail: <info@process.com>.
.[1(p614)]

Stengel[4] reviews a number of tests for assessing motor development in nonnewborn children for the tests' reliability, validity, and usefulness to clinicians. He chose the tests because they are comprehensive, familiar to investigators studying the management of children with neurologic neurologic /neu·ro·log·ic/ (-loj´ik) pertaining to neurology or to the nervous system.
Neurologic
Having to do with the nervous system.
 dysfunction dysfunction /dys·func·tion/ (dis-funk´shun) disturbance, impairment, or abnormality of functioning of an organ.dysfunc´tional

erectile dysfunction  impotence (2).
, and readily available. These tests include the Bayley Scales of infant Development Bay·ley Scales of Infant Development
pl.n.
Standardized tests used to assess the mental, motor, and behavioral progress of children during the first two and one-half years of life.
,[5] Bruininks-Oseretsky Test of Motor Proficiency pro·fi·cien·cy  
n. pl. pro·fi·cien·cies
The state or quality of being proficient; competence.

Noun 1. proficiency - the quality of having great facility and competence
,[6] Peabody Developmental Motor Scales,[7] Miller Assessment for Preschoolers,[8] Pediatric pediatric /pe·di·at·ric/ (pe?de-at´rik) pertaining to the health of children.

pe·di·at·ric
adj.
Of or relating to pediatrics.
 Evaluation of Disability Inventory,[9] and Manual of Developmental Diagnosis.[10] Stengel states that the tests are "... fairly easy to learn to administer without the need for extensive special instruction,"[4(p47)] but he presents no evidence to support this contention,

To determine how well test manuals addressed the issue of training, the manuals from the nonnewborn pediatric tests identified by Stengel[4] were reviewed by the primary author (DJR DJR Def Jam Records (record label)
DJR Dale Junior (NASCAR driver)
DJR Dick Johnson Racing (Australia) 
). The results of this review are presented in the Table. Most of the manuals recommend that therapists have experience with children, experience with standardized testing A standardized test is a test administered and scored in a standard manner. The tests are designed in such a way that the "questions, conditions for administering, scoring procedures, and interpretations are consistent" [1]  procedures, and practice as important factors in learning their measures, but few manuals actually explain how to obtain the necessary skills to ensure competency. The Pediatric Evaluation of Disability Inventory[9] manual has the most detailed description of procedures for training. The authors of the manual advocate attending a training workshop; however, they also suggest methods of training with an experienced examiner. Overall, the authors recommend "high agreement" with an experienced examiner, but they do not specify a particular level of reliability. Case scenarios are also provided, which can be scored by test users to evaluate their reliability compared with the authors' scoring. Although formal training may be available for some of the measures reviewed, information on training is not presented in the Table if this information was not included in the test manual. We are not aware that any authors have evaluated the effects of training.

The Gross Motor Function Measure (GMFM GMFM Gross Motor Function Measure
GMFM Gauss-Markov Fading Model
) was developed for use by pediatric physical therapists as an evaluative measure for assessing change over time in gross motor function of children with cerebral palsy cerebral palsy (sərē`brəl pôl`zē), disability caused by brain damage before or during birth or in the first years, resulting in a loss of voluntary muscular control and coordination. . The GMFM is an 88-item, criterion-based observational measure that assesses motor function in five "dimensions": (1) lying and rolling; (2) sitting; (3) crawling and kneeling; (4) standing; and (5) walking, running, and jumping. Each item is scored on a four-point scale (0=does not initiate activity, 1=initiates activity, 2=partially completes activity, and 3=completes activity). Specific descriptions for how to score each item are found in the administration and scoring guidelines contained within the test manual,[11] which is available from the primary author (DJR). The results of the initial validation See validate.

validation - The stage in the software life-cycle at the end of the development process where software is evaluated to ensure that it complies with the requirements.
 work on the GMFM have been published.[12]

In the original GMFM validation study, reliability of administering and scoring the GMFM over two occasions was assessed. A small number of developmental pediatric physical therapists familiar with the development of the GMFM and trained in the use of the measure completed interrater (n=11) and intrarater (n=10) reliability testing on a sample of children with cerebral palsy. These children represented a spectrum of ages, diagnostic types, and severities. Using intraclass correlation In statistics, the intraclass correlation (or the intraclass correlation coefficient[1]) is a measure of correlation, consistency or conformity for a data set when it has multiple groups.  coefficients (ICC ICC

See: International Chamber of Commerce
[2,1]),[13] reliability estimates were calculated for each dimension as well as for total scores, and these values varied from .87 to .99.

Following minor revisions to the items and guidelines, a second reliability study using a balanced incomplete block design was completed by 16 developmental pediatric therapists using the original and the revised guidelines.[11] The therapists involved in this study were not regularly using the GMFM but had undergone some training. Although the initial reliability studies had all therapists administering and scoring the GMFM, this study required therapists to score from videotapes, The results of the study demonstrated ICCs of .75 to .97 between therapists scoring with the old and the new guidelines. Although the range of values for the reliability coefficients was greater in the study using videotapes, they were still high enough for us to conclude that trained therapists could score the modified GMFM reliably. Because all our reliability and validity data were collected using trained pediatric physical therapists who were involved in the development and validation of the measure, we needed to know whether training was generalizable to those therapists who would be typical users of the test (ie, clinicians working in children's treatment centers). Training would allow test users to determine their competency with scoring the GMFM and allow us to evaluate the value and impact of the training.

The purposes of this report are (1) to present data on the effects of training developmental pediatric clinicians in the use of the GMFM using videotapes for training and testing, and (2) to discuss some practical and methodological issues that arose and may be generalizable to other measurement training situations.

Method

A 1-day GMFM training program was developed. The workshops commenced with a description of the research background and psychometric psy·cho·met·rics  
n. (used with a sing. verb)
The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and
 properties of the GMFM, followed by a videotaped pretest pre·test  
n.
1.
a. A preliminary test administered to determine a student's baseline knowledge or preparedness for an educational experience or course of study.

b. A test taken for practice.

2.
 (40 minutes including pauses between items). Four children with cerebral palsy (1 with athetosis athetosis /ath·e·to·sis/ (ath?e-to´sis) repetitive involuntary, slow, sinuous, writhing movements, especially severe in the hands.

ath·e·to·sis
n.
, 2 with diplegia diplegia /di·ple·gia/ (di-ple´jah) paralysis of like parts on either side of the body.diple´gic

di·ple·gia
n.
Paralysis of corresponding parts on both sides of the body.
, and 1 with hemiplegia hemiplegia /hemi·ple·gia/ (-ple´jah) paralysis of one side of the body.hemiple´gic

alternate hemiplegia  paralysis of one side of the face and the opposite side of the body.
), varying in age from 2 to 14 years, were shown on the testing videotape videotape

Magnetic tape used to record visual images and sound, or the recording itself. There are two types of videotape recorders, the transverse (or quad) and the helical.
. An overview of general concepts in administering and scoring the test was followed by group discussion on the scoring issues of each GMFM item using videotaped examples (4 hours). The teaching videotape showed 3 children (1 with quadriplegia quadriplegia: see paraplegia. , 1 with diplegia, and 1 with hemiplegia), who varied in age from 6 to 14 years. Later in the afternoon, 45 minutes was spent viewing a videotape and discussing how to calculate a total score and issues related to goal setting. The day ended with the readministration of the same videotape used at the pretest. The pretest and posttest post·test  
n.
A test given after a lesson or a period of instruction to determine what the students have learned.
 scores were used to ascertain whether the training workshop had an impact on participants' ability to observe and score a videotaped GMFM assessment. The correct pretest and posttest scores were previously determined by three of the workshop trainers who viewed and independently scored the testing tapes using the GMFM administration and scoring guidelines. Disagreements were identified and discussed, and the videotaped activities were replayed until consensus on the correct or "criterion" score was achieved.

Before commencing the pretest, participants were given a GMFM manual and instructed to use the administration and scoring guidelines when scoring the test videotape. Prior to being shown the GMFM item on videotape, the item number and the number of trials they would see the child attempt for that item were identified. The tape was stopped between items to allow participants time to score and prepare for the next item. No items were replayed. This protocol was repeated using the same videotape for the posttraining test. At the time of the pretest, participants completed a questionnaire about their previous clinical experience and their familiarity and experience with the GMFM.

Prior to initiating training workshops, the plan was to develop three separate criterion videotapes, each approximately 20 minutes in duration. Each tape was to contain one third of the total number of items randomly selected from each of the five GMFM dimensions. These items were examined to ensure that a mixture of items was included (eg, a variety of starting positions, static and dynamic items). A sample of items from various levels of function (from children who were functioning primarily in the first GMFM dimension of lying and rolling activities to those capable of independent ambulation am·bu·late  
intr.v. am·bu·lat·ed, am·bu·lat·ing, am·bu·lates
To walk from place to place; move about.



[Latin ambul
) was shown on each tape. Items were grouped by dimension and shown in numerical order.

The first of the three videotapes was developed according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 plan and was used in the first three workshops. Results using this videotape are summarized in the "Training Study A" section. Upon close examination of the individual item scores from workshop participants, it became apparent that one person could have fewer errors than another but have a lower overall estimated kappa Kappa

Used in regression analysis, Kappa represents the ratio of the dollar price change in the price of an option to a 1% change in the expected price volatility.

Notes:
Remember, the price of the option increases simultaneously with the volatility.
. Because criterion videotape "A" did not sample equally all possible GMFM scores (0, 1, 2, and 3), a bias was created. Participants had one chance on videotape "A" to score 0 (indicating "does not initiate movement"), so that if they scored that item incorrectly, they were severely penalized pe·nal·ize  
tr.v. pe·nal·ized, pe·nal·iz·ing, pe·nal·iz·es
1. To subject to a penalty, especially for infringement of a law or official regulation. See Synonyms at punish.

2.
 (from a statistical point of view). When making the subsequent videotapes, this inequality inequality, in mathematics, statement that a mathematical expression is less than or greater than some other expression; an inequality is not as specific as an equation, but it does contain information about the expressions involved.  was corrected by sampling more equally across GMFM scores. The results using the second videotape are summarized in the "Training Study B" section. The third videotape had not been used extensively at the time this article was written.

Data Analysis

A weighted estimated kappa statistic statistic,
n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample.


statistic

a numerical value calculated from a number of observations in order to summarize them.
 with a quadratic quadratic, mathematical expression of the second degree in one or more unknowns (see polynomial). The general quadratic in one unknown has the form ax2+bx+c, where a, b, and c are constants and x is the variable.  weight was used to analyze chance-corrected reliability between the rater's scoring and the criterion scoring.[14] To get a composite measure of agreement across all categories, a weighted mean of the individual item kappas was calculated as described by Fleiss.[14] A kappa of 1.00 would indicate perfect agreement with the criterion scoring, and a kappa of 0.00 would be equal to chance agreement. A kappa statistic using a quadratic weight penalizes the rater rat·er  
n.
1. One that rates, especially one that establishes a rating.

2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater. 
 more the further away the rater is from the correct score. When a weighted kappa is calculated using quadratic weights, it yields results similar to the ICC and has a similar interpretation.[15]

A paired t test was used to examine the statistical significance of pretraining and posttraining estimated kappa scores, and an independent sample t test was used to compare the posttest estimated kappa scores with the criterion test scores. All t tests were two-tailed. A Pearson product-moment correlation (r) was used to measure the relationship of criterion scores with previous clinical experience and previous experience with the GMFM using SPSS/PC + version 4.0.[16] The .05 level was used to test for statistical significance.

Results

Training Study A

The data for training study A were derived from a total of 76 participants who attended the first three workshops. Eighty-six percent of the participants were physical therapists, 13% were occupational therapists occupational therapist A person trained to help people manage daily activities of living–dressing, cooking, etc, and other activities that promote recovery and regaining vocational skills Salary $51K + 4% bonus. See ADL. , and 1% were kinesiologists. Workshop participants had a mean of 7.7 years of neurological neurological, neurologic

pertaining to or emanating from the nervous system or from neurology.


neurological assessment
evaluation of the health status of a patient with a nervous system disorder or dysfunction.
 pediatric experience, which varied from 0 to 25 years. The criterion kappa score for this videotape was set at.70, based on experience from training with the Gross Motor Performance Measure.17

A paired t test comparing the pretraining and posttraining scores to determine whether scores were significantly different for the total group (n=76) showed a statistically significant improvement in reliability, from a mean estimated kappa of .58 to .82 (t=15.38, df=75, P<.001). The posttest mean estimated kappa of .82 was also significantly higher than the criterion of.70 (t=10.2, df=75, P<.001). Eight percent of the workshop participants reached the criterion level of reliability on the pretest, and 84% reached the criterion on the posttest.

Training Study B

The data for training study B came from 73 participants who attended four workshops that followed those reported in training study A. Eighty-three percent of the participants were physical therapists, 13% were occupational therapists, and 4% were early childhood educators This article or section is in need of attention from an expert on the subject.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
. Workshop participants had a mean of 6.6 years of neurological pediatric experience, which varied from 0 to 20 years. Based on experience from training study A and guidelines in the literature regarding acceptable levels of reliability,[18] the criterion kappa score for this second videotape was raised from .70, indicating good reliability, to .80, indicating excellent reliability.

The results of the paired t test comparing the pretraining and posttraining scores of the total group (n=73) showed a statistically significant improvement in reliability, from a mean estimated kappa of .81 to .92 (t=10.91, df=72, P<.001). The posttest mean estimated kappa of .92 was significantly higher than the criterion of .80 (t=18.64, df=72, P<.001). Sixty-three percent of the workshop participants reached the criterion level of reliability on the pretest, and 100% reached the criterion level on the posttest. Of the 46 workshop participants who reached the criterion level on the pretest, there was still a significant improvement in criterion scores after training, from a mean estimated kappa of .86 to .93 (t=7.86, df=45, P<.001).

The number of years of pediatric neurological experience was correlated cor·re·late  
v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates

v.tr.
1. To put or bring into causal, complementary, parallel, or reciprocal relation.

2.
 with the estimated kappa values for the entire sample (n=149) to determine whether experienced clinicians were more reliable than less experienced clinicians. This was not the case at the pretest, with years of pediatric neurological experience correlating at r=-.04 (t=0.54, df=147, P>.05). At the posttest, there was a small negative correlation Noun 1. negative correlation - a correlation in which large values of one variable are associated with small values of the other; the correlation coefficient is between 0 and -1
indirect correlation
 of r=-.16 (t=1.92, df=147, P<.05), which was statistically significant but accounts for less than 3% of the variance. There was no significant correlation r=-.06, t=-0.84, df=147, P>.05) between years of pediatric neurological experience and improvement in estimated kappa scores.

Discussion

The results of these studies demonstrate that clinicians who attend a 1-day GMFM training workshop improve their scoring reliability significantly when tested using videotaped assessments. Note, however, that the methods used in this study relate to evaluating the reliability of scoring a videotape and do not take into account other sources of variability that are present when clinicians are assessing children in the clinic (eg, variability due to different testers, children, and environments). Whether reliability values obtained using videotapes are higher or lower than those obtained using real-life assessments was not addressed as part of this study. There are good reasons to believe either that real-life assessments might be easier to perform and hence more reliable (eg, because more information is available to the examiner than can be provided on a videotape) or that these assessments are more difficult to perform and hence less reliable (eg, because the assessor must simultaneously test and score), so that empirical studies Empirical studies in social sciences are when the research ends are based on evidence and not just theory. This is done to comply with the scientific method that asserts the objective discovery of knowledge based on verifiable facts of evidence.  of these issues are needed to address these questions appropriately. Results from reliability work with the Gross Motor Performance Measure show high levels of intrarater reliability when therapists administer and score an assessment and then rescore the videotape of the same assessment 6 weeks later. Boyce et al[17] report ICCs (2, 1) varying from .90 to .97 on individual attribute scores and .93 overall.

There was a marked difference in the pretest reliability scores and in the number of people reaching the criterion level of reliability, depending on which testing videotape was used to assess scoring. Higher pretest scores in training study B could have been due to the finding that more than twice as many participants in training study B had reported reading the administration guidelines prior to the workshop (23% as compared with 10% in training study A). We have not yet evaluated separately this particular source of variation in trainee skill (ie, how much familiarity with the material prior to the training workshop influences success in reaching a criterion level of reliability). Further work is needed to assess whether a certain amount of practice with the measure prior to testing would be sufficient to reach a criterion level of reliability, without the need of a formal workshop. It is important to note that although 63% of therapists reached the criterion score on the pretest in training study B, their scores still improved significantly following the workshop. Interestingly, our results showed that years of pediatric experience had little effect on the participants' ability to learn to administer the GMFM, and we believe that from what we currently know, years of experience should not preclude pre·clude  
tr.v. pre·clud·ed, pre·clud·ing, pre·cludes
1. To make impossible, as by action taken in advance; prevent. See Synonyms at prevent.

2.
 people from undergoing the training process.

The videotape used in training study B appeared to be much easier to score than the videotape used in training study A. Things we learned after preparing the first videotape were used to improve the second videotape. These improvements included a longer "lead-in" time prior to the desired movement and a more equal sampling of scores across the total number of response options. By chance, we may also have sampled some items with less contentious scoring issues on the second videotape.

An important consideration in all reliability studies is the need to sample the range of performance across the range of items. For example, if therapists were determining their agreement of scoring the GMFM with a child who was an independent ambulator, it is likely that the child would score "3" (completes independently) on most items in the first three dimensions (lying and rolling, sitting, crawling and kneeling). Therapists would have a high level of agreement strictly because there was little room for disagreement. A more credible estimate of whether therapists agree would be determined by sampling more items for which the child is likely to have a mixture of item scores (0s, 1s, 2s, and 3s), as would be the case (in this example) in the higher two GMFM dimensions (standing; walking, running, and jumping). By including samples of GMFM items from children performing across the spectrum of function, a more realistic estimate of agreement is obtained.

Primary purveyors of measures usually spend a great deal of time developing and validating val·i·date  
tr.v. val·i·dat·ed, val·i·dat·ing, val·i·dates
1. To declare or make legally valid.

2. To mark with an indication of official sanction.

3.
 a new instrument, and collecting normative nor·ma·tive  
adj.
Of, relating to, or prescribing a norm or standard: normative grammar.



nor
 data. Generally, a much smaller amount of effort is directed toward issues of training. Although clinicians have a responsibility to acquire the necessary training before using a new measure, it is often not clear what the necessary training is, or how to acquire it in a systematic and effective manner. The time and cost associated with setting up a training package have likely been deterrents to its development.

Several methodological issues were considered in planning this evaluation of impact of the GMFM training program. Precautions precautions Infectious disease The constellation of activities intended to minimize exposure to an infectious agent; precautions imply that the isolation of an infected Pt is optional, but not mandatory.  were taken to minimize a learning effect as a result of doing the pretest. Workshop participants were not given any feedback on their performance on the test either following the pretest or during the workshop. A separate videotape with different children was used during the training. The pretest videotape was used again at the posttest. Had we used different testing videotapes, this would have added another source of variation, and any differences in pretest-posttest scores might have been due to variability in the videotapes.

Although written feedback from participants indicated the training workshops were beneficial for them, with each new testing videotape (encompassing different items and therefore different issues), the workshop trainers learned more about problematic wording and scoring issues with the GMFM. This has allowed for revisions to the test manual[19] (available from the primary author) to highlight difficult training issues currently being dealt with in the workshop. We do not yet know whether the second edition of the manual will provide untrained users of the GMFM with a clearer set of directions for self-learning of the measure. To make training more accessible to therapists who are unable to attend a workshop, the Gross Motor Measures Group has developed a videodisc videodisc or videodisk, disk used with a special player and television to reproduce both pictures and sound. A videodisc player cannot record television programs off the air for later playback, unlike a videocassette recorder (VCR) or recordable  training package that contains videotape examples of children similar to those used for a workshop, along with a written commentary. This method will need to be evaluated to determine whether individuals learning by the videodisc can reach similar levels of scoring reliability as workshop participants.

There are a number of disadvantages and advantages to the use of videotapes as a medium for training and evaluating new users of a test such as the GMFM. One of the main disadvantages of using criterion videotapes to assess reliability is that this method is only testing the participant's ability to score the videotaped test reliably and is not providing an indication of the assessor's ability to administer and score the test in a clinical situation. For example, can the examiner elicit e·lic·it  
tr.v. e·lic·it·ed, e·lic·it·ing, e·lic·its
1.
a. To bring or draw out (something latent); educe.

b. To arrive at (a truth, for example) by logic.

2.
 appropriate responses from the examinee as well as score them reliably? This is particularly important for a test that involves direct observation of performance rather than being scored from videotaped assessments. This aspect of learning and performing the GMFM needs to be studied further by examining the reliability of workshop participants in a clinical situation and comparing the reliability with that achieved in the workshop.

Another problem with using videotapes is the quality of videotaping, in particular, the ability to capture on videotape, from the best possible camera angle, the movement the therapist is trying to test. Experience has shown it may be more difficult to judge whether a child is "initiating" a movement from videotape or from real life. We relied on the use of expert audiovisual personnel to develop our training materials in an effort to address and overcome these problems.

There are, however, a number of advantages to using videotapes as a method of assessing reliability. First, it is possible to evaluate the effects of an intervention (such as a training workshop) in a standardized standardized

pertaining to data that have been submitted to standardization procedures.


standardized morbidity rate
see morbidity rate.

standardized mortality rate
see mortality rate.
 manner. Second, the use of videotapes allows an efficient means of assessing several patients of varying diagnostic and functional levels while eliminating the issue of patient compliance. This advantage is particularly appealing when dealing with children. Videotapes can be edited to ensure they are capturing different training issues and covering an appropriate spectrum of function. Third, by having a criterion testing videotape with the "correct" score, as determined by experts, the therapist can ensure that responses are not only reliable but valid. For example, if therapists in the clinical setting learn an assessment together, they may make an administration or scoring decision that is different from the intent of the test developer. When the therapists then assess interrater reliability, it may be high because everyone agrees on how to score, but the score is not the correct (valid) one. Finally, another use for criterion testing videotapes is to have an easy method of assessing ongoing levels of competency. Tests can be completed at regular intervals to ensure that high levels of reliability are maintained over time. Gross20 and Gross and Conrad[21] offer further discussion of the advantages and disadvantages of using videotape to capture observational data.

The issue of how reliable is reliable enough is an interesting one. Although a number of guidelines are suggested in the literature,[18,22,23] this is still an arbitrary decision. Streiner and Norman[23] Suggest that an acceptable level of reliability is dependent on the size of the sample, and they point out that clinical assessments used to make decisions on individuals need to be more reliable than those using grouped data. This is because data that are grouped (as in research studies) and used as the mean of several individuals have smaller measurement error. A reliability coefficient coefficient /co·ef·fi·cient/ (ko?ah-fish´int)
1. an expression of the change or effect produced by variation in certain factors, or of the ratio between two different quantities.

2.
 itself does not let the therapist know how many errors were made, which is why we provided participants who did not reach our preset preset Cardiac pacing A parameter of a pacemaker that is programmed permanently when manufactured  criterion level with feedback on individual item problems. A test does not have a single level of reliability; therefore, identifying sources of variability is useful because this information can be used to try and reduce large sources of error variance.23 As clinicians and researchers, we want to be as reliable as possible to make valid decisions about the management of children. In our work, we chose to increase the criterion level of reliability required with the second videotape based on our experience with the first videotape and to ensure a more rigorous level of reliability.

As we have tried to illustrate in this communication, there are methodologic and design features that can be used to address these issues. It is clear that as much care is needed in the preparation and testing of training in the use of a test as in its creation and validation. We believe that primary purveyors have a responsibility to their clinical colleagues and that they can learn useful and important lessons about their measure while providing training in its use.

Summary

We have shown that training improves workshop participants' agreement of scoring a videotaped GMFM assessment. Although there are a number of advantages to using videotapes to train test users and assess scoring reliability, this method does not evaluate participants' ability to administer the measure. Therefore, further work is needed to determine whether reliability is maintained in a clinical situation in which it is necessary to both administer and score the GMFM.

Acknowledgments

We gratefully acknowledge the contribution of data and thoughtful comments and questions from participants of the training workshops. We also thank Jim Chen Jim Chen is the current Dean of the University of Louisville Brandeis School of Law, after recently leaving his position as professor of law at the University of Minnesota Law School.  for his computer assistance and Marilyn Marshall and Gerry Karlovic for the preparation of this manuscript.

References

[1] Task Force on Standards for Measurement in Physical Therapy. Standards for tests and measurements in physical therapy practice. Phys Ther. 1991;71:589-622. [2] Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology epidemiology, field of medicine concerned with the study of epidemics, outbreaks of disease that affect large numbers of people. Epidemiologists, using sophisticated statistical analyses, field investigations, and complex laboratory techniques, investigate the cause ; A Basic Science for Clinical Medicine. 2nd ed. Boston, Mass: Little, Brown and Co Inc; 1991. [3] Rothstein JM, Measurement and clinical practice: theory and application. In: Rothstein JM, ed. Measurement in Physical Therapy, New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
, NY: Churchill livingstone Imprint of a medical publishing company owned by Elsevier Ltd, but previously owned by Harcourt and Pearsons. Originally formed from Livingstone, Edinburgh, Scotland, and J & A Churchill, London, UK, and subsequently with an office in New York, but now integrated with the rest of  Inc; 1985:1-46. [4] Stengel TJ. Assessing motor development in children. In: Campbell SK, ed. Pediatric Neurologic Physical Therapy. New York, NY: Churchill livingstone Inc; 1991:33-65. [5] Bayley N. Bayley Scales of Infant Development. New York, NY: The Psychological Corporation; 1969:5 [6] Bruininks RH. Bruininks-Oseretsky Test of Motor Proficiency. Circle Pines, Minn: American Guidance Service; 1978:42. [7] Folio (1) Text management software for the professional reference publishing market from Fast Search & Transfer, Oslo, Norway and Boston, MA (www.fastsearch.com). Known as FAST Folio since its acquisition in 2004 from NextPage, Inc.  MR, Fewell RR. Peabody Developmental Motor Scales and Activity Cards. Allen, Tex: DLM-Teaching Resources; 1983:13. [8] Miller LJ. Miller Assessment for Preschoolers. Littleton, Colo: The Foundation for Knowledge in Development; 1982:3. [9] Haley SM, Costner WJ, Ludlow LH, et al. Pediatric Evaluation of Disability Inventory (PEDI PEDI Pediatric Evaluation of Disability Inventory
PEDI Protocol for Electronic Data Interchange
): Development, Standardization standardization

In industry, the development and application of standards that make it possible to manufacture a large volume of interchangeable parts. Standardization may focus on engineering standards, such as properties of materials, fits and tolerances, and drafting
 and Administration Manual (Version 1). Boston, Mass: New England New England, name applied to the region comprising six states of the NE United States—Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, and Connecticut. The region is thought to have been so named by Capt.  Medical Center Hospital; 1992:80-88. [10] Knobloch H, Stevens S Stevens, family of U.S. inventors.

John Stevens, 1749–1838, b. New York City, was graduated from King's College (now Columbia Univ.) in 1768.
, Malone AF. Manual of Developmental Diagnosis. Rev ed. New York, NY: Harper & Row; 1980. [11] Russell DJ, Rosenbaum PL, Gowland C, et al. Manual for the Gross Motor Function Measure: A Measure of Gross Motor Function in Cerebral Palsy. Hamilton, Ontario, Canada: McMaster University McMaster University, at Hamilton, Ont., Canada; nondenominational; founded 1887. It has faculties of humanities, science, social sciences, business, engineering, and health sciences, as well as a school of graduate studies and a divinity college. ; 1990. [12] Russell DJ, Rosenbaum PL, Cadman DT, et al. The gross motor function measure: a means to evaluate the effects of physical therapy. Dev Med Child Neurol. 1989;31:341-352. [13] Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420-428. [14] Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley John Wiley may refer to:
  • John Wiley & Sons, publishing company
  • John C. Wiley, American ambassador
  • John D. Wiley, Chancellor of the University of Wisconsin-Madison
  • John M. Wiley (1846–1912), U.S.
 Sons Inc; 1981:219. [15] Kramer MS, Feinstein AR. Clinical Biostatistics biostatistics /bio·sta·tis·tics/ (-stah-tis´tiks) biometry.

bi·o·sta·tis·tics
n.
The science of statistics applied to the analysis of biological or medical data.
 LIV: the biostatistics of concordance concordance /con·cor·dance/ (-kord´ins) in genetics, the occurrence of a given trait in both members of a twin pair.concor´dant

con·cor·dance
n.
. Clin Pharmacol Ther. 1981;29:111-123. [16] Norusis MJ. SPSS/PC+ Statistics 4.0 for the IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries)  PC/XT/AT and PS2. Chicago, Ill: SPSS A statistical package from SPSS, Inc., Chicago (www.spss.com) that runs on PCs, most mainframes and minis and is used extensively in marketing research. It provides over 50 statistical processes, including regression analysis, correlation and analysis of variance.  Inc; 1990. [17] Boyce WF, Gowland C, Rosenbaum PL, et al. Gross Motor Performance Measure for children with cerebral palsy: study design and preliminary findings. Can J Public Health. 1992;83(suppl):S34-S40. [18] Landis JR, Koch GG. The measurement of observer agreement for categorial data. Biometrics The biological identification of a person. Examples are face, iris and retinal patterns, hand geometry and voice. Increasingly built into laptop computers, fingerprint readers have become popular as a secure method for identification. . 1977;33:159-174. [19] Russell D, Rosenbaum P, Gowland C, et al. Gross Motor Function Measure Manual. 2nd ed. Hamilton, Ontario, Canada: McMaster University; 1993. [20] Gross D. Issues related to validity of videotaped observational data. West J Nurs Res. 1991;13:658-663. [21] Gross D, Conrad B. Issues related to reliability of videotaped observational data. West J Nurs Res. 1991;13:798-803. [22] Law M. Measurement in occupational therapy: scientific criteria for evaluation. Canadian Journal of Occupational Therapy. 1987;54: 133-138. [23] Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. New York, NY: Oxford University Press Inc; 1989

DJ Russell, MSc, is Research Coordinator (NCRU), Department of Clinical Epidemiology and Biostatis Faculty of Health Sciences, McMaster University, Bldg 74, Chedoke Campus, Hamilton, Ontario, Canada L8N 3Z5.

PL Rosenbaum, MD, FRCP FRCP Fellow of the Royal College of Physicians.

FRCP
abbr.
Fellow of the Royal College of Physicians
(C), is Professor, Department of Pediatrics pediatrics (pēdēă`trĭks), branch of medicine dedicated to the attainment of the best physical, emotional, and social health for infants, children, and young people generally. , Faculty of Health Sciences, McMaster University, and Director of Pediatrics, Chedoke Child and Family Centre, Chedoke-McMaster Hospitals, Hamilton, Ontario, Canada L8N 3Z5. Address all correspondence to Dr Rosenbaum.

M Lane, Dip-P&OT, is Physiotherapist physiotherapist /phys·io·ther·a·pist/ (-ther´ah-pist) physical therapist.

physiotherapist

physical therapist.
, Clinical Consultant to Halton Parent-Infant Program, Oakville, and Gross Motor Measures Group, Hamilton, Ontario, Canada L6J 6E1.

C Gowland, MHSc, PT, is Associate Professor, School of Occupational Therapy and Physiotherapy physiotherapy: see physical therapy. , Faculty of Health Sciences, McMaster University, and Research Manager, Physiotherapy Department, Chedoke-McMaster Hospitals.

CH Goldsmith, PhD, is Professor, Department of Clinical Epidemiology and Biostatistics, Faculty of Health Sciences, McMaster University.

WF Boyce, Msc, PT, is Assistant Professor, School of Rehabilitation rehabilitation: see physical therapy.  Therapy, Queen's University Queen's University, at Kingston, Ont., Canada; nondenominational; coeducational; founded 1841 as Queen's College. It achieved university status in 1912. It has faculties of arts and sciences, education, law, medicine, and applied science, as well as schools of , Kingston, Ontario Kingston, Ontario, is a Canadian city located at the eastern end of Lake Ontario, where the lake runs into the St. Lawrence River and the Thousand Islands begin.

Kingston is the county seat of Frontenac County.
, Canada K7L 5G2.

N Plews, BHSc, PT, is Research Physiotherapist, Chedoke-McMaster Hospitals.

This article was submitted February 8, 1993, and was accepted January 24, 1994.
COPYRIGHT 1994 American Physical Therapy Association, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1994, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Plews, Nancy
Publication:Physical Therapy
Date:Jul 1, 1994
Words:5268
Previous Article:The experience of spinal cord injury: the individual's perspective - implications for rehabilitation practice. (includes commentary and author...
Next Article:A comparison of gait characteristics in young and old subjects. (includes commentary and author response)
Topics:



Related Articles
Issues in measuring change in motor function in children with cerebral palsy: a special communication.
Measuring quality of movement in cerebral palsy: a review of instruments.
Development of a quality-of-movement measure for children with cerebral palsy.
Reliability of the Gross Motor Performance Measure.
The Gross Motor Performance Measure: validity and responsiveness of a measure of quality of movement.
Changes in the Gross Motor Function Measure in Children With Different Types of Cerebral Palsy: An Eight-Month Follow-up Study.
Improved Scaling of the Gross Motor Function Measure for Children With Cerebral Palsy: Evidence of Reliability and Validity.
Validation of a Model of Gross Motor Function for Children With Cerebral Palsy.
Effects of a Functional Therapy Program on Motor Abilities of Children With Cerebral Palsy.(Statistical Data Included)
Gross motor capability and performance of mobility in children with cerebral palsy: a comparison across home, school, and outdoors/community...

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles