A comparison of five low back disability questionnaires: reliability and responsiveness. (Research Report).The restoration of normal function is considered a key outcome of physical therapy for low back problems. (1,2) Physical therapists, therefore, need measurement tools that accurately assess function and monitor change over time. Activity limitations are defined in the World Health Organization's International Classification of Functioning; Disability and Health [ICIDH-2] as "difficulties an individual may have in executing activities." (3) Impairments such as decreased range of movement and reduced straight leg raise The Straight leg raise also, called Lasègue sign or Lasègue test, is a test done during the physical examination to determine whether a patient with low back pain has an underlying herniated disk. can be observed by therapists. However, direct observation of activity limitation is impractical im·prac·ti·cal adj. 1. Unwise to implement or maintain in practice: Refloating the sunken ship proved impractical because of the great expense. 2. , and physical therapists often rely on clients' self-report to assess the impact of low back pain on daily activities. Physical therapists routinely collect information on activity limitations in the course of their assessments, but the data may not always be collected in a standardized standardized pertaining to data that have been submitted to standardization procedures. standardized morbidity rate see morbidity rate. standardized mortality rate see mortality rate. format that yields a measurement with known reliability and validity. (4,5) Standardized self-report questionnaires provide a convenient method of collecting and synthesizing a large amount of information on activity limitation. (1,2) Many questionnaires have been developed to measure activity limitations in people with low back pain, but there is little evidence that physical therapists routinely use these tools. One of the barriers to their widespread clinical use is the proliferation proliferation /pro·lif·er·a·tion/ (pro-lif?er-a´shun) the reproduction or multiplication of similar forms, especially of cells.prolif´erativeprolif´erous pro·lif·er·a·tion n. of similar questionnaires. (1,6,7) A search of MEDLINE The online medical database of the U.S. National Library of Medicine (NLM) whose parent is the National Institutes of Health, Bethesda, MD. MEDLINE contains millions of articles from thousands of medical journals and publications. The consumer section of the site (http://medlineplus. and CINAHL CINAHL Cumulative Index to Nursing and Allied Health Literature databases, the reference lists of retrieved articles, and published compilations of outcome measures located 24 low back region-specific questionnaires. There are also a number of generic health status measures available. Region-specific questionnaires for low back pain are thought to have the advantage of to have a personal knowledge of one who does not have a reciprocal knowledge. - Clarendon. See also: Advantage containing only items that are relevant to people with low back problems, whereas generic tools can be used across a wide range of conditions. In the study reported in this article, we judged a questionnaire as having potential clinical utility if it could be self-administered, was brief and easy to complete, was simple to score, and had not been shown to have serious floor or ceiling effects in a general ambulatory Movable; revocable; subject to change; capable of alteration. An ambulatory court was the former name of the Court of King's Bench in England. It would convene wherever the king who presided over it could be found, moving its location as the king moved. clinical population. We also wanted the questionnaire to have adequate content validity content validity, n the degree to which an experiment or measurement actually reflects the variable it has been designed to measure. (ie, relevant ICIDH-2 categories were represented) and evidence of credible construct validity construct validity, n the degree to which an experimentally-determined definition matches the theoretical definition. and good reliability. Five questionnaires met these criteria: the modified Oswestry Oswestry (ŏz`wĕstrē, –wəs–), town (1991 pop. 12,448) and district, Shropshire, W central England. The market town has plastics, clothing, and printing industries. The area is named for St. Disability Questionnaire, (8,9) the Quebec Quebec, city, Canada Quebec, Fr. Québec, city (1991 pop. 167,517), provincial capital, S Que., Canada, at the confluence of the St. Lawrence and St. Charles rivers. Back Pain Disability Scale, (10) the Roland-Morris Disability Questionnaire, (11) the Waddell Waddell is a common surname and may refer to:
adj. Of, relating to, or prescribing a norm or standard: normative grammar. nor data are available in many countries. (15,16) In addition, we believe that if a generic questionnaire can be shown to perform as well as a condition-specific questionnaire, then it becomes redundant to use both condition-specific and generic questionnaires. The selected questionnaires have also been identified by other authors (1,2,10,17-19) as suitable for use in physical therapist practice. Scores have been shown to be correlated cor·re·late v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates v.tr. 1. To put or bring into causal, complementary, parallel, or reciprocal relation. 2. with related variables such as pain intensity and physical impairments and have also been demonstrated to detect change in functioning over time. (10,18,20-22) It is important that the measurement properties of questionnaires are derived from or confirmed on samples from the population on whom the measurements will be used in clinical practice. (14,23) This is particularly the case for studies of reliability and responsiveness because the results of these studies provide the information required for interpreting the scores of individuals. Client groups receiving the services of other health care professionals (eg, orthopedic orthopedic /or·tho·pe·dic/ (-pe´dik) pertaining to the correction of deformities of the musculoskeletal system; pertaining to orthopedics. surgeons) are unlikely to be representative of the population receiving physical therapy. Much of the information currently available on the reliability of measurements obtained with and responsiveness of the 5 questionnaires is from studies that drew samples from clinical populations other than patients receiving physical therapy, (8,9,11,12,14,24) from only 1 or 2 physical therapy practices or hospital departments, (17,18,25) or from both physical therapy and medical treatment centers. (10) Little information is currently available on the reliability of measurements obtained with and responsiveness of the Quebec and Waddell questionnaires, and no studies have demonstrated the reliability of measurements obtained with and responsiveness of these 5 questionnaires when concurrently administered to clients receiving treatment from physical therapists in a range of clinical settings. The aim of this study, therefore, was to compare the reliability of measurements obtained with and responsiveness of the modified Oswestry, Quebec, Roland-Morris, and Waddell questionnaires and the SF-36 physical health scales in an ambulatory clinical population seeking physical therapy for low back pain in hospital outpatient outpatient /out·pa·tient/ (-pa-shent) a patient who comes to the hospital, clinic, or dispensary for diagnosis and/or treatment but does not occupy a bed. out·pa·tient n. departments, community clinics, and private practices. High test-retest reliability test-retest reliability Psychology A measure of the ability of a psychologic testing instrument to yield the same result for a single Pt at 2 different test periods, which are closely spaced so that any variation detected reflects reliability of the instrument coefficients have generally been reported for the scores obtained with the 5 questionnaires. For the original Oswestry questionnaire, values of r =.99 over 24 hours (8) and ICC ICC See: International Chamber of Commerce =.94 over 1 to 14 days (10) are typical. Baker et al (9) reported a reliability coefficient coefficient /co·ef·fi·cient/ (ko?ah-fish´int) 1. an expression of the change or effect produced by variation in certain factors, or of the ratio between two different quantities. 2. of r=.89 for a same-day test-retest Test-retest is a statistical method used to examine how reliable a test is: A test is performed twice, e.g., the same test is given to a group of subjects at two different times. of the modified Oswestry questionnaire. Kopec and colleagues (10) reported the test-retest reliability for measurements obtained with the Quebec scale as ICC (2,1)=.93 over 1 to 14 days. For the Roland-Morris questionnaire, reported reliability estimates ranged from .91 for same-day administration, (11) ICC=.93 for 1 to 14 days, (10) and ICC=.86 over 3 to 6 weeks. (25) No test-retest reliability studies have been reported for the Waddell index, although one research group (12) reported interrater reliability (kappa Kappa Used in regression analysis, Kappa represents the ratio of the dollar price change in the price of an option to a 1% change in the expected price volatility. Notes: Remember, the price of the option increases simultaneously with the volatility. >.60) for each of the 9 questions administered by interview. For the Physical Functioning scale of the SF-36, Kopec and colleagues (10) reported an ICC=.73 over 1 to 14 days. Patrick et al (22) reported ICCs of .89, .89, and .67 for the SF-36 Physical Functioning, Role Limitations-Physical, and Bodily Pain scales, respectively, over a period of 3 months. In those studies where test-retest reliability was evaluated over longer periods, only data from subjects who were classified as "unchanged" based on patient ratings (10,22) or patient and therapist ratings on a retrospective LAW, RETROSPECTIVE. A retrospective law is one that is to take effect, in point of time, before it was passed. 2. Whenever a law of this kind impairs the obligation of contracts, it is void. 3 Dall. 391. change scale (25) were included. The reliability coefficient reported as a value between 0 and 1 does not allow us, in our view, to judge whether the measurement has sufficient reliability for a particular purpose. To examine the effects of intervention A procedure used in a lawsuit by which the court allows a third person who was not originally a party to the suit to become a party, by joining with either the plaintiff or the defendant. , a therapist needs to know when change in an observed score indicates that real change has occurred. This is called the "minimum detectable change" (MDC (1) (Mobile Daughter Card) See riser card. (2) See Meta Data Coalition. ) and has been defined by Stratford Stratford, estate, United States Stratford, home of the Lee family, overlooking the Potomac River, E Va., SE of Fredericksburg. A national shrine dedicated in 1935, the site was purchased in 1716 by Thomas Lee, who built the mansion Stratford Hall in et al (17) as the amount of change required to be 90% confident that an observed change in scores reflects real change in the underlying variable. Stratford and colleagues (17,25) have reported the MDC for the Roland-Morris questionnaire as 4 to 5 points. No authors have reported the MDC for the modified Oswestry, Quebec, and Waddell questionnaires or the SF-36 physical health scales. No improvement can be detected for an individual who has the best possible score prior to treatment, and no worsening wors·en tr. & intr.v. wors·ened, wors·en·ing, wors·ens To make or become worse. Noun 1. worsening - process of changing to an inferior state decline in quality, deterioration, declension can be detected for an individual who has the worst possible score on a particular scale. The lowest and highest possible scores are called the "floor" and "ceiling" of the scale. McHorney and Tarlov (26) suggested that health surveys with more than 15% of respondents scoring the lowest or highest possible score initially should not be used. However, because we believe an observed change in scores must be at least equal to the MDC to be 90% confident that the observed change is not simply due to measurement error, we propose that questionnaires with more than 15% of respondents scoring within the MDC at the upper or lower end of the available range of scores should not be used. For example, we believe that if a questionnaire has a possible range in scores from 0 to 100 and an MDC of 15 points, then no more than 15% of subjects should score less than 15 or more than 85. In this way, the MDC can be useful not only for interpreting change in questionnaire scores but also for providing a benchmark for choosing a measurement tool that is practical for use with a particular clinical population. In this article, we use the term "scale width" to indicate the capacity of a scale to have initial scores that are far enough onto the scale to allow detection of change in scores over time. Responsiveness refers to the ability of a measurement tool to detect meaningful change over time and is also called "sensitivity to change." (23) Many methods have been proposed to explore the responsiveness of questionnaires, (27) and all involve the administration of the questionnaire before and after a period of time (usually when the participants are receiving treatment) during which it is expected that function will improve. Methods of exploring responsiveness can be classified either as those that measure change alone (distribution-based methods) or those that measure clinically meaningful change (criterion-based methods). (27,28) Criterion-based methods require that a judgment be made as to whether clinically meaningful change has occurred over the retest re·test tr.v. re·test·ed, re·test·ing, re·tests To test again. n. A second or repeated test. period. This is often achieved by having the participants rate the overall amount of change they have experienced. (10,20,21,25) In 3 studies, (10,20,21) various combinations of questionnaires were administered to people who were receiving physical therapy, and the questionnaires' responsiveness was studied. The Oswestry and Roland-Morris questionnaires were compared by Stratford et al (21) in Ontario Ontario, city, United States Ontario, city (1990 pop. 133,179), San Bernardino co., S Calif., near Los Angeles, in a region of vineyards; inc. 1891. , Canada Canada (kăn`ədə), independent nation (2001 pop. 30,007,094), 3,851,787 sq mi (9,976,128 sq km), N North America. Canada occupies all of North America N of the United States (and E of Alaska) except for Greenland and the French islands of , and by Beurskens et al (20) in the Netherlands Netherlands (nĕth`ərləndz), Du. Nederland or Koninkrijk der Nederlanden, officially Kingdom of the Netherlands, constitutional monarchy (2005 est. pop. 16,407,000), 15,963 sq mi (41,344 sq km), NW Europe. . Kopec et al (10) in Quebec, Canada, examined reliability of measurements from and responsiveness of the Oswestry, Quebec, and Roland-Morris questionnaires and the SF-36 Physical Functioning scale, but only 65% of the subjects were seen by physical therapists. In all 3 studies, the questionnaires were administered on 2 occasions, and a global change scale was used as the criterion for meaningful change: Direct comparison of these 3 studies is hampered by differences in subject characteristics, the use of different retest periods, differing interventions and global change scales, and the variety of strategies for classifying subjects as "changed" or "unchanged." Of these 3 studies, only Stratford and colleagues (21) tested whether there were differences in observed responsiveness between the questionnaires used in the study. The conclusions of the other groups of authors were based only on the rank order of the magnitude of the particular responsiveness index used. However, without statistical testing of the difference between questionnaires, it is not clear whether observed differences are likely to reflect genuine or chance variations. (27) Method This was a prospective, multi-site study with repeated measurements taken when subjects entered the study and 6 weeks later. Over a 5-month period, consecutive eligible patients were invited by their treating therapist to participate in the study. Patients were eligible if they were aged 18 years or older, were able to read and write English 1. English - (Obsolete) The source code for a program, which may be in any language, as opposed to the linkable or executable binary produced from it by a compiler. The idea behind the term is that to a real hacker, a program written in his favourite programming language is , were seeking treatment for a complaint of low back pain, and provided written informed consent. We defined low back pain as pain in the lumbar region (Anat.) the region of the loin; specifically, a region between the hypochondriac and iliac regions, and outside of the umbilical region. See also: Lumbar with or without referral of pain to the lower extremities lower extremity n. The hip, thigh, leg, ankle, or foot. Also called inferior limb, pelvic limb. . Subjects were recruited from the physical therapy outpatient departments of 3 hospitals, 3 community health services health services Managed care The benefits covered under a health contract , and 4 private physical therapy practices. The 10 health care agencies from which the subjects came represented, in our view, the range of settings where physical therapy services are delivered to patients with low back pain who were ambulatory and were located in urban areas of high, middle, and low socioeconomic status socioeconomic status, n the position of an individual on a socio-economic scale that measures such factors as education, income, type of occupation, place of residence, and in some populations, ethnicity and religion. . Subjects who consented to participate in the study were given a package of questionnaires at the recruitment site, with a reply paid envelope for returning the questionnaires by mail. After 6 weeks, a second set of questionnaires was sent by mail to the subjects. On both occasions, questionnaires were presented in random order as determined by a random numbers table. The battery of questionnaires were bundled together with a paper clip. The forms were scan-forms and therefore could not be stapled. Because completion of questionnaires was unsupervised, there was no way of knowing whether subjects completed the questionnaires in the order in which they were presented. A reminder was mailed if the second set of questionnaires was not returned within 10 days. A 6-week retest interval was chosen for both the reliability and responsiveness studies. We agree with other authors (22,25) who contend that the variability in scores over a typical clinical retest period is more likely to reflect true variability in scores than that found with very short retest periods. We believe that 6 weeks is commonly used in practice as a time for comprehensive reassessment Reassessment The process of re-determining the value of property or land for tax purposes. Notes: Property is usually reassessed on an annual basis. You may request a "reassessment" if you disagree with your assessment. of patients with low back pain, particularly if they have not resumed their normal activities. (29,30) The type and frequency of treatments applied to patients in this study were not under investigation. Subjects were recruited at the first or second consultation for their current episode of back pain, and the combination of treatment and the natural history of the condition constituted the "construct for change." (11,27) We anticipated, based on the results reported by van den Hoogen et al, (31) that many subjects would experience some improvement over a 6-week period. Materials We administered by mail 5 questionnaires that we believed were most likely to be useful in clinical practice. The modified version of the Oswestry Disability Questionnaire (9) does not include a reference to medications in the pain and sleeping sections and is therefore, in our view, more widely applicable, as not all patients will be taking medications. We used the original Roland-Morris Disability Questionnaire, (11) the final format of the Quebec Back Pain Disability Scale recommended by the developers, (10) the Waddell Disability Index wording from Delitto, (2) and the Australian Australian pertaining to or originating in Australia. Australian bat lyssavirus disease see Australian bat lyssavirus disease. Australian cattle dog a medium-sized, compact working dog used for control of cattle. version of the SF-367 Characteristics of the 5 questionnaires are shown in Table 1. The Oswestry, Quebec, Roland-Morris, and Waddell questionnaires were all developed to measure activity limitation in people with low back problems and take only a few minutes to complete and score. Scores for the individual questions are summed to provide a single "index" score for each questionnaire, and higher scores indicate greater activity limitation. In contrast, the SF-36 is a generic health survey that is designed to assess health for any population and for any condition. (13-15) The SF-36 consists of 8 scales that provide a "profile" of scores, with higher scores indicating better health status. The 10-item Physical Functioning scale is used to measure activity limitations and so, to a lesser extent, does the 4-item Role Limitations-Physical scale and the 2-item Bodily Pain scale. The SF-36 takes about 10 minutes to complete, and a scoring algorithm In statistics, Fisher's Scoring algorithm is a form of Newton's method used to solve maximum likelihood equations numerically. Sketch of Derivation Let be random variables, independent and identically distributed with twice differentiable p.d.f. is used to calculate scores. (32) Questionnaire scores were calculated according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. developers' instructions. For the Oswestry questionnaire, the sum of the section scores was divided by the total possible score (50 if all sections are completed), and the resulting total was multiplied mul·ti·ply 1 v. mul·ti·plied, mul·ti·ply·ing, mul·ti·plies v.tr. 1. To increase the amount, number, or degree of. 2. Mathematics To perform multiplication on. by 100 to yield a percentage score. The Quebec questionnaire total score was calculated by summing the 20 individual item scores. The Roland-Morris questionnaire score was a count of the chosen items, and the Waddell questionnaire score was the sum of the "yes" responses. The scoring methods prescribed pre·scribe v. pre·scribed, pre·scrib·ing, pre·scribes v.tr. 1. To set down as a rule or guide; enjoin. See Synonyms at dictate. 2. To order the use of (a medicine or other treatment). by the test developers were applied to the SF-36 Physical Functioning, Role Limitations-Physical, and Bodily Pain scales. (32) In addition to the 5 questionnaires, demographic data and details of current and past medical history were also collected initially using the questionnaire designed for this study. At follow-up follow-up, n the process of monitoring the progress of a patient after a period of active treatment. follow-up subsequent. follow-up plan , a 7-level global change scale was included with the questionnaires. This scale asked subjects to rate the extent to which their back problem had changed over the past 6 weeks. The rating scale, previously used in a study by Patrick and colleagues, (22) had 7 response options: 1="completely gone," 2="much better," 3="better," 4="a little better," 5="about the same," 6="a little worse," and 7="much worse." Many scales have been used to rate global change, from a simple 3-level "better"/"the same"/"worse" scale (10) to a 15-level scale with 7 levels of improvement and worsening. (10,17,21) We decided to steer steer castrated male cattle beast over a year of age. See also bullock, buller steer. steer bulling see bulling. steer Medtalk verb what we considered a middle course between a very parsimonious par·si·mo·ni·ous adj. Excessively sparing or frugal. par si·mo scale that lacked any distinction in the magnitude of
change and a complex scale that subjects may have found difficult to
interpret and complete without assistance. The selected rating scale had
4 levels for rating improvement but only 2 ratings of worsening. We
believe this rating scale was appropriate for rating overall change for
2 reasons. First, there is no opposite of "completely gone,"
yet complete resolution of the problem is the optimal patient outcome.
Second, we expected that few subjects would report a worsening of their
problem, and therefore an additional step between "a little
worse" and "much worse" was unnecessary.Data Analysis Unless otherwise stated, statistical analyses were performed using SPSS A statistical package from SPSS, Inc., Chicago (www.spss.com) that runs on PCs, most mainframes and minis and is used extensively in marketing research. It provides over 50 statistical processes, including regression analysis, correlation and analysis of variance. for Macintosh Version 6.1. * Test-retest reliability was explored for a subgroup sub·group n. 1. A distinct group within a group; a subdivision of a group. 2. A subordinate group. 3. Mathematics A group that is a subset of a group. tr.v. of patients who were identified post hoc post hoc adv. & adj. In or of the form of an argument in which one event is asserted to be the cause of a later event simply by virtue of having happened earlier: as not changed by what we believed to be a clinically meaningful amount over the 6-week retest period. That is, we classified subjects who self-reported their condition as "about the same" or only "a little better" or "a little worse" as "unchanged." A paired t test was also used to test the hypothesis that the questionnaire scores for the "unchanged" group at the 2 administrations were not different (P [less than or equal to] .05). Based on our experience and that of other authors, (20,25) we believe that patients who report only a little change are unlikely to have experienced clinically meaningful change, which we defined as the smallest change in the domain of interest that can be considered significant. To check the validity of this assumption, we used a paired t test to check that scores for the subgroup who reported they were "a little better" were not different between the start of the study and follow-up. Intraclass correlation In statistics, the intraclass correlation (or the intraclass correlation coefficient[1]) is a measure of correlation, consistency or conformity for a data set when it has multiple groups. coefficients (2,1) (33) were then calculated for each of the questionnaires. We used parametric See parametric modeling, parametric symbol and PTC. tests because, with the exception of the SF-36 Role Limitations--Physical scale, the data were normally distributed or approached a normal distribution, and pretest pre·test n. 1. a. A preliminary test administered to determine a student's baseline knowledge or preparedness for an educational experience or course of study. b. A test taken for practice. 2. and posttest post·test n. A test given after a lesson or a period of instruction to determine what the students have learned. variances were equivalent. The SF-36 Role Limitations--Physical scale scores were positively skewed skewed curve of a usually unimodal distribution with one tail drawn out more than the other and the median will lie above or below the mean. skewed Epidemiology adjective Referring to an asymmetrical distribution of a population or of data at pretest and posttest for the "unchanged" group. It has been demonstrated, however, that even severely abnormal distributions have little effect on the result of the t test or the F test when the samples come from the same population, and violation of the homogeneity Homogeneity The degree to which items are similar. of variance The discrepancy between what a party to a lawsuit alleges will be proved in pleadings and what the party actually proves at trial. In Zoning law, an official permit to use property in a manner that departs from the way in which other property in the same locality assumption has little effect on the result provided the sample sizes are the same. (34) To check the validity of measurements obtained with our post hoc method of identifying a stable group of subjects, we calculated ICCs for another group of subjects, those with back pain of more than 6 months' duration, who a priori a priori In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience. could be expected to experience little change over a 6-week retest period. We defined the minimum detectable change as the 90% CI of the error associated with the repeated measurements. (17) First, the standard error of measurement (SEM) was determined by the formula: (1) SEM=S[D.sub.av] [square root of (1 - R)] where S[D.sub.av] was the average standard deviation In statistics, the average amount a number varies from the average number in a series of numbers. (statistics) standard deviation - (SD) A measure of the range of values in a set of numbers. of the scores initially and at follow-up for the 106 subjects who completed both sets of questionnaires and R was the test-retest reliability coefficient for the 47 subjects classified as "unchanged." (35) The error associated with the repeated measurements was calculated by the formula: (2) SE[M.sub.repeat] = [square root of 2xSEM] and this step recognizes that there is error associated with both the first and second measurements. (36) The 90% CI (the MDC) was calculated by multiplying mul·ti·ply 1 v. mul·ti·plied, mul·ti·ply·ing, mul·ti·plies v.tr. 1. To increase the amount, number, or degree of. 2. Mathematics To perform multiplication on. the result by 1.64 (the tabled z value). This calculation can be interpreted as the magnitude of change, expressed in scale points, required to be 90% confident that the observed change reflects real change and not just measurement error. (17) Unless subjects score far enough onto the scale to allow change by at least as much as the MDC, there is insufficient scale width to reliably detect change over time. To evaluate scale width, we calculated for each questionnaire the proportion of the 140 subjects who returned the initial questionnaire who did not register an initial score that would allow at least that amount of improvement or worsening to be registered at follow-up. Responsiveness was quantified in g ways. We used one distribution-based method (standardized response means [SRMs]), one criterion-based method (receiver operating characteristic [ROC] curves), and a method that counted the proportion of subjects who changed by at least as much as the MDC. The SRM (1) (Storage Resource Management) The management of the storage resources in an organization in order to avoid duplication of files and to determine space utilization across all servers. was calculated by dividing the mean change by the standard deviation of change scores. (10,20,27,37) We chose the SRM because a method of testing the significance of observed differences in SRMs has been described by Liang Liang The name of two Chinese dynasties, the Earlier Liang Dynasty (502-557) and the Later Liang Dynasty (907-923). et al. (37) Confidence intervals confidence interval, n a statistical device used to determine the range within which an acceptable datum would fall. Confidence intervals are usually expressed in percentages, typically 95% or 99%. were constructed using the "jackknife jack·knife n. 1. A large clasp knife. 2. Sports A dive in the pike position, in which the diver straightens out to enter the water hands first. v. " method detailed by Liang et al, (37) and a paired t test was used to compare the estimated population SRMs derived by this method. (27,37) Rather than compare the SRMs for questionnaires using every possible pair-wise comparison, we limited the number of comparisons by comparing the highest and lowest SRMs until nonsignificant non·sig·nif·i·cant adj. 1. Not significant. 2. Having, producing, or being a value obtained from a statistical test that lies within the limits for being of random occurrence. comparisons occurred. Criterion-based methods of evaluating responsiveness require that a judgment be made as to whether clinically meaningful change has or has not occurred. (27,28) In this study, subjects were classified as having improved by an important amount if they rated their back problem as "completely gone," "much better," or "better" at posttest and as "unchanged" if they reported being "a little better," "about the same," or "a little worse." Receiver operating characteristic curve receiver operating characteristic curve see roc curve. analysis was performed using Accuroc Version 2.0. ([dagger]) The area under the ROC curve ROC curve acronym for receiver operating characteristic curve. A graphical method of assessing the characteristic of a diagnostic test. reflects the ability of the test to discriminate dis·crim·i·nate v. dis·crim·i·nat·ed, dis·crim·i·nat·ing, dis·crim·i·nates v.intr. 1. a. between subjects who have improved from subjects who are unchanged. (23,27) A value of 1 for the area under the curve represents perfect (100%) accuracy, whereas a value of .50 represents chance alone. Accuroc uses a chi-square chi-square (ki´skwar) see under distribution and test. chi-square n. statistic statistic, n a value or number that describes a series of quantitative observations or measures; a value calculated from a sample. statistic a numerical value calculated from a number of observations in order to summarize them. to compare ROC curves for different questionnaires. Even without the Bonferroni adjustments for the multiple post hoc comparisons, there were no observed differences in area under the ROC curves among the different instruments. The 95% CIs of the areas under the ROC curves show the similarities among questionnaires. The third method of evaluating responsiveness relates responsiveness to reliability and has not previously been used to compare concurrently administered questionnaires. Goldie Gold´ie n. 1. (Zool.) The European goldfinch. and colleagues (38) suggested that the proportion of subjects who improve by at least as much as the MDC could be used as an indicator of test responsiveness. We have termed this a reliable-change approach. We calculated the proportion of subjects who registered a change in questionnaire scores equal to or greater than the MDC. The standard error of the proportions (SEp) was calculated as: (3) SEp= [square root of [p(1 - p)/n]] where p is the observed proportion and n is the number of subjects. The observed proportion [+ or -] 1.96 x SEp yields the 95% CI. (39) The Cochran Q test was used to determine whether the proportions were different among all of the questionnaires. Results Of 284 patients with a complaint of low back pain, 226 met the eligibility criteria to participate in the study, and 207 (92%) agreed to participate. One hundred forty participants (68%) returned the first set of questionnaires, and 106 participants (51%) returned the follow-up package 6 weeks later. Five subjects who completed both sets of questionnaires failed to complete the global change scale. The time taken to return the questionnaires at both pretest and posttest was a median of 8 days. There was no difference in age or sex between subjects who returned both sets of questionnaires and those who returned only the first set. The mean change in scores for subjects in each of the 7 levels of the global change scale is shown in Table 2 for the 101 subjects who completed both sets of questionnaires and the global change scale. We classified the 47 subjects who reported that their back problem was "about the same," "a little better," or "a little worse" as "unchanged" and the 52 subjects who reported that their back problem was "better," "much better," or "completely gone" as "improved." Sample characteristics for the "unchanged" and "improved" groups are shown in Table 3. The mean age of the "unchanged" group was 55 years (SD=17, range=19-83), and the mean age of the "improved" group was 49 years (SD=16, range=20-80) ([t.sub.(97)]=-1.87, P=.06). Questionnaire scores obtained when the study began and at follow-up for the "unchanged" and "improved" groups are shown in Table 4. For the "unchanged" group, normal distribution of scores when the study began and at follow-up was confirmed by the K-S K-S Kolmogorov-Smirnov (statistical test) Lilliefors test In statistics, the Lilliefors test, named after Hubert Lilliefors, professor of statistics at George Washington University, is an adaptation of the Kolmogorov-Smirnov test. for the Oswestry and Quebec questionnaires and the SF-36 Physical Functioning scale initially and at follow-up and for the SF-36 Bodily Pain scale initially. The K-S Lilliefors test is the Kolmogorov-Smirnov statistic with a Lilliefors significance level for testing normality normality, in chemistry: see concentration. . (40) Data are normally distributed if the significance level is greater than .05. The K-S Lilliefors test is very sensitive to departures from normal distribution, so a visual inspection was also made of histograms and box plots of the data that did not meet the K-S Lilliefors standard. (40) Only the SF-36 Role Limitations--Physical scale data were extremely positively skewed, reflecting a large floor effect, with 68% of the subjects scoring the lowest (worst) possible score initially and 25% of the subjects scoring the lowest (worst) possible score at follow-up. Table 4 shows that for the 47 subjects who were classified as "unchanged," there was no difference between initial and follow-up scores on any questionnaire except the SF-36 Bodily Pain scale. Scores on this scale improved by an average 8 points (SD=20) over the retest period ([t.sub.(46)]=2.88, P=.006). For the 52 subjects classified as "improved," all questionnaire scores were different at follow-up (P<.0001). Because the SF-36 Bodily Pain scale scores initially and at follow-up for the group classified as "unchanged" were different, we examined the subgroup of 28 subjects who said their problem was "a little better." The SF-36 Bodily Pain scale scores improved by an average of 12 points (SD=22) over the retest period ([t.sub.(27)]=2.97, P=.006), but there were no differences between initial and follow-up scores for any of the other questionnaires. Because the SF-36 Bodily Pain scale score indicated that the subjects who rated themselves as "a little better" had changed, we calculated the ICC (2,1), SEM, SE[M.sub.repeat], and MDC for the subjects classified as "unchanged" and for the subgroup of 16 subjects who rated their problem as "about the same" at follow-up (Tab. 5). Scores initially and at follow-up for the 16 subjects were confirmed by the KS-Lilliefors test to be normally distributed, except for the SF-36 Role Limitations--Physical scale scores, which were positively skewed. Paired t tests confirmed that for all scales, the questionnaire scores were not different between the start of the study and follow-up. The ICCs exceeded .80 for the Oswestry and Quebec questionnaires and the SF-36 Physical Functioning scale for the "unchanged" group of 47 subjects, and the ICCs for these questionnaires were higher than for the Roland-Morris questionnaire or the SF-36 Role Limitations--Physical or Bodily Pain scale (there was no overlap o·ver·lap n. 1. A part or portion of a structure that extends or projects over another. 2. The suturing of one layer of tissue above or under another layer to provide additional strength, often used in dental surgery. v. of 95% CIs for the reliability coefficients). The 95% CI for the Waddell questionnaire overlaps with those of all the other scales. Reliability coefficients for a group of 37 subjects with back pain of more than 6 months' duration were similar or identical to the coefficients for the group that was classified as "unchanged." For the subgroup of 16 subjects who rated themselves as "about the same," the reliability coefficient for the Oswestry questionnaire was higher, based on the 95% CIs, than that obtained for the Roland-Morris questionnaire and the SF-36 Role Limitations--Physical scale. The reliability coefficient for the SF-36 Physical Functioning scale was higher than that obtained for the Roland-Morris questionnaire. The 95% CIs of the Roland-Morris questionnaire and the SF-36 Role Limitations--Physical and Bodily Pain scales were very wide (Tab. 5). Scale width was calculated on the 140 subjects who completed initial questionnaires and is shown in Table 6 for the MDC calculated for the "unchanged" group and for the subgroup classified as "about the same." The 15% criterion limit was met for the Oswestry questionnaire and the SF-36 Physical Functioning scale in both cases and for the Quebec questionnaire when the MDC for the subgroup was calculated. The SF-36 Role Limitations--Physical and Bodily Pain scales would be unable to detect worsening over time in 87% and 54% of the subjects, respectively. Table 7 shows the point estimates and 95% CIs for the 3 methods of quantifying responsiveness. The 95% confidence intervals that are presented in Table 7 indicate that there are no differences in the estimate of the mean SRM across instruments. The mother used by Liang et [al.sup.37] for comparing SRMs does not use independent t tests but rather uses paired t tests to compare multiple SRMs for each test assembled as·sem·ble v. as·sem·bled, as·sem·bling, as·sem·bles v.tr. 1. To bring or call together into a group or whole: assembled the jury. 2. under "jackknife" procedures. Using this method, the SRM of the Waddell questionnaire was different from that of the SF-36 Bodily Pain scale ([t.sub.(105)]=2.92, P=.004) and the Roland-Morris questionnaire ([t.sub.105)] = 2.52, P=.013). However, if Bonferroni adjustments are made for all 21 paired comparisons, none of the effects are significant. There were no differences among the questionnaires on the ROC curves, as indicated by the overlap of all of the 95% CIs and the chi-square analysis of the highest and lowest values (Oswestry questionnaire and SF-36 Role Limitations--Physical and Bodily Pain scales). The reliable-change method based on the MDC for the group originally classified as "unchanged" and for the subgroup of 16 subjects showed no differences among the questionnaires, with overlap of all of the 95% CIs. That is, the proportion of subjects who changed by at least as much as the MDC was not different among the questionnaires. Discussion We chose to explore the test-retest reliability of measurements obtained for 5 questionnaires by identifying post hoc a group of subjects who were unchanged (ie, subjects who rated themselves as "about the same," "a little better," or "a little worse"). We checked the validity of measurements obtained using this strategy in 3 ways. First, we examined the mean change scores for each level of the global rating scale. The pattern confirmed to us the direction and magnitude of mean change scores for the 7 levels of the global change scale that we expected. Only 5 subjects reported any overall worsening of their condition. There were some inconsistencies. For example, on the SF-36 Role Limitations--Physical scale, the 3 subjects who rated their problem as "a little worse" had an average worsening of 17 points, whereas the 2 subjects who rated themselves as "much worse" improved by an average of 13 points. These inconsistencies were likely due to the very small numbers of subjects who selected either category; to the structure of the SF-36 Role Limitations--Physical scale, which yields only 5 total scores; and to the forced choice between the ratings "a little worse" and "much worse." Second, we confirmed that, with the exception of the SF-36 Bodily Pain scale, the questionnaire scores of the subjects classified as "unchanged" were not different initially and at follow-up, nor were the scores for subjects who rated themselves as "a little better" different at the 6-week follow-up. There was a difference in the SF-36 Bodily Pain scale scores between the initial and follow-up tests (5 points for the 47 subjects classified as "unchanged" and 12 points for the 16 subjects who rated themselves as "a little better"), but neither magnitude of change may be clinically meaningful. Third, we identified another group of subjects, those with back pain of more than 6 months' duration, who a priori could be expected to experience little change over a 6-week retest period. Intraclass correlation coefficients for this group of 37 subjects were identical or similar to those for the group that was classified as "unchanged" using the global change scale. Because on one of the scales (ie, the SF-36 Bodily Pain scale) there was a difference in the "unchanged" group between the initial score and the follow-up score, we also calculated ICCs on questionnaire scores for the subgroup of 16 subjects who rated themselves as "about the same." For the modified Oswestry Disability Questionnaire, the ICC value of .84 (95% CI=.7-.91) that we found is comparable to the reliability coefficient reported by Baker et al (9) for same-day administration of this questionnaire (r =.89). The MDC derived from the group classified as "unchanged" was about the same (15 points) as the estimate of 16 points that we made from data published by Fairbank Fairbank is a surname, and may refer to:
For the Quebec Back Pain Disability Scale, the ICC value of .84 (95% CI=.73-.91) that we found was a little lower than the ICC of .93 reported by Kopec et al. (10) We believe that this difference reflects either sampling differences or the greater variability in scores we would expect because we used a longer retest period. The MDC of 19 for the "unchanged" group was somewhat larger than the estimate of 14 points that we calculated from Kopec and colleagues' data. (10) Subjects in the study by Kopec et al, however, were classified as "unchanged" if they rated themselves as the same on a 3-level transitional scale ("better," "the same," "worse"); therefore, the MDC of 15 points derived from the reliability data of the subjects who said they were "about the same" in our study is comparable. A change of at least 15 points in the Quebec questionnaire score of an individual patient (and possibly as much as 19 points) would be necessary, in our view, to be 90% confident that real change had occurred. Scale width for the Quebec questionnaire when based on the MDC for the "unchanged" group was a little over the 15% criterion limit at the lower end of the scale, with 19% of subjects having an initial score too low to allow improvement to be detected. When based on the MDC for the subgroup, scale width was within the 15% criterion. For the Roland-Morris Disability Questionnaire, the ICC value of .53 (95% CI=.29-.71) that we found was markedly lower than that reported over a 3- to 6-week retest period by Stratford and colleagues. (25) They reported an ICC of .86 (95% CI=.72-.94) and an MDC of 4 to 5 points. (21) The ICC appeared lower again (ICC=.42, 95% CI = -.07-.75) for the subgroup of 16 subjects who rated themselves as "about the same," and the lower bound of the 95% CI crosses zero. Our data showed an MDC of 8.6 or 9.5 points based on the reliability estimates for the 2 groups. The difference in test-retest reliability found in other studies and in our study may be explained by sample differences. The subjects in the studies by Stratford and colleagues, (17,25) were referred by physicians to the physical therapy outpatient department of 1 or 2 hospitals. In contrast, we drew our sample from a range of physical therapy outpatient services outpatient services Hospital-based services Managed care Medical and other services provided, to a nonadmitted Pt, by a hospital or other qualified facility–eg, mental health clinic, rural health clinic, mobile X-ray unit, free-standing dialysis unit Examples , and we believe that our subjects were more likely to be more variable and more closely representative of the general clinical population in a health care system where patients may consult a physical therapist with or without referral from a physician. The sample in our study included a greater proportion of female subjects, were on average older, had lower initial Roland-Morris questionnaire scores, and had a longer duration of back pain than the samples in the studies by Stratford and colleagues. (17,25) If sample differences were sufficient to explain poorer test-retest reliability for the Roland-Morris questionnaire, we would expect to have seen a similar effect with the other questionnaires, but this was not the case. The use of the average of the patient's and the therapist's ratings of overall change in the studies by Stratford and colleagues may have screened out the types of subjects in our study who showed considerable variability in scores. Subjects in our study who reported no change but whose Roland-Morris questionnaire scores suggested they had changed, tended to have had their low back problem for more than 6 months. Perhaps these subjects had become used to their problem and reported no overall perception of change, despite the functional improvement detected by the Roland-Morris questionnaire. This explanation, however, seems unlikely in the absence of similar variability in the scores of the other questionnaires. Another possibility is that the variability in scores may reflect the emphasis in the Roland-Morris questionnaire's instructions to subjects to select an item only "if you are sure that it describes you today." Low back pain can vary considerably from day to day; thus, Roland-Morris questionnaire scores will reflect diurnal diurnal /di·ur·nal/ (di-er´nal) pertaining to or occurring during the daytime, or period of light. di·ur·nal adj. 1. Having a 24-hour period or cycle; daily. 2. variations in activity limitations. The instructions also urge that "if the sentence does not describe you, then leave the space blank"; therefore, it is possible that subjects will not select an item if they have not attempted that activity that day. The poor reliability and consequently large MDC for the Roland-Morris questionnaire severely reduces the scale width. At the time of the initial measurements, 51% of the subjects scored less than the MDC. Therefore, the Roland-Morris questionnaire would not be able to reliably detect improvement in half of the sample. Even using the previous best estimate by Stratford et al (17) of the MDC at scale extremes of 4 points, 19% of the subjects scored less than 4 points at initial testing. On the basis of the poor test-retest reliability and consequently large MDC and limited scale width, we cannot recommend the use of the Roland-Morris questionnaire as a measure of functional outcome in a general clinical population. The test-retest reliability of measurements obtained with the Waddell Disability Index has not previously been reported for a self-administered version of the questionnaire. We calculated the ICCs as .74 (95% CI=.58-.85) for the "unchanged" group and .79 (95% CI=.51-.92) for the subgroup and the MDC as around 3 points, which constitutes one third of the available range of the scale. The potential clinical utility of the Waddell Disability Index is diminished di·min·ish v. di·min·ished, di·min·ish·ing, di·min·ish·es v.tr. 1. a. To make smaller or less or to cause to appear so. b. by the relatively large MDC and a lack of scale width, as 21% of the sample scored less than 3 points and 20% more than 6 points at the initial measurement. The ICCs of .83 (95% CI=.71-.90) and .91 (95% CI=.76-.97) that we obtained for the SF-36 Physical Functioning scale are similar to that reported by Patrick et al (22) (ICC=.89), who analyzed an·a·lyze tr.v. an·a·lyzed, an·a·lyz·ing, an·a·lyz·es 1. To examine methodically by separating into parts and studying their interrelations. 2. Chemistry To make a chemical analysis of. 3. the data for 52 subjects with sciatica sciatica (sīăt`ĭkə), severe pain in the leg along the sciatic nerve and its branches. It may be caused by injury or pressure to the base of the nerve in the lower back, or by metabolic, toxic, or infectious disease. who self-rated their leg pain as unchanged over a 3-month retest period. The MDC of 22 is close to the 21 points we estimated from the data reported by Patrick et al. When based on the smaller subgroup in our study, the MDC might be as low as 16. Scale width is within the 15% criterion limit whether the MDC of 16 or 22 is applied, and the SF-36 Physical Functioning scale therefore appears to be an appropriate scale for use by physical therapists. A therapist would need to observe a change in the SF-36 Physical Functioning scale score of at least 16 points (or 22 points by the less stringent reliability analysis) to be 90% confident that real change had occurred. The ICCs for the Role Limitations--Physical and Bodily Pain scales of the SF-36 in our study were considerably lower than those reported by Patrick et al (22) (ICC=.80 and .67). Although the ICCs for the subgroup who rated themselves as "about the same" were somewhat stronger, they were still weak (ICC=.47 and .59), and the lower bound of the CIs approached zero. In the study by Patrick et al, subjects rated the overall change in their leg pain rather than the change in their overall condition. In addition, the subjects had sciatica secondary to a herniated herniated /her·ni·at·ed/ (her´ne-at?ed) protruding like a hernia; enclosed in a hernia. her·ni·at·ed adj. lumbar lumbar /lum·bar/ (lum´bar) pertaining to the loins. lum·bar adj. Of, near, or situated in the part of the back and sides between the lowest ribs and the pelvis. intervertebral intervertebral /in·ter·ver·te·bral/ (-ver´te-bral) situated between two contiguous vertebrae; see under disk. in·ter·ver·te·bral adj. Located between vertebrae. disk and represent a different clinical population than the subjects in our study. The different results, therefore, may relate to differences in sample characteristics (eg, variance differences), but scale characteristics may also help explain the different results. The SF-36 Role Limitations--Physical scale consists of 4 questions with forced-choice (yes/no) responses, and available total scores are therefore 0, 25, 50, 75, and 100. For any individual, a small number of changes in responses from "yes" to "no" or vice versa VICE VERSA. On the contrary; on opposite sides. could have a very large effect on the score. Score distribution was very skewed, with 66% of the subjects at the initial measurement and 42% of the subjects at the follow-up measurement scoring 0, the worst possible score. Thirty subjects scored 0, the worst possible score, at both pretest and posttest, but many others showed large improvements and worsening. The data for the SF-36 Role Limitations--Physical scale were highly skewed, and the estimate of MDC of 62 or 66 points is likely to be overestimated. There was a small improvement in SF-36 Bodily Pain scale scores over the retest period for subjects classified as "unchanged" and for those who rated their back condition as "a little better." The SF-36 Bodily Pain scale has only 2 items, and poor reliability is more likely in very brief scales. The MDC was 33 or 41 points, and scale width was beyond the 15% criterion limit at the lower end of the scale range. On its own, the SF-36 Bodily Pain scale, in our view, cannot be said to be an adequate measure of pain or pain-related function, comprised as it is of one pain intensity item and one item regarding how much pain interferes with normal work. Because of the substantial floor effect, the poor scale width, and the variability in scores in stable subjects, the SF-36 Role Limitations--Physical and Bodily Pain scales do not appear to be useful measures of functional outcome for individual patients. Based on these data, the Physical Functioning scale is the most relevant of the SF-36 physical health scales, and it can be easily hand-scored. We see advantages, however, in administering the SF-36 in its entirety The whole, in contradistinction to a moiety or part only. When land is conveyed to Husband and Wife, they do not take by moieties, but both are seised of the entirety. . The SF-36 provides a health status profile, rather than a single index score, and individual and aggregated data can be compared with the population norms available in many countries. (15,16,41-45) The International Quality of Life Assessment (IQOLA IQOLA International Quality of Life Assessment ) Project is translating, validating val·i·date tr.v. val·i·dat·ed, val·i·dat·ing, val·i·dates 1. To declare or make legally valid. 2. To mark with an indication of official sanction. 3. , and norming the SF-36 in 14 countries: Australia Australia (ôstrāl`yə), smallest continent, between the Indian and Pacific oceans. With the island state of Tasmania to the south, the continent makes up the Commonwealth of Australia, a federal parliamentary state (2005 est. pop. , Belgium Belgium (bĕl`jəm), Du. België, Fr. La Belgique, officially Kingdom of Belgium, constitutional kingdom (2005 est. pop. 10,364,000), 11,781 sq mi (30,513 sq km), NW Europe. , Canada, Denmark Denmark (dĕn`märk), Dan. Danmark, officially Kingdom of Denmark, kingdom (2005 est. pop. 5,432,000), 16,629 sq mi (43,069 sq km), N Europe. , France, Germany Germany (jûr`mənē), Ger. Deutschland, officially Federal Republic of Germany, republic (2005 est. pop. 82,431,000), 137,699 sq mi (356,733 sq km). , Italy Italy (ĭt`əlē), Ital. Italia, officially Italian Republic, republic (2005 est. pop. 58,103,000), 116,303 sq mi (301,225 sq km), S Europe. , Japan, the Netherlands, Norway Norway, Nor. Norge, officially Kingdom of Norway, constitutional monarchy (2005 est. pop. 4,593,000), 125,181 sq mi (324,219 sq km), N Europe, occupying the western part of the Scandinavian peninsula. , Spain Spain, Span. España (āspä`nyä), officially Kingdom of Spain, constitutional monarchy (2005 est. pop. 40,341,000), 194,884 sq mi (504,750 sq km), including the Balearic and Canary islands, SW Europe. , Sweden Sweden, Swed. Sverige, officially Kingdom of Sweden, constitutional monarchy (2005 est. pop. 9,002,000), 173,648 sq mi (449,750 sq km), N Europe, occupying the eastern part of the Scandinavian peninsula. , the United Kingdom (English version), and the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. (English and Spanish Spanish, river, c.150 mi (240 km) long, issuing from Spanish Lake, S Ont., Canada, NW of Sudbury, and flowing generally S through Biskotasi and Agnew lakes to Lake Huron opposite Manitoulin island. There are several hydroelectric stations on the river. versions). (46) The scales in the mental health domain may provide a brief screening tool to alert the clinician clinician /cli·ni·cian/ (kli-nish´in) an expert clinical physician and teacher. cli·ni·cian n. to the need for appropriate referral. The main disadvantage of the SF-36 is that hand-scoring of some of the 8 scales is laborious la·bo·ri·ous adj. 1. Marked by or requiring long, hard work: spent many laborious hours on the project. 2. Hard-working; industrious. , in our view, because of the complex scoring algorithm. However, SF-36 scores can be easily generated using a spreadsheet spreadsheet Computer software that allows the user to enter columns and rows of numbers in a ledgerlike format. Any cell of the ledger may contain either data or a formula that describes the value that should be inserted therein based on the values in other cells. , and customized scoring software is also available. The results of the reliability portion of our study indicated that the modified Oswestry Disability Questionnaire, the SF-36 Physical Functioning scale, and the Quebec Low Back Disability Scale were the most reliable and had sufficient scale width to detect improvement or worsening in most subjects. The reliability of measurements obtained with the Waddell Disability Index is moderate, but we believe the scale width is insufficient to recommend it for clinical application. The Roland-Morris Disability Questionnaire and the SF-36 Role Limitations--Physical and Bodily Pain scales lacked sufficient reliability and scale width for clinical application. Test-retest reliability results for the Roland-Morris Disability Questionnaire differed from those of earlier reports, and this highlights the importance of examining reliability in the population to which the measurement tool will be applied in practice. In the second part of our study, we explored the responsiveness of the 5 questionnaires. Just as measurements obtained with a test may be reliable but not valid, it is possible for a test to yield reliable measurements but to be unresponsive unresponsive Neurology adjective Referring to a total lack of response to neurologic stimuli . There has also been some debate about whether a test can yield unreliable measurements yet be responsive. (47,48) There is currently no agreement as to the most appropriate method of evaluating the responsiveness of tests. (24,27,49,50) Therefore, we explored responsiveness using 3 methods by which point estimates and 95% CIs could be calculated and the differences among questionnaires tested. The SRM is typical of the distribution-based or overall-change approach, and the ROC curve is representative of the criterion-based or valid-change approach. The third method, which calculates the proportion of subjects who change by at least as much as the MDC has not previously been used and can be termed a reliable-change approach. The absolute value of the SRM can be interpreted in the same way was an effect size, where .20 is regarded as small, .40 as moderate, and .80 as large. (51) The SRM point estimate values for the questionnaires in our study were moderate, and the 95% CIs were very wide. We chose the SRM because it is the only distribution-based method for which a method of hypothesis testing hypothesis testing In statistics, a method for testing how accurately a mathematical model based on one set of data predicts the nature of other data sets generated by the same process. has been described. (27,37) We believe there is considerable opportunity in the repeated iterations of Liang and colleagues' complex SRM procedure (37) for error. The "jackknife" procedure used to generate what Liang and colleagues called "pseudo-values" (37) is performed by systematically dropping each subject's data from analysis at a time. That is, the SRM is recalculated n times with each subject removed in turn. This results in a population of n SRM pseudo-values around the sample SRM and provides a sampling distribution of SRMs from which to estimate a population SRM. The population SRM and variance are then estimated from the pseudo-values, and finally a t test is used to compare the tests. We found that the result was distorted unless calculations were made to 5 decimal places decimal place n. The position of a digit to the right of a decimal point, usually identified by successive ascending ordinal numbers with the digit immediately to the right of the decimal point being first: . The area under the ROC curve has a possible range from .50, indicating a chance finding, to 1.0, indicating perfect ability of change scores to discriminate between changed and unchanged patients. The ROC point estimate in our study fell within a narrow range from .73 to .78, and there was no difference among the scores from the questionnaires, suggesting that all of the tests were equivalent in responsiveness. The ROC values of .78 and .77 that we obtained for the Oswestry and Roland-Morris questionnaires are almost identical to those reported by Stratford and colleagues (21) (.78 and .79). Beurskens et al (20) reported a similar ROC value for the Oswestry questionnaire (.76), but a higher value for the Roland-Morris questionnaire (.93). Criterion-based methods require the sample to be dichotomized into those subjects who are unchanged and those who have improved by a certain amount. (27,28) The use of patients' self-ratings of overall change as the criterion of meaningful clinical change has several limitations: the measurements have unknown reliability and validity; recall of initial states tend to be inflated, which tends to inflate inflate - deflate the perceived magnitude of change; and the scale is completed at the same time as the follow-up questionnaires and is therefore not independent. (52) In our study, subjects were asked to complete the rating of change scale before the questionnaires, and the completion of the questionnaires may have been influenced by the overall rating. However, because the questionnaires were administered by mail, we have no way of knowing the order in which the subjects completed the tasks. Patient self-ratings, or averages of patient and therapist ratings of overall change, are commonly used as the criterion of change because of the valued perspective of the rater rat·er n. 1. One that rates, especially one that establishes a rating. 2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater. (s) and because the information can be collected easily. The reliable-change method of evaluating responsiveness counted the number of subjects who changed by at least as much as the MDC over 6 weeks. Because we had performed 2 reliability analyses, one for the group classified as "unchanged" and one for the smaller subgroup who had rated themselves as "about the same," we had 2 estimates of MDC. In neither case was the proportion different among the questionnaires. In the responsiveness portion of our study, we found that none of the questionnaires could be shown to be more or less responsive than any other. Furthermore, it appears possible for a questionnaire to yield scores with very poor reliability, but to have reasonable responsiveness. The SF-36 Bodily Pain scale's ICC was lower than .50, but the scale was comparable in responsiveness to the other questionnaires. This finding may indicate either that the questionnaires perform similarly in their ability to detect change over time or that the responsiveness methods are not able to discriminate between instruments with low and high responsiveness. The proliferation of responsiveness measures and debate concerning methods for determining responsiveness suggest that the optimal way to quantify Quantify - A performance analysis tool from Pure Software. this relatively recently conceptualized psychometric psy·cho·met·rics n. (used with a sing. verb) The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and property of tests has not been described. (27,28,48,50) The validity of scores obtained with a responsiveness index could be demonstrated by testing whether the index is able to discriminate between a test that is known to be responsive and one that is known not to detect change over time in a particular clinical population. We suggest that the choice of a responsiveness index should be dictated dic·tate v. dic·tat·ed, dic·tat·ing, dic·tates v.tr. 1. To say or read aloud to be recorded or written by another: dictate a letter. 2. a. by the purpose for which the index is being used in this application. If the aim is to quantify the responsiveness of an outcome measure to be used in research, then we believe that a distribution-based method would be most appropriate, as this information could be used to estimate sample size and statistical power. Distribution-based methods, however, provide no information about whether change is clinically meaningful. A criterion-based method may be appropriate where the purpose is to detect meaningful change in a clinical setting. Distribution-based methods provide information analogous analogous /anal·o·gous/ (ah-nal´ah-gus) resembling or similar in some respects, as in function or appearance, but not in origin or development. a·nal·o·gous adj. to a test of statistical significance, and criterion-based methods are analogous to a judgment of clinical significance. The reliable-change method, in our opinion, provides practical information for clinical application in that it answers the question, "In what proportion of my patients is this questionnaire likely to detect change beyond the amount that can be attributed to measurement error?" The limitation of this method is that the MDC may not be known for many questionnaires and clinical tests. We are the first authors to report on reliability and responsiveness for these 5 questionnaires in a sample drawn from the range of settings in which patients with low back pain seek physical therapy interventions. Our sample was drawn from hospitals, private practices, and community-based services, whereas previous studies have used samples obtained from patients seeking physical therapy at 1 or 2 hospitals or practices (17,18,25) or from both physical therapy and medical treatment centers. (10) Although our sampling strategy was designed to obtain a representative sample, a number of factors tend to reduce generalizability. The success of consecutive sampling may have been obscured if therapists did not record instances when they failed to approach a potential subject. Only 7 such instances were recorded, and it is possible that underreporting occurred in the course of busy daily practice and due to the eagerness of the therapist to appear cooperative. In addition, 67 subjects (32%) who initially agreed to participate failed to return the first set of questionnaires, and it is not known whether this group was different from those who actually participated in the study. In addition, because the recruitment sites were all located in urban areas, the sample may not reflect differences in the profile of clients seeking physical therapy in rural locations. For practical reasons, people who could not read or write English were excluded, and the results therefore may not be generalizable gen·er·al·ize v. gen·er·al·ized, gen·er·al·iz·ing, gen·er·al·iz·es v.tr. 1. a. To reduce to a general form, class, or law. b. To render indefinite or unspecific. 2. to people from non-English-speaking backgrounds. Another limitation of our study is the use of the global rating of change scale as the sole criterion of meaningful change. Whether the single-item global change scale used in this study yields reliable measurements is unknown, and it is likely that the rating was not independent of the activity limitation questionnaire responses. That is, a subject's response to the global rating of change may have influenced the subsequent responses to the questionnaires at follow-up. Norman Norman, city (1990 pop. 80,071), seat of Cleveland co., central Okla.; inc. 1891. It is the center of a livestock region. Oil wells, food processing, and printing and publishing contribute to the economy, and there is diverse manufacturing (machinery, communication and colleages (52) identified one study of quality of life in childhood asthma asthma (ăz`mə, ăs`–), chronic inflammatory respiratory disease characterized by periodic attacks of wheezing, shortness of breath, and a tight feeling in the chest. A cough producing sticky mucus is symptomatic. (53) where the criterion of change was determined by an independent evaluation of all patient data. It needs to be established whether an independent evaluation of change based on these data would be a better criterion of change in patients with low back pain. (52,53) In the questionnaires that we studied, subjects were asked to report activity limitation during different time periods (Tab. 1), which could have influenced their responses. The Roland-Morris and Quebec questionnaires ask about activity limitation "today," the SF-36 Physical Functioning scale asks about activity limitation "now," the SF-36 Role Limitations--Physical and Bodily Pain scales ask about activity limitation during "the past 4 weeks," and the Oswestry questionnaire gives no specific time reference. We are unaware of any studies that have explored this issue, although Fairbank and Pynsent (54) recently reported that patients prefer a format such as that of the Oswestry questionnaire in which the time frame "now" is made explicit. A surprising result in our study was that although 49% of the subjects said their condition was "better," "much better," or "completely gone" after 6 weeks, none of the questionnaires reliably detected change in more than 30% of the subjects (Tab. 7). This result illustrates that the amount of change in questionnaire scores perceived by the client to be meaningful may be smaller than the amount of change required to be statistically 90% confident that score change is not just measurement error (the MDC). More reliable and responsive methods need to be developed for measuring activity limitation in people with low back pain. Perhaps we are currently overestimating the SEM (and therefore the MDC) derived from small samples. However, the consequences of wrongly concluding that a patient with low back pain either has or has not changed by a measurable amount based on change in questionnaire scores are unlikely, in our opinion, to be substantially adverse. If a patient's status does not change by at least as much as the current MDC within an expected time-frame, the therapist may decide to alter some component of the treatment regimen regimen /reg·i·men/ (rej´i-men) a strictly regulated scheme of diet, exercise, or other activity designed to achieve certain ends. reg·i·men n. 1. , to refer the patient to another health care professional, or to cease therapy. The clinician faced with interpreting a change in an individual patient's questionnaire scores will advisedly use a range of outcome indicators to provide a picture of overall change. Although we contend that the modified Oswestry Disability Questionnaire, the SF-36 Physical Functioning scale, and the Quebec Back Pain Disability Scale appear to be the most useful measures of functional outcome for people with low back pain, there are practical considerations that also influence the choice of questionnaire. If a clinician sees few patients with low back problems and fast processing of results is the primary consideration, then the Waddell Disability Index may be appropriate. Therapists in multidisciplinary mul·ti·dis·ci·pli·nar·y adj. Of, relating to, or making use of several disciplines at once: a multidisciplinary approach to teaching. clinics may decide that the SF-36 can provide the more comprehensive assessment required for their purposes. Scale content also provides a point of differentiation. For example, the SF-36 does not ask about difficulty sustaining body positions such as sitting and standing, and the Oswestry questionnaire does not include difficulty moving between postures such as sit to stand. The Quebec questionnaire has more content relating to relating to relate prep → concernant relating to relate prep → bezüglich +gen, mit Bezug auf +acc upper-limb activities (pulling/pushing, throwing/catching, reaching) than the other scales. Notwithstanding a careful choice of scale, there will always be some individuals who do not have a sufficient initial score to enable change to be reliably detected over time. Clinicians, therefore, should have alternative or multiple strategies for measuring functional outcome, and they should be aware of the limitations of each method. Conclusion Our data indicate that the Oswestry Disability Questionnaire, the SF-36 Physical Functioning scale, and the Quebec Back Pain Disability Scale have sufficient reliability and scale width to be applied in an ambulatory clinical population with low back problems. The Waddell Disability Index has insufficient scale width for clinical utility. The Roland-Morris Disability Questionnaire and the SF-36 Role Limitations--Physical and Bodily Pain scales did not have sufficient reliability to be recommended as clinical outcome measures for individual patients. This study showed that the responsiveness of the questionnaires was similar, and we conclude that one questionnaire cannot be preferred over another based on the magnitude of the absolute values of responsiveness indexes.
Table 1.
Characteristics of the Oswestry Disability Questionnaire, (8,9) Quebec
Back Pain Disability Scale, (10) Roland-Morris Disability
Questionnaire, (11) Waddell Disability Index, (12) and Medical
Outcomes Study 36-Item Short-Form Health Survey (SF-36) Physical
Functioning, Role Limitations--Physical, and Bodily Pain Scales (13,14)
Questionnaire Reference Period (a)
Oswestry Disability Questionnaire Not specified
Quebec Back Pain Disability Scale Today
Roland-Morris Disability Questionnaire Today
Waddell Disability Index Since onset of back pain
SF-36 Physical Functioning scale Now
SF-36 Role Limitations--Physical scale Past 4 wk
SF-36 Bodily Pain scale Past 4 wk
No. of No. of
Items Response
Questionnaire in Scale Options
Oswestry Disability Questionnaire 10 6
Quebec Back Pain Disability Scale 20 6
Roland-Morris Disability Questionnaire 24 1
Waddell Disability Index 9 2
SF-36 Physical Functioning scale 10 3
SF-36 Role Limitations-Physical scale 4 2
SF-36 Bodily Pain scale 2 5 and 6
Score Better Function
Questionnaire Range Indicated by
Oswestry Disability Questionnaire 0-100 Lower scores
Quebec Back Pain Disability Scale 0-100 Lower scores
Roland-Morris Disability Questionnaire 0-24 Lower scores
Waddell Disability Index 0-9 Lower scores
SF-36 Physical Functioning scale 0-100 Higher scores
SF-36 Role Limitations-Physical scale 0-100 Higher scores
SF-36 Bodily Pain scale 0-100 Higher scores
Activity limitations experienced during this period or at this point
in time.
Table 2.
Self-Rated Change in Questionnaire Scores (n=101) (a)
Quebec
Oswestry Back Pain
Disability Disability
Questionnaire (8,9) Scale (10)
Global Change
Scale [bar]X SD [bar]X SD
Completely gone (n=6) 33 30 38 25
Much better (n=26) 16 16 19 25
Better (n=20) 9 13 11 14
A little better (n=28) 3 10 2 11
About the same (n=16) 0 7 0.3 11
A little worse (n=3) -6 6 -6 8
Much worse (n=2) -4 17 -8 18
Roland-
Morris Waddell
Disability Disability
Questionnaire (11) Index (12)
Global Change
Scale [bar]X SD [bar]X SD
Completely gone (n=6) 9 7 4 3
Much better (n=26) 7 6 2 3
Better (n=20) 3 5 1 2
A little better (n=28) 0 4 0 1.6
About the same (n=16) 1 7 -1 1.6
A little worse (n=3) 0 7 0 1
Much worse (n=2) -2 2 -1 2
SF-36
Physical SF-36 Role
Functioning Limitations-
Scale Physical Scale
Global Change
Scale [bar]X SD [bar]X SD
Completely gone (n=6) -38 32 -58 38
Much better (n=26) -16 19 -42 48
Better (n=20) -15 25 -26 47
A little better (n=28) -2.5 13 -6 41
About the same (n=16) 5 10 1.6 28
A little worse (n=3) 7 13 17 29
Much worse (n=2) 7.5 11 -13 18
SF-36 Bodily
Pain Scale
Global Change
Scale [bar]X SD
Completely gone (n=6) -46 26
Much better (n=26) -31 27
Better (n=20) -14 26
A little better (n=28) -12 22
About the same (n=16) -5 17
A little worse (n=3) 4 21
Much worse (n=2) 10 15
(a) SF-36=Medical Outcomes Study 36-Item Short-Form Health
Survey. (13,14) Positive scores reflect improvement, except for the
SF-36 for which negative scores indicate improvement. All scales
scored 0-100, except Roland-Morris Disability Questionnaire (0-24)
and Waddell Disability Index (0-9).
Table 3.
Sample Characteristics of "Unchanged" and "Improved" Groups
"Unchanged" "Improved"
(n=47) (n=52)
Variable No. % No. %
Age (y)
18-30 4 8.5 6 11.5
31-40 6 12.8 12 23.1
41-50 14 29.8 10 19.2
51-60 4 8.5 11 21.2
61-70 9 19.1 5 9.6
[greater than or equal] to 71 10 21.3 8 15.4
Sex
Male 17 36.2 14 26.9
Female 30 63.8 38 73.1
Work situation
Employed 14 29.8 24 46.1
Unemployed 5 10.6 3 5.8
Not in the labor force 28 59.6 25 48.1
Receiving compensation
Yes 2 4.3 7 13.5
No 45 95.7 45 86.5
Duration of current episode
<1 wk 2 4.2 9 17.3
1-6 wk 10 21.3 22 42.2
6 wk to 6 mo 11 23.4 10 19.2
>6 mo 24 51.1 9 17.3
Missing 2 4.0
Pain location
Back only 8 17.0 20 38.5
Buttock, groin, or thigh 20 42.6 20 38.5
Below knee 19 40.4 12 23.0
Previous episodes
None 3 6.4 5 9.6
1-5 9 19.2 20 38.5
>5 22 46.8 21 40.4
Continuous pain 13 27.6 5 9.6
Missing 1 1.9
Table 4.
Questionnaire Initial and Follow-up Scores for Subjects Classified as
"Unchanged" and "Improved" (a)
Subjects Classified as
"Unchanged" (n=47)
Initial Follow-up
Questionnaire [bar]X SD [bar]X SD
Oswestry Disability
Questionnaire, (8,9) 35 15 (b) 34 15 (b)
Quebec Back Pain Disability
Scale (10) 41 21 (b) 40 17 (b)
Roland-Morris Disability
Questionnaire (11) 9 5.2 8.2 5.2
Waddell Disability Index (12) 4.6 2.3 4.9 2.1
SF-36 Physical Functioning
scale 51 20 (b) 50 23 (b)
SF-36 Role Limitations--
Physical scale 20 32 22 33
SF-36 Bodily Pain scale 32 17 (b) 40 19
Subjects Classified as
"Unchanged" (n=47)
Difference t Test
Questionnaire [bar]X SD P
Oswestry Disability
Questionnaire, (8,9) 1 9 (b) .38
Quebec Back Pain Disability
Scale (10) 1 11 (b) .54
Roland-Morris Disability
Questionnaire (11) 0.8 5.1 .30
Waddell Disability Index (12) 0.3 1.6 .31
SF-36 Physical Functioning
scale 1 13 (b) .77
SF-36 Role Limitations-
Physical scale -2 36 .76
SF-36 Bodily Pain scale -8 20 (b) .006
Subjects Classified as
"Improved" (n=52)
Initial Follow-up
Questionnaire [bar]X SD [bar]X SD
Oswestry Disability
Questionnaire, (8,9) 35 17 (b) 19 14 (b)
Quebec Back Pain Disability
Scale (10) 38 21 (b) 20 16
Roland-Morris Disability
Questionnaire (11) 9.5 5.9 3.8 4.1
Waddell Disability Index (12) 4.4 2.2 2.6 2.1
SF-36 Physical Functioning
scale 52 25 (b) 70 21 (b)
SF-36 Role Limitations-
Physical scale 19 31 57 42
SF-36 Bodily Pain scale 35 24 61 21 (b)
Subjects Classified as
"Improved" (n=52)
Difference t Test
Questionnaire [bar]X SD P
Oswestry Disability
Questionnaire, (8,9) 16 18 .000
Quebec Back Pain Disability
Scale (10) 18 22 .000
Roland-Morris Disability
Questionnaire (11) 5.7 6 .000
Waddell Disability Index (12) 1.9 2.5 (b) .000
SF-36 Physical Functioning
scale -18 24 (b) .000
SF-36 Role Limitations-
Physical scale -39 47 .000
SF-36 Bodily Pain scale -26 28 (b) .000
(a) SF-36=Medical Outcomes Study 36-Item Short-Form Health
Survey. (13,14) For SF-36, a negative change score indicates
improvement due to reverse scoring direction. All questionnaires have
a possible score range of 0-100, except for the Roland-Morris
Disability Questionnaire (0-24) and the Waddell Disability Index
(0-9).
(b) K-S Lilliefors confirms normal distribution of scores.
Table 5.
Test-Retest Reliability (Intraclass Correlation Coefficients
[ICC (2,1)]), Standard Error of Measurement (SEM), Standard Error of
Repeated Measurement ([SEM.sub.repeat]), and Minimum Detectable Change
(MDC) for Subjects Classified as "Unchanged" and Subjects Self-Rated as
"About the Same" (a)
Subjects Classified as
"Unchanged" (n=47)
ICC SEM
Questionnaire (95% CI) (95% CI)
Oswestry Disability
Questionnaire (8,9) .84 (.73-.91) 6 (5-8)
Quebec Back Pain Disability
Scale (10) .84 (.73-.91) 8 (6-10)
Roland-Morris Disability
Questionnaire (11) .53 (.29-.71) 3.7 (2.9-4.6)
Waddell Disability Index (12) .74 (.58-.85) 1.2 (0.9-1.5)
SF-36 Physical Functioning scale .83 (.71-.90) 10 (7-13)
SF-36 Role Limitations--Physical
scale .39 (.11-.61) 28 (23-35)
SF-36 Bodily Pain scale .37 (.09-.59) 18 (14-21.5)
Subjects Classified as
"Unchanged" (n=47)
[SEM.sub.repeat] MDC
Questionnaire (95% CI) (95% CI)
Oswestry Disability
Questionnaire (8,9) 9 (7-12) 15 (11-19)
Quebec Back Pain Disability
Scale (10) 11 (8.5-15) 19 (14-24)
Roland-Morris Disability
Questionnaire (11) 5.2 (4.1-6.4) 8.6 (6.7-10.6)
Waddell Disability Index (12) 1.7 (1.3-2.2) 2.8 (2.1-3.5)
SF-36 Physical Functioning scale 14 (10.5-18) 22 (17-29)
SF-36 Role Limitations-Physical 40 (32-49) 66 (53-80)
scale
SF-36 Bodily Pain scale 25 (20-30) 41 (33-50)
Subjects Self-Rated as
"About the Same" (n=16)
ICC SEM
Questionnaire (95% CI) (95% CI)
Oswestry Disability
Questionnaire (8,9) .92 (.79-.97) 4.5 (3-7)
Quebec Back Pain Disability
Scale (10) .89 (.72-.96) 7 (4-11)
Roland-Morris Disability
Questionnaire (11) .42 (-.07-.75) 4.1 (2.7-5.6)
Waddell Disability Index (12) .79 (.51-.92) 1.1 (0.7-1.6)
SF-36 Physical Functioning scale .91 (.76-.97) 7 (4-12)
SF-36 Role Limitations--Physical
scale .47 (-.02-.78) 27 (17-37)
SF-36 Bodily Pain scale .59 (.15-.83) 14 (9-21)
Subjects Self-Rated as
"About the Same" (n=16)
[SEM.sub.repeat] MDC
Questionnaire (95% CI) (95% CI)
Oswestry Disability
Questionnaire (8,9) 6 (4-10) 10.5 (6-17)
Quebec Back Pain Disability
Scale (10) 9 (6-15) 15 (9-24)
Roland-Morris Disability
Questionnaire (11) 5.8 (3.8-7.9) 9.5 (6.3-13)
Waddell Disability Index (12) 1.5 (0.9-2.3) 2.5 (1.5-3.8)
SF-36 Physical Functioning scale 10 (6-16) 16 (9-27)
SF-36 Role Limitations--Physical
scale 38 (24-52) 62 (40-86)
SF-36 Bodily Pain scale 20 (13-20) 33 (22-48)
(a) SF-36=Medical Outcomes Study 36-Item Short-Form Health
Survey. (13,14) SEM=SD [square root of (1-R)], where SD is the average
standard deviation for pretest and posttest for 106 subjects and R is
the ICC (2,1). The MDC is expressed in the same scale units as the
questionnaires and is the 90% confidence interval of the error
associated with repeated measurements.
Table 6.
Scale Width of Questionnaires at Initial Measurement (a)
Subjects Classified as
"Unchanged" (n=47) (b)
Proportion of
Subjects With
Insufficient Initial
Score to
Reliably Detect
Improvement
Questionnaire (n=140)
Oswestry Disability Questionnaire (8,9) 11%
Quebec Back Pain Disability Scale (10) 19%
Roland-Morris Disability Questionnaire (11) 51%
Waddell Disability Index (12) 21%
SF-36 Physical Functioning scale 13%
SF-36 Role Limitations--Physical scale 21%
SF-36 Bodily Pain scale 11%
Subjects Classified as
"Unchanged" (n=47) (b)
Proportion of
Subjects With
Insufficient Initial
Score to
Reliably Detect
Deterioration
Questionnaire (n=140)
Oswestry Disability Questionnaire (8,9) 0%
Quebec Back Pain Disability Scale (10) 4%
Roland-Morris Disability Questionnaire (11) 16%
Waddell Disability Index (12) 20%
SF-36 Physical Functioning scale 15%
SF-36 Role Limitations--Physical scale 87%
SF-36 Bodily Pain scale 54%
Subjects Classified
as "About the Same"
(n=16) (c)
Proportion of
Subjects With
Insufficient Initial
Score to
Reliably Detect
Improvement
Questionnaire (n=140)
Oswestry Disability Questionnaire (8,9) 3%
Quebec Back Pain Disability Scale (10) 14%
Roland-Morris Disability Questionnaire (11) 51%
Waddell Disability Index (12) 21%
SF-36 Physical Functioning scale 9%
SF-36 Role Limitations--Physical scale 21%
SF-36 Bodily Pain scale 6%
Subjects Classified
as "About the Same"
(n-16) (c)
Proportion of
Subjects With
Insufficient Initial
Score to
Reliably Detect
Deterioration
Questionnaire (n=140)
Oswestry Disability Questionnaire (8,9) 0%
Quebec Back Pain Disability Scale (10) 1%
Roland-Morris Disability Questionnaire (11) 16%
Waddell Disability Index (12) 20%
SF-36 Physical Functioning scale 10%
SF-36 Role Limitations--Physical scale 86%
SF-36 Bodily Pain scale 54%
(a) SF-36=Medical Outcomes Study 36-Item Short-Form Health
Survey. (13,14)
(b) Subjects who self-rated their condition as "about the same" or
"a little better/worse" and who were classified as "unchanged."
(c) Subjects who self-rated their condition as "about the same" after
6 weeks.
Table 7.
Standardized Response Means (SRM), Receiver Operating Characteristic
(ROC) Curves, and the Proportion of the Sample Improved at Least as
Much as the Minimum Detectable Change (MDC) (a)
SRM ROC
Questionnaire (n=106) 95% CI (n=99) 95% CI
Oswestry Disability
Questionnaire (8,9) 0.52 -0.51-1.56 .78 .69-.87
Quebec Back Pain
Disability Scale (10) 0.49 -0.47-1.44 .74 .64-.84
Roland-Morris Disability
Questionnaire (11) 0.55 -0.54-1.64 .77 .68-.87
Waddell Disability
Index (12) 0.35 -0.33-1.01 .76 .67-.86
SF-36 Physical
Functioning scale 0.44 -0.44-1.34 .74 .64-.84
SF-36 Role Limitations--
Physical scale 0.45 -0.47-1.43 .73 .64-.83
SF-36 Bodily Pain scale 0.67 -0.66-2.00 .73 .63-.84
Proportion Improved [greater than or
equal to] MDC (n=106)
Based on
Subjects
Classified as
"Unchanged"
Questionnaire (n=47) (b) 95% CI
Oswestry Disability
Questionnaire (8,9) 24% 16-33
Quebec Back Pain
Disability Scale (10) 23% 15-31
Roland-Morris Disability
Questionnaire (11) 22% 14-30
Waddell Disability
Index (12) 21% 13-29
SF-36 Physical
Functioning scale 20% 12-28
SF-36 Role Limitations-
Physical scale 21% 13-29
SF-36 Bodily Pain scale 18% 11-25
Proportion Improved [greater than or
equal to] MDC (n=106)
Based on
Subjects
Classified as
"About the
Same"
Questionnaire (n=16) (c) 95% CI
Oswestry Disability
Questionnaire (8,9) 30% 21-39
Quebec Back Pain
Disability Scale (10) 29% 20-38
Roland-Morris Disability
Questionnaire (11) 17% 10-24
Waddell Disability
Index (12) 21% 13-29
SF-36 Physical
Functioning scale 27% 18-36
SF-36 Role Limitations--
Physical scale 21% 13-29
SF-36 Bodily Pain scale 23% 15-31
(a) SF-36=Medical Outcomes Study 36-Item Short-Form Health
Survey. (13,14) 95% CI=95% confidence interval.
(b) Subjects who self-rated their condition as "about the same" or
"a little better/worse" and who were classified as "unchanged."
(c) Subjects who self-rated their condition as "about the same"
after 6 weeks.
* SPSS Inc, 444 N Michigan Michigan (mĭsh`ĭgən), upper midwestern state of the United States. It consists of two peninsulas thrusting into the Great Lakes and has borders with Ohio and Indiana (S), Wisconsin (W), and the Canadian province of Ontario (N,E). Ave AVE Avenue AVE Average AVE Alta Velocidad Espanola (train between Madrid and Seville) AVE Alta Velocidad Española (Spanish: High Speed Train) AVE Audio Video Entertainment AVE Advertising Value Equivalent , Chicago Chicago, city, United States Chicago (shĭkä`gō, shĭkô`gō), city (1990 pop. 2,783,726), seat of Cook co., NE Ill., on Lake Michigan; inc. 1837. , IL 60611. ([dagger]) Accumetric Corp, 1650 Cedar cedar, common name for a number of trees, mostly coniferous evergreens. The true cedars belong to the small genus Cedrus of the family Pinaceae (pine family). Ave, Montreal Montreal (mŏn'trēôl`), Fr. Montréal (môNrāäl`), city (1991 pop. 1,017,666), S Que., Canada, on Montreal island, surrounded by St. Lawrence River and Rivière des Prairies. , Quebec, Canada H3G 1A4. References (1) Beattie Beattie is a surname, and may refer to:
Caste
(2) Delitto A. Are measures of function and disability important in low back care? Phys Ther. 1994;74:452-462. (3) ICIDH-2: International Classification of Functioning, Disability, and Health--Prefinal Draft Full Version. Geneva Geneva, canton and city, Switzerland Geneva (jənē`və), Fr. Genève, canton (1990 pop. 373,019), 109 sq mi (282 sq km), SW Switzerland, surrounding the southwest tip of the Lake of Geneva. , Switzerland Switzerland (swĭt`sərlənd), Fr. Suisse, Ger. Schweiz, Ital. Svizzera, officially Swiss Confederation, federal republic (2005 est. pop. 7,489,000), 15,941 sq mi (41,287 sq km), central Europe. : World Health Organization; 2000. (4) Nelson MA, Allen Al·len , Edgar 1892-1943. American anatomist who is noted for his studies of hormones and for the discovery (1923) of estrogen. P, Clamp SE, de Dombal FT. Reliability and reproducibility reproducibility Lab medicine The degree of agreement among repeated measurements of a particular parameter, presented in terms of a standard deviation or coefficient of variation of the results in a set of measurements of clinical findings in low-back pain. Spine. 1979;4: 97-101. (5) Waddell G, Main CJ, Morris EW, et al. Normality and reliability in the clinical assessment of backache back·ache n. Discomfort or a pain in the region of the back or spine. . BMJ BMJ n abbr (= British Medical Journal) → vom BMA herausgegebene Zeitschrift . 1982;284:1519-1530. (6) Kopec JA. Measuring functional outcomes in persons with back pain: a review of back-specific questionnaires. Spine. 2000;25:3110-3114. (7) Bombardier C. Outcome assessments in the evaluation of treatment of spinal spinal /spi·nal/ (spi´n'l) 1. pertaining to a spine or to the vertebral column. 2. pertaining to the spinal cord's functioning independently from the brain. spi·nal adj. disorders. Spine. 2000;25:3110-3103. (8) Fairbank JCT JCT Junction JCT Jerusalem College of Technology JCT Joint Contracts Tribunal (UK build contracts governing body) JCT Journal of Coatings Technology JCT John Christner Trucking JCT Journal of Curriculum Theorizing , Couper
Couper could refer to:
American painter who was the chief organizer of the revolutionary Armory Show in 1913. JB, O'Brien O'Bri·en , Edna Born 1932. Irish writer whose works, including The Lonely Girl (1962) and Johnny I Hardly Knew You (1977), explore the lives of women in modern-day Ireland. Noun 1. JP. The Oswestry Low Back Pain Disability Questionnaire. Physiotherapy. 1980;66:271-273. (9) Baker DJ, Pynsent PB, Fairbank JCT. The Oswestry Disability Index revisited: its reliability, repeatability, and validity, and a comparison with the St Thomas (language) Thomas - A language compatible with the language Dylan(TM). Thomas is NOT Dylan(TM). The first public release of a translator to Scheme by Matt Birkholz, Jim Miller, and Ron Weiss, written at Digital Equipment Corporation's Cambridge Research Laboratory runs Disability Index. In: Roland Roland (rō`lənd), the great French hero of the medieval Charlemagne cycle of chansons de geste, immortalized in the Chanson de Roland (11th or 12th cent.). M, Jenner Jen·ner , Edward 1749-1823. British physician and vaccination pioneer who found that smallpox could be prevented by inoculation with the substance from cowpox lesions. JR, eds. Back Pain: New Approaches to Rehabilitation rehabilitation: see physical therapy. and Education. Manchester Manchester, city, England Manchester (măn`chəstər, –chĕs'tər), city and metropolitan district (1991 pop. 397,400), NW England, on the Irwell, Medlock, Irk, and Tib rivers. , United Kingdom: Manchester University Press; 1989:174-186. (10) Kopec JA, Esdaile JM, Abrahamowicz M, et al. The Quebec Back Pain Disability Scale: measurement properties. Spine. 1995;20:341-352. (11) Roland M, Morris R. A study of the natural history of back pain, part I: development of a reliable and sensitive measure of disability in low back pain. Spine. 1983;8:141-144. (12) Waddell G, Main CJ. Assessment of severity in low-back disorders. Spine. 1984;9:204-208. (13) Ware JE Jr, Sherbourne CD. The MOS (1) (Metal Oxide Semiconductor) See MOSFET. (2) (Mean Opinion Score) The quality of a digitized voice line. It is a subjective measurement that is derived entirely by people listening to the calls and scoring the results from 36-Item Short-Form Health Survey (SF-36), 1: conceptual framework For the concept in aesthetics and art criticism, see . A conceptual framework is used in research to outline possible courses of action or to present a preferred approach to a system analysis project. and item selection. Med Care. 1992;30:473-483. (14) McHorney CA, Ware JE Jr, Lu RJF, Sherbourne CD. The MOS 36-Item Short-Form Health Survey (SF-36), III: tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care. 1994;32:40-66. (15) Ware JE Jr. SF-36 Health Survey SF-36 Health Survey, n.pr a widely used, valid, and standardized questionnaire used to measure an individual's overall subjective health status. The eight concepts measured by the survey are body pain, general mental health, perception of general health, : Manual and Interpretation Guide. Boston Boston, town, England Boston, town (1991 pop. 26,495), E central England, on the Witham River. Boston's fame as a port dates from the 13th cent., when it was a Hanseatic port trading wool and wine. Having recovered from a decline in the 18th and 19th cent. , Mass: The Health Institute;1993. (16) National Health Survey: SF-36 Population Norms. Canberra Canberra (kăn`bərə), city (1991 pop. 276,162), capital of Australia, in the Australian Capital Territory, SE Australia. The Canberra urban agglomeration includes a small area in New South Wales. , Australian Capital Territory Australian Capital Territory (1991 pop. 276,468), 939 sq mi (2,432 sq km), SE Australia, an enclave within New South Wales, containing Canberra, capital of Australia. It was called the Federal Capital Territory until 1938. , Australia: Australian Bureau of Statistics The Australian Bureau of Statistics (ABS) is the Australian government agency that collects and publishes statistical information about Australia and its people. Population and Housing The agency undertakes the Australian Census of Population and Housing. ; 1997. (17) Stratford PW, Binkley JM, Solomon Solomon, d. c.930 B.C., king of the ancient Hebrews (c.970–c.930 B.C.), son and successor of David. His mother was Bath-sheba. His accession has been dated to c.970 B.C. According to the Bible. P, et al. Defining the minimum level of detectable change for the Roland-Morris Questionnaire. Phys Ther. 1996;76:359-365. (18) Beurskens AJHM, de Vet HCW HCW Health care worker, see there , Koke AJA AJA Adjacent AJA Aj Auxerre (French soccer club) AJA American Jail Association AJA American Journal of Archaeology AJA American Judges Association AJA Americans of Japanese Ancestry , et al. Measuring the functional status Of patients with low back pain: assessment of the quality of four disease-specific questionnaires. Spine. 1995;20: 1017-1028. (19) Binkley JM. Measurement of functional status, progress, and outcome in orthopaedic clinical practice. Ortho Div Review. September/ October 1998:7-17. (20) Beurskens AJHM, de Vet HCW, Koke AJA. Responsiveness of functional status in low back pain: a comparison of different instruments. Pain. 1996;65:71-76. (21) Stratford PW, Binkley JM, Solomon P, et al. Assessing change over time in patients with low back pain. Phys Ther. 1994;74:528-533. (22) Patrick DL, Deyo RA, Atlas Atlas, in Greek mythology Atlas (ăt`ləs), in Greek mythology, a Titan; son of Iapetus and Clymene and the brother of Prometheus. SJ, et al. Assessing health-related quality of life in patients with sciatica. Spine. 1995;20:1899-1908. (23) Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. 2nd ed. New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of , NY: Oxford University Press Inc; 1995. (24) Stucki G, Liang MH, Fossel AH, Katz Katz , Bernard 1911-2003. German-born British physiologist. He shared a 1970 Nobel Prize for the study of nerve impulse transmission. JN. Relative responsiveness of condition-specific and generic health status measures in degenerative de·gen·er·a·tive adj. Of, relating to, causing, or characterized by degeneration. Degenerative Degenerative disorders involve progressive impairment of both the structure and function of part of the body. lumbar spinal stenosis Spinal Stenosis Definition Spinal stenosis is any narrowing of the spinal canal that causes compression of the spinal nerve cord. Spinal stenosis causes pain and may cause loss of some body functions. . J Clin Epidemiol. 1995;48:1369-1378. (25) Stratford PW, Finch finch, common name for members of the Fringillidae, the largest family of birds (including over half the known species), found in most parts of the world except Australia. E, Solomon P, et al. Using the Roland-Morris Questionnaire to make decisions about individual patients. Physiotherapy Canada, 1996;48:107-110. (26) McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4:293-307. (27) Stratford PW, Binkley JM, Riddle riddle, puzzling question, specifically one that consists of a fanciful description or definition of something to be guessed. A famous riddle was asked by the Sphinx: "What goes on four legs in the morning, on two at noon, on three at night?" Oedipus guessed the DL. Health status measures: strategies and analytic an·a·lyt·ic or an·a·lyt·i·cal adj. 1. Of or relating to analysis or analytics. 2. Expert in or using analysis, especially one who thinks in a logical manner. 3. Psychoanalytic. methods for assessing change scores. Phys Ther. 1996;76:1109-1123. (28) Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res. 1993;2:221-226. (29) Fordyce WE, ed. Back Pain in the Workplace: Management of Disability in Nonspecific nonspecific /non·spe·cif·ic/ (non?spi-sif´ik) 1. not due to any single known cause. 2. not directed against a particular agent, but rather having a general effect. nonspecific 1. Conditions. Seattle, Wash: IASP IASP International Association for the Study of Pain IASP International Association of Science Parks IASP International Association for Suicide Prevention IASP Information Assurance Scholarship Program IASP Independent Auxiliary Storage Pool Press; 1995. (30) Waddell G, Feder G, McIntosh A, et al. Low Back Pain Evidence Review. London, United Kingdom: Royal College of General Practitioners The Royal College of General Practitioners (RCGP) was founded in 1952 in London, England. It is a registered charity that aims to maintain the highest standards of general medical practice in education, training and research in the UK. ; 1996. (31) van den Hoogen HJM HJM Heath-Jarrow-Morton (model) , Koes BW, van Eijk JTM JTM Je T'aime (French: I Love You) JTM Job Transfer & Manipulation JTM Joint Technical Manual JTM Jackass the Movie (movie) JTM Jack T. , et al. On the course of low back pain in general practice: a one year follow up study. Ann ANN, Scotch law. Half a year's stipend over and above what is owing for the incumbency due to a minister's relict, or child, or next of kin, after his decease. Wishaw. Also, an abbreviation of annus, year; also of annates. In the old law French writers, ann or rather an, signifies a year. Rheum rheum (rldbomacm) any watery or catarrhal discharge. rheum n. A watery or thin mucous discharge from the eyes or nose. rheum any watery or catarrhal discharge. Dis. 1998;57:13-19. (32) SF-36 Health Survey Scoring Manual for English-Language Adaptations: Australia/New Zealand, Canada, United Kingdom. Boston, Mass: Medical Outcomes Trust; 1994. (33) Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420-428. (34) Zimmerman D. Mimicking properties of nonparametric nonparametric said of statistical techniques which do not depend on the data having a normal or some other definable distribution. rank tests using scores that are not ranks. J Gen Psychol. 1993;120:509-516. (35) Jacobson NS, Follette WC, Revensdort D. Psychotherapy psychotherapy, treatment of mental and emotional disorders using psychological methods. Psychotherapy, thus, does not include physiological interventions, such as drug therapy or electroconvulsive therapy, although it may be used in combination with such methods. outcome research: methods for reporting variability and evaluating clinical significance. Behav Ther. 1984;15:336-352. (36) Christensen L, Mendoza JL. A method of assessing change in a single subject: an alteration Modification; changing a thing without obliterating it. An alteration is a variation made in the language or terms of a legal document that affects the rights and obligations of the parties to it. of the RC index [letter to the editor]. Behav Ther. 1986;17:305-308. (37) Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care. 1990;28:632-642. (38) Goldie PA, Matyas TA, Evans Ev·ans , Herbert McLean 1882-1971. American anatomist who isolated four pituitary hormones and discovered vitamin E (1922). OM. Deficit and change in gait velocity during rehabilitation after stroke. Arch Phys Med Rehabil. 1996;77: 1074-1082. (39) Bland M. An Introduction to Medical Statistics. 2nd ed. New York, NY: Oxford University Press; 1995. (40) Coakes SJ, Steed steed see nag. LG. SPSS Version 6.1 Analysis Without Anguish. Brisbane, Queensland Queensland, state (1991 pop. 2,477,152), 667,000 sq mi (1,727,200 sq km), NE Australia. Brisbane is the capital; other important cities are Gold Coast, Toowoomba, Townsville, Rockhampton, Cairns, and Ipswich. : Australia: John Wiley John Wiley may refer to:
(41) Hopman WM, Towheed T, Anastassiades T, et al. Canadian Canadian (kənā`dēən), river, 906 mi (1,458 km) long, rising in NE New Mexico. and flowing E across N Texas and central Oklahoma into the Arkansas River in E Oklahoma. normative data for the SF-36 health survey. CMAJ CMAJ Canadian Medical Association Journal . 2000;163:265-271. (42) Scott KM, Tobias MI, Sarfati D, Haslett SJ. SF-36 health survey reliability, validity and norms for New Zealand New Zealand (zē`lənd), island country (2005 est. pop. 4,035,000), 104,454 sq mi (270,534 sq km), in the S Pacific Ocean, over 1,000 mi (1,600 km) SE of Australia. The capital is Wellington; the largest city and leading port is Auckland. . Aust N Z J Pub Health. 1999;23:401-406. (43) Aaronson NK, Muller Mul·ler , Hermann Joseph 1890-1967. American geneticist. He won a 1946 Nobel Prize for the study of the hereditary effect of x-rays on genes. Mül·ler , Johannes Peter 1801-1858. M, Cohen cohen or kohen (Hebrew: “priest”) Jewish priest descended from Zadok (a descendant of Aaron), priest at the First Temple of Jerusalem. The biblical priesthood was hereditary and male. PD. Translation, validation See validate. validation - The stage in the software life-cycle at the end of the development process where software is evaluated to ensure that it complies with the requirements. , and norming of the Dutch language Dutch language, member of the West Germanic group of the Germanic subfamily of the Indo-European family of languages (see Germanic languages). Also called Netherlandish, it is spoken by about 15 million inhabitants of the Netherlands, where it is the national version of the SF-36 Health Survey in community and chronic disease populations. J Clin Epidemiol. 1998;51: 1055-1068. (44) Loge JH, Kaasa S. Short form 36 (SF-36) health survey: normative data from the general Norwegian Norwegian associated in some way with Norway. Norwegian buhund, Norwegian sheepdog a medium-sized (26-40 lb), spitz-type dog with a short, dense coat in wheaten, black, red or sable, sometimes with black markings on the face, ears population. Scand J Soc Med. 1998; 26:250 -258. (45) Jenkinson C, Wright L, Coulter A. Quality of Life Measurement in Health Care: A Review of Measures and Population Norms for the UK SF-36. Oxford, United Kingdom: Services Research Unit; 1993. (46) The International Quality of Life Assessment (IQOLA) Project. Available at: http://www.iqola.org. (47) Guyatt G, Walter S Wal·ter , Bruno 1876-1962. German conductor noted for his interpretations of Mozart and Mahler. Noun 1. Walter - German conductor (1876-1962) Bruno Walter , Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987; 40:171-178. (48) Hays Hays, city (1990 pop. 17,767), seat of Ellis co., W central Kans.; inc. 1885. It is a rail, trade, and medical center in a grain, cattle, and oil area. Manufactures include electronic equipment, plastics, feeds, medical supplies, aircraft, and motorcycles. RD, Hadorn D. Responsiveness to change: an aspect of validity, not a separate dimension. Qual Life Res. 1992;1:73-75. (49) Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997;50:239-246. (50) Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation. Control Clin Trials. 1991;12(suppl 4):142S-158S. (51) Cohen J. Statistical Power Analysis for the Behavioral Sciences behavioral sciences, n.pl those sciences devoted to the study of human and animal behavior. . New York, NY: Academic Press Inc; 1977. (52) Norman GR, Stratford PW, Regehr G. Methodological problems in the retrospective computation Computation is a general term for any type of information processing that can be represented mathematically. This includes phenomena ranging from simple calculations to human thinking. of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol. 1997;50:869-879. (53) Juniper juniper, any tree or shrub of the genus Juniperus, aromatic evergreens of the family Cupressaceae (cypress family), widely distributed over the north temperate zone. Many are valuable as a source of lumber and oil. EF, Guyatt GH, Feeny DH, et al. Measuring quality of life in childhood asthma. Qual Life Res. 1996;5:35-46. (54) Fairbank JCT, Pynsent PB. The Oswestry Disability Index. Spine. 2000;25:2940-2953. M Davidson, PT, BAppSc, is Lecturer lecturer A person who is primarily–if not entirely—involved in the teaching activities of an academic center, who is not expected to perform research or Pt management; in general, lectureships are non-tenured positions , School of Physiotherapy School of Physiotherapy is located in Lahore, Punjab, Pakistan. It is located in Mayo Hospital and is affiliated with King Edward Medical College. , La Trobe University 1. u/r = unranked 2.AsiaWeek is now discontinued. Student life During the 1970s and 1980s, La Trobe, along with Monash, was considered to have the most politically active student body of any university in Australia. , Bundoora, 3053, Melbourne, Australia (M.Davidson@latrobe.edu.au). Address all correspondence to Ms Davidson. JL Keating, PT, PhD, is Lecturer, School of Physiotherapy, La Trobe University. Ms Davidson provided concept/research design, writing, data collection and analysis, and project management. Dr Keating provided consultation (including review of manuscript manuscript, a handwritten work as distinguished from printing. The oldest manuscripts, those found in Egyptian tombs, were written on papyrus; the earliest dates from c.3500 B.C. before submission). This study was approved by the Human Ethics Committee ethics committee A multidisciplinary hospital body composed of a broad spectrum of personnel–eg, physicians, nurses, social workers, priests, and others, which addresses the moral and ethical issues within the hospital. See DNR, Institutional review board. of La Trobe University, This article was submitted October 18, 2000, and was accepted June 15, 2001. |
|
||||||||||||||||||||

si·mo
Printer friendly
Cite/link
Email
Feedback
Reader Opinion