Reliability of ratings across studies of the BASC.Abstract
Reliability estimates of behavioral rating scale ratings are influenced by sample composition and variability. This study describes and documents reliability reporting practices in dissertation studies that have used the Behavior Assessment System for Children (BASC BASc
1. Bachelor of Agricultural Science
2. Bachelor of Applied Science ) since its inception until 2001. Only 9 of 106 studies examined reported reliability for the subpopulation sub·pop·u·la·tion
A part or subdivision of a population, especially one originating from some other population: microbial subpopulations.
Noun 1. at hand. Most cited reliability scores from the BASC Manual. The lack of reliability score estimates for subpopulations in studies of the BASC has implications for the use of the BASC to help identify culturally diverse students with emotional disorders. The lack of reliability data for behavioral ratings suggests that studies using rating scales as the primary dependent variable may be inherently flawed.
Given the diversity of today's school populations, it is imperative that researchers investigate to what extent the ratings from behavioral rating scales used to help determine student eligibility for placement into special education services as defined by the Individuals with Disabilities Act (IDEA) for "seriously emotionally disturbed" (SED (1) (Stream EDitor) A Unix text editor that processes an entire file. It is the stream-oriented version of ed, an earlier text editor. Sed executes ed commands, but instead of editing one line at a time, sed applies the commands to the whole file. ) demonstrate reliability for the diverse subpopulations on which they are being utilized. This consideration is important since African Americans, Hispanics, Native Americans, and students with low socioeconomic status are overrepresented o·ver·rep·re·sent·ed
Represented in excessive or disproportionately large numbers: "Some groups, and most notably some races, may be overrepresented and others may be underrepresented" in special education programs for students with SED. That is, given the percentages of these students in a normal school population, they are more likely than Whites and students with middle and high socioeconomic status to be referred for, qualify for, and be placed into special education programs for SED than their counterparts from the dominant culture (Artiles & Trent, 1994; Dooley & Voltz, 1999; Grossman, 1995).
The authors' initial purpose for conducting this study was to conduct a reliability generalization gen·er·al·i·za·tion
1. The act or an instance of generalizing.
2. A principle, a statement, or an idea having general application. study (Thompson & Vacha-Haase, 2000; Vacha Haase, 1998; Vacha-Haase, Henson, & Caruso, 2002). Vacha-Haase (1998) listed three different uses or purposes of a reliability generalization study: (a) to describe the typical degree of reliability obtained from using a particular instrument, (b) to describe the extent to which reliability estimates obtained from using that instrument vary from study to study, and (c) to identify factors in the different assessment procedures that explain variation in the reliability of the scores or ratings from one measurement situation to another. Our intent was to describe the degree of reliability typically associated with BASC ratings used in doctoral dissertations, to describe how much the reported reliability estimates varied across studies, and then to identify factors that explained this variability. However, after identifying the many dissertations that used the BASC, we discovered that it was impossible to conduct a full- scale reliability generalization study for two reasons: (a) most of the dissertations reported reliability estimates from the BASC standardization sample instead of computing reliability estimates for their own data, and (b) the few studies that did compute and report the reliability estimates for the data in the study used different BASC scales or composites. Consequently, instead of conducting a full reliability generalization study, we conducted a descriptive study that describes and documents reliability reporting practices in dissertation studies that have used the BASC since its inception until the end of 2001. Similar studies have been conducted by Hogan, Benjamin, and Brezinski (2000), Vacha-Haase, Ness, Nilsson, and Reetz (1999), and Whittington (1998). The unique contribution of this study is the focus on doctoral dissertations that all used the BASC in one way or another. Since doctoral dissertations are reviewed by examining committees consisting of professors, we expected that most of the dissertation authors would report reliability coefficients for their own data rather than presuming pre·sum·ing
Having or showing excessive and arrogant self-confidence; presumptuous.
pre·suming·ly adv. that the obtained ratings were at the same degree of reliability as the ratings used in the BASC standardization sample and reported in the BASC manual. Vacha- Haase, Kogan, and Thompson (2000) coined the term reliability induction to refer to this inferential in·fer·en·tial
1. Of, relating to, or involving inference.
2. Derived or capable of being derived by inference.
in process of citing reliability coefficients from a previous study as the sole warrant for the reliability of new data. In order to justify such an inference, researchers should present evidence and make a case to show that the composition of their sample of ratees or examinees is similar to the warranting sample.
The authors chose to investigate reliability estimates from studies of the BASC since the authors of the BASC suggest its use to facilitate the eligibility determination for special education services in the category of SED (Reynolds & Kamphaus, 1992). The BASC was developed by Cecil R. Reynolds Cecil Randy Reynolds is an American psychology professor known for his work in psychological testing and assessment.
Reynolds was born on February 7, 1952 at the US Naval Hospital in Camp Lejeune, North Carolina. His father, Cecil C. at Texas A&M University and Randy W. Kamphaus at the University of Georgia Organization
The President of the University of Georgia (as of 2007, Michael F. Adams) is the head administrator and is appointed and overseen by the Georgia Board of Regents. . The BASC Manual indicates that the composites and scales may be useful in the separation of students with conduct disorders or social maladjustment social maladjustment Psychiatry An extreme difficulty in dealing appropriately with other people from students with serious emotional/behavioral dis orders. This separation has been difficult in the past but is required by IDEA, because students with conduct disorders and/or social maladjustment do not automatically qualify for special education services under the classification of emotional disturbance. The BASC was also devised to meet the requirements of IDEA to evaluate student personality, social history, and observed classroom behavior; hence, it is a multidimensional system similar to the one employed by Achenbach's Child Behavior Checklist (McConaughy & Ritter rit·ter
n. pl. ritter
[German, from Middle High German riter, from Middle Dutch ridder, from r , 1995; Wilder, 1999).
Reynolds (1989), one of the authors of the BASC, emphasized the importance of reliability:
Reliability may be the single most influential of psychometric concepts because of its relationship to all other psychometric characteristics. It is the foundation of validity, and classical psychometric theory is known as reliability theory (pp. 186).
Reynolds further suggested, "all neuropsychological neu·ro·psy·chol·o·gy
The branch of psychology that deals with the relationship between the nervous system, especially the brain, and cerebral or mental functions such as language, memory, and perception. measures must be evaluated for effects related to culture, ethnicity, gender, and other nominal variables as findings in this area do not generalize generalize /gen·er·al·ize/ (-iz)
1. to spread throughout the body, as when local disease becomes systemic.
2. to form a general principle; to reason inductively. well across tests or necessarily across nominal groupings" (p 182). For many years, scholars in educational psychology (Linn linn
1. A waterfall.
2. A steep ravine.
[Scottish Gaelic linne, pool, waterfall.] & Gronlund, 1995; Worthen, White, Fan, & Sudweeks, 1999) and special education (Taylor, 2000; Venn, 2000), the fields in which diagnosticians are trained to determine SED in students, have recognized the need for reliability of measurement (Reynolds, 1989; Obiakor & Schwenn, 1996; Wilder, Jackson, & Smith, 2001). Sue (1999) concurred that ratings from psychological measures should be cross-validated with different subpopulations, especially ethnic minority subpopulations, and that "one must question, doubt, or suspend judgment until sufficient information is available ... thoughtfully gather evidence and be persuaded by the evidence rather than by prejudice, bias, or uncritical thinking" (p. 1072). Individuals who use and interpret tests, rating scales, and other assessment procedures often use the term reliability as if it were a property of the assessment instrument. But reliability refers to the consistency of the scores or ratings obtained from using an assessment device (e.g., a test, rating scale, or scoring rubric RUBRIC, civil law. The title or inscription of any law or statute, because the copyists formerly drew and painted the title of laws and statutes rubro colore, in red letters. Ayl. Pand. B. 1, t. 8; Diet. do Juris. h.t. ) rather than to the instrument itself. Several scholars have emphasized this distinction in recent years (Thompson, 2003; Thompson & Vacha-Haase, 2000; Vacha-Haase, 1998; Worthen et al., 1999), but the idea is not new. In the first edition of his text on educational measurement and evaluation, Norman Gronlund asserted:
Reliability refers to the results obtained with an evaluation instrument and not to the instrument itself. Any particular instrument may have a number of different reliabilities, depending on the group involved and the situation in which it is used. Thus it is more appropriate to speak of the reliability of 'the test scores,' or of 'the measurement,' than of 'the test,' or 'the instrument ... (Gronlund, 1965, p. 80).
The durability of Gronlund's statement is evidenced by the fact that it appears in all eight editions of his text (1965, 1971, 1976, 1981, 1985, 1990, 1995, and 2000). The statement is slightly reworded to have even broader application in the 7 th and 8 th editions, but the meaning is essentially the same.' Three other textbooks (Crocker & Algina, 1986; Ebel, 1965; Guilford & Fruchter, 1973) include similar statements emphasizing that the reliability of scores or ratings obtained from using a particular instrument will likely vary from one group of examinees to another or from one assessment situation to another. Thorndike (1951) and other subsequent authors have attempted to classify and catalog the possible sources of measurement error that may cause inconsistencies (variability) in the scores or ratings that students obtain from a test or rating procedure (See Table 8 on p. 568 of Thorndike's chapter. His categories are reproduced in Table 13.1, p. 364 of Stanley, 1971). The various categories of error include:
1. Internal changes within the student that may affect the individual's performance (e.g., changes in health, motivation, or luck in guessing).
2. Changes in the procedures and conditions under which the instrument is administered (e.g., amount of time provided, distracting influences present).
3. Scoring errors (i.e., inaccuracies in the scoring or rating procedures from one rater rat·er
1. One that rates, especially one that establishes a rating.
2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater. to another or one from rating occasion to another).
4. Sampling error (i.e., differences in the tasks or problems included in different forms of the test or rating scale).
These factors are all part of the assessment procedure, but three of the four categories are external to the assessment instrument. Nevertheless, they all have the potential to produce inconsistencies in the scores or ratings obtained by an examinee from one assessment situation to another. To the degree that the ratings differ from one rater to another (summarized by a coefficient of interrater consistency), from one rating occasion to another (summarized by a coefficient of intrarater consistency or stability), or from one item or task to another (summarized by a coefficient of internal consistency In statistics and research, internal consistency is a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether several items that propose to measure the same general construct produce similar scores. ), then the ratings will include errors of measurement and will lack reliability. However, it is the scores that reflect these errors and that must be analyzed to check for degree of consistency. Since correlation coefficients are heavily influenced by the amount of variance in the scores or ratings being correlated, the same instrument administered under the same conditions by the same raters to a homogeneous group of students will produce quite different reliability estimates when administered to a more heterogenous (spelling) heterogenous - It's spelled heterogeneous. group of students.
Modern thinking in psychometric psy·cho·met·rics
n. (used with a sing. verb)
The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and theory states, "Because reliability can fluctuate, researchers should always examine the reliability of their data in hand and report these results even in nonmeasurement studies" (Capraro, Capraro, & Henson, 2001, p. 374). Thompson (2003) reiterated:
It is important to evaluate score reliability in all studies, because it is the reliability of the data in hand in a given study that will drive study results, and not the reliability of the scores described in the test manual. (p. 5)
It is not only wishful thinking wishful thinking Psychology Dereitic thought that a thing or event should have a specified outcome , but poor practice, for researchers to assume that the ratings they obtain will have the same degree of reliability as the original developers obtained when the instrument was constructed and standardized. Hence, researchers should get in the habit of computing and reporting reliability estimates for their own data.
In addition, reliability coefficients for students from the dominant culture may vary from those of groups of minority students, calling into question the appropriateness of an instrument for different groups. Believing that the process of reliability generalization would have merit as a means of investigating reliability differences across subpopulations, the authors conducted a descriptive study based on the reliability generalization of BASC ratings across studies. Much of the research and discussion about students with SED has focused on early intervention ear·ly intervention
n. Abbr. EI
A process of assessment and therapy provided to children, especially those younger than age 6, to facilitate normal cognitive and emotional development and to prevent developmental disability or delay. , service provision, and program efficacy (Kauffman, 2001; Zionts, Zionts, & Simpson, 2002). Less attention has been paid to the manner in which the target population is identified. The critical question in this study is whether BASC score reliability estimates are sufficiently high for diverse populations to justify the use of the BASC as an instrument to help determine eligibility of ethnically diverse students and of students with low socioeconomic status for special education services for the seriously emotionally disturbed.
The meta-analytic sample included 106 dissertations that used the BASC to collect data for their studies. We included all dissertations that met the initial criterion and were completed at institutions of higher learning in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. . We excluded masters theses.
Description of the Behavioral Assessment System for Children. Reynolds and Kamphaus developed the BASC as a way of quantifying the degree to which students in three age groups (preschool (4-5 years), elementary (611 years), and secondary (12-18 years)] display evidence of either a behavioral problem or emotional disturbance. The BASC consists of (a) two rating scales of observable behavior for completion separately by a parent and a teacher, (b) a self-report personality inventory Noun 1. self-report personality inventory - a personality inventory in which a person is asked which of a list of traits and characteristics describe her or him or to indicate which behaviors and hypothetical choices he or she would make
self-report inventory , (c) a structured developmental history, and (d) a student observation system. Behaviors are reported in five composite scales: (a) Internalizing Problems, (b) Externalizing Problems, (c) School Problems, (d) Other Problems, and (e) Adaptive Skills. All scales report deficit or problem behaviors except for the Adaptive Skills scale; it identifies personal attributes. Although the BASC is multidimensional, the present reliability generalization study focuses on studies that used child (C) and adolescent (A) parent and teacher rating scale composite ratings and the structure observation system. BASC norms are based on large, representative samples--the Teacher Rating Scale (TRS See traffic engineering methods.
TRS - term rewriting system ) on 1,259 children ages 6-11 and 809 adolescents and the Parent Rating Scale (PRS PRS Partnership (IRB)
PRS Printer (File Name Extension)
PRS Paul Reed Smith (Guitar Brand)
PRS Pairs (shoe industry) ) on 2,084 6-11 year olds and 1,090 youths. Clinical norms were drawn from children and adolescents in self-contained classrooms, mental health and juvenile detention centers, residential schools, and outpatient hospital and university centers (Flanagan, 1995).
Reliability data from the BASC Manual. The BASC Manual reports internal consistency, test- retest, and interrater reliability coefficients for the standardization samples. Reliability estimates are differentiated by age, gender, and clinical status of the student but not by ethnic background or socioeconomic status of the student. The authors examined child and adolescent age levels on all scales and composites. Coefficient alpha reliability estimates for the TRS range from .48 for 6-7 year old females on the Conduct Scale to .97 for boys ages 8-11 on the Adaptive Skills and Behavioral Symptoms Index Composite. The reported test-retest reliability coefficients for the TRS range from .70 for adolescents on the Withdrawal Scale to .95 for adolescents on the Externalizing Problems Composite. Interrater reliability coefficients range from .44 for children ages 6-11 on the Depression Scale to .93 for the same age group on the Learning Problems Scale. For the Parent Rating Scale (PRS), the reported coefficient alpha reliability estimates range from .42 for girls ages 6-7 on the Atypicality a·typ·i·cal also a·typ·ic
Not conforming to type; unusual or irregular.
atyp·i·cal Scale to .93 for boys ages 6-7 and girls ages 8-11 on the Adaptive Skills Composite. The reported test-retest reliability coefficients for the PRS range from .55 for adolescents (ages 12-18) on the Withdrawal Scale to .94 for children (ages 6-11) on the Internalizing Problems Scale. PRS interrater (mothers and fathers as raters) reliability estimates range from .46 for children on the Somatization somatization /so·ma·ti·za·tion/ (so?mah-ti-za´shun) the conversion of mental experiences or states into bodily symptoms.
n. Scale to .71 for adolescents on the Behavioral Symptoms Index Composite.
The technical information presented in the BASC Manual for the Self-Report Profile (SRP SRP - A data link layer protocol. ) indicates that coefficient alpha reliability estimates range from .54 for females ages 15-18 on the Self-Reliance Scale to .95 for females ages 8-11 on the Clinical Maladjustment maladjustment /mal·ad·just·ment/ (mal?ah-just´ment) in psychiatry, defective adaptation to the environment.
1. Faulty or inadequate adjustment.
2. Composite and males 1518 on the Emotional Symptoms Index Composite. SRP test-retest reliability estimates range from .68 for adolescents on the Self Reliance Scale to .96 for adolescents on the Emotional Symptoms Index Composite. The BASC Manual does not report reliability data for the Student Observation System (SOS SOS, code letters of the international distress signal. The signal is expressed in International Morse code as … — — — … (three dots, three dashes, three dots). ). Studies published in peer-reviewed journals were not numerous enough to warrant a meta-analysis of journal articles; therefore, the authors conducted a reliability study of dissertation studies using the BASC.
Sample of Completed Dissertations Studied
A search for dissertation studies from the ERIC, Digital Dissertations (ProQuest), and PsychLit databases using the keyword BASC identified 123 dissertations. Copies of the dissertations were (a) obtained through interlibrary in·ter·li·brar·y
Existing or occurring between or involving two or more libraries: an interlibrary loan; an interlibrary network. loan, (b) downloaded from Digital Dissertations (ProQuest), or (c) purchased from University Microfilms International University Microfilms International, UMI, was founded in the 1930s by Eugene Power in Ann Arbor. By June of 1938, Power worked in two rented rooms from a downtown Ann Arbor funeral parlor, specializing in microphotography to preserve libraries. (UMI UMI University Microfilms International
UMI United States Minor Outlying Islands (ISO Country code)
UMI University of Miami
UMI Universal Management Infrastructure (IBM) ), Lansing, Michigan “Lansing” redirects here. For other uses, see Lansing (disambiguation).
Lansing is the capital city of the U.S. state of Michigan, and the state's sixth largest city. . Research assistants entered the dissertation author, year, institution, title, status (received or not), and then the agreed upon Adj. 1. agreed upon - constituted or contracted by stipulation or agreement; "stipulatory obligations"
noncontroversial, uncontroversial - not likely to arouse controversy coded category into an EXCEL file. The researchers trained two raters to read and code the studies. Raters coded dissertations as follows:
1=False hits: the BASC was not mentioned in the dissertation study.
2=No reliability coefficients reported. Researchers did use the BASC
to collect for the study.
3=Reported only reliability coefficients given in the BASC Manual or some other previous study.
4=Reported reliability coefficients from the BASC Manual plus reliability estimates computed specifically for the scores analyzed in the present study.
Two raters rated the first 81 dissertations eventually agreeing on every rating. One rater rated the remaining dissertations with a professor rechecking each dissertation that contained reliability estimates for the current study subpopulation. Raters then entered additional information from the studies coded as 3s and 4s into the EXCEL file, including the classification (i.e., codes 1-4), the specific BASC scales used (PRS, TRS and child or adolescent), the type of reliability reported (test-retest, internal consistency, or interrater), and the reliability coefficients reported.
Of the 123 hits in the literature search, five were false hits and were excluded from further analysis. In addition, one dissertation could not be obtained (because the home university library had lost it and it had not yet been processed by UMI); nine mentioned the BASC but did not use it to collect data for the study, and two were removed from further analysis because they were masters theses rather than dissertations.
Of the 106 remaining dissertation studies, seven researchers (6.6%) used the BASC to collect data but did not mention reliability estimates in their analyses, and 90 researchers (84.9%) reported reliability estimates of the BASC from the BASC Manual or from previous studies (see Table 1). Twenty-eight (26.4%) of the 106 dissertation studies on the BASC were completed at the universities where the developers of the BASC teach, (i.e., Texas A&M and the University of Georgia). Hofstra University Hofstra University (hŏf`strə, hôf`–), at Hempstead, N.Y.; coeducational. Founded as a division of New York Univ. in 1935, it became independent in 1940, and its name was changed to Hofstra College. , Ball State University, and Texas Women's University accounted for another 19 (17.9%). A complete list of the dissertations analyzed in this study is available by contacting the authors.
Only nine (8.5%) of the 106 dissertation authors reported some form of reliability estimates for the behavioral ratings collected specifically for analysis in their study. Most of these nine studies also quoted the reliability estimates for the BASC standardization sample as reported in the BASC Manual. These nine scholars who reported reliability information for their own data are listed in Table 2. Four of these nine were from Texas A&M University and two were from Texas Women's University.
None of these nine studies analyzed data from all five BASC components. Some studies analyzed data from the Self-Report of Personality (SRP), some used the Teacher Rater Scale (TRS-C or TRS-A), while some others used the Parent Rater Scale (PRS-C or PRS-A), and one study used only the Student Observation System (SOS). Most of the authors analyzed data for multiple scales or composites within a component, but one study (Serrano ser·ra·no
n. pl. ser·ra·nos
A cultivar of the tropical pepper Capsicum annuum having small, blunt, highly pungent red or green fruit used in cooking. , 1996) used only the Aggression scale of the BASC TRS-C. These differences among the nine studies severely limited the number and kinds of comparisons we could make.
When we began this study we anticipated that we would find at least some dissertations that reported comparable reliability estimates for various BASC components for the BASC ratings reported in their study. To make such comparisons, we needed to find at least two dissertations that reported the corresponding types of reliability coefficients.
Test-retest Reliability Coefficients
Not one of the dissertations we examined reported test-retest reliability coefficients for the sample of BASC ratings collected for the specific study at hand. Consequently, we have no way of estimating variability in stability coefficients from one dissertation to another. This complete lack of information may be due to the difficulty and expense of asking raters to rate each child on a second occasion.
Interrater Reliability Coefficients
The only original reliability estimates that were comparable were in the studies completed by Wootten (1998) and Neill (2001). Each of these studies reported interrater reliability coefficients showing the correlation between ratings assigned by teachers using the TRS and ratings obtained from parents using corresponding scales or composites from the PRS. Table 3 shows the reported interrater reliability estimates reported in these two studies for the Child Forms of the TRS and PRS. Table 4 shows the estimated interrater reliability coefficients for the Adolescent Forms of the TRS and PRS.
Inspection of Tables 3 and 4 shows that the correlation between ratings obtained from parents and ratings obtained from teachers is consistently low on all of the scales and composites. Low rates of cross-informant agreement are not unusual. In a meta-analysis of cross-informant studies, Achenbach, McConaughy, and Howell (1987) found low rates (.20- .30) of cross-informant agreement between parents/teachers and youth on behavioral rating scales reported in the literature. The point that is more pertinent to the purposes of this study is the variability in corresponding reliability coefficients across the two studies. Ideally, we would liked to have had interrater reliability estimates for the TRS scales from many other dissertations, but these two were the only ones for which this type of comparison was possible. It is obvious from the results reported in Tables 3 and 4 that the reliability of scores from each of the scales and composites varies considerably from one study to another and that reliability is not an invariant (programming) invariant - A rule, such as the ordering of an ordered list or heap, that applies throughout the life of a data structure or procedure. Each change to the data structure must maintain the correctness of the invariant. number that inures within a particular scale.
Internal Consistency Reliability Coefficients
We understand that it is difficult and often not feasible to collect the data necessary to estimate test-retest reliability and interrater reliability. However, internal consistency reliability estimates can be readily computed from a single set of ratings of a group of children collected on a single occasion. All the researcher has to do is apply the formula for Cronbach's alpha Cronbach's (alpha) has an important use as a measure of the reliability of a psychometric instrument. It was first named as alpha by Cronbach (1951), as he had intended to continue with further instruments. coefficient (Worthen et al., 1999) to a single set of ratings from a single group of children. This statistic can be readily computed from data entered into a spreadsheet, but it is even easier to compute with the SPSS A statistical package from SPSS, Inc., Chicago (www.spss.com) that runs on PCs, most mainframes and minis and is used extensively in marketing research. It provides over 50 statistical processes, including regression analysis, correlation and analysis of variance. software package. Hence, we were surprised to discover that out of all the dissertations we examined we could not find at least two that reported alpha coefficients for the data that was original to a given study. Consequently, we were unable to examine the variability of internal consistency reliability coefficients across dissertations.
Since the overrepresentation of some minorities in programs for students with SED remains problematic, it is imperative that raters measure student behavior consistently, accurately, and without bias across subpopulations during initial identification procedures. Reliability estimates of behavioral rating scale ratings for different subpopulations of students are influenced by sample composition and variability, and they frequently differ across studies. Therefore, it is useful for researchers to demographically describe their sample subpopulations and to determine the reliability of scores for their specific subpopulations. All research studies in special education and behavioral disorders that rely on rating scales for their primary dependent measure and do not report reliability for the study at hand should be critically examined. Most research using rating scales is inherently flawed due to limited or unknown reliability. This has immediate implications for researchers at all levels as they plan and pursue their current research agendas and advise doctoral students in research. The lack of reliability of rating scale scores appears to have serious implications for the process by which we identify students with disabilities, including students of culturally diverse backgrounds.
A few dissertations that were examined are of interest given our original research question, Are score reliability estimates sufficiently high for diverse populations to justify the use of the BASC for ethnically diverse students and for students with low socioeconomic status? Some studies, for example, Assing's study and Fronterra-Benenevutti's study, examined diverse populations using the BASC, but, unfortunately, did not report reliability estimates for their diverse populations. Assing's dissertation used the PRS-C (Child version in Spanish) and was conducted in South Florida with the parents of 402 Spanish-speaking students K-5. A PRS-C differential item functioning Differential item functioning (DIF) occurs when people from different groups (commonly gender or ethnicity) with the same latent trait (the same ability/skill) have a different probability of giving a certain response on a questionnaire or test. analysis indicated that 73% of items functioned differently for Spanish-speaking parents than for the norm group. She reported no reliability data for her ethnic minority population. Frontera-Benevenutti's study used the BASC Student Questionnaire, apparently a forerunner A family of ATM adapters from Marconi (formerly Fore Systems). See Marconi. of the SRP, and took place in Puerto Rico Puerto Rico (pwār`tō rē`kō), island (2005 est. pop. 3,917,000), 3,508 sq mi (9,086 sq km), West Indies, c.1,000 mi (1,610 km) SE of Miami, Fla. , but failed to report reliability estimates for the population at hand. Serrano (1996), James (1995), and Mayfield (1996) used ethnically diverse populations from the standardization sample in their studies. Neill (2001) did report reliability estimates for the subpopulation studied, noting only that students qualified for SED and were from suburban school districts in the southern U.S., but gave no further useful demographic information on the sample.
In light of the above challenges, we recommend that researchers both (a) report reliability coefficients for the ratings obtained in their study, and (b) give detailed demographic information on study participants, so that those interested in the reliability estimates for diverse populations can determine to what extent ratings are reliable for specific diverse populations of students. Although test bias is usually associated with validity scores and not reliability scores, we believe reliability should be reported for every study and particularly noted for diverse populations to ensure that the instrument functions effectively for the population at hand.
The clear indication in this investigation was that the researchers of a very small percentage of studies on the BASC or using the BASC as instrumentation for a study (9%) attended to the reliability estimates of ratings from the subpopulation using the behavioral rating scale. The dissertation authors that did report reliability for their rating data were so few that comparisons across socioeconomic or ethnic groups or any other subpopulations were not possible. This finding is alarming since the BASC is sometimes used to help determine special education eligibility for students with possible serious emotional disturbances from many cultures and the reliability of its ratings for specific diverse populations is unknown. We make the assumption that most dissertation researchers maintain the false sense of security that reliability estimates from the BASC Manual suffice for the ratings of all students.
Given the paucity pau·ci·ty
1. Smallness of number; fewness.
2. Scarcity; dearth: a paucity of natural resources. of reliability data at hand for the studies we examined, graduate programs in universities are not emphasizing the importance of collecting and reporting data on the reliability of behavioral rating scale ratings in dissertation studies. Further investigation of studies of other behavioral rating scale ratings is warranted to determine how serious a problem this may be. If graduate school faculty are not teaching the principle of behavioral rating score reliability estimation, then studies using the instruments cannot assist researchers in deciding reliability estimates and appropriate use for diverse subpopulations. We recommend that the BASC be used cautiously with ethnically diverse students because reliability estimates for ethnically diverse students and students with low socioeconomic status has not yet been determined.
The authors began two years ago to conduct a reliability generalization study on another behavioral rating scale used to determine special education eligibility for serious emotional disturbance This article requires authentication or verification by an expert.
Please assist in recruiting an expert or [ improve this article] yourself. See the talk page for details. , the Burks' Behavior Rating Scale, published in 1977 by Western Psychological Services. Although the Burks' has been in use for 24 years, an extensive search of the ERIC, PsychLit, and social science databases yielded 0 false hits, 5 studies in which Burks'was mentioned but not used in data collection, 20 articles in which Burks'was used for data collection that reported reliability estimates from the manual or other studies, and only 2 articles that reported studies using the Burks' that gave reliability estimates for the ratings from the study samples. Obviously, 2 studies were too few to conduct an RG analysis, so another behavioral rating scale, the BASC, was selected. A preliminary literature search of the BASC yielded 6 false hits, no articles in which the BASC was used reporting reliability estimates from the manual or other studies, and only 3 articles that reported studies using the BASC that gave reliability estimates of the ratings from the study samples. Of course, 3 journal articles reporting reliability estimates for BASC ratings were not sufficient data for a reliability generalization study. For this reason, the authors decided to examine dissertation studies of the BASC.
In the current study, we expected to determine whether the BASC score estimates were reliable for specific diverse subpopulations. Certainly well designed dissertation studies completed under the tutelage TUTELAGE. State of guardianship; the condition of one who is subject to the control of a guardian. of a committee of seasoned professors would report reliability estimates; unfortunately, Vacha-Haase, Kogan, Tani, and Woodall (2001) regret, as do we, that "the reporting of reliability coefficients for the data in hand is often the exception rather than the norm" (p. 46). Indeed, the authors of most of the studies (n=90 of 106) examined in this reliability generalization study ignored reliability data for the particular subpopulations in their studies and simply reported reliability correlates from the BASC Manual or from previous studies. We can do better in our efforts to combat overrepresentation of culturally diverse students in the assessment and identification of SED processes.
Table 1 Studied Dissertations Classified by Type of Reliability and Degree Granting Institution Reported only reliability No reliability coefficients given in the Degree Granting coefficients BASC Manual or some Institution reported other previous study Texas A&M 2 7 University of Georgia 2 13 Hofstra University 0 8 Ball State University 0 6 Texas Women's University 0 4 Miami Institute of Psychology 0 2 University of South Carolina 0 2 Georgia State University 0 2 Temple University 0 2 Other Colleges & Universities 3 44 Total 7 90 Percent 6.6 84.9 Reported reliability coefficients for the present study scores or reported reliability coefficients from the BASC Manual plus reliability estimates computed specifically for the scores Numbers of Percent Degree Granting analyzed in the Dissertations of Institution present study Examined Total Texas A&M 4 13 12.3 University of Georgia 0 15 14.2 Hofstra University 0 8 7.5 Ball State University 0 6 5.7 Texas Women's University 1 5 4.7 Miami Institute of Psychology 1 3 2.8 University of South Carolina 0 2 1.9 Georgia State University 0 2 1.9 Temple University 0 2 1.9 Other Colleges & Universities 3 50 46.7 Total 9 106 100 Percent 8.5 Table 2. Dissertations That Reported Reliabilityfor Scores Analyzed in the Current Study Classified by BASC Component, Form, and Scales Used BASC Component Author of and Form Used Dissertation and BASC Scales or Composites Year Completed Used in Study Self- Report of Personality (SRP) Child Form Neill (2001) 8 clinical scales, 4 adaptive scales, and 4 composite scores Adolescent Form James (1995) 14 scales (pre-publication, trial version) H.C. Stanton (1995) All 1(1 clinical scales plus some proposed revised scales Miao-jungLin(1998) 2 clinical scales, 4 adaptive scales, and the Personal Adjustment composite Neill (2001) 10 clinical scales, 4 adaptive scales, and 4 composite scales Teacher Rater Scale (TRS) Child Form Serrano (1996) Aggression scale only Wootten (1998) 10 clinical scales, 4 adaptive scales, and 5 composite scores Neill (2001) 10 clinical scales, 4 adaptive scales, and 5 composite scores Adolescent Form Wootten (1998) 10 clinical scales, 3 adaptive scales, and 5 composite scores Neill (2001) 10 clinical scales, 3 adaptive scales, and 5 composite scores Parent Rater Scale (PRS) Child Form Serranno (1996) Aggression scale only Mayfield (1996) 10 scales (pre-publication, trial version) Wootten (1998) 9 clinical scales, 3 adaptive scales, and 4 composite scores Neill (2001) 9 clinical scales, 3 adaptive scales, and 4 composite scores Adolescent Form Keith (1996) 1 adaptive scale and 4 composite scores Wootten (1998) 9 clinical scales, 2 adaptive scales, and 4 composite scores Neill (2001) 9 clinical scales, 2 adaptive scales, and 4 composite scores Student Observation Ramsay (1997) 4 adaptive behavior System (SOS) categories, and 9 maladaptive behavior categories Table 3. Correlations between PRS-C Ratings and TRS-C Ratings Reported by Wootten (1998) and Neill (2001) Scale/Composite Wootten (n = 70) Neill (n = 132) Scales Adaptability .17 .05 Aggression .41 .10 Anxiety .17 -.03 Attention Problems .26 .18 Atypicality .28 .11 Conduct Problems .39 .24 Depression .21 .04 Hyperactivity .30 .14 Leadership .24 .05 Social Skills .33 .10 Somatization .16 .16 Withdrawal .24 .10 Composites Externalizing Problems .51 .12 Internalizing Problems .29 .03 Adaptive Skills .29 -.04 Behavioral Symptoms Index .28 .02 Table 4. Correlations between PRS-A Ratings and TRS-A Ratings Reported by Wootten (1998) and Neill (2001) Scale/Composite Wootten (n = 70) Neill (n = 132) Scale Aggression -.03 .26 Anxiety .03 .09 Attention Problems .16 .20 Atypicality .04 -.01 Conduct Problems .16 .37 Depression .03 .23 Hyperactivity .03 .13 Leadership .19 .04 Social Skills .15 .07 Somatization -.07 .25 Withdrawal .34 .23 Composites Externalizing Problems .04 .30 Internalizing Problems .03 .22 Adaptive Skills .10 .02 Behavioral Symptoms Index .04 .11
(1.) Norman Gronlund was the sole author of the first five editions. Robert Linn For the composer, see .
Robert P. Linn (b. December 27, 1908, d. August 22, 2004) was the longest-serving mayor in the United States. Linn, a Republican, served 58 years as the mayor of Beaver, Pennsylvania, a small town (borough) around 25 miles northwest of Pittsburgh. became second author of the 6th edition and first author of the 7th and 8th editions with Gronlund listed as second author. The title of the book was also changed slightly in the 7th and 8th editions
Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin, 101 (2), 213-232.
Artiles, A. J., & Trent, S. C. (1994). Overrepresentation of minority students in special education: A continuing debate. Journal of Special Education, 27, 410-437.
Assing, R. (2000). Differential item functioning in the BASC PRS-C by ethnic group Spanish speaking and standardization sample. Tampa, FL: University of South Florida
• • [ .
Capraro, M. M., Capraro, R. M., & Henson, R. K. (2001). Measurement error of scores on the mathematics anxiety rating scale across studies. Educational and Psychological Measurement, 61 (3), 373-386.
Crocker, L., & Algina, J. (1986). Introduction to classical and modem test theory. New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of : Holt, Rinehart, and Winston.
Dooley, E. A., & Voltz, D. L. (1999). Educating the African-American exceptional learner. In F. E. Obiakor, J. O. Schwenn, & A. F. Rotatori (Eds.), Advances in special education: Multicultural education for learners with exceptionalities. (pp. 15-32). Stamford, CT: JAI JAI Java Advanced Imaging
JAI Justice et Affaires Interiéures (French: Justice and Home Affairs)
JAI Journal of ASTM International
JAI Just An Idea
JAI Jazz Alliance International
JAI Joint Africa Institute Press.
Ebel, R.L. (1965). Measuring educational achievement. Englewood Cliffs, NJ: Prentice-Hall.
Ebel, R.L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall.
Flanagan, R. (1995). A review of the Behavior Assessment for Children (BASC): Assessment consistent with the requirements of the Individuals with Disabilities Education Act
Some statements may be disputed, incorrect, , biased or otherwise objectionable.
Fronterra-Benevenutti, R. L. (1991). A comparison of two independent Spanish translations for the student questionnaire of the Behavior Assessment System for Children (BASC). College Station,TX: Texas A&M University.
Gronlund, N.E. (1965). Measurement and evaluation in teaching. New York: Macmillan.
Gronlund, N. E. & Linn, R. L. (1990). Measurement and evaluation in teaching (6 th ed.). New York: Macmillan.
Grossman, H. (1995). Special education in a diverse society. Needham Heights, MA: Allyn-Bacon.
Guilford, J.P. & Fruchter, B. (1973). Fundamental statistics in psychology and education (5th ed.). New York: McGraw-Hill.
Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use on the various types. Educational and Psychological Measurement, 60, 523- 531.
Kauffman, J. (2001). Characteristics of emotional and behavioral disorders in children and youth (7th ed.).Upper Saddle River Saddle River may refer to:
In 1913, law professor Dr. .
Linn, R.L. & Gronlund, N.E. (1995). Measurement and assessment in teaching (7th ed.). Englewood Cliffs, NJ: Merrill.
McConaughy, S. H. & Ritter, D. R. (1995). Best practices in multidimensional ssessment of emotional or behavioral disorders. In A. Thomas & J. Grimes Grimes is a surname, that is believed to be of a Scandinavian decent and may refer to
Obiakor, F. E., & Schwenn, J. O. (1996). Assessment of culturally diverse students with behavior disorders. In F. E. Obiakor, J. O. Schwenn, & A. F. Rotatori (Eds.), Advances in special education: Assessment and psychopathology psychopathology /psy·cho·pa·thol·o·gy/ (-pah-thol´ah-je)
1. the branch of medicine dealing with the causes and processes of mental disorders.
2. abnormal, maladaptive behavior or mental activity. issues in special education (pp. 37-57). Stamford, CT: JAI Press.
Reynolds, C. R. (1989). Measurement and statistical problems in neuropsychological assessment Neuropsychological assessment was traditionally carried out to assess the extent of impairment to a particular skill and to attempt to locate an area of the brain which may have been damaged after brain injury or neurological illness. . In C. R. Reynolds & E. Fletcher-Jantzen (Eds.), Handbook of clinical child neuropsychology neuropsychology
Science concerned with the integration of psychological observations on behaviour with neurological observations on the central nervous system (CNS), including the brain. . (pp. 180-203). New York: Plenum In a building, the space between the real ceiling and the dropped ceiling, which is often used as an air duct for heating and air conditioning. It is also filled with electrical, telephone and network wires. See plenum cable. Press.
Reynolds, C. R., & Kamphaus, R. W. (1992). Manual: Behavior Assessment System for Children. Circle Pines, MN: American Guidance Service.
Stanley, J.C. (1971). Reliability. In R.L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 356-442). Washington, DC: American Council on Education.
Sue, S. (1999). Science, ethnicity, and bias: Where have we gone wrong? American Psychologist, 54 (12), 1070-1077.
Taylor, R. L. (2000). Assessment of Exceptional students: Educational and Psychological procedures. Boston, MA: Allyn and Bacon.
Thompson, B. (Ed.) (2003). Score reliability. Contemporary thinking on reliability issues. Thousand Oaks Thousand Oaks, residential city (1990 pop. 104,352), Ventura co., S Calif., in a farm area; inc. 1964. Avocados, citrus, vegetables, strawberries, and nursery products are grown. , CA: Sage Publications This article or section needs sources or references that appear in reliable, third-party publications. Alone, primary sources and sources affiliated with the subject of this article are not sufficient for an accurate encyclopedia article. .
Thompson, B., & Vacha-Haase, T. (2000). Psychometrics psychometrics
Science of psychological measurement. Psychometricians design and administer psychological tests (see psychological testing), both to generate empirical data on mental processes and to refine their understanding of measurement techniques and the is datametrics: The test is not reliable. Educational and Psychological Measurement, 60 (2), 174-195.
Thorndike, R.L.(1951), Reliability. In E.F. Lindquist (Ed.), Educational measurement (pp. 560-620). Washington, DC: American Council on Education.
Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58 (1), 6-20.
Vacha-Haase, T., Henson, R. K., & Caruso, J. C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562-569.
Vacha-Haase, T., Kogan, L., Tani, C. R., & Woodall, R. A. (2001). Reliability generalization: Exploring reliability coefficients of MMPI MMPI
Minnesota Multiphasic Personality Inventory
MMPI Child psychiatry A personality assessment tool widely used in making psychologic evaluations, which is normally given at age 16 and older. Personality testing clinical scales scores. Educational and Psychological Measurement, 61 (1), 45-59.
Vacha-Haase, T., Ness, C., Nilsson, J., & Reetz, D. (1999). Practices regarding reporting of reliability coefficients: A review of three journals. Journal of Experimental Education, 67, 335-341.
Venn, J. (2000). Assessing students with special needs. New York: Merrill Publishing.
Wilder, L. K. (1999). Student versus teacher perception of student behavior for youth with emotional/ behavioral disorders: Accurate assessment. (Doctoral Dissertation) Lansing, MI: University Microfilms International.
Wilder, L. K., Jackson, A. P., & Smith, T. B. (2001). Secondary transition of multicultural learners: Lessons from the Navajo Native American experience. Preventing School Failure, 45 (3), 119-124.
Worthen, B.R., White, K.R., Fan, X., & Sudweeks, R.R (1999). Measurement and assessment in schools (2nd ed.). New York: Addison Wesley Longman.
Whittington, D. (1998). How well do researchers report their measures?: An evaluation of measurement in published educational research. Educational and Psychological Measurement, 58,21-37.
Zionts, P., Zionts, L., & Simpson, R. L. (2002). Emotional and behavioral problems: A handbook for understanding and handling students. Thousand Oaks, CA: Corwin Press.
List of Dissertations from Table 2
James, E. M. (1995). A test of Harrington's experimental model of ethnic bias in testing applied to a measure of emotional functioning in adolescents. College Station, TX Texas A&M University.
Keith, L. K. (1996). Construction and concurrent and contrasted groups validation of the Clinical Assessment of Behavior Scale: Parent form. Memphis, TN: The University of Memphis The University of Memphis is a public research university located in Memphis, Tennessee, United States, and is a flagship public research university of the Tennessee Board of Regents system. .
Lin, Miao-Jung (1998). Attachment of parents and peers: Impact on adolescents psychosocial psychosocial /psy·cho·so·cial/ (si?ko-so´shul) pertaining to or involving both psychic and social aspects.
Involving aspects of both social and psychological behavior. adjustment in interpersonal relationships in Taiwan. Greeley, CO: University of Northern Colorado It has been suggested that this article or section be merged with and ()
University of Northern Colorado (Northern Colorado) .
Mayfield, J. W. (1996). Are ethnic differences in diagnosis of childhood psychopathology an artifact A distortion in an image or sound caused by a limitation or malfunction in the hardware or software. Artifacts may or may not be easily detectable. Under intense inspection, one might find artifacts all the time, but a few pixels out of balance or a few milliseconds of abnormal sound of psychometric methods? An experimental evaluation of Harrington's hypothesis using parent reported symptornatology. College Station, TX Texas A&M University.
Neill, D. L. (2001). Emotionally disturbed (ED) profiles in children and adolescents from parents, teachers, and students. Denton, TX Texas Women's University.
Ramsay, B. A. (1997). Standardization of the Roberts Apperception apperception /ap·per·cep·tion/ (ap?er-sep´shun) the process of receiving, appreciating, and interpreting sensory impressions.
1. Test with Haitian children. Miami, FL: Miami Institute of Psychology of the Caribbean Center for Advanced Studies.
Serrano, C. V. (1996). Inter-rater reliability of aggression among three ethnic groups. College Station, TX Texas A&M University.
Stanton, H. C. (1995). Expert versus statistical scales in the diagnosis of adolescent psychopathology with BASC self-Report of Personality-Adolescent. College Station, TX Texas A&M.
Wootten, S. A. (1998). Attention deficit/hyperactivity disorder (ADHD Attention-Deficit/Hyperactivity Disorder (ADHD) Definition
Attention-deficit/hyperactivity disorder (ADHD) is a developmental disorder characterized by distractibility, hyperactivity, impulsive behaviors, and the inability to remain focused on tasks or ) profiles in children and adolescents from parent and teacher reports on the BASC. Denton, TX Texas Women's University.
Address: Lynn K. Wilder, Ed.D., Assistant Professor of Special Education, Department of Counseling Psychology Counseling psychology as a psychological specialty facilitates personal and interpersonal functioning across the life span with a focus on emotional, social, vocational, educational, health-related, developmental, and organizational concerns. and Special Education, Brigham Young University Brigham Young University, at Provo, Utah; Latter-Day Saints; coeducational; opened as an academy in 1875 and became a university in 1903. It is noted for its law and business schools. , 340-H MCKB, Provo, LIT 84602. 801-422 1237 (w), 801 422-3961 (fax), email@example.com