Printer Friendly
The Free Library
14,497,195 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

The psychometric benefits of soft-linked items: a reply to Pope and Harley.


In this issue, Pope and Harley criticized our recent work with soft-linked items (Loerke, Jones, and Chow, 1999), claiming that soft-linked items are not independent, and thus, violate the basic assumption of classical test theory. Furthermore, they claim that our findings that soft-linked items had better point-biserial correlation coefficients The point biserial correlation coefficient (rpb) is a correlation coefficient used when one variable (e.g. Y) is dichotomous; Y can either be 'naturally' dichotomous, like gender, or an artificially dichotomized variable.  (PBCCs) than hard-linked items could have been predicted by "common sense," in that they simply reflect a higher proportion correct for soft-linked items. Because an examinee's response to an initial item has no effect on the scoring of the second item in a soft-linked pair, soft-linked items clearly meet independence, and cause no problem for classical test theory. Since the scoring outcomes of hard-linked items are more likely to be consistent (correct or incorrect) than those for soft-linked items, common sense suggests that hard-linked items would produce higher PBCCs than soft-linked items. In addition, it is pointed out that Pope and Harley's lack of understanding of the concepts of local independence and unidimensionality within the framework of item-response theory may have provided them with nebulous logic that will confuse readers.

**********

Background

We recently presented evidence that soft-linked items have better psychometric psy·cho·met·rics  
n. (used with a sing. verb)
The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and
 properties than hard-linked items in achievement tests (Loerke, Jones, & Chow, 1999). Pope and Harley (in the current issue of this journal) have criticized this work, claiming that the findings are "common sense" and that differences between hard-linked items and soft-linked items are due to an increased probability that soft-linked items will be scored as correct. Pope and Harley also claimed that soft-linked items are not independent. We acknowledge that there is indeed a link artifact A distortion in an image or sound caused by a limitation or malfunction in the hardware or software. Artifacts may or may not be easily detectable. Under intense inspection, one might find artifacts all the time, but a few pixels out of balance or a few milliseconds of abnormal sound , but this artifact exists only for hard-linked items and not for soft-linked items. We would like to point out that soft-linked items are indeed independent (this is the major appeal of them) and it is hard-linked items that are not.

Linked items are items in which the examinee uses his/her answer from one item to compute an answer for a second item, typically using the numerical response format. These items are often used in multi-step calculations, allowing hierarchical computer scoring of complex reasoning. Linked items are a computer-scorable alternative to constructed response questions.

Hard-linked items require the examinee to get the first linked item correct before any of the subsequent linked items may be answered correctly. Thus, hard-linked items have a fixed key. Conversely con·verse 1  
intr.v. con·versed, con·vers·ing, con·vers·es
1. To engage in a spoken exchange of thoughts, ideas, or feelings; talk. See Synonyms at speak.

2.
, soft-linked items do not require the examinee to answer the first linked item correctly to have his/her response scored as correct on the subsequent linked item; that is, soft-linked items have an adaptive key. One method of accomplishing this is by a computer algorithm that generates the appropriate keys for the soft-linked items on the basis of the response given to the previous item. Using this method, examinees are not penalized pe·nal·ize  
tr.v. pe·nal·ized, pe·nal·iz·ing, pe·nal·iz·es
1. To subject to a penalty, especially for infringement of a law or official regulation. See Synonyms at punish.

2.
 twice for a single incorrect response if the soft-linked item is answered correctly using the initial incorrect answer. Although several items may be linked together (nested linking), we will limit our discussion to the simple case of a single item linked with an initial item.

Linked Items and Independence

Pope and Harley made the claim that linked items, in general, violate independence. However, they failed to make the distinction between the two types of linked items. Item independence is a fundamental assumption of classical test theory that states that item responses are randomly related when ability (_) is held constant (Nunnally & Bernstein, 1994). Thus, all shared variance between items that is not explained by _must be due to pure random error. Alternatively phrased, score on an item must be a function of_ only, not a function of score on any other item or a function of any other trait.

Hard-linked items violate independence because an incorrect response to the initial item automatically makes the second one incorrect (the items have correlated error terms). However, soft-linked items meet the independence assumption because the response given to the initial item has no effect on the scoring of the second item. Thus, soft-linked items have independent error terms, similar to any other two items on a test taken at random.

Further, Pope and Harley erroneously proposed that it is "common sense" that soft-linked items would on average have higher point-biserial correlation coefficients (PBCCs) than hard-linked items because soft-linked items are more likely to be scored as correct compared to their hard-linked counterparts. Since proportion correct plays an important role in the calculation of PBCC PBCC Palm Beach Community College
PBCC Packet Binary Convolutional Coding (RF modulation)
PBCC Pennsylvania Breast Cancer Coalition
PBCC Pitney Bowes Credit Corporation
PBCC PowerBASIC Console Compiler
PBCC Peninsula Bible Church Cupertino
, Pope and Harley's intuition that soft-linked items would have higher PBCCs than hard-linked items is a possibility, but it is not the only possibility.

PBCC is basically a correlation between item score (dichotomously-scored as "correct" or "incorrect") and the total test score. An item with a high PBCC indicates that examinees with high test scores tend to answer it correctly and examinees with low test scores tend to answer it incorrectly. Conversely, items with negative PBCCs tend to be answered correctly primarily by examinees with low test scores. An alternative common sense approach to Pope and Harley's prediction would be that since the outcomes of two items that are hard-linked are more likely to be the same (either correct or incorrect) than the outcomes of two items that are soft-linked, hard-linked items are expected to positively correlate more highly with total test score than soft-linked items because for an examinee to get the second hard-linked item correct, he must have answered the initial item correctly. On the other hand, if an examinee was scored as correct on the second soft-linked item, he may or may not have answered the initial item incorrectly. Thus, even though more examinees tend to get soft-linked items correct, examinees that get hard-linked items correct tend to have higher test scores because they must also have the initial item correct. This results in the prediction that the PBCCs for hard-linked items should be higher than soft-linked items.

Point-Biserial Correlation

Pope and Harley were also critical of our interpretation of PBCC, particularly in drawing conclusions about item reliability. Their discomfort with our use of PBCC as an index of item reliability reveals their lack of breadth and depth in their literature review in this area. Although typically used as an item discrimination index, PBCC is an item-total correlation, and hence, the square of it (PBCC-squared) is a measure of the proportion of variance in total test score that is predictable by an item. Alternatively phrased, it is an index of consistency between an item and the total test score. This is why a strong relationship exists between PBCC and alpha or KR-20 (Traub, 1994, pp. 101-107). This approach has been used elsewhere (Chow, Russell, & Traub, 2000).

Soft-Linked Items, Computerized Adaptive Testing Computerized adaptive testing is a more commonly used term [1] for Computer-adaptive testing. , and Item Response Theory Item response theory is a body of theory used in the field of psychometrics. Pychometrics is concerned with the theory and technique of educational and psychological measurement.  (IRT IRT Item Response Theory
IRT In Regard To
IRT Incident Response Team
IRT In Reference To
IRT In Regards To
IRT Icing Research Tunnel (wind tunnel)
IRT Interborough Rapid Transit
)

Another confusion of Pope and Harley' s paper is how soft-linked items and computers fit together. It was fairly obvious in the original article that we were referring to computerized scoring of soft-linked items, rather than computer-administered testing, or computer adaptive testing (CAT) (e.g. the title of the paper was "soft-linked scoring algorithms In statistics, Fisher's Scoring algorithm is a form of Newton's method used to solve maximum likelihood equations numerically. Sketch of Derivation
Let be random variables, independent and identically distributed with twice differentiable p.d.f.
"). After all, it is the computer scoring algorithm that makes linked items soft-linked; hence the name. Since we have established that soft-linked items are independent, it would make no difference whether the test administration was paper and pencil or computerized. The only stipulation An agreement between attorneys that concerns business before a court and is designed to simplify or shorten litigation and save costs.

During the course of a civil lawsuit, criminal proceeding, or any other type of litigation, the opposing attorneys may come to an agreement
 is that the scoring be done by computer. Whether or not soft-linked items might fit into a CAT framework is completely determined by the dimensionality of the test itself. We did factor analyze Verb 1. factor analyze - to perform a factor analysis of correlational data
factor analyse

analyse, analyze - break down into components or essential features; "analyze today's financial market"
 the data and the assumption of unidimensionality could not be fulfilled for these chemistry tests.

Pope and Harley mentioned that "The most fundamental assumption of item response theory (IRT) is that of local independence and the related assumption of unidimensionality .... If local independence holds true then the related assumption of unidimensionality will also hold true in that all items are measuring only one latent variable In statistics, Latent variables (as opposed to observable variables), are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed and directly measured.  or dimension (Lord, 1980; Hambleton, Swaminathan, & Rogers, 1991)." It seems that Pope and Harley simply copied these statements verbatim ver·ba·tim  
adj.
Using exactly the same words; corresponding word for word: a verbatim report of the conversation.

adv.
 from textbooks without actually understanding what they put down. The concept of unidimensionality and local independence beg for some clarifications here lest the reader be led down the path of confusion.

Since 1968 most adaptive testing research has used IRT as a psychometric basis. Within the context of IRT, different models for dichotomously di·chot·o·mous  
adj.
1. Divided or dividing into two parts or classifications.

2. Characterized by dichotomy.



di·chot
 scored items have been proposed and implemented. These models specify the mathematical form of the regression of Pi(_), the probability of responding correctly to item "i" given ability _, on _, the latent ability continuum (see Lord & Novick, 1968, Birnbaum, 1968). Different parameter and ability estimation procedures have also been proposed to correct for guessing (e.g., Chow, 1987).

The basic principle of IRT (also called latent trait theory Trait theory is an approach to personality theory in psychology.

The emotions, thoughts and behavior patterns that a person has are typically referred to as a personality (Kassin, 2003) and can vary immensely between individuals.
) is that for any fixed values of the latent traits the observed variables are mutually statistically independent; this has come to be known as the principle of local independence (Lazarsfeld, 1950, Lazarsfeld & Henry, 1968). Local independence requires that any two items be uncorrelated when _ is fixed. However, as Lord (1980) pointed out, "local independence follows automatically from unidimensionality. It is not an additional assumption" (p.19) as the way understood by Pope & Harley.

In most existing research on computerized adaptive testing, it is assumed that the test is unidimensional u·ni·di·men·sion·al  
adj.
One-dimensional.

Adj. 1. unidimensional - relating to a single dimension or aspect; having no depth or scope; "a prose statement of fact is unidimensional, its value being measured wholly in terms
_that the item pool is designed to measure only a single common dimension of ability. Unidimensionality is the prerequisite for the use of IRT in adaptive testing. As Lord (1980, p.21) lamented la·ment·ed  
adj.
Mourned for: our late lamented president.



la·mented·ly adv.
 that "there is no generally valid statistical test to determine whether a set of test items is strictly unidimensional." Lord proposed a "rough" procedure, in which the size of the latent roots of the tetrachoric item intercorrelation matrix are compared to see if there is one dominant factor. Green et al. (1984) suggested that "a single factor that accounts for 70% of the total common variance is probably strong enough evidence for unidimensionality; one that accounts for less than 50% probably signals the use of subtests ..." (p. 351). Our factor analyses Verb 1. factor analyse - to perform a factor analysis of correlational data
factor analyze

analyse, analyze - break down into components or essential features; "analyze today's financial market"
 on the data obtained from those chemistry tests resulted in many factors with eigenvalues eigenvalues

statistical term meaning latent root.
 larger than one and besides, none of these factors could account for more than 50% of the variance in the data. We therefore decided to abandon the use of IRT to analyze the data and turned to good old classical test theory.

It is reassuring to know that Lord (1980), the founding father of modern IRT theory and CAT, specifically pointed out in no unequivocal terms that "We can easily imagine tests that are not [unidimensional]. An achievement test in chemistry might in part require mathematical training or arithmetic skill and in part require knowledge of nonmathematical facts" (p.20).

Conclusion

Contrary to Pope and Harley's belief, soft-linked items are independent, and thus, are not a problem for either classical test theory or IRT. Further, given the nature of hard- and soft-linked items, it is not simply "common sense" to predict the magnitudes of their PBCCs.

The main purpose of creating the soft-linked items was that they could be scored independently. This method of scoring linked items makes for an attractive item format to assess higher cognitive skills cognitive skill Psychology Any of a number of acquired skills that reflect an individual's ability to think; CSs include verbal and spatial abilities, and have a significant hereditary component  without the statistical pitfalls inherent in the conventional scoring of linked items. Compared to hard-linked items, soft-linked items are a better measure of achievement status. We stand behind the soft link approach and believe that it offers substantial benefits for assessment.

References

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.

Chow, P. (1987). A simulation study of respondent and parameter-induced bias in computerized adaptive testing. Unpublished doctoral dissertation, University of Toronto Research at the University of Toronto has been responsible for the world's first electronic heart pacemaker, artificial larynx, single-lung transplant, nerve transplant, artificial pancreas, chemical laser, G-suit, the first practical electron microscope, the first cloning of T-cells, .

Chow, P., Russell, H., and Traub, R. E. (2000). Expertise sensitive item selection. Psychological Reports, 87, 791-801.

Green, B. F., Bock Noun 1. bock - a very strong lager traditionally brewed in the fall and aged through the winter for consumption in the spring
bock beer

lager beer, lager - a general term for beer made with bottom fermenting yeast (usually by decoction mashing); originally
, R. D., Humphreys, L. G., Linn linn  
n. Scots
1. A waterfall.

2. A steep ravine.



[Scottish Gaelic linne, pool, waterfall.]
, R. L., & Reckase, M. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21(4). 347-360.

Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In S. A. Stouffer et al. (Eds.), Measurement and Prediction. NJ: Princeton University Princeton University, at Princeton, N.J.; coeducational; chartered 1746, opened 1747, rechartered 1748, called the College of New Jersey until 1896. Schools and Research Facilities
 Press.

Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton Mifflin Houghton Mifflin Company is a leading educational publisher in the United States. The company's headquarters is located in Boston's Back Bay. It publishes textbooks, instructional technology materials, assessments, reference works, and fiction and non-fiction for both young readers .

Loerke, D. R. B., Jones. M. N., & Chow, P. (1999). Psychometric benefits of soft-linked scoring algorithms in achievement testing. Education, 120, 273-280.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N. J.: Lawrence Erlbaum Associates.

Nunnally, J. C. & Bemstein, I. H. (1994). Psychometric theory (3rd ed.). New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
: McGraw-Hill.

Pope, G. A., & Harley, D. (in press). A reply to Loerke, Jones, and Chow (1999) on the "psychometric benefits" of linked items. Instructional Psychology,

Traub, R. E. (1994). Reliability for the social sciences: Theory and applications. Thousand Oaks Thousand Oaks, residential city (1990 pop. 104,352), Ventura co., S Calif., in a farm area; inc. 1964. Avocados, citrus, vegetables, strawberries, and nursery products are grown. , CA: Sage.

Dr. Chow, Faculty, Nipissing University Nipissing University is a small liberal arts university located in North Bay, Ontario, Canada, on a 720 acre (2.9 km²) farm site overlooking Lake Nipissing. The university's unique character is defined by its location in Northern Ontario, a large faculty of education program with , North Bay, Ontario North Bay (, time zone EST) is a city in Northeastern Ontario, Canada (2006 population 53,966). North Bay takes its name from its position on the shore of Lake Nipissing, and covers an area of 314.92 km² (121. , Canada. Dr. Jones, Department of Psychology, Queen's University Queen's University, at Kingston, Ont., Canada; nondenominational; coeducational; founded 1841 as Queen's College. It achieved university status in 1912. It has faculties of arts and sciences, education, law, medicine, and applied science, as well as schools of , Kingston, Ontairo. Mr. Loerke, Spruce Grove Composite High Shcool, Spruce Grove, Alberta Spruce Grove is a city in the vicinity of Edmonton, Alberta. Its population is over 19,000. Like nearby Stony Plain it is surrounded by Parkland County.

Spruce Grove is a vibrant city (11th largest in Alberta), upheld by city council as having plenty of business and housing
, Canada.

Correspondence concerning this article should be addressed to Dr. Chow, 100 College Drive, Box 5002, North Bay, Ontario Canada P1B 8L7.
COPYRIGHT 2002 George Uhlig Publisher
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Loerke, Donald R.B.
Publication:Journal of Instructional Psychology
Geographic Code:1USA
Date:Sep 1, 2002
Words:2193
Previous Article:A reply to Loerke, Jones, and Chow (1999) on the "Psychometric benefits" of linked items.
Next Article:An exploratory study of academic goal setting, achievement calibration and self-regulated learning.
Topics:



Related Articles
Patient/Client Satisfaction.(Letter to the Editor)
To the Editors.(letters)
From Tony Moran. (Letters to the Editor).
Errata.
A reply to Loerke, Jones, and Chow (1999) on the "Psychometric benefits" of linked items.
Why was Pope John XXIII so influential? (Glad you asked: Q&A on church teaching).
LEATHER AND LACE CLUB TO PRESENT MOTORCYCLE FASHIONS.(News)
TUNISIA - The Global Petroleum Perspective.
Measuring masculine body ideal distress: development of a measure.(body satisfaction-analysis)
Using psychometric tests.(for employee selection)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles