Detecting differential person functioning in emotional intelligence.
Tests and testing play a major role in today's society. Therefore, it is extremely important that test developers and users strive to insure the validity of their tests and instruments for the target population and purposes for which they were designed for. Therefore, many writers, investigators, and researchers have addressed the notion of test\ or item bias.
Many definitions of item bias have been offered. Runder (1978) suggested that a biased item is one that "behaves differently for members of two different culture groups" (p.33). Angoff (1993) explains that "statistically biased" means a tendency for an estimate to deviate in one direction or another from a true value. It means that, whenever systematically inaccurate estimation occurs, bias is there. However, he believes that "there should be an educational and psychological rational for deciding that a statistically biased item is indeed biased" (p. 114). O'Neal (1991) argues that empirical and statistical indices of bias do not necessarily indicate bias as equivalent to unfairness. Unfair advantages of an item could arise in subpopulation (e.g., males & females) when compared to another subpopulation. This unfair advantage will exist if within the subpopulations both have equal standing on the construct\ or trait being measured. This means that irrelevant sources of variation are not distributed similarly for both subpopulations (Cromwell, 2002). In other words, even when an item shows a significant statistical bias, it might be still judged as a fair item depending on the purpose of the test unless a criterion of overall measure is used to control for the compared subpopulations (Angoff, 1993).
To come over this argument, the term Differential item functioning (DIF) was introduced to mean that an item is functioning differentially (i.e., presents a real and clear bias) if showed different statistical properties in different group settings provided that similar ability levels were controlled while comparing all groups.
In DIF analysis, a data matrix resulting usually from different assessment situations consists of many persons or cases (i. e., the column variable) responding to many test items (i.e., the raw variable). For a particular person, his score on each item is assumed to reflect his\her standing point on the scale of the trait measured by that item. In other words, persons' abilities are being rated by items. But, is it possible to draw the same logic for the opposite case if the same typical data matrix was transposed? Can we think of items being rated (agreed upon\ or persons' proportion correct) by persons.
In short, an item shows DIF when it functions differentially between matched subgroups of persons. An item in a mathematical test, for example, functions differentially for two classes a & b if the item is easier for one class than the other when comparing between subgroups of the same ability level (i.e., total test score) from both classes. Such item functioning behavior is referred to as DIF. Similarly, a person may function differently on two groups of items after controlling for an overall measure. Such person functioning behavior is referred to as differential person functioning or DPF (Johanson & Alsmadi 2000; Alsmadi, 1998).
In their study on DPF in Attitude assessment, Johanson & Osborn (2000) presented an example of DPF with positively and negatively phrased item format. They labeled this as acquiescence.
A search of the Educational Resources information Center (ERIC) and Psychology and Behavioral Sciences Collection electronic databases using a combination of words "differential person functioning" presented three entries only. Nevertheless, when the term "emotional Intelligence" added to the combination, the search failed to turn up any entries.
Therefore, the purpose of this paper is to introduce and encourage the use of DPF analysis in tests of emotional or cognitive intelligence as a diagnostic approach tests by illustrating the presence of DPF examples.
The Mayer, Salovey, & Caruso Emotional Intelligence Test, 2002 (MSCEIT) was translated, judged, and applied in Jordan (Mawajdeh, 2004). MSCEIT was administered as a part of a master thesis and the researcher himself allowed the permission for data analysis. MSCEIT is consisted of 141 multiple-choice items distributed over 8 different tasks. (Faces, pictures, facilitating, sensation, changes, blends, emotional management, and emotional relations). Items of every two tasks are formulating one of four areas (perceiving emotions, facilitating thought, understanding emotions, and managing emotions). The test also measure two domains: the experiential domain and the strategic domain. Each domain is consisted of the items of two areas. Total scores computation is possible at the task, area, domain, and overall levels.
Mantel-Haenszel (MH) procedure
For DPF analysis, in MH chi-square procedure requires that all items were regrouped into 3 to 5 groups as suggested by Sceenuman's chi- square procedure since recommendations about number of groups are not widely available in the literature (O'Neal, 1991). The procedure requires a 2x2 contingency table to be created for each person at each level of the criterion groups. Cell values within each table are the sum of person's scores on items in each subgroup (the focal and reference groups) at each criterion level.
Nichols (1994) proposed a macro for SPSS that produces an estimator of the common odds ratio and MH chi-square test for the null hypothesis that the odds ratio is 1. It also produces the level of significance for the chi-square statistical test. This macro was used in this study.
150 persons responded to the MSCEIT which has 141 items with 5 different possible responses. Each response was given a score based on the selection proportion. The median of each item was used as a criterion to recode responses into a binary coding.
For DPF analysis, the data matrix was transposed so the persons became the columns (i.e., variables) and items became the rows. Then, two variables were created: first, the variable "type" was created to represent both focal group (all items of the experiential domain) and reference group (all items of the strategic domain). Second, the variable "itemgrop" was created by recoding items' overall score achieved by all persons into 5 interval groups. Interval length was computed by dividing the range by 5.
The MH common odds ratio was calculated along with the MH chi-square statistic for each person. The results showed significant DPF (p<0.05) for 20 persons (I. e., 13%, N=150). Out of these 20 cases, 13 persons (i. e., 65%) were showing significant DPF favoring the experiential domain. Whereas 7 persons (i. e., 35%) were showing DPF in favor of the strategic domain.
Person 139, has a mean of 1.65 and 1.49 for experiential and strategic domains respectively, with an observable difference (0.16). MH analysis showed significant (odds-ratio=0.3; p=0.007). Figure 1. illustrates this person's performance on the similar item proportion agreement categories of focal (experiential) and reference (strategic) groups of items. For this person, the superiority of the experiential domain is clear.
[FIGURE 1 OMITTED]
On the other hand, person 144, has a me an of 1.32 and 1.68 for strategic and experiential domains respectively, with an observable difference (0.36). MH analysis showed significant (odds-ratio=4.1; p=0.0002). Figure 2. illustrates this person's performance. Unlike the previous case, the superiority of the strategic domain is clear.
As mentioned earlier, simple observable difference might not be always an evidence of DPF existence and may rather considered impact. As illustrated in figure 3., person 14 might be a good example showing impact.
This Person has a mean of 1.60 and 1.67 for experiential and strategic domains respectively, with an observable difference (0.07). MH analysis showed significant (odds-ratio=4.1; p=0.0002).
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
As illustrated by examples, DPF is very likely to exist within data of tests measuring different domains of a particular trait or ability. The practical implications of these findings seem clear. First, practitioners should be aware to differentiate between person impact and DPF. Simple general observable difference in person's performance on focal group items and reference group items if items were not matched is called impact. Impact is usually expected. However, when that difference stands over most of\ or all similar groups of items from both focal and reference groups, then this difference is an evidence of DPF. Second, presented examples showed that whenever scale items are consisting different domains of the trait, DPF is possible. However, different instances of DPF might not be of similar diagnosis and should not be treated similarly. Finally, if DIF analysis is very important in test development because it is a serious threat to the validity, analyzing for DPF is very important procedure provides more precise information about person's performance for diagnostic purposes.
Alsmadi, A. (1998). Differential Person Functioning. (Doctoral Dissertation, Ohio University, 1998).
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & Winer (EDs.); Differential item functioning. (pp. 3-25). Hillsdale, NJ: Lawrence Erlbaum Associates.
Cromwell, S. (2002). A prime on ways to explore item bias. A paper presented at the annual meeting of the Southwest Educational Research Association in Austin, Tx.
Johanson, G. A. & Alsmadi, A. (1998. Differential Person Functioning. A paper presented at the annual meeting of the American Educational Research Association in San Diego, CA.
Johanson, G. A. & Osborn, C. J. (2000). Acquiescence as Differential Person Functioning. A paper presented at the annual meeting of the American Educational Research Association in New Orleans, LA.
Mawajdeh, K. (2004). Mayer, Salovey, & Caruso Emotional Intelligence Test (MSCTIT) standardization on Mu'tah University students. Unpublished thesis. Mu'tah University. Jordan.
Nichols, D.P. (1994). The Mantel-Haenszel statistic for 2x2xk tables. Keywords: Tips and news for statistical software users, 54, 10-12.
O'Neal, M. R. (1991). Comparison of methods for detecting item bias. Lexington, KY: MD-South Educational Research Association. (ERIC Document Reproduction Services No. ED 340 735).
Rudner, L. M. (1978). Using Standard tests with the hearing impaired: The problem of item bias. The Volta Review, 80, 13-40.
Yahia M. Alsmadi, Ph.d, The University of Jordan. Abdalla A. Alsmadi, Ph.D., Mu'tah University.
|Printer friendly Cite/link Email Feedback|
|Author:||Alsmadi, Yahia M.; Alsmadi, Abdalla A.|
|Publication:||Journal of Instructional Psychology|
|Date:||Dec 1, 2009|
|Previous Article:||An evaluation of the aims of the faculty school partnership sheme by mathematics student teachers and their mentors.|
|Next Article:||The effect of perspective on misconceptions in psychology: a test of conceptual change theory.|