GUARDING AGAINST SIMPSON'S PARADOX WHEN COMBINING DATA SETS.
The anomaly caused by lurking or confounding variables was initially described by E. H. Simpson (1951) in a paper titled, "The Interpretation of Interaction in Contingency Tables" in the Journal of the Royal Statistical Society. Blyth (1972) was the first to call this difficulty "Simpson's paradox" (Pearl, 2016). Bickel, Hammel, and O'Connell (1975) described a classic case of the paradox revolving around graduate admissions at Berkley and perceived gender bias. Over 40 years after the Berkley study, the Simpson's paradox is still important for researchers to be alert about when combining data sets.
For one example, one of the authors recently conducted a research project over a two-semester time-period. For inclusion in the same study, five students qualified during the first semester and 15 students qualified during the following semester. The researcher intended to combine the data sets from the two semesters, but needed to be mindful of the potential for confounding or lurking variables. Lurking variables, while not part of the focus of the study, could potentially cause the combined data to indicate an inverse relationship from what might be expected when the two original datasets are analyzed separately. A colleague teaching in the College of Nursing and Human Services was intrigued by this discussion and noted similar difficulties in her own and her students' research.
The nursing professor noted that Simpson's paradox also occurred within the field of nursing and patient care research (Norton & Divine, 2015). For example, 146 patients were seen for new onset of upper respiratory symptoms at a variety of clinics. In the clinics, the patients were either prescribed or not prescribed antibiotics (ATB) for their symptoms. In three of the clinics, 104 patients did not receive antibiotics; however, in two clinics, 42 patients did receive an antibiotic. In the two clinics where ATB treatments were given, the patients appeared to improve with 120 return visits whereas patients who did not receive ATB treatments had a return rate of 200 visits during the first week. In this example, when the individual clinics were compared, they had a positive correlation between the treatment and return visits of patients. However, when the groups were combined for treatment and return visits, the association disappeared because of a confounding factor related to the clinic settings (urban versus rural). The caution to the healthcare researcher and others is to realize that such confounding variables may be present when data sets are combined, requiring further reflection by the researcher.
Thus, all researchers must be prepared for the phenomenon of Simpson's paradox that occurs during data aggregation because the paradox holds an important and concerning effect on data synthesis and data analysis. Confounding variables have the potential to skew or distort statistical study findings, leading to the report of erroneous results (Norton & Divine, 2015). Subgroups of data may trend in one direction while the aggregate group data will trend in a different direction (Bracey, 2004; Smith & Glotz, 2012; Norton & Divine, 2015). Researchers and educators are therefore obligated to inform and instruct peers and students about appropriate statistical methods to ensure that confounding factors are not hidden or disguised thereby altering outcomes (Smith & Glotz, 2012). It is thus crucial to be a reflective practitioner, as sometimes things are not what they seem!
Bickel, P. J., Hammel, E. A., & O'Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkley. Science, 187, 398-104.
Blyth, C. R. (1972). On Simpson's paradox and the sure-thing principle. Journal of the American Statistical Association, 67(338), 364-366.
Bracey, G. W. (2004). Simpson's Paradox and other statistical mysteries. American School Board Journal, 191(2), 32-34.
Demers, S., & Rossmo, D. K. (2015, July). Simpson's paradox in Canadian police clearance rates. Canadian Journal of Criminology and Criminal Justice, 424-434.
Fu, P., Panneerselvam, A., Clifford, B., Dowlati, A., Ma, P. C., Zeng, G., Halmos, B. & Leidner, R. S. (2015). Simpson's paradox--aggregating and partitioning populations in health disparities of lung cancer patients. Statistical Methods in Medical Research, 24(6), 937-948.
Galipaud, M., Bollache, L., Wattier, R., Dubreuil, C., Dechaume-Moncharmont, F., & Lagrue, C. (2015). Overestimation of the strength of size-assortive pairing in taxa with cryptic diversity: A case of Simpson's paradox. Animal Behaviour, 102, 217-221.
Norton, H. J., & Divine, G. (2015). Simpson's paradox... and how to avoid it. The Royal Statistical Society, 12(4), 40-43.
Pearl, J. (2016). The sure-thing principle. UCLA Cognitive System Laboratory, Technical Report R-466. Retrieved March 14, 2016, from http://ftp.cs.ucla.edu/pub/stat_ser/r466.pdf
Simon, D., & Rossmo, D. K. (2015). Simpson's paradox in Canadian police clearance rates. Canadian Journal of Criminology and Criminal Justice, 57(3), 424-434.
Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society, 13(2), 238-241.
Smith, M. L., & Goltz, H. H. (2012). What is hidden in my data? Practical strategies to reveal Yule-Simpson's paradox and strengthen research quality in health education research. Health Promotion Practice, 13(5), 637-641.
|Printer friendly Cite/link Email Feedback|
|Author:||Cohen, Bonni S.; Moch, Peggy L.|
|Publication:||Curriculum and Teaching Dialogue|
|Date:||Jan 1, 2017|
|Previous Article:||NEGOTIATING CURRICULUM-BASED TEACHER LEADERSHIP.|
|Next Article:||TRANSITIONING TO CONCEPT-BASED CURRICULUM AND INSTRUCTION.|