Psychology's replication crisis debated: analyses of reproducibility review reach conflicting conclusions.
The original investigation of 100 studies contained key errors, contend Daniel Gilbert, a Harvard University psychologist, and colleagues. After correcting for those errors, the effects reported in 85 of those studies appeared in replications conducted by different researchers. So an initial conclusion that only 35 studies generated repeatable findings was a gross underestimate, Gilbert's team reports in the March 4 Science.
"There's no evidence for a replication crisis in psychology," Gilbert says.
Psychologist Brian Nosek of the University of Virginia in Charlottesville and other members of the group that conducted the original replication study (SN: 10/3/15, p. 8) reject Gilbert's analysis. The 2015 report provides "initial, not definitive evidence" that psychology has a reproducibility problem, the group writes in a response published in the same issue of Science.
"The very best scientists cannot really agree on what the results of the most important paper in the recent history of psychology mean," says Stanford University epidemiologist John Ioannidis. Researchers' assumptions and expectations can influence their take on any results, "no matter how clear and strong they are," he says.
The details of many repeat studies in the 2015 paper differed dramatically from initial studies, stacking the deck against achieving successful replications, Gilbert says. Replications often sampled different populations, such as substituting native Italians for Americans in a study of attitudes toward black Americans.
Many studies also altered procedures. One replication effort gave older children the relatively easy task of locating items on a small computer screen, whereas the original study gave younger children a harder task of locating items on a large computer screen.
Repeat studies also generally included too few volunteers to make a statistically compelling case that a replication had succeeded or failed, Gilbert says. Another problem was that each original study was replicated only once. Multiple repeats of a study balance out differences in study procedures and increase the number of successful replications, the scientists argue.
In a replication analysis that often amounted to a comparison of apples and oranges, at least 34 replication studies should have failed by chance, assuming all 100 original studies described true effects, Gilbert and colleagues estimate. That makes the new estimate of 85 successful replications even more impressive, they say.
Nosek's group calculates that only about 22 replication attempts in the 2015 study should have failed by chance. Tellingly, Nosek says, even successful replications found weaker statistical effects than the original studies had. Published studies make statistically significant findings look unduly strong, he says. Journals usually don't publish replication failures and many researchers simply file them away.
A separate analysis of Nosek and his group's work suggests that initial study samples need to be beefed up before any conclusions can be made about the durability of psychology results. Failures to replicate in the 2015 investigation largely occurred because many original studies contained only enough participants to generate weak but statistically significant effects, two psychologists assert online February 26 in PLOS ONE. Journals' bias for publishing only positive results also contributed to replication failures, report Alexander Etz, at the University of Amsterdam at the time of the study, and Joachim Vandekerckhove of the University of California, Irvine.
Etz and Vandekerckhove statistically analyzed 72 papers and replication attempts from Nosek's project. Only 19 original studies contained enough volunteers to yield a strong, statistically significant effect. That's not enough adequately sized studies to generalize about the state of replication in psychology, the researchers say.
Researchers in psychology and other fields need to worry less about reproducing statistically significant results and more about developing theories that can be tested with a variety of statistical approaches, argues psychologist Gerd Gigerenzer of the Max Planck Institute for Human Development in Berlin. Statistical significance expresses the probability of observing a relationship between two variables--say, a link between a change in the wording of a charitable appeal and an increase in donations--assuming from the start that no such relationship exists. But researchers rarely test any proposed explanations for statistically significant results.
Pressures to publish encourage researchers to tweak what they're studying and how they measure it to ensure statistically significant results, adds Gigerenzer. Journals need to review study proposals before any experiments are run to discourage such "borderline cheating," he recommends.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||HUMANS & SOCIETY|
|Date:||Apr 2, 2016|
|Previous Article:||Jupiter could have formed near sun: scenario would explain why inner solar system has just 4 planets.|
|Next Article:||Tipping point for ice sheet looms: Antarctica's past may be a guide to future melting.|