An attempt to reproduce 100 psychology studies published in leading journals couldn't match the results in almost two thirds of cases. The findings raise serious questions about the rigor of research in the field, as well as how other areas of science would stack up under the same test.
Reproducibility is considered a core feature of science. Exceptions are tolerated for observations of fleeting natural phenomena, but these don’t apply to the 160 psychology papers published in 2008 in three leading journals, Psychological Science, Journal of Experimental Psychology: Learning, Memory and Cognition and the Journal of Personality and Social Psychology.
The Center for Open Science (COS) set out to see how reliable the results from these papers were. They invited psychology researchers worldwide to claim a paper and attempt to replicate the results. Dr Patrick Goodbourn was one of those taking part. Like others, Goodbourn chose a paper that suited his experience, sought details from the original authors and set out to see if the results could be repeated. Funding grants were available from COS and a philanthropic organization.
Goodbourn told IFLScience, “The year 2008 was chosen because it was recent enough that most original authors still had their dataset, but enough time has elapsed that we could see how influential these papers have been.” However, rather than picking only the most cited papers, as some projects have done, Goodbourn said all 2008 papers were included 'to eliminate bias.'”
“A few of the papers relied on historical events or would have taken too long to replicate, but most of them were suitable for testing,” Goodbourn said. Outcomes from the first 100 tested have been published in Science.
Rather than repeat the studies with the same sample sizes, “We looked at the effect size reported in the original paper and chose samples that should have had a 95% probability of achieving statistical significance with that effect size,” said Goodbourn.
Nevertheless, just 36 of the replication efforts achieved the benchmark of statistical significance of having a less than 5% chance of being incorrect. The authors report that three other teams got results “subjectively rated to have replicated the original result.”
Goodbourn told IFLScience, “We each got acquainted with the original dataset, and no one has raised any evidence of fraud.” Instead he thinks psychologists are setting the bar too low for what is considered rigorous enough to publish. In some cases subtle differences in the conduct of the studies have been blamed. However, Goodbourn said, “None of the differences looked likely to be important prior to attempting replication.”
Goodbourn himself chose a cognitive psychology study looking at the “repetition effect” which did not achieve statistical significance. He told IFLScience. “I have no doubt the effect they found is real, but I think it is much weaker than the original study reported.” In general Goodbourn noted, cognitive psychology studies held up better than those in social psychology, which he attributed to “many of the effects in social psychology being much more subtle.”
This is the largest attempt at study replication ever conducted, although COS is engaged in one examining 50 cancer research papers. Goodbourn hopes scientists in other fields will follow suit.
If psychologists are feeling humbled, they can at least take comfort that they're better than climate change contrarians. On the same day as Science announced the COS paper, Theoretical and Applied Climatology published a paper investigating the 38 most prominent papers of recent years claiming to contradict the theory greenhouse gasses are warming the Earth. Not one of these papers proved replicable, most often because the original relied on an unrepresentative period of climate data or already discredited assumptions.