Policy statement aims to halt missteps in the quest for certainty.
Misuse of the P value — a common test for judging the strength of scientific evidence — is contributing to the number of research findings that cannot be reproduced, the American Statistical Association (ASA) warns in a statement released today^{1}. The group has taken the unusual step of issuing principles to guide use of the P value, which it says cannot determine whether a hypothesis is true or whether results are important.
This is the first time that the 177yearold ASA has made explicit recommendations on such a foundational matter in statistics, says executive director Ron Wasserstein. The society’s members had become increasingly concerned that the P value was being misapplied in ways that cast doubt on statistics generally, he adds.
In its statement, the ASA advises researchers to avoid drawing scientific conclusions or making policy decisions based on P values alone. Researchers should describe not only the data analyses that produced statistically significant results, the society says, but all statistical tests and choices made in calculations. Otherwise, results may seem falsely robust.
Véronique Kiermer, executive editor of the Public Library of Science journals, says that the ASA’s statement lends weight and visibility to longstanding concerns over undue reliance on the P value. “It is also very important in that it shows statisticians, as a profession, engaging with the problems in the literature outside of their field,” she adds.
Weighing the evidence
P values are commonly used to test (and dismiss) a ‘null hypothesis’, which generally states that there is no difference between two groups, or that there is no correlation between a pair of characteristics. The smaller the P value, the less likely an observed set of values would occur by chance — assuming that the null hypothesis is true. A P value of 0.05 or less is generally taken to mean that a finding is statistically significant and warrants publication. But that is not necessarily true, the ASA statement notes.
A P value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a P value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on patients’ blood glucose levels without having a therapeutic effect.
Giovanni Parmigiani, a biostatistician at the Dana Farber Cancer Institute in Boston, Massachusetts, says that misunderstandings about what information a P value provides often crop up in textbooks and practice manuals. A course correction is long overdue, he adds. “Surely if this happened twenty years ago, biomedical research could be in a better place now.”
Frustration abounds
Criticism of the P value is nothing new. In 2011, researchers trying to raise awareness about false positives gamed an analysis to reach a statistically significant finding: that listening to music by the Beatles makes undergraduates younger^{2}. More controversially, in 2015, a set of documentary filmmakers published conclusions from a purposely shoddy clinical trial — supported by a robust P value — to show that eating chocolate helps people to lose weight. (The article has since been retracted.)
But Simine Vazire, a psychologist at the University of California, Davis, and editor of the journal Social Psychological and Personality Science, thinks that the ASA statement could help to convince authors to disclose all of the statistical analyses that they run. “To the extent that people might be sceptical, it helps to have statisticians saying, ‘No, you can’t interpret P values without this information,” she says.
More drastic steps, such as the ban on publishing papers that contain P values instituted by at least one journal, could be counterproductive, says Andrew Vickers, a biostatistician at Memorial Sloan Kettering Cancer Center in New York City. He compares attempts to bar the use of P values to addressing the risk of automobile accidents by warning people not to drive — a message that many in the target audience would probably ignore. Instead, Vickers says that researchers should be instructed to “treat statistics as a science, and not a recipe”.
But a better understanding of the P value will not take away the human impulse to use statistics to create an impossible level of confidence, warns Andrew Gelman, a statistician at Columbia University in New York City.
“People want something that they can’t really get,” he says. “They want certainty.”

Nature 531, 151 (10 March 2016)

doi:10.1038/nature.2016.19503