Hold on to your hats, quantitative researchers.
Last week, a trio of scientists wrote a piece in the Comments section of Nature arguing that the concept of statistical significance should be retired. Over 800 researchers have added their names as signatories to the proposal, and the internet has been ablaze with debate.
At the moment, this is just a proposal; one set of voices within a wider conversation about how statistics is done. There are counter-arguments, too. But if this is a sign of things to come, we could be about to witness a major revolution in statistical research.
Statistical significance has been in use for 300 years, and in widespread use for the past hundred. For the uninitiated, here’s how it works. In a court of law (in New Zealand, at least) a defendant is considered innocent until proven guilty beyond a reasonable doubt. Statisticians use similar logic. An association between two phenomena is considered to be absent (this is the ‘null hypothesis’) until proven present to a statistically significant degree.
The ‘statistically significant’ part is a way of assessing the effect of sampling error. When proving or disproving the null hypothesis with data, it is possible to come up with the wrong answer simply because the data from that particular sample wasn’t a good representation of the whole population. Statistical significance applies ‘odds’ to that possibility.
Researchers will typically define a required significance level (α) before collecting data. That significance level represents the probability of the study falsely rejecting the null hypothesis. (In other words, the odds of finding an association where none exists.) The required significance level will vary from study to study, but common choices are 5% or 1%.
Once the data has been collected and analysed, and results calculated, researchers figure out a p-value. That value represents the probability that, assuming the null hypothesis is true, you would observe a result at least as extreme as in your sample. If the p-value is less than or equal to the pre-defined significance value, you reject the null hypothesis.
Part of the problem, though, is in that word ‘significance.’ In everyday language, it’s roughly synonymous with ‘importance.’ In statistics, it’s not.
The authors and signatories of this latest article in Nature argue that statistical significance creates misconceptions. A statistically non-significant result, they argue, is often interpreted as proof of no effect whereas it may only indicate that we can’t be 100% confident of some effect.
Put it this way: I can’t be 100% confident that the six cups of coffee I drank yesterday were what kept me awake until 2am, but that doesn’t mean that they played no part. It’s this potential for misinterpretation, some argue, that could make statistical significance an unhelpful concept.
It’s important to note that this is not the end of P values, and it’s not the end of measures to test the rigour of results. While the authors of the article are calling for an end to the use of statistical significance, they still support the use of P values in other ways:
“We are not calling for a ban on P values. Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-control standard). And we are also not advocating for an anything-goes situation, in which weak evidence suddenly becomes credible. Rather, and in line with many others over the decades, we are calling for a stop to the use of P values in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis.
Amrhein, Greenland, & McShane (2019)
How will this affect you? Possibly not at all. It remains to be seen whether statistical significance will live on or die out. For now, nothing has changed; and the editors of Nature have noted that the processes for evaluating statistics in submitted papers is not under review.
But if you use statistics in your research, this is a great time to check in with your supervisors and reevaluate the ways in which you interpret (or plan to interpret) your results.
Reference
Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567, 305-7. doi:10.1038/d41586-019-00857-9 pmid:30894741
What kind of software to use statistical significance?