Time to move from significance testing to estimation

A Research Note published in the latest issue of the Journal of Physiotherapy argues the case for moving away from significance tests and hypothesis tests in health research. The central reason is that p-values and claims of statistical significance (that is, the products of null hypothesis statistical tests), have some inherent flaws and are often misused and misinterpreted. The problems are difficult to describe succinctly. The Research Note addresses each of them under the headings: p-values do not indicate the probability that a hypothesis is true (or not), p-values are not evidence, significance findings are not replicable, and the null hypothesis is false in most clinical research.

For a long time, leading statisticians have argued that the concept of statistical significance should be abandoned. However, researchers in laboratory and clinical settings have continued to use null hypothesis statistical tests – presumably because it was what they are taught, it is what many journals expect, and because they were unaware of the benefits of alternative approaches to analysis. This year, however, articles in The American Statistician and Nature have strongly recommended that it is time to stop using ‘statistically significant’ and related terms.

One widely recommended alternative to significance tests and hypothesis tests in randomised controlled trials is to report the size of the effect (or point estimate) and the precision of the effect (or confidence interval). Trialists could then interpret the size of the point estimate, that is, is the point estimate large enough to be clinically worthwhile. The lower and upper values for the confidence interval can then be considered in the same way. For example, if both the lower and upper values for the confidence interval are large enough to be clinically important, the trial provides a clear answer.

The migration to confidence intervals has already begun in many journals. The proportion of physiotherapy trials that are using confidence intervals instead of (or as well as) reporting statistical significance and p-values has been increasing steadily over the past few decades. The migration from p-values to confidence intervals is more common among higher quality trials. This increases the need for physiotherapists to understand confidence intervals if they are to keep abreast of the available evidence.)

Stopping using ‘statistically significant’ and related terms has implications for many groups. These include journal editors and editorial policies, reporting checklists (eg, CONSORT Checklist), and critical appraisal tools that include a reporting component (eg, PEDro scale). A group of member journals of The International Society of Physiotherapy Journal Editors will soon be releasing their new policy on this issue. We will keep PEDro users informed of developments in this area.

Please consider reading the Research Note, which is freely available in full text via the link below, to ensure you understand the reasons for this shift in the approach to statistical analysis.

Herbert R. Research note: significance testing and hypothesis testing: meaningless, misleading and mostly unnecessary. J Physiother 2019;65(3):178-81

Sign up to the PEDro Newsletter to receive the latest news