Layman’s abstract for paper on P value functions: An underused method to present research results and to promote quantitative reasoning

Every few days, we will be publishing layman’s abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.

The article featured today is from Statistics in Medicine and the full article, published in issue 38.21, is available to read online here.

Infanger, D, Schmidt‐Trucksäss, A. P value functions: An underused method to present research results and to promote quantitative reasoning. Statistics in Medicine. 2019; 38: 4189– 4197. doi: 10.1002/sim.8293

Null hypothesis significance testing (NHST) remains the most prevalent method of choice to assess the evidence of an effect in many fields of science, despite frequent criticism. In null hypothesis significance testing, researchers stipulate a null hypothesis – typically meaning “no effect” or “no association” – and then calculate a p-value to assess the evidence against this hypothesis. The p-value is a continuous measure expressing the probability of getting a result as extreme or even more extreme than that result obtained in the sample, assuming that the null hypothesis is true. In other words, p-values measure the compatibility between a specified model and the data. Smaller p-values indicate a lower degree of compatibility between the model and the data than larger p-values. In addition, researchers also set a significance level (usually at 0.05) that acts as a threshold for the p-value to assess the evidential claim of the effect. The obtained result is then deemed “significant” or “not significant”, depending on whether the p-value is below or above the significance level. Frequently, a non-significant result leaves researchers in an uncomfortable position because it makes it harder to publish them. Also, a result that fails to reach statistical significance is often wrongly interpreted to mean that there exists no real effect, i.e. that the null hypothesis is true. In summary, there is an emerging consensus that the dichotomization of results into “significant” and “not significant” is inadequate to summarize the evidence of an effect and that better ways of presenting evidence are needed.

P-value functions are an improved way to summarize the evidence of an effect graphically. P-value functions may be described as graphs of all possible p-values or all possible confidence limits. P-value functions are obtained by plotting p-values testing all hypotheses, null and non-null on the y-axis and the corresponding parameter values on the x-axis. Another way of looking at them is that they are a graph of confidence limits for all possible confidence levels. P-value functions accessibly summarize a wealth of information in a single graph: point estimate for the parameter, one- and two-sided confidence limits at any level, one- and two-sided p-values for any null and non-null parameter value. Hence, p-value functions avoid any dichotomization and display the degree of compatibility with the data for every parameter value, not just the null value. Additionally, the counternull, the value that is supported by the same amount of evidence as the null value can be read off immediately from the graph. Visualizing the counternull inhibits the prevalent fallacy of equating a non-significant result with “no effect”. Results from different studies or from different estimators can be compared by plotting the corresponding p-value functions together in the same graph. Because they are essentially just graphs of p-values and confidence limits, p-value functions require minimal statistical retraining and are easily interpreted. They enable researchers to think critically and holistically about the available evidence without the need to dichotomize results into a simplistic “significant” or “not significant” decision.
The present paper focuses on an accessible introduction to p-value functions. It is shown how p-value functions can be created using only published information such as a point estimate and the corresponding confidence interval or p-value. It is shown how p-value functions can aid the interpretation of results by discussing several examples from the recent medical literature. To facilitate the user-friendly creation of p-value functions, the R-package “pvaluefunctions” was developed and released on CRAN.