Normal Curves: Sexy Science, Serious Statistics

P-Values: Are we using a flawed statistical tool?

P-values show up in almost every scientific paper, yet they’re one of the most misunderstood ideas in statistics. In this episode, we break from our usual journal-club format to unpack what a p-value really is, why researchers have fought about it for a century, and how that famous 0.05 cutoff became enshrined in science. Along the way, we share stories from our own papers—from a Nature feature that helped reshape the debate to a statistical sleuthing project that uncovered a faulty method in sports science. The result: a behind-the-scenes look at how one statistical tool has shaped the culture of science itself.


Statistical topics

  • Bayesian statistics
  • Confidence intervals 
  • Effect size vs. statistical significance
  • Fisher’s conception of p-values
  • Frequentist perspective
  • Magnitude-Based Inference (MBI)
  • Multiple testing / multiple comparisons
  • Neyman-Pearson hypothesis testing framework
  • P-hacking
  • Posterior probabilities
  • Preregistration and registered reports
  • Prior probabilities
  • P-values
  • Researcher degrees of freedom
  • Significance thresholds (p < 0.05)
  • Simulation-based inference
  • Statistical power 
  • Statistical significance
  • Transparency in research 
  • Type I error (false positive)
  • Type II error (false negative)
  • Winner’s Curse

Methodological morals

  • “​​If p-values tell us the probability the null is true, then octopuses are psychic.”
  • “Statistical tools don't fool us, blind faith in them does.”


References

  • Nuzzo R. Scientific method: statistical errors. Nature. 2014 Feb 13;506(7487):150-2. doi: 10.1038/506150a. 
  • Nuzzo, R., 2015. Scientists perturbed by loss of stat tools to sift research fudge from fact. Scientific American, pp.16-18.
  • Nuzzo RL. The inverse fallacy and interpreting P values. PM&R. 2015 Mar;7(3):311-4. doi: 10.1016/j.pmrj.2015.02.011. Epub 2015 Feb 25. 
  • Nuzzo, R., 2015. Probability wars. New Scientist, 225(3012), pp.38-41.
  • Sainani KL. Putting P values in perspective. PM&R. 2009 Sep;1(9):873-7. doi: 10.1016/j.pmrj.2009.07.003.
  • Sainani KL. Clinical versus statistical significance. PM&R. 2012 Jun;4(6):442-5. doi: 10.1016/j.pmrj.2012.04.014.
  • McLaughlin MJ, Sainani KL. Bonferroni, Holm, and Hochberg corrections: fun names, serious changes to p values. PM&R. 2014 Jun;6(6):544-6. doi: 10.1016/j.pmrj.2014.04.006. Epub 2014 Apr 22. 
  • Sainani KL. The Problem with "Magnitude-based Inference". Med Sci Sports Exerc. 2018 Oct;50(10):2166-2176. doi: 10.1249/MSS.0000000000001645. 
  • Sainani KL, Lohse KR, Jones PR, Vickers A. Magnitude-based Inference is not Bayesian and is not a valid method of inference. Scand J Med Sci Sports. 2019 Sep;29(9):1428-1436. doi: 10.1111/sms.13491. 
  • Lohse KR, Sainani KL, Taylor JA, Butson ML, Knight EJ, Vickers AJ. Systematic review of the use of "magnitude-based inference" in sports science and medicine. PLoS One. 2020 Jun 26;15(6):e0235318. doi: 10.1371/journal.pone.0235318. 
  • Wasserstein, R.L. and Lazar, N.A., 2016. The ASA statement on p-values: context, process, and purpose. The American Statistician, 70(2), pp.129-133.


Kristin and Regina’s online courses: 

Demystifying Data: A Modern Approach to Statistical Understanding  

Clinical Trials: Design, Strategy, and Analysis 

Medical Statistics Certificate Program  

Writing in the Sciences 

Epidemiology and Clinical Research Graduate Certificate Program 

Programs that we teach in:

Epidemiology and Clinical Research Graduate Certificate Program 

Find us on:

Kristin -  LinkedIn & Twitter/X

Regina - LinkedIn & ReginaNuzzo.com

  • (00:00) - Intro & claim of the episode
  • (01:00) - Why p-values matter in science
  • (02:44) - What is a p-value? (ESP guessing game)
  • (06:47) - Big vs. small p-values (psychic octopus example)
  • (08:29) - Significance thresholds and the 0.05 rule
  • (09:00) - Regina’s Nature paper on p-values
  • (11:32) - Misconceptions about p-values
  • (13:18) - Fisher vs. Neyman-Pearson (history & feud)
  • (16:26) - Botox analogy and type I vs. type II errors
  • (19:41) - Dating app analogies for false positives/negatives
  • (22:02) - How the 0.05 cutoff got enshrined
  • (23:46) - Misinterpretations: statistical vs. practical significance
  • (25:22) - Effect size, sample size, and “statistically discernible”
  • (25:51) - P-hacking and researcher degrees of freedom
  • (28:52) - Transparency, preregistration, and open science
  • (29:58) - The 0.05 cutoff trap (p = 0.049 vs 0.051)
  • (30:24) - The biggest misinterpretation: what p-values actually mean
  • (32:35) - Paul the psychic octopus (worked example)
  • (35:05) - Why Bayesian statistics differ
  • (38:55) - Why aren’t we all Bayesian? (probability wars)
  • (40:11) - The ASA p-value statement (behind the scenes)
  • (42:22) - Key principles from the ASA white paper
  • (43:21) - Wrapping up Regina’s paper
  • (44:39) - Kristin’s paper on sports science (MBI)
  • (47:16) - What MBI is and how it spread
  • (49:49) - How Kristin got pulled in (Christie Aschwanden & FiveThirtyEight)
  • (53:11) - Critiques of MBI and “Bayesian monster” rebuttal
  • (55:20) - Spreadsheet autopsies (Welsh & Knight)
  • (57:11) - Cherry juice example (why MBI misleads)
  • (59:28) - Rebuttals and smoke & mirrors from MBI advocates
  • (01:02:01) - Winner’s Curse and small samples
  • (01:02:44) - Twitter fights & “establishment statistician”
  • (01:05:02) - Cult-like following & Matrix red pill analogy
  • (01:07:12) - Wrap-up