The Problem with p-Hacking: How to Spot It

By Site Editor | Published on November 5, 2025

In scientific research, the p-value is a number that researchers use to determine if their results are "statistically significant." By convention, a p-value of less than 0.05 (p < .05) is considered the gold standard for claiming a discovery. It suggests there's less than a 5% chance the result was due to random luck.

However, this single number has created a powerful incentive to find ways to "get" a significant result, leading to a practice known as **p-hacking** (or data dredging).

What is p-Hacking?

P-hacking is the act of re-analyzing a dataset in many different ways until a statistically significant result is found. It's not necessarily fabricating data (which is outright fraud), but it's a flexible, undisclosed process of analysis that dramatically increases the odds of finding a false positive.

Imagine a researcher wants to see if eating jelly beans causes acne. They run a study and find no effect. But then they think...

"Maybe it's not all jelly beans, just a specific color. Let's test green jelly beans vs. acne. No? How about red jelly beans? No? ...How about blue jelly beans?"

If you test 20 different colors, the odds are pretty good that *one* of them will give you a "significant" p < .05 result just by random chance. The researcher then publishes a paper titled "Blue Jelly Beans Linked to Acne!" without mentioning the 19 other colors they tested (the "failed" results).

Common Forms of p-Hacking

P-hacking can be subtle. It's often rationalized as "exploratory analysis." Common forms include:

Testing many variables: Measuring 10 different personality traits and only reporting the one that correlated.
Changing statistical models: Running a t-test, then an ANOVA, then a regression, and only reporting the one that "worked."
Dropping "outliers": Removing certain data points (subjects) to see if it makes the p-value go down.
Optional stopping: Collecting data, checking the p-value, and if it's not significant, collecting *more* data until it is.

Why It's a Problem

P-hacking is a direct cause of the Replication Crisis. It fills the scientific literature with exciting, "significant" findings that are actually just noise. When other researchers try to replicate the study (e.t., by testing *only* blue jelly beans), they find no effect.

This wastes time, money, and erodes public trust in science. It's a key reason why practices like pre-registration (committing to an analysis plan *before* seeing the data) are so vital to fixing science.