regression to the mean | Definition

Regression to the mean refers to the tendency for extreme scores to move closer to the average upon repeated measurement, threatening internal validity.

What Is Regression to the Mean?

In social science research, regression to the mean describes a statistical phenomenon where unusually high or low scores tend to move closer to the average when measured again. This shift is not due to any real change in the underlying variable but is simply a result of random variation or measurement error.

This concept becomes especially important when researchers select participants based on extreme scores. For example, if a study selects only students with very low test scores to receive a new teaching method, those students are likely to show higher scores the next time they are tested—even if the intervention had no effect. Their improvement may simply be a natural movement toward the average.

Regression to the mean is not a flaw in the data itself, but a threat to internal validity—the confidence that a study’s outcomes are caused by the treatment or condition being tested rather than other factors. If not accounted for, this phenomenon can lead researchers to wrongly believe that an intervention caused a change when it actually did not.

Why Regression to the Mean Happens

To understand why regression to the mean occurs, it’s important to think about how data behaves in the presence of variation.

In any measurement, there is a combination of:

  • True score: The actual level of the trait or variable.
  • Random error: Small influences that affect a measurement but are not consistent.

When researchers select participants who scored extremely high or low, some of that extremity is often due to random error. On a second measurement, that error is unlikely to occur in the same way, so the new score tends to fall closer to the group average.

This is particularly true when:

  • The correlation between two measures is less than perfect.
  • Selection is based on extreme scores.
  • Measurement tools are less reliable.

In other words, the more error there is in measurement, the more likely regression to the mean will affect results.

Regression to the Mean as a Threat to Internal Validity

In experimental and quasi-experimental studies, researchers try to determine whether a treatment or intervention caused a change in behavior, attitude, performance, or other outcomes. Internal validity refers to how confidently we can make that claim.

Regression to the mean can threaten internal validity when:

  • Participants are chosen based on extreme scores (e.g., lowest performers, highest risk).
  • A treatment group shows improvement that may be due to natural score movement rather than the treatment itself.
  • The design does not include a proper comparison or control group.

In these cases, researchers might conclude that the treatment “worked,” but in reality, the change might have happened even without any intervention.

Real-World Examples

Education

Imagine a school introduces a new tutoring program for students who scored the lowest on a math test. After tutoring, many of those students perform better. It may be tempting to conclude that the tutoring helped. But it’s also possible that the students’ original low scores were unusually bad due to chance (being tired, nervous, or guessing). On the second test, their scores might improve naturally—even without tutoring.

Without a control group of similarly low-performing students who did not receive tutoring, it is hard to tell whether the tutoring really caused the improvement.

Psychology

A psychologist might treat clients with very high anxiety levels using a new therapy. If anxiety levels drop afterward, it might seem like the therapy was successful. But because the clients started with extreme scores, part of that drop could be a result of regression to the mean rather than the therapy itself.

Criminology

In a study of juvenile offenders, researchers might offer an intervention to those with the highest rates of offenses. If their behavior improves over time, it may seem like the program was effective. However, since the selection was based on extreme behavior, some improvement may occur simply due to regression to the mean.

Political Science

A political campaign might target voters with extremely low approval of a candidate, hoping to shift their views through messaging. If the next survey shows slightly higher approval, the campaign might claim success. But again, part of that shift could be due to a natural return toward the average opinion, especially if the initial rating was unusually low due to strong emotions or specific events.

When Is Regression to the Mean Most Likely?

Regression to the mean is most likely to appear under certain research conditions. Researchers should be especially careful when the following are true:

  • Selection is based on extreme scores.
  • There is only one group being tested.
  • Measurements have low reliability.
  • Follow-up measurements are taken soon after the first.

How to Reduce the Impact

Social science researchers can take several steps to minimize the risk that regression to the mean will threaten the validity of their findings.

Use a Control Group

The most effective way to deal with regression to the mean is to include a control group that does not receive the intervention. Both the treatment and control groups should be selected in the same way—ideally randomly—and measured at the same time.

If both groups show similar changes over time, it is likely due to regression to the mean. If only the treatment group changes, the treatment may have had a real effect.

Avoid Selecting Only Extreme Scores

When possible, researchers should avoid selecting participants based only on very high or very low scores. If the full range of scores is used, the risk of regression to the mean affecting the results goes down.

Measure Multiple Times Before Intervention

Taking multiple baseline measurements before starting an intervention can help identify whether an extreme score is a one-time occurrence or part of a consistent pattern. If the extreme score is stable over time, regression to the mean is less likely.

Use Reliable Measurement Tools

High-quality, reliable tests and surveys reduce measurement error, which lowers the chance of regression to the mean. When tools are consistent and accurate, scores are more likely to reflect true values rather than random variation.

Report and Interpret Findings Carefully

Researchers should be aware of regression to the mean when interpreting their results, especially when reporting changes among groups with extreme baseline scores. Being transparent about design limitations and considering alternative explanations strengthens the credibility of a study.

Regression to the Mean vs. True Change

One of the biggest challenges in social science research is figuring out whether a change in scores or outcomes is real or simply statistical noise. Here’s how researchers can start to tell the difference:

  • True change is linked to the treatment, intervention, or condition being tested. It often occurs alongside evidence such as theory, mechanisms, or repeated results.
  • Regression to the mean is a natural statistical tendency. It’s likely when extreme scores are selected and no comparison group is used.

Distinguishing between the two requires careful research design, appropriate comparisons, and honest reporting.

Summary of Key Points

  • Regression to the mean is the tendency for extreme scores to move closer to the average upon repeated measurement.
  • It can mislead researchers into thinking an intervention caused improvement when the change may be due to natural score movement.
  • This is especially dangerous when participants are chosen for a study based on unusually high or low scores.
  • Including control groups, using reliable measurements, and avoiding selection based solely on extremes can reduce this risk.
  • Regression to the mean is a threat to internal validity and should be addressed in both research design and interpretation.

Conclusion

Regression to the mean is a well-known but often overlooked issue in social science research. It happens when extreme scores naturally move closer to the average over time, especially when influenced by measurement error or random factors. Although it is a normal part of statistical behavior, it can lead researchers to draw false conclusions if not properly accounted for.

By understanding how regression to the mean works and designing studies to guard against it, researchers can improve the internal validity of their work. Using control groups, avoiding selection based on extremes, and applying careful analysis all help ensure that observed changes are real and meaningful, not just statistical illusions.

Glossary Return to Doc's Research Glossary

Last Modified: 03/23/2025

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.