what is spurious correlation

10 months ago 27
Nature

Spurious Correlation

In statistics, a spurious correlation, also known as a spurious relationship, refers to a mathematical relationship between two or more events or variables that are associated but not causally related. This can occur due to coincidence or the presence of a third, unseen factor, also known as a "common response variable," "confounding factor," or "lurking variable".

Characteristics of Spurious Correlation

  • It is a mathematical relationship in which two or more events or variables are associated but not causally related.
  • It can be caused by coincidence or the presence of a third, unseen factor.
  • Spurious correlations can appear in the form of non-zero correlation coefficients and as patterns in a graph, giving the appearance of a genuine causal relationship.

Examples of Spurious Correlation

  • An example of a spurious relationship can be found in the time-series literature, where a spurious regression is one that provides misleading statistical evidence of a linear relationship between independent non-stationary variables. In fact, the non-stationarity may be due to the presence of a unit root in both variables.
  • Another example is the correlation between U.S. crude oil imports from Norway and drivers killed in a collision with a railway train, which has a very high correlation coefficient of +0.95, representing a strong, positive relationship. However, this correlation is spurious and does not imply causation.

Detecting Spurious Relationships

  • Spurious relationships can be identified by using common sense and by conducting closer statistical examination to determine if the aligned movements are coincidental or caused by a third factor that affects the two variables.
  • Research conducted with small sample sizes or arbitrary endpoints is particularly susceptible to spuriousness.

In conclusion, a spurious correlation is an important concept in statistics that highlights the need for careful analysis and interpretation of relationships between variables to avoid drawing incorrect causal inferences.