From Tyler Vigen's site, Spurious Correlations
Correlation does not imply causation. A phrase that seemingly refutes most casual-non-causal statistical observations. The divorce rate in Maine had a 99.26% correlation with the per capita consumption of margarine from 2000-2009. Surely eating margarine doesn't cause divorces.
But is there a more specific reasoning other than correlation does not imply causation? Here are a few reasons why we might observe two correlated data that are not causal.
- There's actually reverse causation. We observe worse weather when Uber prices increase. Yet, Uber prices do not cause bad weather.
- There's a third, confounding variable. Sunburns are correlated with ice cream eating.
- Selection bias. We sample data in a way that over represents a particular trait or group.
- The relationship is purely coincidental.
How do you observe causality then? There's no hard and fast rule. Causal inference is hard. Hill's criteria for causation provides a decent starting point. Here are some excerpts from his criteria.
- Strength – how large is the effect? Small effects aren't necessarily not causal, but the larger the effect, the more likely it is causal.
- Temporality – The effect should occur after the cause.
- Biological gradient – Often times, higher exposure leads to more of an observed effect. The obvious analogy here is medicine.