Data Is Not Evidence

The terms data and evidence are often used interchangeably. Data is just a collection of facts. Evidence is a collection of facts that points to one particular conclusion. So as an example, where do we often hear the word evidence? It’s in a criminal trial. So evidence is evidence only if it points to one particular suspect and rules out others. So if the evidence suggests that Tom or Dick or Harry could have killed Sarah, this is not evidence. It’s consistent with multiple explanations. And so even the world’s most rigorous, large dataset might not be evidence if there’s multiple different conclusions that we can draw.

Correlation does not imply causation.

There are studies that suggest that babies who were breastfed have higher IQ later in life. Now it’s true that one interpretation is breastfeeding causes the higher IQ. But it’s very important to understand that correlation does not imply causation because there could be alternative explanations for the correlations that we see in the data. An alternative explanation is which babies end up getting breastfed. Those are the ones with a good home environment. Because breastfeeding is tough, it’s hard to do without family support. So the babies who are breastfed, it may well be that the mother has a supportive partner at home, maybe help outside the house. So rather than breastfeeding causing higher IQ, it could be that there’s a common cause, like family background, that leads both a mother to breastfeed and also independently leads to the higher IQ.

A second reason why correlation does not imply causation is it could be reverse causality. It could be that the tail wags the dog. So there is a strong correlation between stopping smoking and death. So if a smoker stops smoking, they are more likely to die rather than less. And that might seem crazy, is it that stopping smoking is causing the death? No. It could be the opposite direction. Who is it that chooses to stop smoking? You choose to stop smoking when your doctor tells you you really need to give up the habit because the likelihood of mortality is imminent. So it’s that the likelihood of death causes you to stop smoking rather than stopping smoking increasing the likelihood of death.

Alternative explanations may exist.

Now everybody knows that correlation is not causation in the cold light of day. But because of our biases, we might forget this if we like the cause or story being paraded. So we often like to think that breast milk must be good. Something natural must be better than something man-made, a formula concocted by some giant corporation. So if we see this data, we will latch on to the interpretation that it’s breast milk which is causing the high IQ when it could be something else such as parental background which is actually doing the work.

So how can we ask ourselves whether a correlation is a causation or whether there are alternative explanations like common factors or reverse causality? If the result is something we want to be true, imagine we saw the opposite result, one that we would want to knock down, and think about how we might knock it down. For example, if a study claimed that breastfeeding leads to lower IQ, that is something which just jars with us. It just sounds wrong. And how would we try to knock it down? We would appeal to common causes. We might say, well, who are the mothers who breastfeed? They might be poor because they’re not able to afford formula, and it could be the poverty which is leading to the lower IQ, not the breast milk itself.

So now that we’ve alerted ourselves to the possibility of common causes, in particular, the role of family background, ask ourselves whether those common causes could also explain the correlation that we see even though it’s in the direction that we want to be true. So the actual correlation is breastfeeding is linked to high IQ. But is it that family affluence, family background is leading to that result rather than breast milk being what’s causing the high IQ?