Simpson's paradox is a paradox from statistics. It is named after Edward H. Simpson, a Britishstatistician who first described it in 1951.[1] The statistician Karl Pearson described a very similar effect in 1899.[2]- Udny Yule's description dates from 1903.[3] Sometimes, it is called the Yule–Simpson effect. When looking at the statistical scores of groups, these scores may change, depending on whether the groups are looked at one by one, or if they are combined into a larger group. This case often occurs in social sciences and medical statistics.[4] It may confuse people, if frequency data is used to explain a causal relationship.[5] Other names for the paradox include reversal paradox and amalgamation paradox.[6]
Example: Kidney stone treatment
This is a real-life example from a medical study[7] comparing the success rates of two treatments for kidney stones.[8]
The table shows the success rates and numbers of treatments for treatments involving both small and large kidney stones, where Treatment A includes all open procedures and Treatment B is percutaneous nephrolithotomy:
Treatment A
Treatment B
success
failure
success
failure
Small Stones
Group 1
Group 2
number of patients
81
6
234
36
93%
7%
87%
13%
Large Stones
Group 3
Group 4
number of patients
192
71
55
25
73%
27%
69%
31%
Both
Group 1+3
Group 2+4
number of patients
273
77
289
61
78%
22%
83%
17%
The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example, it was not known that the size of the kidney stone influenced the result. This is called a hidden variable (or lurking variable) in statistics.
Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson's paradox, happens because two effects occur together:
The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give the severe cases (large stones) the better treatment (A), and the milder cases (small stones) the inferior treatment (B). Therefore, the totals are dominated by groups three and two, and not by the two much smaller groups one and four.
The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group three) does worse than the group with small stones, even if the latter used the inferior treatment B (group two).
References
↑Simpson, Edward H. (1951). "The Interpretation of Interaction in Contingency Tables". Journal of the Royal Statistical Society, Ser. B. 13: 238–241.
↑Pearson, Karl; Lee, A.; Bramley-Moore, L. (1899). "Genetic (reproductive) selection: Inheritance of fertility in man". Philosophical Translations of the Royal Statistical Society, Ser. A. 173: 534–539.