martes, 12 de mayo de 2009

Not all causal relationships are created equal


Sent to you by deniman via Google Reader:


via Hacker News on 5/11/09


You might have already see this chart relating obesity to time spent eating in The New York Times:


The commentary accompanying the chart goes like so:

On Monday, in posting some of the data from the Organization for Economic Cooperation and Development's Society at a Glance report, I noted that the French spent the most time per day eating, but had one of the lowest obesity rates among developed nations.

Coincidence? Maybe, maybe not.

Jim Manzi dug deeper into the data and found something very interesting:

I recreated the original analysis (minus the inclusion of the OECD average as a data point in the regression, for what I assume are obvious reasons). I get pretty much the same picture, and using a log regression form, get what looks to be the same trend line. The R-Squared on the regression (not noted in the original post, as far as I could see) is 26%. Without the U.S. and Mexico, it goes to about 6%, and becomes statistically insignificant.

But what was really interesting is that there are five other time categorizations provided at the source website. Here's the same data plot, but using "Time Spent Doing Unpaid Work" instead of "Time Spent Eating and Drinking":


Huh. This relationship, produced from the same data source, is about twice as strong (R-Squared = 52%) as the one that was reported. It took me literally five minutes of work to discover it. Why do you think that one was reported but not the other? This appears to be a textbook example of the human tendency to accept correlations as "not definitive, but part of the overall picture of evidence for causality" when such data serves to confirm pre-existing beliefs, and to ignore it otherwise.

R-squared here refers to the proportion of variation of Y explained by the variation in X. It is a problem of dredging through data that you selectively pick out relationships of "interest" and dismiss those which you don't want to highlight as of less interest, or simplifying the "underlying complexities." More generally, it is always an interesting verbal experience dealing with someone who is the king of nuance and subtly shadings when they are making a negative case against a hypothesis, but become forceful advocates of black & white inferences when making a positive argument.



Things you can do from here:


No hay comentarios:

Publicar un comentario