Visualizations to Show Causality

Recently I have been reading about the best way to design a graph / visualization / infographic to express a particular message. Some cases are easier than others. To show distribution of a total into parts one chart that works well is the Pie Chart. Take a look at this example:

Source: http://www.apple.com/pr/library/2010/10/18results.html

It works well because the pie represents a total, and all the parts add up to the total. The area of each part is proportional to the fraction of the total that it represents. This graph shows the distribution of the total revenue among its parts, but it doesnt explain for instance how this revenue changes over time. To show changes over time a chart that works well is a line chart. This chart shows the years in increasing order from left to right. In Western cultures we normally associate time progress with movement from left to right because it's the same direction in which we read.

Source: http://www.google.com/finance?q=NASDAQ:AAPL&fstype=ii

Notice that in both cases the title indicates what data is shown in the graph and the time period in which it applies. Also we have units for all the numbers in the chart (Million / M). And in the second chart the axis are correctly labeled. Finally, the source of the data is indicated for both charts.

A more difficult problem is how to show causality. That is, how to show in a a graph that a certain condition results in a corresponding effect. Or that a change in one variable results in a change in a different value.

Usually causality is expressed with a scatter plot, or a line chart. However in those cases its hard to distinguish between real causality and mere correlation. If we plot for example the speed of a runner with the total distance traveled after 1 hour, its obvious that both numbers are completely determined by each other, and the correlation is 1. However causality is generally more unidirectional, where correlation is bidirectional. What I mean by this, is that when we talk about causality we normally express that one condition implies another. When we talk about correlation that usually means that two measures are tied together, but its not clear if there is a causal relationship from one to the other, or vice-versa.

In a static graph in general its hard to distinguish correlation from causality. The most common resource is to write the independent variable (the source in the causal relationship) in the lower axis, while leaving the dependent variable in the vertical axis. However this may not be as effective as intended.

For example, this chart shows the population and GDP of 6 European countries: Germany, France, UK, Italy, Spain and Netherlands. From the chart its not clear weather a higher population implies a higher GDP, or the other way around. Most likely there is no causal relationship altogether. India for example has a higher population than all of them combined, but its GDP (PPP) is comparable to Spain. The other implication is not true either, Australia has a higher GDP (PPP) then Mexico, but less than 20% of the population.

Source: http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)

I think one tool that is very useful to distinguish causality from correlation is a dynamic chart. In this chart not only we can put the causal variable in the lower axis, but we can also give the reader the freedom to alter that variable manually. This will trigger the change in the dependent variable. And because of the technical implementation of the chart its impossible to modify it the other way around. There is no way to modify the dependent variable and see the value of the independent variable. This asymmetry in the visualization is what enhances the message of causality, as opposed to mere correlation. Try it out in the example below!

Source: v = ds/dt http://en.wikipedia.org/wiki/Speed