(Alberto Cairo’s paper Graphics Lies, Misleading Visuals Reflections on the Challenges and Pitfalls of Evidence-Driven Visual Communication gave guidance to the below analysis)
Humans love visual representation of data. A computer may look at long rows of data, or unstructured data even, and draw insights from it. For us humans though, that information needs to be presented as graphics we can understand, often with various shapes and colors added to drive home a key point. While I’m all for making information and trends visually insightful to humans, we must proceed with caution as often such representations can be misleading or downright dishonest. I highly recommend reading Cairo’s paper to gain a deeper understanding of this problem.
Here, I’d like to provide a quick analysis of a graph I saw on a medium article titled ‘Why We Need to Recognize and Consider Organic Foods’ .
I’m leaving out the statistical information on the top of the graph, including debates on the relevance of p values and R square goodness of fit values, or even the fact that correlation doesn’t imply causation, to focus simply on the visual deception of the graphic.
The deceptive tricks used fall into two categories:
- Too much data is represented to obscure reality
- Using graphic forms in inappropriate ways.
Too much data is represented to obscure reality
The graph proclaims to plot two different correlations:
between glyphosate usage and death rates from end stage renal disease
between the percentage of US corn and soy crops that are GE and death rates from end stage renal disease.
What does it show in reality though – Three data time series superimposed on each other at the same time.
Note how the x axis is time, meaning the graph doesn’t show the correlation between any two series, instead it simply shows how three different series of data are correlated with time!
Need I point out how the series all start at different points in time. For eg: Death rates from renal disease are plotted from 1985 to 1991 even though there is no information plotted about the supposedly causal glyphosate usage and percentage of soy and corn crops that are GE.
Using graphic forms in inappropriate ways.
Now look at the Y axes.
For one, they are both truncated, also why are there two axes ? Is there a third axis for the % GE Soy and Corn series.( btw how does the same percentage apply for soy and corn)
Truncating the Y axis helps to magnify and hence distort the magnitude of change in a series.
For a series(40,50) let’s say if the y axis is truncated at 40, the point with value 50 would look like infinite growth from the previous point!
Including multiple y axes in data is a way to suggest correlations or superimpositions in values that don’t really exist. If I’m allowed to change the scale of the y axis and its origin, I can make almost any two series look like they correlate.
To illustrate, I constructed two series of numbers random 1 and random 2, with 1 data point each from 1991 to 2009, both series are the sum of a random number and a linear time trend.
In the above figure, the two series are plotted against time, with a common Y axis starting at the origin 0.
Above, I’ve included two y axes with truncated origins.
Hid some of the values of Random1 above, overall suggesting to a user at a first glance that the sudden occurrence of the blue line caused the changes in the orange line.
So, in conclusion, graphs are great, but they are worth pondering over beyond the initial aha moment they might create in us.