Data visualizations, part 3
Graphs without self-esteem issues
April 11, 2019
There are a lot of exciting new technical developments that enable visualization in ways that could not be anticipated even just five years ago. But it is important to keep the fundamentals in mind. Visualization is about telling a story. And we can only really do that effectively if we work hard to get rid of all sources of confusion.
From: United Nations, 2009. Making Data Meaningful (https://www.unece.org/stats/documents/writing/).
I. All the rules of table architecture apply (see the previous post) –
- The title needs to be simple and well-worded.
- Include data sources.
- Add notes to provide answers to questions you can anticipate from your audience.
- Reduce colors and extras as much as possible. This is a special problems with graphs because we have so many choices. Take the two examples below. The first one may draw your eyes more readily, but the message is lost entirely. The reader really has to dig for it.
II. These additional guidelines make sure the data are presented accurately –
- Label both axes. The ‘dependent’ variable traditionally is shown on the y-axis (remember, this is the field or variable that you are studying; the things explaining the changes in that variable are on the x-axis).
- Use a legend when you have more than one series of data (or note the name of the series in the graph).
- The origin of the chart (where the X- and y-axes meet) should always be zero. If this rule is broken:
- It should be noted with a double line slash between the origin and the first cross mark on the y-axis.
- Results of statistical testing should be added to the notes. The reason for this is that the purpose of changing the origin is to emphasize a change that cannot be seen when the origin is zero. To warrant bending this time-honored rule, that difference better be statistically significant – or significant in a way your audience will quickly agree with. Otherwise, you can blow up any difference and make it look ‘noteworthy’ when it is just noise in the data. (In the example shown, statistical significance is not relevant because the data represent the whole population of adults in Sweden. We will get to statistical significance in a future post.)
- You should point out a change in the origin verbally during your presentation. The potential for misleading the audience is very high, so the burden is on you to make sure that does not happen.
- When you are showing more than one graph, the y-axis should have the same scaling on each graph. This assures that the audience can do a precise comparison among all charts. You will need to change the default in Excel for each graph because Excel has no discipline about this.
- When in doubt, include data labels.
So there you have it, some basic guidelines for making yourself understood. If you are excited to jump into the world of visualization, there are some great resources online to help you. Ann K. Emery has published a terrific chart chooser on her website; she also provides lots and lots of good examples and advice. Also, FlowingData offers inspriration for those who really want to get creative.