Edward Tufte's Graphical Heuristics

08 Aug 2019

Edward Tufte’s book The Visual Display of Quantitative Information introduces several heuristics related to graphical displays.

Data-Ink Ratio

Data-Ink is the non-redundant, non-erasable core of a graphic which directly correlates to the underlying data and which is essential to the sense-making process for a given variable. The Data-Ink Ratio is the amount of data-ink divided by the total ink required to print the graphic.

The ideal visual will have the highest possible data-ink ratio, meaning that the chart elements that don’t add new information to the graphic have all been removed. Specific items that should be removed include:

Background imagery and coloration,
Any redundant labeling, including the removal of legends if chart items can be labeled directly,
Verbiage in the title that is redundant with verbiage already in the labels on the axes,
Lines forming borders around the graphic,
Non-value-add coloration, with the possible exception of a single colored item which may be used to highlight a notable datapoint,
Three-dimensional bars, drop shadows, and bolding,
Non-value-add gridlines,
Vertical axes entirely, including labels and tick marks, if bars in a bar chart can be numerically labeled directly.

Chart Junk

Chart Junk is an especially onerous type of non-data-ink. It includes all types of artistic decorations, but in particular, Tufte highlights three types:

Unintended Optical Art, which would include excessive shading or patterning of chart features. These cause visual fatigue due to unintended jumping of the human eye.
The Grid, which causes competition with the actual data being shared. Thinning, removing, or desaturating the grid are all possible mitigation strategies.
All Non-data, creative graphic embellishments, including line are or photographs. Newspapers and magazines commonly take this sort of creative license.

Sparklines

Tufte pioneered the concept of sparklines, now included in Microsoft Excel and other graphical spreadsheet software. Sparklines are very minimalist and small graphs that are generally embedded in tables alongside the data they describe. They are useful for determining the overall trend behind the data.

Lie Factor

The Lie Factor is the size of an effect shown in a graphic divided by the side of the effect actually in the data. Effects of perspective in graphical displays and using areas and volumes instead of lengths to determine effect sizes are both examples of lie factor that visuals may introduce.

These notes were taken from the Coursera course Applied Plotting, Charting & Data Representation in Python. Some concepts were taken from Edward Tufte's book The Visual Display of Quantitative Information.