More Visualization Guidelines
1: Know Your Audience
Scientific visualization can be defined as a graphical interface between people and data. Given this definition, problems arise when how a visual is perceived differs significantly from the intent of the conveyer.
If the intended audience is a “general audience,” then much less can be assumed about the knowledge they will bring to bear on the visual. Consequently, the visual must be more precise and self-contained than for audiences that are familiar with the subject matter to which the visual pertains.
2: Identify Your Message
A visual is meant to express an idea or introduce some facts or a result that would be too long (or nearly impossible) to explain only with words or numbers. Just as an author must first put her thoughts in order before writing a text, a data analyst must identify his message before developing his visual.
3: Adapt the Figure to the Support Medium
The support medium could be a poster, a computer monitor, a printed page, a projection screen, or a cell phone. Which of these options is used will have implications for how long the visual will be viewed, how much detail can be shown, and whether long accompanying captions are likely to be read. It also can impact things like color choice due to differences in contrast, depending on the medium.
4: Captions Are Not Optional
Captions explain how to read the figure and provides additional precision. Analogous to a verbal explanation of a graph that is presented during a technical talk, they should not be skipped.
5: Do Not Trust the Defaults
Default values are available for any visual-creating software, whether something commercial and widely-available like Excel, Powerpoint, Tableau, or something more precise and niche like Matplotlib. These defaults are chosen because they are good for many visuals, but this means they are also the best for none. Tailor the settings to the visual.
6: Use Color Effectively
When color does not convey additional information, do not use it. If you must use color, choose colors that are colorblind friendly (IE, high-contrast, and preferably blues and oranges)
There are three types of colormaps that are commonly used follow. Choose the one that is most relevant to the data.
- Sequential: one variation of a unique color, used for quantitative data varying from low to high.
- Diverging: variation from one color to another, used to highlight deviation from a median value.
- Qualitative: rapid variation of colors, used mainly for discrete or categorical data.
7: Do Not Mislead the Reader
What distinguishes a scientific figure from other graphical artwork is the presence of data that needs to be shown as objectively as possible. A scientific figure is, by definition, tied to the data. As a rule of thumb, make sure to always use the simplest type of plots that can convey your message and make sure to use labels, ticks, title, and the full range of values when relevant.
Use lengths, and not areas or angular displacements (IE, avoid area plots and pie charts), and begin your bar charts at zero and not some arbitrary middle point.
8: Avoid ‘‘Chartjunk’’
Chartjunk refers to all the unnecessary or confusing visual elements found in a figure that do not improve the message (in the best case) or add confusion (in the worst case). For example, chartjunk may include the use of too many colors, too many labels, gratuitously colored backgrounds, useless grid lines, etc. The term was first coined by Edward Tutfe, a visualization expert and author of The Visual Display of Quantitative Information. In that work, he argues that any decorations that do not tell the viewer something new must be avoided.
9: Message Trumps Beauty
In science, message and readability of the figure is the most important aspect while beauty is only an option. Today, the lines between data visualization, infographics, design, and art are becoming thinner and thinner. When in doubt, opt for precision and accuracy over aesthetics.
10: Get the Right Tool
The article lists several open-source tools that are sufficiently precise for scientific data visualization. These are Matplotlib, R, Inkscape, TikZ and PGF, GIMP, ImageMagick, D3.js, Cytoscape, and Circos. For details on use cases for each of these, see the article on which these notes are based, linked below.