Practical Decluttering of Matplotlib Visuals

These notes illustrate the practical steps required to adhere to Tufte’s recommendations regarding Data-Ink Ratio and Chart Junk.

Specifically:

  1. Begin with a default bar chart.
  2. Remove ticks and y-axis labels.
  3. Remove the frame.
  4. Change the bar colors to be higher contrast and to highlight the Python bar.
  5. Add direct labels for the y-axis values.
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

The following steps improve the appearance of the inline figures.

  • First, it changes the dpi to be a higher-quality image. Changing the DPI has the effect of increasing the size of the figures.
  • Second, it configures the inline backed to work with high pixel density displays.
import matplotlib as mpl
mpl.rcParams['figure.dpi']= 150
%config InlineBackend.figure_format = 'retina'

A default Matplotlib bar chart visual is shown below.

languages =['Python', 'SQL', 'Java', 'C++', 'JavaScript']
pos = np.arange(len(languages))
popularity = [56, 39, 34, 34, 29]
def setup_original():
    barlist = plt.bar(pos, popularity, align='center')
    plt.xticks(pos, languages)
    plt.ylabel('% Popularity')
    plt.title('Top 5 Languages for Math & Data \nby % popularity on Stack Overflow', alpha=0.8)
    
    return barlist
plt.figure()
setup_original();

png

This figure can be improved by removing the tick marks from both axes and the labels from the y-axis.

def remove_ticks_and_labels():
    plt.tick_params(
        axis='x',
        bottom=False)
    plt.tick_params(
        axis='y',
        left=False,
        labelleft=False)
plt.figure()
setup_original()
remove_ticks_and_labels()

png

The figure can be further improved by removing the border.

def remove_frame():
    for spine in plt.gca().spines.values():
        spine.set_visible(False)
plt.figure()
setup_original()
remove_ticks_and_labels()
remove_frame()

png

Change the bar colors to be more neutral and to highlight the Python bar.

def change_bar_colors(barlist):
    for bar in barlist:
        bar.set_color('grey')
    barlist[0].set_color('navy')
plt.figure()
barlist = setup_original()
remove_ticks_and_labels()
remove_frame()
change_bar_colors(barlist)

png

Change the labels to appear softer by using a grey color.

def change_label_color():
    plt.gca().get_yaxis().get_label().set_alpha(0.8)
    [i.set_alpha(0.8) for i in plt.gca().get_xticklabels()]
    plt.gca().set_title(plt.gca().get_title(), 
                        alpha=0.8)
plt.figure()
barlist = setup_original()
remove_ticks_and_labels()
remove_frame()
change_bar_colors(barlist)
change_label_color()

png

Add direct labels to the bars.

def add_direct_labels():
    rects = plt.gca().patches
    
    for rect, label in zip(rects, popularity):
        height = rect.get_height()
        plt.gca().text(rect.get_x() + rect.get_width() / 2, 
                       height - 5, 
                       str(label)+'%',
                       ha='center', 
                       va='bottom',
                       color='white')
plt.figure()
barlist = setup_original()
remove_ticks_and_labels()
remove_frame()
change_bar_colors(barlist)
change_label_color()
add_direct_labels()

png

A single script to produce this visual from scratch, without the iterative improvements, is shown below.

languages =['Python', 'SQL', 'Java', 'C++', 'JavaScript']
pos = np.arange(len(languages))
popularity = [56, 39, 34, 34, 29]

barlist = plt.bar(pos, 
                  popularity, 
                  align='center',
                  color='grey')
barlist[0].set_color('navy')

plt.xticks(pos, 
           languages, 
           alpha=0.8)
plt.ylabel('% Popularity', 
           alpha=0.8)
plt.title('Top 5 Languages for Math & Data \nby % popularity on Stack Overflow', 
          alpha=0.8)

plt.tick_params(
    axis='x',
    bottom=False)
plt.tick_params(
    axis='y',
    left=False,
    labelleft=False)

for spine in plt.gca().spines.values():
    spine.set_visible(False)

rects = plt.gca().patches
for rect, label in zip(rects, popularity):
    height = rect.get_height()
    plt.gca().text(rect.get_x() + rect.get_width() / 2, 
                   height - 5, 
                   str(label)+'%',
                   ha='center', 
                   va='bottom',
                   color='white')

png