Plotting Directly from Pandas

There are a few advantages to plotting directly from Pandas instead of using Matplotlib:

  • Faster, fewer lines of code
  • Easy incorporation of Panda’s built-in data manipulations like differencing and rolling means
  • Pandas is often easier to use than Matplotlib. Chart types are parameters and subplot manuipulation is similarly straightforward, as examples.

Despite this, remember that panda’s plotting uses matplotlib under the hood. Pandas does provide a convenient and direct interface to connect the data to the visual.

Imports

Note that Matplotlib is not imported, below

import pandas as pd
import numpy as np

Generate Sample Data

df = pd.DataFrame(np.random.rand(10, 4), columns=['A','B','C','D'])
df

A B C D
0 0.338320 0.918672 0.545048 0.183447
1 0.094920 0.939942 0.728564 0.917459
2 0.519461 0.305833 0.979634 0.598790
3 0.006291 0.225203 0.962938 0.845943
4 0.720794 0.386856 0.546380 0.826647
5 0.141325 0.907937 0.803052 0.346366
6 0.165777 0.833604 0.211596 0.088015
7 0.445472 0.010162 0.321679 0.067631
8 0.268215 0.353037 0.917589 0.777908
9 0.876085 0.175703 0.044765 0.529395

Example Plots

df.plot.bar();

png

df.plot.bar(stacked=True);

png

df.plot.barh(stacked=True);

png

Use seaborn to change the color pallete.

import seaborn as sns
sns.set_palette('seismic')

df.plot.barh(stacked=True);

png

df.plot.area();

png

Parameters can be adjusted as they normally would in matplotlib or seaborn

df.plot.area(stacked=False,
               alpha=0.25);

png

df.diff() takes the difference between one row and the row before it, which is helpful when working with time series

df.diff().plot.box(vert=False,
                   color={'medians':'lightblue',
                          'boxes':'blue',
                          'caps':'darkblue'});

png

.rolling().mean() takes the average rolling mean

df = (pd.DataFrame(np.random.rand(100, 1), 
                   columns=['value'])
      .reset_index())
df['smoothed'] = df['value'].rolling(3).mean()
df.head()

index value smoothed
0 0 0.868018 NaN
1 1 0.044417 NaN
2 2 0.542460 0.484965
3 3 0.467757 0.351545
4 4 0.437014 0.482410
sns.set_palette('tab10')
df['value'].plot()
df['value'].rolling(10).mean().plot();

png

All Plots have a figsize=(x,y) argument

df['value'].plot(figsize=(9,6))
df['value'].rolling(10).mean().plot(figsize=(9,6));

png

Other Examples

df = pd.DataFrame(np.random.rand(100, 4), columns=['A','B','C','D'])

df.plot.kde();                  # distribution plot
df.plot.scatter(x='A',y='B',    # scatterplot x and y
                c='C',          # color of data points
                s=df['C']*200); # size of data points
df.plot.hexbin(x='C',y='D',     # hexbin x and y
               gridsize=18);    # hexagon dimensions

png

png

png

subplots=True results in subplots based on the columns

Other possible parameters for pie charts include:

  • labels=['label1','label2']
  • colors=[‘red’,’green’]
  • autopct=’%.2f’
  • fontsize=20
df = pd.DataFrame(np.random.rand(5, 2),
                  index=list("ABCDE"), 
                  columns=list("XY"))
df

X Y
A 0.439267 0.282618
B 0.201808 0.522568
C 0.143325 0.742457
D 0.836806 0.108267
E 0.613775 0.384959
df.plot.pie(subplots=True, 
            figsize=(9, 6));

png

Line plots are the Default with .plot

df = pd.DataFrame(np.random.rand(100, 4), 
                  columns=['A','B','C','D'])
df.plot(subplots=True,
        figsize=(16,8));

png

layout=(2,2) results in pandas automatically formatting the subplots according to the layout

df.plot(subplots=True,
        layout=(2, 2),
        figsize=(16,8));

png