Plotting Directly from Pandas

07 Jul 2020

There are a few advantages to plotting directly from Pandas instead of using Matplotlib:

Faster, fewer lines of code
Easy incorporation of Panda’s built-in data manipulations like differencing and rolling means
Pandas is often easier to use than Matplotlib. Chart types are parameters and subplot manuipulation is similarly straightforward, as examples.

Despite this, remember that panda’s plotting uses matplotlib under the hood. Pandas does provide a convenient and direct interface to connect the data to the visual.

Imports

Note that Matplotlib is not imported, below

import pandas as pd
import numpy as np

Generate Sample Data

df = pd.DataFrame(np.random.rand(10, 4), columns=['A','B','C','D'])
df

	A	B	C	D
0	0.338320	0.918672	0.545048	0.183447
1	0.094920	0.939942	0.728564	0.917459
2	0.519461	0.305833	0.979634	0.598790
3	0.006291	0.225203	0.962938	0.845943
4	0.720794	0.386856	0.546380	0.826647
5	0.141325	0.907937	0.803052	0.346366
6	0.165777	0.833604	0.211596	0.088015
7	0.445472	0.010162	0.321679	0.067631
8	0.268215	0.353037	0.917589	0.777908
9	0.876085	0.175703	0.044765	0.529395

Example Plots

df.plot.bar();

png

df.plot.bar(stacked=True);

png

df.plot.barh(stacked=True);

png

Use seaborn to change the color pallete.

import seaborn as sns
sns.set_palette('seismic')

df.plot.barh(stacked=True);

png

df.plot.area();

png

Parameters can be adjusted as they normally would in matplotlib or seaborn

df.plot.area(stacked=False,
               alpha=0.25);

png

`df.diff()` takes the difference between one row and the row before it, which is helpful when working with time series

df.diff().plot.box(vert=False,
                   color={'medians':'lightblue',
                          'boxes':'blue',
                          'caps':'darkblue'});

png

`.rolling().mean()` takes the average rolling mean

df = (pd.DataFrame(np.random.rand(100, 1), 
                   columns=['value'])
      .reset_index())
df['smoothed'] = df['value'].rolling(3).mean()
df.head()

	index	value	smoothed
0	0	0.868018	NaN
1	1	0.044417	NaN
2	2	0.542460	0.484965
3	3	0.467757	0.351545
4	4	0.437014	0.482410

sns.set_palette('tab10')
df['value'].plot()
df['value'].rolling(10).mean().plot();

png

All Plots have a `figsize=(x,y)` argument

df['value'].plot(figsize=(9,6))
df['value'].rolling(10).mean().plot(figsize=(9,6));

png

Other Examples

df = pd.DataFrame(np.random.rand(100, 4), columns=['A','B','C','D'])

df.plot.kde();                  # distribution plot
df.plot.scatter(x='A',y='B',    # scatterplot x and y
                c='C',          # color of data points
                s=df['C']*200); # size of data points
df.plot.hexbin(x='C',y='D',     # hexbin x and y
               gridsize=18);    # hexagon dimensions

png

`subplots=True` results in subplots based on the columns

Other possible parameters for pie charts include:

labels=['label1','label2']
colors=[‘red’,’green’]
autopct=’%.2f’
fontsize=20

df = pd.DataFrame(np.random.rand(5, 2),
                  index=list("ABCDE"), 
                  columns=list("XY"))
df

	X	Y
A	0.439267	0.282618
B	0.201808	0.522568
C	0.143325	0.742457
D	0.836806	0.108267
E	0.613775	0.384959

df.plot.pie(subplots=True, 
            figsize=(9, 6));

png

Line plots are the Default with `.plot`

df = pd.DataFrame(np.random.rand(100, 4), 
                  columns=['A','B','C','D'])
df.plot(subplots=True,
        figsize=(16,8));

png

`layout=(2,2)` results in pandas automatically formatting the subplots according to the layout

df.plot(subplots=True,
        layout=(2, 2),
        figsize=(16,8));

png

This content was summarized from a post on Medium created by Andre Ye.

Imports

Generate Sample Data

Example Plots

Use seaborn to change the color pallete.

Parameters can be adjusted as they normally would in matplotlib or seaborn

df.diff() takes the difference between one row and the row before it, which is helpful when working with time series

.rolling().mean() takes the average rolling mean

All Plots have a figsize=(x,y) argument