Plotting Directly from Pandas
There are a few advantages to plotting directly from Pandas instead of using Matplotlib:
- Faster, fewer lines of code
- Easy incorporation of Panda’s built-in data manipulations like differencing and rolling means
- Pandas is often easier to use than Matplotlib. Chart types are parameters and subplot manuipulation is similarly straightforward, as examples.
Despite this, remember that panda’s plotting uses matplotlib under the hood. Pandas does provide a convenient and direct interface to connect the data to the visual.
Imports
Note that Matplotlib is not imported, below
import pandas as pd
import numpy as np
Generate Sample Data
df = pd.DataFrame(np.random.rand(10, 4), columns=['A','B','C','D'])
df
A | B | C | D | |
---|---|---|---|---|
0 | 0.338320 | 0.918672 | 0.545048 | 0.183447 |
1 | 0.094920 | 0.939942 | 0.728564 | 0.917459 |
2 | 0.519461 | 0.305833 | 0.979634 | 0.598790 |
3 | 0.006291 | 0.225203 | 0.962938 | 0.845943 |
4 | 0.720794 | 0.386856 | 0.546380 | 0.826647 |
5 | 0.141325 | 0.907937 | 0.803052 | 0.346366 |
6 | 0.165777 | 0.833604 | 0.211596 | 0.088015 |
7 | 0.445472 | 0.010162 | 0.321679 | 0.067631 |
8 | 0.268215 | 0.353037 | 0.917589 | 0.777908 |
9 | 0.876085 | 0.175703 | 0.044765 | 0.529395 |
Example Plots
df.plot.bar();
df.plot.bar(stacked=True);
df.plot.barh(stacked=True);
Use seaborn to change the color pallete.
import seaborn as sns
sns.set_palette('seismic')
df.plot.barh(stacked=True);
df.plot.area();
Parameters can be adjusted as they normally would in matplotlib or seaborn
df.plot.area(stacked=False,
alpha=0.25);
df.diff()
takes the difference between one row and the row before it, which is helpful when working with time series
df.diff().plot.box(vert=False,
color={'medians':'lightblue',
'boxes':'blue',
'caps':'darkblue'});
.rolling().mean()
takes the average rolling mean
df = (pd.DataFrame(np.random.rand(100, 1),
columns=['value'])
.reset_index())
df['smoothed'] = df['value'].rolling(3).mean()
df.head()
index | value | smoothed | |
---|---|---|---|
0 | 0 | 0.868018 | NaN |
1 | 1 | 0.044417 | NaN |
2 | 2 | 0.542460 | 0.484965 |
3 | 3 | 0.467757 | 0.351545 |
4 | 4 | 0.437014 | 0.482410 |
sns.set_palette('tab10')
df['value'].plot()
df['value'].rolling(10).mean().plot();
All Plots have a figsize=(x,y)
argument
df['value'].plot(figsize=(9,6))
df['value'].rolling(10).mean().plot(figsize=(9,6));
Other Examples
df = pd.DataFrame(np.random.rand(100, 4), columns=['A','B','C','D'])
df.plot.kde(); # distribution plot
df.plot.scatter(x='A',y='B', # scatterplot x and y
c='C', # color of data points
s=df['C']*200); # size of data points
df.plot.hexbin(x='C',y='D', # hexbin x and y
gridsize=18); # hexagon dimensions
subplots=True
results in subplots based on the columns
Other possible parameters for pie charts include:
labels=['label1','label2']
colors=[‘red’,’green’]
autopct=’%.2f’
fontsize=20
df = pd.DataFrame(np.random.rand(5, 2),
index=list("ABCDE"),
columns=list("XY"))
df
X | Y | |
---|---|---|
A | 0.439267 | 0.282618 |
B | 0.201808 | 0.522568 |
C | 0.143325 | 0.742457 |
D | 0.836806 | 0.108267 |
E | 0.613775 | 0.384959 |
df.plot.pie(subplots=True,
figsize=(9, 6));
Line plots are the Default with .plot
df = pd.DataFrame(np.random.rand(100, 4),
columns=['A','B','C','D'])
df.plot(subplots=True,
figsize=(16,8));
layout=(2,2)
results in pandas automatically formatting the subplots according to the layout
df.plot(subplots=True,
layout=(2, 2),
figsize=(16,8));