Average Bitcoin Returns Analysis
Objective of this anlysis is to determine the distribution of Bitcoin returns over 1, 2, and 4 year intervals.
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf
import numpy as np
import matplotlib as mpl
Open Raw Data
The data is taken from a tradingview.com export of Bitcoin pricing from the Bitstamp exchange. The data includes BTC pricing info (Open
, High
, Low
, and Close
) and volume (daily, Volume
, and 20-day simple moving average, Volume MA
) for over 3600 days since August 18, 2011.
d = pd.read_csv('average-bitcoin-returns-analysis/BITSTAMP_BTCUSD, 1D.csv')
print('Earliest date: {}\nLatest date: {}\n'.format(d['time'].min(),d['time'].max()))
d.info()
d.head()
Earliest date: 2011-08-18T00:00:00Z
Latest date: 2021-08-02T00:00:00Z
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3605 entries, 0 to 3604
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 3605 non-null object
1 open 3605 non-null float64
2 high 3605 non-null float64
3 low 3605 non-null float64
4 close 3605 non-null float64
5 EMA 502 non-null float64
6 MA 502 non-null float64
7 Volume 3605 non-null float64
8 Volume MA 3586 non-null float64
dtypes: float64(8), object(1)
memory usage: 253.6+ KB
time | open | high | low | close | EMA | MA | Volume | Volume MA | |
---|---|---|---|---|---|---|---|---|---|
0 | 2011-08-18T00:00:00Z | 10.90 | 10.90 | 10.90 | 10.90 | NaN | NaN | 0.489908 | NaN |
1 | 2011-08-19T00:00:00Z | 11.85 | 11.85 | 11.15 | 11.69 | NaN | NaN | 1.926578 | NaN |
2 | 2011-08-20T00:00:00Z | 11.70 | 11.70 | 11.70 | 11.70 | NaN | NaN | 0.085470 | NaN |
3 | 2011-08-21T00:00:00Z | 11.70 | 11.70 | 11.70 | 11.70 | NaN | NaN | 0.085470 | NaN |
4 | 2011-08-22T00:00:00Z | 11.70 | 11.70 | 11.70 | 11.70 | NaN | NaN | 0.028632 | NaN |
Basic Cleansing
- Convert “time” to “date”
- Identify any missing days by merging into a complete list of dates from the start to end date
- Subset to just the minimally necessary columns
d['date'] = pd.to_datetime(d['time'])
complete_date_list = pd.DataFrame(
pd.date_range(d['date'].min(),
d['date'].max()),
columns=['date'])
print(d.shape)
d = (pd.merge(complete_date_list, d,
on=['date'],
how='left')
.reset_index(drop=True))
print(d.shape)
(3605, 10)
(3638, 10)
d = (d[['date','close']]
.reset_index(drop=True))
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3638 entries, 0 to 3637
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 3638 non-null datetime64[ns, UTC]
1 close 3605 non-null float64
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 57.0 KB
Exploration
- A total of 33 days are missing pricing information. This is less than 1% of the total values in the dataset, and 30⁄33 were in 2011.
d[d['close'].isna()]['date'].dt.year.value_counts()
2011 30
2015 3
Name: date, dtype: int64
Calculate Returns
Use a 364-day year. Subtract 1 so that a return of 0.1 means a 10% return, and a return of -0.1 means a 10% loss.
d['1-year prior close'] = d.shift(364)['close']
d['2-year prior close'] = d.shift(364 * 2)['close']
d['4-year prior close'] = d.shift(364 * 4)['close']
d['1-year return'] = (d['close']/d['1-year prior close'] - 1) * 100.
d['2-year return'] = (d['close']/d['2-year prior close'] - 1) * 100.
d['4-year return'] = (d['close']/d['4-year prior close'] - 1) * 100.
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3638 entries, 0 to 3637
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 3638 non-null datetime64[ns, UTC]
1 close 3605 non-null float64
2 1-year prior close 3241 non-null float64
3 2-year prior close 2877 non-null float64
4 4-year prior close 2149 non-null float64
5 1-year return 3238 non-null float64
6 2-year return 2874 non-null float64
7 4-year return 2149 non-null float64
dtypes: datetime64[ns, UTC](1), float64(7)
memory usage: 227.5 KB
Visualize
np.clip
in the call to the histogram function, below, clips the extreme values to be in the smallest/largest bins.
d[~d['4-year prior close'].isna()].head()
date | close | 1-year prior close | 2-year prior close | 4-year prior close | 1-year return | 2-year return | 4-year return | |
---|---|---|---|---|---|---|---|---|
1456 | 2015-08-13 00:00:00+00:00 | 264.48 | 507.99 | 98.08 | 10.90 | -47.935983 | 169.657423 | 2326.422018 |
1457 | 2015-08-14 00:00:00+00:00 | 265.40 | 500.25 | 98.25 | 11.69 | -46.946527 | 170.127226 | 2170.316510 |
1458 | 2015-08-15 00:00:00+00:00 | 261.46 | 521.97 | 99.71 | 11.70 | -49.908999 | 162.220439 | 2134.700855 |
1459 | 2015-08-16 00:00:00+00:00 | 256.11 | 496.86 | 99.30 | 11.70 | -48.454293 | 157.915408 | 2088.974359 |
1460 | 2015-08-17 00:00:00+00:00 | 256.47 | 475.22 | 102.85 | 11.70 | -46.031312 | 149.363150 | 2092.051282 |
d[~d['4-year prior close'].isna()].tail()
date | close | 1-year prior close | 2-year prior close | 4-year prior close | 1-year return | 2-year return | 4-year return | |
---|---|---|---|---|---|---|---|---|
3633 | 2021-07-29 00:00:00+00:00 | 40037.51 | 11112.12 | 10407.71 | 2793.37 | 260.304874 | 284.690869 | 1333.304933 |
3634 | 2021-07-30 00:00:00+00:00 | 42234.89 | 11356.74 | 10529.51 | 2855.00 | 271.892726 | 301.109738 | 1379.330648 |
3635 | 2021-07-31 00:00:00+00:00 | 41490.13 | 11817.49 | 10821.52 | 3263.62 | 251.090883 | 283.403903 | 1171.291694 |
3636 | 2021-08-01 00:00:00+00:00 | 39839.00 | 11071.66 | 10986.96 | 3222.75 | 259.828607 | 262.602576 | 1136.180281 |
3637 | 2021-08-02 00:00:00+00:00 | 39335.62 | 11236.39 | 11800.00 | 3387.55 | 250.073467 | 233.352712 | 1061.181975 |
which_period = '1-year return'
plt.figure(figsize=(16,9))
bins=np.arange(-100,1100,50)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 1-Year Holding Period
{} Days with available 1-year return data, Median Return = {}%'''
.format(d[~d[which_period].isna()].shape[0],
int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
height = r.get_height()
plt.gca().text(r.get_x() + r.get_width() / 2,
height + 10,
str(int(height)) + '\n' + str(((height/d[~d[which_period].isna()].shape[0])*100.)
.round(1)) + '%',
ha='center',
va='center')
plt.axvline(d[which_period].median(),
c='C1');
which_period = '2-year return'
plt.figure(figsize=(16,9))
bins=np.arange(-100,3200,100)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 2-Year Holding Period
{} Days with available 2-year return data, Median Return = {}%'''
.format(d[~d[which_period].isna()].shape[0],
int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
height = r.get_height()
plt.gca().text(r.get_x() + r.get_width() / 2,
height + 15,
str(int(height)) + '\n' + str(((height/d[~d[which_period].isna()].shape[0])*100.)
.round(1)) + '%',
ha='center',
va='center')
plt.axvline(d[which_period].median(),
c='C1');
which_period = '4-year return'
plt.figure(figsize=(16,9))
bins=np.arange(0,11000,500)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 4-Year Holding Period
{} Days with available 4-year return data, Median Return = {}%'''
.format(d[~d[which_period].isna()].shape[0],
int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
height = r.get_height()
plt.gca().text(r.get_x() + r.get_width() / 2,
height + 15,
str(int(height)) + '\n' + str(((height/d[~d[which_period].isna()].shape[0])*100.)
.round(1)) + '%',
ha='center',
va='center')
plt.axvline(d[which_period].median(),
c='C1');