Disclaimer: Any information found on this page is not to be considered as
financial advice. You should do your own research before making any decisions.

Historical Bitcoin Returns Analysis

Objective of this analysis is to determine the distribution of Bitcoin returns over 1, 2, and 4 year holding intervals.

The outcome of the analysis is the set of three visuals, shown below. pngpngpng

Open Raw Data

Data from August 18, 2011 until August 2, 2021 was taken from a tradingview.com export of Bitcoin pricing from the Bitstamp exchange. Data after August 2, 2021 was taken from an API pull of pricing data from CoinAPI.

import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
d = pd.read_csv('historical-bitcoin-returns-analysis/btc_daily_closing_price.csv')
print('Earliest date: {}\nLatest date: {}\n'.format(d['time'].min(),d['time'].max()))
d.info()
d.head()
Earliest date: 2011-08-18T00:00:00Z
Latest date: 2022-09-24T00:00:00.0000000Z

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4023 entries, 0 to 4022
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   time    4023 non-null   object 
 1   close   4023 non-null   float64
dtypes: float64(1), object(1)
memory usage: 63.0+ KB

time close
0 2011-08-18T00:00:00Z 10.90
1 2011-08-19T00:00:00Z 11.69
2 2011-08-20T00:00:00Z 11.70
3 2011-08-21T00:00:00Z 11.70
4 2011-08-22T00:00:00Z 11.70

Basic Cleansing

  • Convert “time” to “date”
  • Identify any missing days by merging into a complete list of dates from the start to end date
  • Subset to just the minimally necessary columns
d['date'] = pd.to_datetime(d['time'])
complete_date_list = pd.DataFrame(
    pd.date_range(d['date'].min(), 
                  d['date'].max()),
    columns=['date'])
print(d.shape)
d = (pd.merge(complete_date_list, d,
              on=['date'],
              how='left')
     .reset_index(drop=True))
print(d.shape)
(4023, 3)
(4056, 3)
d = d.drop(columns=['time'])

Exploration

  • A total of 33 days are missing pricing information. This is less than 1% of the total values in the dataset, and 30/33 were in 2011. Clean by filling the values forward.
d[d['close'].isna()]['date'].dt.year.value_counts()
2011    30
2015     3
Name: date, dtype: int64
d['close'] = d['close'].fillna(method='ffill')
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4056 entries, 0 to 4055
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype              
---  ------  --------------  -----              
 0   date    4056 non-null   datetime64[ns, UTC]
 1   close   4056 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 63.5 KB

Calculate Returns

Use a 364-day year. Subtract 1 so that a return of 0.1 means a 10% return, and a return of -0.1 means a 10% loss.

d['1-year prior close'] = d.shift(364)['close']
d['2-year prior close'] = d.shift(364 * 2)['close']
d['4-year prior close'] = d.shift(364 * 4)['close']
d['1-year return'] = (d['close']/d['1-year prior close'] - 1) * 100.
d['2-year return'] = (d['close']/d['2-year prior close'] - 1) * 100.
d['4-year return'] = (d['close']/d['4-year prior close'] - 1) * 100.
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4056 entries, 0 to 4055
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   date                4056 non-null   datetime64[ns, UTC]
 1   close               4056 non-null   float64            
 2   1-year prior close  3692 non-null   float64            
 3   2-year prior close  3328 non-null   float64            
 4   4-year prior close  2600 non-null   float64            
 5   1-year return       3692 non-null   float64            
 6   2-year return       3328 non-null   float64            
 7   4-year return       2600 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(7)
memory usage: 253.6 KB

Visualize

np.clip in the call to the histogram function, below, clips the extreme values to be in the smallest/largest bins.

d[~d['4-year prior close'].isna()].head()

date close 1-year prior close 2-year prior close 4-year prior close 1-year return 2-year return 4-year return
1456 2015-08-13 00:00:00+00:00 264.48 507.99 98.08 10.90 -47.935983 169.657423 2326.422018
1457 2015-08-14 00:00:00+00:00 265.40 500.25 98.25 11.69 -46.946527 170.127226 2170.316510
1458 2015-08-15 00:00:00+00:00 261.46 521.97 99.71 11.70 -49.908999 162.220439 2134.700855
1459 2015-08-16 00:00:00+00:00 256.11 496.86 99.30 11.70 -48.454293 157.915408 2088.974359
1460 2015-08-17 00:00:00+00:00 256.47 475.22 102.85 11.70 -46.031312 149.363150 2092.051282
d[~d['4-year prior close'].isna()].tail()

date close 1-year prior close 2-year prior close 4-year prior close 1-year return 2-year return 4-year return
4051 2022-09-20 00:00:00+00:00 19539.040269 42820.865901 10534.31 6428.99 -54.370282 85.480020 203.920838
4052 2022-09-21 00:00:00+00:00 18874.158632 40720.098608 10233.42 6461.51 -53.649035 84.436470 192.101361
4053 2022-09-22 00:00:00+00:00 18500.246285 43568.796718 10744.02 6681.62 -57.537854 72.191101 176.882646
4054 2022-09-23 00:00:00+00:00 19402.480290 44892.151371 10689.81 6610.76 -56.779794 81.504445 193.498483
4055 2022-09-24 00:00:00+00:00 19400.584671 42836.736645 10729.09 6579.38 -54.710405 80.822275 194.869496
which_period = '1-year return'
plt.figure(figsize=(16,9), facecolor='white')
bins=np.arange(-100,1100,50)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
         bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 1-Year Holding Period
{} Days with available 1-year return data, Median Return = {}%'''
          .format(d[~d[which_period].isna()].shape[0],
                  int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
    height = r.get_height()
    plt.gca().text(r.get_x() + r.get_width() / 2,
                   height + 10,
                   (str(int(height)) + '\n' 
                    + str(((height/d[~d[which_period].isna()].shape[0])*100.)
                          .round(1)) + '%'),
                   ha='center',
                   va='center')
plt.axvline(d[which_period].median(),
            c='C1');

png

which_period = '2-year return'
plt.figure(figsize=(16,9), facecolor='white')
bins=np.arange(-100,3200,100)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
         bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 2-Year Holding Period
{} Days with available 2-year return data, Median Return = {}%'''
          .format(d[~d[which_period].isna()].shape[0],
                  int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
    height = r.get_height()
    plt.gca().text(r.get_x() + r.get_width() / 2,
                   height + 15,
                   (str(int(height)) + '\n' 
                    + str(((height/d[~d[which_period].isna()].shape[0])*100.)
                          .round(1)) + '%'),
                   ha='center',
                   va='center')
plt.axvline(d[which_period].median(),
            c='C1');

png

which_period = '4-year return'
plt.figure(figsize=(16,9), facecolor='white')
bins=np.arange(0,11000,500)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
         bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 4-Year Holding Period
{} Days with available 4-year return data, Median Return = {}%'''
          .format(d[~d[which_period].isna()].shape[0],
                  int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
    height = r.get_height()
    plt.gca().text(r.get_x() + r.get_width() / 2,
                   height + 15,
                   (str(int(height)) + '\n' 
                    + str(((height/d[~d[which_period].isna()].shape[0])*100.)
                          .round(1)) + '%'),
                   ha='center',
                   va='center')
plt.axvline(d[which_period].median(),
            c='C1');

png