Average Bitcoin Returns Analysis

Objective of this anlysis is to determine the distribution of Bitcoin returns over 1, 2, and 4 year intervals.

import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import mplfinance as mpf
import numpy as np
import matplotlib as mpl

Open Raw Data

The data is taken from a tradingview.com export of Bitcoin pricing from the Bitstamp exchange. The data includes BTC pricing info (Open, High, Low, and Close) and volume (daily, Volume, and 20-day simple moving average, Volume MA) for over 3600 days since August 18, 2011.

d = pd.read_csv('average-bitcoin-returns-analysis/BITSTAMP_BTCUSD, 1D.csv')
print('Earliest date: {}\nLatest date: {}\n'.format(d['time'].min(),d['time'].max()))
d.info()
d.head()
Earliest date: 2011-08-18T00:00:00Z
Latest date: 2021-08-02T00:00:00Z

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3605 entries, 0 to 3604
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   time       3605 non-null   object 
 1   open       3605 non-null   float64
 2   high       3605 non-null   float64
 3   low        3605 non-null   float64
 4   close      3605 non-null   float64
 5   EMA        502 non-null    float64
 6   MA         502 non-null    float64
 7   Volume     3605 non-null   float64
 8   Volume MA  3586 non-null   float64
dtypes: float64(8), object(1)
memory usage: 253.6+ KB
time open high low close EMA MA Volume Volume MA
0 2011-08-18T00:00:00Z 10.90 10.90 10.90 10.90 NaN NaN 0.489908 NaN
1 2011-08-19T00:00:00Z 11.85 11.85 11.15 11.69 NaN NaN 1.926578 NaN
2 2011-08-20T00:00:00Z 11.70 11.70 11.70 11.70 NaN NaN 0.085470 NaN
3 2011-08-21T00:00:00Z 11.70 11.70 11.70 11.70 NaN NaN 0.085470 NaN
4 2011-08-22T00:00:00Z 11.70 11.70 11.70 11.70 NaN NaN 0.028632 NaN

Basic Cleansing

  • Convert “time” to “date”
  • Identify any missing days by merging into a complete list of dates from the start to end date
  • Subset to just the minimally necessary columns
d['date'] = pd.to_datetime(d['time'])
complete_date_list = pd.DataFrame(
    pd.date_range(d['date'].min(), 
                  d['date'].max()),
    columns=['date'])
print(d.shape)
d = (pd.merge(complete_date_list, d,
              on=['date'],
              how='left')
     .reset_index(drop=True))
print(d.shape)
(3605, 10)
(3638, 10)
d = (d[['date','close']]
     .reset_index(drop=True))
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3638 entries, 0 to 3637
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype              
---  ------  --------------  -----              
 0   date    3638 non-null   datetime64[ns, UTC]
 1   close   3605 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 57.0 KB

Exploration

  • A total of 33 days are missing pricing information. This is less than 1% of the total values in the dataset, and 3033 were in 2011.
d[d['close'].isna()]['date'].dt.year.value_counts()
2011    30
2015     3
Name: date, dtype: int64

Calculate Returns

Use a 364-day year. Subtract 1 so that a return of 0.1 means a 10% return, and a return of -0.1 means a 10% loss.

d['1-year prior close'] = d.shift(364)['close']
d['2-year prior close'] = d.shift(364 * 2)['close']
d['4-year prior close'] = d.shift(364 * 4)['close']
d['1-year return'] = (d['close']/d['1-year prior close'] - 1) * 100.
d['2-year return'] = (d['close']/d['2-year prior close'] - 1) * 100.
d['4-year return'] = (d['close']/d['4-year prior close'] - 1) * 100.
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3638 entries, 0 to 3637
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   date                3638 non-null   datetime64[ns, UTC]
 1   close               3605 non-null   float64            
 2   1-year prior close  3241 non-null   float64            
 3   2-year prior close  2877 non-null   float64            
 4   4-year prior close  2149 non-null   float64            
 5   1-year return       3238 non-null   float64            
 6   2-year return       2874 non-null   float64            
 7   4-year return       2149 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(7)
memory usage: 227.5 KB

Visualize

np.clip in the call to the histogram function, below, clips the extreme values to be in the smallest/largest bins.

d[~d['4-year prior close'].isna()].head()
date close 1-year prior close 2-year prior close 4-year prior close 1-year return 2-year return 4-year return
1456 2015-08-13 00:00:00+00:00 264.48 507.99 98.08 10.90 -47.935983 169.657423 2326.422018
1457 2015-08-14 00:00:00+00:00 265.40 500.25 98.25 11.69 -46.946527 170.127226 2170.316510
1458 2015-08-15 00:00:00+00:00 261.46 521.97 99.71 11.70 -49.908999 162.220439 2134.700855
1459 2015-08-16 00:00:00+00:00 256.11 496.86 99.30 11.70 -48.454293 157.915408 2088.974359
1460 2015-08-17 00:00:00+00:00 256.47 475.22 102.85 11.70 -46.031312 149.363150 2092.051282
d[~d['4-year prior close'].isna()].tail()
date close 1-year prior close 2-year prior close 4-year prior close 1-year return 2-year return 4-year return
3633 2021-07-29 00:00:00+00:00 40037.51 11112.12 10407.71 2793.37 260.304874 284.690869 1333.304933
3634 2021-07-30 00:00:00+00:00 42234.89 11356.74 10529.51 2855.00 271.892726 301.109738 1379.330648
3635 2021-07-31 00:00:00+00:00 41490.13 11817.49 10821.52 3263.62 251.090883 283.403903 1171.291694
3636 2021-08-01 00:00:00+00:00 39839.00 11071.66 10986.96 3222.75 259.828607 262.602576 1136.180281
3637 2021-08-02 00:00:00+00:00 39335.62 11236.39 11800.00 3387.55 250.073467 233.352712 1061.181975
which_period = '1-year return'
plt.figure(figsize=(16,9))
bins=np.arange(-100,1100,50)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
         bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 1-Year Holding Period
{} Days with available 1-year return data, Median Return = {}%'''
          .format(d[~d[which_period].isna()].shape[0],
                  int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
    height = r.get_height()
    plt.gca().text(r.get_x() + r.get_width() / 2,
                   height + 10,
                   str(int(height)) + '\n' + str(((height/d[~d[which_period].isna()].shape[0])*100.)
                                                 .round(1)) + '%',
                   ha='center',
                   va='center')
plt.axvline(d[which_period].median(),
            c='C1');

png

which_period = '2-year return'
plt.figure(figsize=(16,9))
bins=np.arange(-100,3200,100)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
         bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 2-Year Holding Period
{} Days with available 2-year return data, Median Return = {}%'''
          .format(d[~d[which_period].isna()].shape[0],
                  int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
    height = r.get_height()
    plt.gca().text(r.get_x() + r.get_width() / 2,
                   height + 15,
                   str(int(height)) + '\n' + str(((height/d[~d[which_period].isna()].shape[0])*100.)
                                                 .round(1)) + '%',
                   ha='center',
                   va='center')
plt.axvline(d[which_period].median(),
            c='C1');

png

which_period = '4-year return'
plt.figure(figsize=(16,9))
bins=np.arange(0,11000,500)
plt.hist(np.clip(d[which_period], bins[0], bins[-1]),
         bins=bins)
plt.grid()
for spine in plt.gca().spines.values():
    spine.set_visible(False)
plt.ylabel('Days')
plt.xlabel('Returns (%)')
plt.title('''Distribution of BTC Returns over a 4-Year Holding Period
{} Days with available 4-year return data, Median Return = {}%'''
          .format(d[~d[which_period].isna()].shape[0],
                  int(round(d[which_period].median(),0))))
rects = plt.gca().patches
for n, r in enumerate(rects):
    height = r.get_height()
    plt.gca().text(r.get_x() + r.get_width() / 2,
                   height + 15,
                   str(int(height)) + '\n' + str(((height/d[~d[which_period].isna()].shape[0])*100.)
                                                 .round(1)) + '%',
                   ha='center',
                   va='center')
plt.axvline(d[which_period].median(),
            c='C1');

png