# Processing GPS Data

The GPS data manipulation steps below require the GPX Converter Python package and is inspired by this blog post and this GitHub repo by Jarrett Retz.

The GPS data itself is from my backpacking trip in Big Sur from 8/20/21 to 8/22/21. It was tracked using the Gaia GPS app on my iPhone 12 Pro and then downloaded as GPX files from gaiagps.com.

import pandas as pd
from gpx_converter import Converter


6 different “tracks” or hikes over the 3 day, 2 night trip. Each file contains a track.

files = [
'../backpacking-trips/big-sur-2021-08-20-thru-22/spruce-camp-to-estrella-camp-82021-112028am.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/estrella-camp-to-lions-den-camp.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/lions-den-camp-to-cruikshank-camp.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/to-north-buckeye-camp.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/track-82221-81723am.gpx'
]


I also use a Python module called Haversine to calculate the distance between two geolocations (geolocation = set of latitudes and longitudes). Haversine distance is defined as the angular distance between two locations on the Earth’s surface. I discovered it from this article on Towards Data Science.

from haversine import haversine, Unit


Example usage:

loc1=(35.815768,-121.358696)
loc2=(35.815689,-121.358611)
print('{:2.1f} feet'.format(haversine(loc1,loc2,unit=Unit.FEET)))

38.2 feet


## Examine raw data from GPX file

raw = (Converter(input_file=files[0])
.gpx_to_dataframe())
raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 510 entries, 0 to 509
Data columns (total 4 columns):
#   Column     Non-Null Count  Dtype
---  ------     --------------  -----
0   time       510 non-null    datetime64[ns, SimpleTZ("Z")]
1   latitude   510 non-null    float64
2   longitude  510 non-null    float64
3   altitude   510 non-null    float64
dtypes: datetime64[ns, SimpleTZ("Z")](1), float64(3)
memory usage: 16.1 KB


time latitude longitude altitude
0 2021-08-20 16:12:31+00:00 35.815768 -121.358696 94.4
1 2021-08-20 16:12:41+00:00 35.815689 -121.358611 93.0
2 2021-08-20 16:12:47+00:00 35.815626 -121.358562 94.8
3 2021-08-20 16:12:55+00:00 35.815519 -121.358497 92.0
4 2021-08-20 16:13:08+00:00 35.815450 -121.358477 88.0
raw['time_delta'] = raw['time'].shift(-1)-raw['time']
raw['time_delta_seconds'] = ((raw['time_delta']
.fillna(pd.Timedelta(seconds=0))
.astype(int)/1000000000)
.astype(int))
raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 510 entries, 0 to 509
Data columns (total 6 columns):
#   Column              Non-Null Count  Dtype
---  ------              --------------  -----
0   time                510 non-null    datetime64[ns, SimpleTZ("Z")]
1   latitude            510 non-null    float64
2   longitude           510 non-null    float64
3   altitude            510 non-null    float64
4   time_delta          509 non-null    timedelta64[ns]
5   time_delta_seconds  510 non-null    int64
dtypes: datetime64[ns, SimpleTZ("Z")](1), float64(3), int64(1), timedelta64[ns](1)
memory usage: 24.0 KB


time latitude longitude altitude time_delta time_delta_seconds
0 2021-08-20 16:12:31+00:00 35.815768 -121.358696 94.4 0 days 00:00:10 10
1 2021-08-20 16:12:41+00:00 35.815689 -121.358611 93.0 0 days 00:00:06 6
2 2021-08-20 16:12:47+00:00 35.815626 -121.358562 94.8 0 days 00:00:08 8
3 2021-08-20 16:12:55+00:00 35.815519 -121.358497 92.0 0 days 00:00:13 13
4 2021-08-20 16:13:08+00:00 35.815450 -121.358477 88.0 0 days 00:00:17 17
print('Average time delta between GPS meansurements: {} seconds'.format(raw['time_delta_seconds'].mean()))
raw['time_delta_seconds'].hist(bins=raw['time_delta_seconds'].max());

Average time delta between GPS meansurements: 9.4 seconds


## Develop Data Manipulation Steps

test = (Converter(input_file=files[0])
.gpx_to_dataframe())

# Convert GMT to PST and format
test['time'] = test['time'].apply(lambda x: x.tz_convert('US/Pacific'))
test['seconds_delta'] = (((test['time'].shift(-1)-test['time'])
.fillna(pd.Timedelta(seconds=0))
.astype(int)/1000000000)
.astype(int))
test['human_date'] = test['time'].dt.strftime('%Y-%m-%d')
test['human_time'] = test['time'].dt.strftime('%I:%M:%S %p')


time human_date human_time seconds_delta
0 2021-08-20 09:12:31-07:00 2021-08-20 09:12:31 AM 10
1 2021-08-20 09:12:41-07:00 2021-08-20 09:12:41 AM 6
2 2021-08-20 09:12:47-07:00 2021-08-20 09:12:47 AM 8
3 2021-08-20 09:12:55-07:00 2021-08-20 09:12:55 AM 13
4 2021-08-20 09:13:08-07:00 2021-08-20 09:13:08 AM 17
# Convert altitude from meters to feet
test['altitude_feet'] = round(test['altitude'] * 3.280839895).astype('int')


time altitude altitude_feet
0 2021-08-20 09:12:31-07:00 94.4 310
1 2021-08-20 09:12:41-07:00 93.0 305
2 2021-08-20 09:12:47-07:00 94.8 311
3 2021-08-20 09:12:55-07:00 92.0 302
4 2021-08-20 09:13:08-07:00 88.0 289
# Calculate speed and altitude change from one measurement to the next
for i in range(test.shape[0]-1):
start = test.at[i,   'latitude'], test.at[i,   'longitude']
end =   test.at[i+1, 'latitude'], test.at[i+1, 'longitude']
distance = round(haversine(start,
end,
unit=Unit.FEET),1)
test.at[i, 'distance_feet'] = distance

altitude_change = test.at[i+1, 'altitude_feet'] - test.at[i, 'altitude_feet']
test.at[i, 'altitude_change'] = altitude_change

test['speed_mph'] = ((test['distance_feet'] / test['seconds_delta']) * (3600/5280)).round(1)


time distance_feet seconds_delta speed_mph altitude_feet altitude_change
0 2021-08-20 09:12:31-07:00 38.2 10 2.6 310 -5.0
1 2021-08-20 09:12:41-07:00 27.2 6 3.1 305 6.0
2 2021-08-20 09:12:47-07:00 43.5 8 3.7 311 -9.0
3 2021-08-20 09:12:55-07:00 25.9 13 1.4 302 -13.0
4 2021-08-20 09:13:08-07:00 20.4 17 0.8 289 6.0
print('Average speed: {} mph'.format(round(test['speed_mph'].mean(),2)))
test['speed_mph'].hist(bins=20);

Average speed: 2.09 mph


## Define Function to Quickly Perform this Manipulation

def transform_gpx_data(filename):
df = (Converter(input_file=filename)
.gpx_to_dataframe())
df['time'] = df['time'].apply(lambda x: x.tz_convert('US/Pacific'))
df['seconds_delta'] = (((df['time'].shift(-1)-df['time'])
.fillna(pd.Timedelta(seconds=0))
.astype(int)/1000000000)
.astype(int))
df['human_date'] = df['time'].dt.strftime('%Y-%m-%d')
df['human_time'] = df['time'].dt.strftime('%I:%M:%S %p')

df['altitude_feet'] = round(df['altitude'] * 3.280839895).astype('int')

for i in range(df.shape[0]-1):
start = df.at[i,   'latitude'], df.at[i,   'longitude']
end =   df.at[i+1, 'latitude'], df.at[i+1, 'longitude']
distance = round(haversine(start,
end,
unit=Unit.FEET),1)
df.at[i, 'distance_feet'] = distance

altitude_change = df.at[i+1, 'altitude_feet'] - df.at[i, 'altitude_feet']
df.at[i, 'altitude_change'] = altitude_change

df['speed_mph'] = ((df['distance_feet'] / df['seconds_delta']) * (3600/5280)).round(1)

df = df[['time', 'human_date', 'human_time', 'seconds_delta',
'latitude', 'longitude', 'altitude', 'altitude_feet',
'distance_feet', 'speed_mph',
'altitude_feet','altitude_change']].copy()
return df

d = transform_gpx_data(files[0])
pd.DataFrame(d.iloc[:3].T)


0 1 2
time 2021-08-20 09:12:31-07:00 2021-08-20 09:12:41-07:00 2021-08-20 09:12:47-07:00
human_date 2021-08-20 2021-08-20 2021-08-20
human_time 09:12:31 AM 09:12:41 AM 09:12:47 AM
seconds_delta 10 6 8
latitude 35.815768 35.815689 35.815626
longitude -121.358696 -121.358611 -121.358562
altitude 94.4 93.0 94.8
altitude_feet 310 305 311
distance_feet 38.2 27.2 43.5
speed_mph 2.6 3.1 3.7
altitude_feet 310 305 311
altitude_change -5.0 6.0 -9.0
d.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 510 entries, 0 to 509
Data columns (total 12 columns):
#   Column           Non-Null Count  Dtype
---  ------           --------------  -----
0   time             510 non-null    datetime64[ns, US/Pacific]
1   human_date       510 non-null    object
2   human_time       510 non-null    object
3   seconds_delta    510 non-null    int64
4   latitude         510 non-null    float64
5   longitude        510 non-null    float64
6   altitude         510 non-null    float64
7   altitude_feet    510 non-null    int64
8   distance_feet    509 non-null    float64
9   speed_mph        509 non-null    float64
10  altitude_feet    510 non-null    int64
11  altitude_change  509 non-null    float64
dtypes: datetime64[ns, US/Pacific](1), float64(6), int64(3), object(2)
memory usage: 47.9+ KB


### Call Function Over All 6 Tracks

And summarize.

for file in files:
print(file)
d = transform_gpx_data(file)
string = '''{}, {} - {}
Start: ({}, {}),   End: ({}, {})
{} GPS datapoints
{} duration
{:3.2f} miles @ {:3.1f} avg MPH
{:3.0f}/{:3.0f} feet total/net elevation change'''.format(d['human_date'].min(),
d.iloc[0]['human_time'],
d.iloc[d.shape[0]-1]['human_time'],
d.iloc[0]['latitude'],
d.iloc[0]['longitude'],
d.iloc[d.shape[0]-1]['latitude'],
d.iloc[d.shape[0]-1]['longitude'],
d.shape[0],
str(d['time'].max()-d['time'].min())[7:12],
d['distance_feet'].sum()/5280,
d['speed_mph'].mean(),
d['altitude_change'].abs().sum(),
d['altitude_change'].sum())
print(string + '\n')

../backpacking-trips/big-sur-2021-08-20-thru-22/salmon-creek-trailhead-to-spruce-camp-82021-91230am.gpx
2021-08-20, 09:12:31 AM - 10:32:25 AM
Start: (35.815768, -121.358696),   End: (35.82598, -121.344968)
510 GPS datapoints
01:19 duration
2.36 miles @ 2.1 avg MPH
2931/637 feet total/net elevation change

../backpacking-trips/big-sur-2021-08-20-thru-22/spruce-camp-to-estrella-camp-82021-112028am.gpx
2021-08-20, 11:20:29 AM - 12:04:52 PM
Start: (35.826085, -121.344949),   End: (35.836589, -121.338499)
311 GPS datapoints
00:44 duration
1.36 miles @ 2.1 avg MPH
1827/593 feet total/net elevation change

../backpacking-trips/big-sur-2021-08-20-thru-22/estrella-camp-to-lions-den-camp.gpx
2021-08-20, 01:46:40 PM - 06:06:54 PM
Start: (35.836574, -121.338406),   End: (35.858015, -121.338537)
594 GPS datapoints
04:20 duration
3.51 miles @ 2.0 avg MPH
4191/1457 feet total/net elevation change

../backpacking-trips/big-sur-2021-08-20-thru-22/lions-den-camp-to-cruikshank-camp.gpx
2021-08-21, 08:53:39 AM - 11:54:47 AM
Start: (35.858349, -121.338616),   End: (35.856667, -121.38415)
921 GPS datapoints
03:01 duration
5.03 miles @ 2.1 avg MPH
5722/-1690 feet total/net elevation change

../backpacking-trips/big-sur-2021-08-20-thru-22/to-north-buckeye-camp.gpx
2021-08-21, 01:04:18 PM - 02:52:32 PM
Start: (35.85661, -121.384024),   End: (35.840788, -121.378364)
523 GPS datapoints
01:48 duration
2.62 miles @ 1.9 avg MPH
3401/641 feet total/net elevation change

../backpacking-trips/big-sur-2021-08-20-thru-22/track-82221-81723am.gpx
2021-08-22, 08:17:24 AM - 10:26:54 AM
Start: (35.840822, -121.378279),   End: (35.81578, -121.358846)
725 GPS datapoints
02:09 duration
4.04 miles @ 2.2 avg MPH
4348/-1794 feet total/net elevation change