Processing GPS Data
The GPS data manipulation steps below require the GPX Converter Python package and is inspired by this blog post and this GitHub repo by Jarrett Retz.
The GPS data itself is from my backpacking trip in Big Sur from 8/20/21 to 8/22/21. It was tracked using the Gaia GPS app on my iPhone 12 Pro and then downloaded as GPX files from gaiagps.com.
import pandas as pd
from gpx_converter import Converter
6 different “tracks” or hikes over the 3 day, 2 night trip. Each file contains a track.
files = [
'../backpacking-trips/big-sur-2021-08-20-thru-22/salmon-creek-trailhead-to-spruce-camp-82021-91230am.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/spruce-camp-to-estrella-camp-82021-112028am.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/estrella-camp-to-lions-den-camp.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/lions-den-camp-to-cruikshank-camp.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/to-north-buckeye-camp.gpx',
'../backpacking-trips/big-sur-2021-08-20-thru-22/track-82221-81723am.gpx'
]
I also use a Python module called Haversine to calculate the distance between two geolocations (geolocation = set of latitudes and longitudes). Haversine distance is defined as the angular distance between two locations on the Earth’s surface. I discovered it from this article on Towards Data Science.
from haversine import haversine, Unit
Example usage:
loc1=(35.815768,-121.358696)
loc2=(35.815689,-121.358611)
print('{:2.1f} feet'.format(haversine(loc1,loc2,unit=Unit.FEET)))
38.2 feet
Examine raw data from GPX file
raw = (Converter(input_file=files[0])
.gpx_to_dataframe())
raw.info()
raw.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 510 entries, 0 to 509
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 510 non-null datetime64[ns, SimpleTZ("Z")]
1 latitude 510 non-null float64
2 longitude 510 non-null float64
3 altitude 510 non-null float64
dtypes: datetime64[ns, SimpleTZ("Z")](1), float64(3)
memory usage: 16.1 KB
time | latitude | longitude | altitude | |
---|---|---|---|---|
0 | 2021-08-20 16:12:31+00:00 | 35.815768 | -121.358696 | 94.4 |
1 | 2021-08-20 16:12:41+00:00 | 35.815689 | -121.358611 | 93.0 |
2 | 2021-08-20 16:12:47+00:00 | 35.815626 | -121.358562 | 94.8 |
3 | 2021-08-20 16:12:55+00:00 | 35.815519 | -121.358497 | 92.0 |
4 | 2021-08-20 16:13:08+00:00 | 35.815450 | -121.358477 | 88.0 |
raw['time_delta'] = raw['time'].shift(-1)-raw['time']
raw['time_delta_seconds'] = ((raw['time_delta']
.fillna(pd.Timedelta(seconds=0))
.astype(int)/1000000000)
.astype(int))
raw.info()
raw.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 510 entries, 0 to 509
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 510 non-null datetime64[ns, SimpleTZ("Z")]
1 latitude 510 non-null float64
2 longitude 510 non-null float64
3 altitude 510 non-null float64
4 time_delta 509 non-null timedelta64[ns]
5 time_delta_seconds 510 non-null int64
dtypes: datetime64[ns, SimpleTZ("Z")](1), float64(3), int64(1), timedelta64[ns](1)
memory usage: 24.0 KB
time | latitude | longitude | altitude | time_delta | time_delta_seconds | |
---|---|---|---|---|---|---|
0 | 2021-08-20 16:12:31+00:00 | 35.815768 | -121.358696 | 94.4 | 0 days 00:00:10 | 10 |
1 | 2021-08-20 16:12:41+00:00 | 35.815689 | -121.358611 | 93.0 | 0 days 00:00:06 | 6 |
2 | 2021-08-20 16:12:47+00:00 | 35.815626 | -121.358562 | 94.8 | 0 days 00:00:08 | 8 |
3 | 2021-08-20 16:12:55+00:00 | 35.815519 | -121.358497 | 92.0 | 0 days 00:00:13 | 13 |
4 | 2021-08-20 16:13:08+00:00 | 35.815450 | -121.358477 | 88.0 | 0 days 00:00:17 | 17 |
print('Average time delta between GPS meansurements: {} seconds'.format(raw['time_delta_seconds'].mean()))
raw['time_delta_seconds'].hist(bins=raw['time_delta_seconds'].max());
Average time delta between GPS meansurements: 9.4 seconds
Develop Data Manipulation Steps
test = (Converter(input_file=files[0])
.gpx_to_dataframe())
# Convert GMT to PST and format
test['time'] = test['time'].apply(lambda x: x.tz_convert('US/Pacific'))
test['seconds_delta'] = (((test['time'].shift(-1)-test['time'])
.fillna(pd.Timedelta(seconds=0))
.astype(int)/1000000000)
.astype(int))
test['human_date'] = test['time'].dt.strftime('%Y-%m-%d')
test['human_time'] = test['time'].dt.strftime('%I:%M:%S %p')
test[['time','human_date','human_time','seconds_delta']].head()
time | human_date | human_time | seconds_delta | |
---|---|---|---|---|
0 | 2021-08-20 09:12:31-07:00 | 2021-08-20 | 09:12:31 AM | 10 |
1 | 2021-08-20 09:12:41-07:00 | 2021-08-20 | 09:12:41 AM | 6 |
2 | 2021-08-20 09:12:47-07:00 | 2021-08-20 | 09:12:47 AM | 8 |
3 | 2021-08-20 09:12:55-07:00 | 2021-08-20 | 09:12:55 AM | 13 |
4 | 2021-08-20 09:13:08-07:00 | 2021-08-20 | 09:13:08 AM | 17 |
# Convert altitude from meters to feet
test['altitude_feet'] = round(test['altitude'] * 3.280839895).astype('int')
test[['time','altitude','altitude_feet']].head()
time | altitude | altitude_feet | |
---|---|---|---|
0 | 2021-08-20 09:12:31-07:00 | 94.4 | 310 |
1 | 2021-08-20 09:12:41-07:00 | 93.0 | 305 |
2 | 2021-08-20 09:12:47-07:00 | 94.8 | 311 |
3 | 2021-08-20 09:12:55-07:00 | 92.0 | 302 |
4 | 2021-08-20 09:13:08-07:00 | 88.0 | 289 |
# Calculate speed and altitude change from one measurement to the next
for i in range(test.shape[0]-1):
start = test.at[i, 'latitude'], test.at[i, 'longitude']
end = test.at[i+1, 'latitude'], test.at[i+1, 'longitude']
distance = round(haversine(start,
end,
unit=Unit.FEET),1)
test.at[i, 'distance_feet'] = distance
altitude_change = test.at[i+1, 'altitude_feet'] - test.at[i, 'altitude_feet']
test.at[i, 'altitude_change'] = altitude_change
test['speed_mph'] = ((test['distance_feet'] / test['seconds_delta']) * (3600/5280)).round(1)
test[['time','distance_feet','seconds_delta','speed_mph','altitude_feet','altitude_change']].head()
time | distance_feet | seconds_delta | speed_mph | altitude_feet | altitude_change | |
---|---|---|---|---|---|---|
0 | 2021-08-20 09:12:31-07:00 | 38.2 | 10 | 2.6 | 310 | -5.0 |
1 | 2021-08-20 09:12:41-07:00 | 27.2 | 6 | 3.1 | 305 | 6.0 |
2 | 2021-08-20 09:12:47-07:00 | 43.5 | 8 | 3.7 | 311 | -9.0 |
3 | 2021-08-20 09:12:55-07:00 | 25.9 | 13 | 1.4 | 302 | -13.0 |
4 | 2021-08-20 09:13:08-07:00 | 20.4 | 17 | 0.8 | 289 | 6.0 |
print('Average speed: {} mph'.format(round(test['speed_mph'].mean(),2)))
test['speed_mph'].hist(bins=20);
Average speed: 2.09 mph
Define Function to Quickly Perform this Manipulation
def transform_gpx_data(filename):
df = (Converter(input_file=filename)
.gpx_to_dataframe())
df['time'] = df['time'].apply(lambda x: x.tz_convert('US/Pacific'))
df['seconds_delta'] = (((df['time'].shift(-1)-df['time'])
.fillna(pd.Timedelta(seconds=0))
.astype(int)/1000000000)
.astype(int))
df['human_date'] = df['time'].dt.strftime('%Y-%m-%d')
df['human_time'] = df['time'].dt.strftime('%I:%M:%S %p')
df['altitude_feet'] = round(df['altitude'] * 3.280839895).astype('int')
for i in range(df.shape[0]-1):
start = df.at[i, 'latitude'], df.at[i, 'longitude']
end = df.at[i+1, 'latitude'], df.at[i+1, 'longitude']
distance = round(haversine(start,
end,
unit=Unit.FEET),1)
df.at[i, 'distance_feet'] = distance
altitude_change = df.at[i+1, 'altitude_feet'] - df.at[i, 'altitude_feet']
df.at[i, 'altitude_change'] = altitude_change
df['speed_mph'] = ((df['distance_feet'] / df['seconds_delta']) * (3600/5280)).round(1)
df = df[['time', 'human_date', 'human_time', 'seconds_delta',
'latitude', 'longitude', 'altitude', 'altitude_feet',
'distance_feet', 'speed_mph',
'altitude_feet','altitude_change']].copy()
return df
d = transform_gpx_data(files[0])
pd.DataFrame(d.iloc[:3].T)
0 | 1 | 2 | |
---|---|---|---|
time | 2021-08-20 09:12:31-07:00 | 2021-08-20 09:12:41-07:00 | 2021-08-20 09:12:47-07:00 |
human_date | 2021-08-20 | 2021-08-20 | 2021-08-20 |
human_time | 09:12:31 AM | 09:12:41 AM | 09:12:47 AM |
seconds_delta | 10 | 6 | 8 |
latitude | 35.815768 | 35.815689 | 35.815626 |
longitude | -121.358696 | -121.358611 | -121.358562 |
altitude | 94.4 | 93.0 | 94.8 |
altitude_feet | 310 | 305 | 311 |
distance_feet | 38.2 | 27.2 | 43.5 |
speed_mph | 2.6 | 3.1 | 3.7 |
altitude_feet | 310 | 305 | 311 |
altitude_change | -5.0 | 6.0 | -9.0 |
d.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 510 entries, 0 to 509
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 510 non-null datetime64[ns, US/Pacific]
1 human_date 510 non-null object
2 human_time 510 non-null object
3 seconds_delta 510 non-null int64
4 latitude 510 non-null float64
5 longitude 510 non-null float64
6 altitude 510 non-null float64
7 altitude_feet 510 non-null int64
8 distance_feet 509 non-null float64
9 speed_mph 509 non-null float64
10 altitude_feet 510 non-null int64
11 altitude_change 509 non-null float64
dtypes: datetime64[ns, US/Pacific](1), float64(6), int64(3), object(2)
memory usage: 47.9+ KB
Call Function Over All 6 Tracks
And summarize.
for file in files:
print(file)
d = transform_gpx_data(file)
string = '''{}, {} - {}
Start: ({}, {}), End: ({}, {})
{} GPS datapoints
{} duration
{:3.2f} miles @ {:3.1f} avg MPH
{:3.0f}/{:3.0f} feet total/net elevation change'''.format(d['human_date'].min(),
d.iloc[0]['human_time'],
d.iloc[d.shape[0]-1]['human_time'],
d.iloc[0]['latitude'],
d.iloc[0]['longitude'],
d.iloc[d.shape[0]-1]['latitude'],
d.iloc[d.shape[0]-1]['longitude'],
d.shape[0],
str(d['time'].max()-d['time'].min())[7:12],
d['distance_feet'].sum()/5280,
d['speed_mph'].mean(),
d['altitude_change'].abs().sum(),
d['altitude_change'].sum())
print(string + '\n')
../backpacking-trips/big-sur-2021-08-20-thru-22/salmon-creek-trailhead-to-spruce-camp-82021-91230am.gpx
2021-08-20, 09:12:31 AM - 10:32:25 AM
Start: (35.815768, -121.358696), End: (35.82598, -121.344968)
510 GPS datapoints
01:19 duration
2.36 miles @ 2.1 avg MPH
2931/637 feet total/net elevation change
../backpacking-trips/big-sur-2021-08-20-thru-22/spruce-camp-to-estrella-camp-82021-112028am.gpx
2021-08-20, 11:20:29 AM - 12:04:52 PM
Start: (35.826085, -121.344949), End: (35.836589, -121.338499)
311 GPS datapoints
00:44 duration
1.36 miles @ 2.1 avg MPH
1827/593 feet total/net elevation change
../backpacking-trips/big-sur-2021-08-20-thru-22/estrella-camp-to-lions-den-camp.gpx
2021-08-20, 01:46:40 PM - 06:06:54 PM
Start: (35.836574, -121.338406), End: (35.858015, -121.338537)
594 GPS datapoints
04:20 duration
3.51 miles @ 2.0 avg MPH
4191/1457 feet total/net elevation change
../backpacking-trips/big-sur-2021-08-20-thru-22/lions-den-camp-to-cruikshank-camp.gpx
2021-08-21, 08:53:39 AM - 11:54:47 AM
Start: (35.858349, -121.338616), End: (35.856667, -121.38415)
921 GPS datapoints
03:01 duration
5.03 miles @ 2.1 avg MPH
5722/-1690 feet total/net elevation change
../backpacking-trips/big-sur-2021-08-20-thru-22/to-north-buckeye-camp.gpx
2021-08-21, 01:04:18 PM - 02:52:32 PM
Start: (35.85661, -121.384024), End: (35.840788, -121.378364)
523 GPS datapoints
01:48 duration
2.62 miles @ 1.9 avg MPH
3401/641 feet total/net elevation change
../backpacking-trips/big-sur-2021-08-20-thru-22/track-82221-81723am.gpx
2021-08-22, 08:17:24 AM - 10:26:54 AM
Start: (35.840822, -121.378279), End: (35.81578, -121.358846)
725 GPS datapoints
02:09 duration
4.04 miles @ 2.2 avg MPH
4348/-1794 feet total/net elevation change