SQL Window Functions

The tables queried this blog post are the publicly-available “tutorial.dc_bikeshare_q1_2012,” table that is available at modeanalytics.com.

Window Functions are one of the most powerful concepts in SQL databases. They allow users to compare one row to another without doing any JOINs.

Running Total

The following query calculates the cumulative seconds spent riding a certain bike in the DC bikeshare ride history.

There are multiple keywords used in this query. Note that GROUP BY is not one of them, even though the query includes a sum. OVER specifies the query as a window function. ORDER BY functions in the same way it typically does: it provides an ordering for the rows, which directly impacts the values of the running_ride_duration column that is calculated.

SELECT start_time,
       duration_seconds,
       SUM(duration_seconds) OVER (ORDER BY start_time) AS running_ride_duration
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE bike_number = 'W01412'
start_time duration_seconds running_ride_duration
2012-01-01 00:04:00 475 475
2012-01-02 18:45:00 563 1038
2012-01-02 20:15:00 410 1448
2012-01-02 20:45:00 261 1709
2012-01-03 07:09:00 348 2057

PARTITION_BY Function

The PARTITION BY function enables users to specify time periods (or any other grouping of data) over which to aggregate. As an example, the following query partitions the running_ride_duration by day. As shown in the resulting rows, the running_ride_duration is reset to zero for each new partition.

SELECT DATE_TRUNC('day', start_time) AS day,
       duration_seconds,
       SUM(duration_seconds) OVER (PARTITION BY DATE_TRUNC('day', start_time) ORDER BY start_time) AS daily_running_ride_duration
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE bike_number = 'W01412'
day duration_seconds daily_running_ride_duration
2012-01-01 00:00:00 475 475
2012-01-02 00:00:00 563 563
2012-01-02 00:00:00 410 973
2012-01-02 00:00:00 261 1234
2012-01-03 00:00:00 348 348

Windows

The order and partition define what is called the “window.” A “window” is an ordered subset of data over which all these calculations are made.

If the ORDER BY is omitted from the query, as in the following example, then the result is the daily_ride_duration without the “running” component.

SELECT DATE_TRUNC('day', start_time) AS day,
       duration_seconds,
       SUM(duration_seconds) OVER (PARTITION BY DATE_TRUNC('day', start_time)) AS daily_running_ride_duration
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE bike_number = 'W01412'
day duration_seconds daily_running_ride_duration
2012-01-01 00:00:00 475 475
2012-01-02 00:00:00 563 1234
2012-01-02 00:00:00 410 1234
2012-01-02 00:00:00 261 1234
2012-01-03 00:00:00 348 1332

Window Aliases

It is possible to define an alias for partitions that will be reused multiple times. These are defined in the WINDOW clause, which usually is placed between the WHERE and GROUP BY clauses. The following query produces the same results as shown under the “PARTITION_BY Function” heading.

SELECT DATE_TRUNC('day', start_time) AS day,
       duration_seconds,
       SUM(duration_seconds) OVER main_window AS daily_running_ride_duration
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE bike_number = 'W01412'
WINDOW main_window AS (PARTITION BY DATE_TRUNC('day', start_time) ORDER BY start_time)

Content for this note is taken from information I learned while pursuing the Udacity Data Analyst Nanodegree. Udacity is an online tech education service that offers both free and paid, tech-focused training. Learn more.