Window Functions
Window Functions are one of the most powerful concepts in SQL databases. They allow users to compare one row to another without doing any JOINs.
Running Total
The following query calculates the cumulative seconds spent riding a certain bike in the DC bikeshare ride history.
There are multiple keywords used in this query. Note that GROUP BY
is not one of them, even though the query includes a sum. OVER
specifies the query as a window function. ORDER BY
functions in the same way it typically does: it provides an ordering for the rows, which directly impacts the values of the running_ride_duration column that is calculated.
SELECT start_time,
duration_seconds,
SUM(duration_seconds) OVER (ORDER BY start_time) AS running_ride_duration
FROM tutorial.dc_bikeshare_q1_2012
WHERE bike_number = 'W01412'
start_time | duration_seconds | running_ride_duration |
---|---|---|
2012-01-01 00:04:00 | 475 | 475 |
2012-01-02 18:45:00 | 563 | 1038 |
2012-01-02 20:15:00 | 410 | 1448 |
2012-01-02 20:45:00 | 261 | 1709 |
2012-01-03 07:09:00 | 348 | 2057 |
PARTITION_BY Function
The PARTITION BY
function enables users to specify time periods (or any other grouping of data) over which to aggregate. As an example, the following query partitions the running_ride_duration by day. As shown in the resulting rows, the running_ride_duration is reset to zero for each new partition.
SELECT DATE_TRUNC('day', start_time) AS day,
duration_seconds,
SUM(duration_seconds) OVER (PARTITION BY DATE_TRUNC('day', start_time) ORDER BY start_time) AS daily_running_ride_duration
FROM tutorial.dc_bikeshare_q1_2012
WHERE bike_number = 'W01412'
day | duration_seconds | daily_running_ride_duration |
---|---|---|
2012-01-01 00:00:00 | 475 | 475 |
2012-01-02 00:00:00 | 563 | 563 |
2012-01-02 00:00:00 | 410 | 973 |
2012-01-02 00:00:00 | 261 | 1234 |
2012-01-03 00:00:00 | 348 | 348 |
Windows
The order and partition define what is called the “window.” A “window” is an ordered subset of data over which all these calculations are made.
If the ORDER BY
is omitted from the query, as in the following example, then the result is the daily_ride_duration without the “running” component.
SELECT DATE_TRUNC('day', start_time) AS day,
duration_seconds,
SUM(duration_seconds) OVER (PARTITION BY DATE_TRUNC('day', start_time)) AS daily_running_ride_duration
FROM tutorial.dc_bikeshare_q1_2012
WHERE bike_number = 'W01412'
day | duration_seconds | daily_running_ride_duration |
---|---|---|
2012-01-01 00:00:00 | 475 | 475 |
2012-01-02 00:00:00 | 563 | 1234 |
2012-01-02 00:00:00 | 410 | 1234 |
2012-01-02 00:00:00 | 261 | 1234 |
2012-01-03 00:00:00 | 348 | 1332 |
Window Aliases
It is possible to define an alias for partitions that will be reused multiple times. These are defined in the WINDOW
clause, which usually is placed between the WHERE
and GROUP BY
clauses. The following query produces the same results as shown under the “PARTITION_BY Function” heading.
SELECT DATE_TRUNC('day', start_time) AS day,
duration_seconds,
SUM(duration_seconds) OVER main_window AS daily_running_ride_duration
FROM tutorial.dc_bikeshare_q1_2012
WHERE bike_number = 'W01412'
WINDOW main_window AS (PARTITION BY DATE_TRUNC('day', start_time) ORDER BY start_time)
ROW_NUMBER and RANK
ROW_NUMBER
displays the number of a given row within a window you define. With each new partition, the ROW_NUMBER
is reset to zero.
RANK
is similar, except for items with the same value in the ORDER BY
clause, for which the same RANK
will be assigned.