SQL GROUP BY Statements

Aggregate Functions

Aggregate functions take multiple rows of data and aggregate, or combine, it into a single value. There are several aggregate functions that will be demonstrated: AVG, MIN, MAX, and SUM.

SELECT AVG(amount)
  FROM payment;

This will return a precise value with many decimal places (“4.2006056453822965”). ROUND allows users to round this down to a less precise, more comprehensible value (“4.20”).

SELECT ROUND(AVG(amount), 2)
  FROM payment;
SELECT MIN(amount),
       MAX(amount),
       SUM(amount)
  FROM payment;

GROUP BY Statements

GROUP BY divides the rows returned by the SELECT statement into groups. For each group, an aggregate function can be applied. As examples, users can COUNT the number of items in the groups, or SUM the items.

SELECT *
  FROM payment
 LIMIT 3;

Using GROUP BY without an aggregate function is the same as calling DISTINCT on the column. PostgreSQL does not require an aggregate function to be included in the SELECT clause when using GROUP BY. Other SQL engines do.

SELECT customer_id
  FROM payment
 GROUP BY customer_id;

PostgreSQL does not require users to include the column used in the GROUP BY in the SELECT clause. Other SQL engines do require customer_id, for example, to be included alongside the aggregate function in SELECT.

SELECT customer_id,
       SUM(amount)
  FROM payment
 GROUP BY customer_id;

Another example.

SELECT rating,
       COUNT(rating)
  FROM film
 GROUP BY rating;

Challenge: GROUP BY

We have two staff members with Staff IDs 1 and 2. We want to give a bonus to the staff member that handled the most payments. How many payments did each staff member handle? And how much was the total amount processed by each staff member?

SELECT staff_id,
       COUNT(amount),
       SUM(amount)
  FROM payment
 GROUP BY staff_id;

Corporate headquarters is auditing the store. They want to know the average replacement cost of movies by rating. For example, R rated movies have an average replacement cost of $20.23.

SELECT rating,
       ROUND(AVG(replacement_cost),2)
  FROM film
 GROUP BY rating;

We want to send coupons to the 5 customers who have spent the most money. We need the customer ids of the top 5 spenders.

SELECT customer_id,
       SUM(amount)
  FROM payment
 GROUP BY customer_id
 ORDER BY SUM(amount) DESC
 LIMIT 5;

HAVING

HAVING is used in conjunction with GROUP BY to filter group rows that do not satisfy a certain condition.

HAVING is analogous to WHERE, except it is used with GROUP BY.

  • The HAVING clause sets the condition for groups of rows created by the GROUP BY clause after the GROUP BY clause applies.
  • The WHERE clause sets the condition for individual rows before the GROUP BY clause applies.
SELECT column_1,
       aggregate_function(column_2)
  FROM table_name
 GROUP BY column_1
HAVING condition;

Examples.

SELECT customer_id, SUM(amount)
  FROM payment
 GROUP BY customer_id
HAVING SUM(amount) > 190;

Example using WHERE and HAVING.

SELECT rating,
       AVG(rental_rate)
  FROM film
 WHERE rating IN ('R', 'G', 'PG')
 GROUP BY rating
HAVING AVG(rental_rate) < 3;

Challenge: HAVING

We want to know what customers are eligible for our platinum credit card. The requirements are that the customer has at least a total of 40 transaction payments. What customers are eligible for the credit card?

SELECT customer_id,
       COUNT(payment_id)
  FROM payment
 GROUP BY customer_id
HAVING COUNT(payment_id) >= 40

When grouped by rating, what movie ratings have an average rental duration of more than 5 days?

SELECT rating,
       AVG(rental_duration)
  FROM film
 GROUP BY rating
HAVING AVG(rental_duration) > 5

Examples

Determine the customer IDs of customers who have spent at least $110 with the staff member who has an ID of 2.

SELECT customer_id,
       SUM(amount)
  FROM payment
 WHERE staff_id = 2
 GROUP BY customer_id
HAVING SUM(amount) > 110;

How many films have titles that begin with the letter “J”?

SELECT COUNT(title)
  FROM film
 WHERE title ILIKE 'J%';

What is the name of the customer who has the highest customer ID number, and whose name starts with an ‘E’, and who has an address ID that is less than 500?

SELECT first_name,
       last_name,
       customer_id
  FROM customer
 WHERE first_name ILIKE 'E%'
   AND address_id < 500
 ORDER BY customer_id DESC
 LIMIT 1

Content for this note is taken from information in “The Complete SQL Bootcamp” course on Udemy. I highly recommend it.