Subqueries and Derived Tables


Subqueries and Derived Tables

Subqueries are used because:

  1. they may be the most logical way to retrieve the desired information,
  2. they can be used to isolate each part of a statement, and
  3. they may run more quickly than joins.

Syntax rules:

  • Enclose subqueries in parentheses.
  • Cannot use ORDER BY in subqueries, but can use them in queries that contain subqueries.
  • Subqueries in SELECT or WHERE clauses must use operators that handle multiple values, such as IN.
  • For readability, it is best to indent subqueries.

Subqueries allow users to retrieve information dynamically, as shown below, rather than hard coding in specific items.

SELECT *
  FROM exam_answers
 WHERE TIMESTAMPDIFF(MINUTE, start_time, end_time) >
       (SELECT AVG(TIMESTAMPDIFF(MINUTE, start_time, end_time)) AS AvgDuration
          FROM exam_answers
         WHERE TIMESTAMPDIFF(MINUTE, start_time, end_time) > 0);

Subqueries also allow for membership testing. That is, testing certain rows to see if they are members of other groups of rows. EXISTS and NOT EXISTS are used for this purpose. These keywords are conceptually similar to IN and NOT IN, except that they are logical statements that return true or false, rather than testing specific values of a given table entry.

The following query selects all the users in the users table who were also in the dogs table. This is

SELECT DISTINCT u.user_guid AS uUserID
  FROM users u
 WHERE EXISTS (SELECT *
                 FROM dogs d
                WHERE u.user_guid = d.user_guid);

Subqueries can also be used in the FROM clause, as shown below. When structured in this manner, SQL actually creates a temporary table, called a derived table, that is then incorporated into the rest of the query.

SELECT DistinctUUsersID.user_guid AS uUserID, d.user_guid AS dUserID, count(*) AS numrows
  FROM (SELECT DISTINCT u.user_guid
          FROM users u) AS DistinctUUsersID
  LEFT JOIN dogs d
    ON DistinctUUsersID.user_guid=d.user_guid
 GROUP BY DistinctUUsersID.user_guid
 ORDER BY numrows DESC

When using subqueries in the FROM clause, there are a few things to remember:

  1. Any alias is required for any derived tables created in subqueries within FROM statements.
  2. This alias must be used every time we want to execute a function that uses the derived table.
  3. Inner subqueries can refer to table aliases assigned outside the subqueries, but outer queries cannot refer to aliases created within subqueries unless they are included in the subquery SELECT statement.
  4. Subqueries in FROM statements can be very computationally intensive.
SELECT DistinctUUsersID.user_guid AS uUserID,
       DistictDUsersID.user_guid AS dUserID,
       DistictDUsersID.dog_guid AS DogID,
       DistictDUsersID.breed AS breed
  FROM (SELECT DISTINCT u.user_guid
          FROM users u
         LIMIT 100) AS DistinctUUsersID
  LEFT JOIN (SELECT DISTINCT d.user_guid,
                             d.dog_guid,
                             d.breed
               FROM dogs d) AS DistictDUsersID
    ON DistinctUUsersID.user_guid=DistictDUsersID.user_guid
 GROUP BY DistinctUUsersID.user_guid
 LIMIT 3;
uUserID dUserID DogID breed
ce134a78-7144-11e5-ba71-058fbc01cf0b ce134a78-7144-11e5-ba71-058fbc01cf0b fd3d1b44-7144-11e5-ba71-058fbc01cf0b Shih Tzu
ce134be0-7144-11e5-ba71-058fbc01cf0b ce134be0-7144-11e5-ba71-058fbc01cf0b fd27c956-7144-11e5-ba71-058fbc01cf0b German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix

Some content in this note are taken from the Jupyter Notebook below. They are accessible as part of the “Managing Big Data with MySQL” course on coursera.org, and licensed by Jana Schaich Borg under CC BY-NC 4.0.

  • MySQL Exercise 09 Subqueries and Derived Tables

Other content for this note is taken from the Coursera course “Managing Big Data with MySQL.”