Subqueries
For each of the example queries, the first few rows of the result are included.
Subquery Basics
Subqueries are queries nested within other queries. They are means of creating a new table, and then querying that table of results.
All subqueries are required to have aliases. In the following simple example, the alias is “sub.”
SELECT *
FROM (
SELECT *
FROM tutorial.billboard_top_100_year_end
) sub
The order of operations of subqueries is as one would expect: the inner queries execute before outer queries. Once the inner query has run, the outer query will run using the result set generated by the inner query.
Note that most SQL editors allow users to highlight a portion of a large query, and run just that portion. This is convenient when working with inner queries, as the user can ensure the inner query executes as expected without muddying the waters by executing the outer query as well.
Subqueries can be used anywhere that a tablename, column, or individual value would be used.
WITH Command
The WITH
command is a means of separating subqueries into their own “Common Table Expressions” (CTEs). In addition to improving legibility of the query structure, this also enables users to run time-consuming CTEs once, and then to iterate on other queries without running the complex queries over again.
The query above is refactored as a Common Table Expression in the following example.
WITH sub AS (
SELECT *
FROM tutorial.billboard_top_100_year_end
)
SELECT *
FROM sub
It is also possible to chain multiple CTEs together, as shown below.
WITH sub1 AS (
SELECT *
FROM tutorial.billboard_top_100_year_end
),
sub2 AS (
SELECT *
FROM sub1
)
SELECT *
FROM sub2
Example 1
The following is an example of a slightly more complex subquery. The subquery below queries the number of top 100 hits each artist has each year. The outer query then queries that result for the number of years each artist had at least one top 100 hit. The results gives some indication as to these artists' career “staying power.” Only artists that have consistent, long-term mass appeal could rank high in these results.
SELECT artist,
COUNT("Top 100 Hits") "Years with 1+ Song in Top 100"
FROM (
SELECT year,
artist,
COUNT(song_name) "Top 100 Hits"
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
GROUP BY artist, year
ORDER BY "Top 100 Hits" DESC
) top_100_hits_per_year
GROUP BY artist
ORDER BY "Years with 1+ Song in Top 100" DESC
The results of the top_100_hits_per_year subquery are as follows:
year | artist | Top 100 Hits |
---|---|---|
1964 | Beatles | 9 |
2008 | Lil Wayne | 8 |
2008 | T-Pain | 8 |
2007 | Justin Timberlake | 7 |
2007 | T-Pain | 7 |
The results of the complete query are as follows:
artist | Years with 1+ Song in Top 100 |
---|---|
Mariah Carey | 18 |
Madonna | 18 |
Elton John | 18 |
Stevie Wonder | 16 |
Chicago | 13 |
Example 2
T-Pain and Lil Wayne are outputs of the subquery in the previous example, and the likes of Mariah Carey and Elton John are part of the output of the complete query. It is difficult for me to consider T-Pain and Madonna in the same category when it comes to musical legacy.
This leads me to wonder: how many artists had a single productive year where they had at least one song in the top 100, but then never appeared in the top 100 again?
SELECT artist
FROM (
SELECT year,
artist,
COUNT(song_name) "Top 100 Hits"
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
GROUP BY artist, year
) top_100_hits_per_year
GROUP BY artist
HAVING COUNT("Top 100 Hits") = 1
ORDER BY "Years with 1+ Song in Top 100"
This first five rows (out of 1678 total) of this query are the following:
Artists with a Single Year of Top 100 Hit(s) |
---|
10,000 Maniacs |
100 Proof Aged In Soul |
20 Fingers |
2 Pistols |
.38 Special |
The inner query returns 4948 rows. Here’s a subset of the results. Note that ‘10cc’ and ‘112’ did not make it into the results of the outer query, as expected.
year | artist | Top 100 Hits |
---|---|---|
1994 | 10,000 Maniacs | 1 |
1970 | 100 Proof Aged In Soul | 1 |
1977 | 10cc | 1 |
1975 | 10cc | 1 |
1997 | 112 | 2 |
1996 | 112 | 1 |
This means that there are 4948 distinct artist/year combinations accounting for the top 100 hits between 1956 and 2013. This is out of a conceivable maximum of 5700, if every artist only had a single top 100 hit per year.
This implies that a large majority of artists only ever have a single song in the top 100 for any given year, which is somewhat surprising. To confirm this:
SELECT year,
artist,
COUNT(song_name) "Top 100 Hits"
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
GROUP BY artist, year
HAVING COUNT(song_name) = 1
ORDER BY artist
The foregoing returns 3899 rows. The conclusion is that it is rare for an artist to have more than one song in the top 100 in any given year. The following subquery will verify this more conclusively.
Example 3
SELECT "Top 100 Hits",
COUNT(artist) "Number of Artists with Indicated Number of Top 100 Hits in a Given Year"
FROM (
SELECT year,
artist,
COUNT(song_name) "Top 100 Hits in a Given Year"
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
AND song_name IS NOT NULL
GROUP BY artist, year
ORDER BY artist
) top_100_hits_per_year
GROUP BY "Top 100 Hits in a Given Year"
ORDER BY "Top 100 Hits in a Given Year"
Top 100 Hits in a Given Year | Number of Artists with Indicated Number of Top 100 Hits in a Given Year |
---|---|
1 | 3899 |
2 | 724 |
3 | 225 |
4 | 62 |
5 | 22 |
6 | 7 |
7 | 2 |
8 | 2 |
9 | 1 |
As an example, consider the complete Top 100 hit count of 2Pac:
year | artist | song_name |
---|---|---|
1993 | 2Pac | I Get Around |
1994 | 2Pac | Keep Ya Head Up |
1995 | 2Pac | Dear Mama / Old School |
1996 | 2Pac | How Do U Want It / California Love |
1998 | 2Pac | Do For Love |
2003 | 2Pac | Thugz Mansion |
I would have expected many more top 100 hits from an artist with the enduring impact of 2Pac. It appears that highly-prolific single years are rare, just as are highly-prolific careers.
Example 4
Back to the initial question: what is the largest number of top 100 hits, for an artist that only had a single year with songs in the top 100? IE, which artist was the biggest “flash in the pan” of musical history?
WITH hit_count_per_year_per_artist AS (
SELECT year,
artist,
COUNT(song_name) "Top 100 Hits"
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
GROUP BY artist, year
ORDER BY year
),
years_w_hits_per_artist AS (
SELECT artist,
COUNT(year) "Years with Hits"
FROM hit_count_per_year_per_artist
GROUP BY artist
)
SELECT h.artist,
y."Years with Hits",
h.year,
h."Top 100 Hits"
FROM years_w_hits_per_artist y
JOIN hit_count_per_year_per_artist h
ON y.artist = h.artist
AND "Years with Hits" = 1
ORDER BY "Top 100 Hits" DESC
The results are not as exciting as I might have hoped, but 10 artists had a single year with three top 100 hits and no further songs in the top 100. Note that the database has data through 2013. It is likely that the artists with 2013 hits have had top 100 hits since 2013.
artist | Years with Hits | year | Top 100 Hits |
---|---|---|---|
Teresa Brewer | 1 | 1956 | 3 |
Harry Belafonte | 1 | 1957 | 3 |
Buckinghams | 1 | 1967 | 3 |
Jim Stafford | 1 | 1974 | 3 |
C+C Music Factory | 1 | 1991 | 3 |
Cathy Dennis | 1 | 1991 | 3 |
CeCe Peniston | 1 | 1992 | 3 |
Future | 1 | 2013 | 3 |
Macklemore | 1 | 2013 | 3 |
Ryan Lewis | 1 | 2013 | 3 |
Example 5
SELECT artist,
count(song_name) "Total Top 100 Hits"
FROM tutorial.billboard_top_100_year_end
GROUP BY artist
ORDER BY "Total Top 100 Hits" DESC
artist | Total Top 100 Hits |
---|---|
Elvis Presley | 36 |
Madonna | 36 |
Mariah Carey | 33 |
Rihanna | 33 |
Elton John | 28 |
A final way to analyze this data is determine which artists had the highest ranked songs. An even more involved composite measure might take into account the number of years during which the artist had songs in the top 100, in addition to the actual ranking of the songs.
For this simple analysis, assign “song_scores” between 1 and 100 for each song. Higher is better. This can be accomplished by subtracting the year_rank from 101. The results are shown below, with the (lengthy) query that generated it shown below.
artist | Total Top 100 Hits | Years with Top 100 Hit(s) | Total Song Score |
---|---|---|---|
Elvis Presley | 36 | 13 | 2247 |
Mariah Carey | 33 | 18 | 2205 |
Rihanna | 33 | 9 | 2002 |
Madonna | 36 | 18 | 1888 |
Usher | 25 | 11 | 1677 |
Elton John | 28 | 18 | 1578 |
Beatles | 27 | 8 | 1572 |
Whitney Houston | 21 | 10 | 1432 |
Michael Jackson | 26 | 12 | 1430 |
Janet Jackson | 23 | 10 | 1415 |
WITH total_hits AS (
SELECT artist,
COUNT(song_name) "Total Top 100 Hits"
FROM tutorial.billboard_top_100_year_end
GROUP BY artist
ORDER BY "Total Top 100 Hits" DESC
),
unique_year_artists AS (
SELECT artist,
COUNT(year) "Years with Top 100 Hit(s)"
FROM (
SELECT DISTINCT year,
artist
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
ORDER BY year
) sub
GROUP BY artist
ORDER BY COUNT(year) DESC
),
song_scores AS (
SELECT artist,
SUM(101-year_rank) "Total Song Score"
FROM tutorial.billboard_top_100_year_end
GROUP BY artist
)
SELECT s.artist,
t."Total Top 100 Hits",
u."Years with Top 100 Hit(s)",
s."Total Song Score"
FROM song_scores s
JOIN total_hits t
ON s.artist = t.artist
JOIN unique_year_artists u
ON s.artist = u.artist
ORDER BY "Total Song Score" DESC
Another brief analysis examines which artists with 10 or more appearances in the top 100 database had the highest average ranking.
SELECT artist,
AVG(year_rank) "Average Hit Ranking",
COUNT(year_rank) "Total Hits"
FROM tutorial.billboard_top_100_year_end
WHERE artist IS NOT NULL
GROUP BY artist
HAVING COUNT(year_rank) > 10
ORDER BY "Average Hit Ranking"
The resulting cohort of megahit artists is quite different from the groups seen thus far.
artist | Average Hit Ranking | Total Hits |
---|---|---|
Bruno Mars | 20 | 12 |
Katy Perry | 23.214285714285715 | 14 |
Maroon 5 | 30.5 | 12 |
Celine Dion | 30.636363636363637 | 11 |
Lady Gaga | 30.75 | 12 |
50 Cent | 30.857142857142858 | 14 |
Bee Gees | 32.15384615384615 | 13 |
Alicia Keys | 32.5 | 12 |
Whitney Houston | 32.80952380952381 | 21 |
Boyz II Men | 33.357142857142854 | 14 |
Finally, which artists had the most appearances in the top 10?
SELECT artist,
COUNT(year_rank) "Total Top 10 Hits"
FROM tutorial.billboard_top_100_year_end
WHERE year_rank <= 10
GROUP BY artist
ORDER BY "Total Top 10 Hits" DESC
Familiar names from previous analyses.
artist | Total Top 10 Hits |
---|---|
Elvis Presley | 9 |
Mariah Carey | 7 |
Usher | 6 |
Elton John | 6 |
Beatles | 5 |