The tables queried this blog post are the publicly-available “tutorial.billboard_top_100_year_end” and “tutorial.sat_scores” tables that are available at modeanalytics.com.
Subqueries are queries nested within other queries. They are means of creating a new table, and then querying that table of results.
All subqueries are required to have aliases. In the following simple example, the alias is “sub.”
The order of operations of subqueries is as one would expect: the inner queries execute before outer queries. Once the inner query has run, the outer query will run using the result set generated by the inner query.
Note that most SQL editors allow users to highlight a portion of a large query, and run just that portion. This is convenient when working with inner queries, as the user can ensure the inner query executes as expected without muddying the waters by executing the outer query as well.
Subqueries can be used anywhere that a tablename, column, or individual value would be used.
WITH command is a means of separating subqueries into their own “Common Table Expressions” (CTEs). In addition to improving legibility of the query structure, this also enables users to run time-consuming CTEs once, and then to iterate on other queries without running the complex queries over again.
The query above is refactored as a Common Table Expression in the following example.
It is also possible to chain multiple CTEs together, as shown below.
The following is an example of a slightly more complex subquery. The subquery below queries the number of top 100 hits each artist has each year. The outer query then queries that result for the number of years each artist had at least one top 100 hit. The results gives some indication as to these artists’ career “staying power.” Only artists that have consistent, long-term mass appeal could rank high in these results.
The results of the top_100_hits_per_year subquery are as follows:
|year||artist||Top 100 Hits|
The results of the complete query are as follows:
|artist||Years with 1+ Song in Top 100|
T-Pain and Lil Wayne are outputs of the subquery in the previous example, and the likes of Mariah Carey and Elton John are part of the output of the complete query. It is difficult for me to consider T-Pain and Madonna in the same category when it comes to musical legacy.
This leads me to wonder: how many artists had a single productive year where they had at least one song in the top 100, but then never appeared in the top 100 again?
This first five rows (out of 1678 total) of this query are the following:
|Artists with a Single Year of Top 100 Hit(s)|
|100 Proof Aged In Soul|
The inner query returns 4948 rows. Here’s a subset of the results. Note that ‘10cc’ and ‘112’ did not make it into the results of the outer query, as expected.
|year||artist||Top 100 Hits|
|1970||100 Proof Aged In Soul||1|
This means that there are 4948 distinct artist/year combinations accounting for the top 100 hits between 1956 and 2013. This is out of a conceivable maximum of 5700, if every artist only had a single top 100 hit per year.
This implies that a large majority of artists only ever have a single song in the top 100 for any given year, which is somewhat surprising. To confirm this:
The foregoing returns 3899 rows. The conclusion is that it is rare for an artist to have more than one song in the top 100 in any given year. The following subquery will verify this more conclusively.
|Top 100 Hits in a Given Year||Number of Artists with Indicated Number of Top 100 Hits in a Given Year|
As an example, consider the complete Top 100 hit count of 2Pac:
|1993||2Pac||I Get Around|
|1994||2Pac||Keep Ya Head Up|
|1995||2Pac||Dear Mama / Old School|
|1996||2Pac||How Do U Want It / California Love|
|1998||2Pac||Do For Love|
I would have expected many more top 100 hits from an artist with the enduring impact of 2Pac. It appears that highly-prolific single years are rare, just as are highly-prolific careers.
Back to the initial question: what is the largest number of top 100 hits, for an artist that only had a single year with songs in the top 100? IE, which artist was the biggest “flash in the pan” of musical history?
The results are not as exciting as I might have hoped, but 10 artists had a single year with three top 100 hits and no further songs in the top 100. Note that the database has data through 2013. It is likely that the artists with 2013 hits have had top 100 hits since 2013.
|artist||Years with Hits||year||Top 100 Hits|
|C+C Music Factory||1||1991||3|
|artist||Total Top 100 Hits|
A final way to analyze this data is determine which artists had the highest ranked songs. An even more involved composite measure might take into account the number of years during which the artist had songs in the top 100, in addition to the actual ranking of the songs.
For this simple analysis, assign “song_scores” between 1 and 100 for each song. Higher is better. This can be accomplished by subtracting the year_rank from 101. The results are shown below, with the (lengthy) query that generated it shown below.
|artist||Total Top 100 Hits||Years with Top 100 Hit(s)||Total Song Score|
Another brief analysis examines which artists with 10 or more appearances in the top 100 database had the highest average ranking.
The resulting cohort of megahit artists is quite different from the groups seen thus far.
|artist||Average Hit Ranking||Total Hits|
|Boyz II Men||33.357142857142854||14|
Finally, which artists had the most appearances in the top 10?
Familiar names from previous analyses.
|artist||Total Top 10 Hits|