This week’s content enables analysts to manipulate SQL data into whatever segments or categories they are interested in.
- SQL aggregations are different from Excel aggregations as a result of the table format that queries must return. The aggregations must be a single level.
- When a column is included in a count function, null values are ignored in the count. When an asterisk is included in a count function, nulls are included in the count.
SUM can be combined with
ISNULL to find the number of null values in a given column.
AVG, MIN, MAX
|Memory versus Pointing||3.5584||0||9|
How would you query how much time it took to complete each test provided in the exam_answers table, in minutes?
|345139||2013-02-05 03:58:13||2013-10-02 20:18:06|
|345139||2013-02-05 03:58:31||2013-10-02 20:18:06|
|345139||2013-02-05 03:59:03||2013-10-02 20:18:06|
What is the average amount of time it took customers to complete all of the tests in the exam_answers table when the negative durations are excluded from your calculation?
MySQL allows you to use aliases in a GROUP BY clause, but some database systems do not.
Note that the following query does not return the correct ordering, because the alias “Dogs” is the name of one of the tables. If instead of double-quotes, the alias is wrapped in back-ticks, then the query works as expected.
Whereas the expression that follows a
WHERE clause has to be applicable to each row of data in a column, the expression that follows a
HAVING clause has to be applicable or computable using a group of data.
Write a query that outputs the average number of tests completed and average mean inter-test-interval for every breed type, sorted by the average number of completed tests in descending order.
|breed_type||Tests Completed||Avg ITI|
|Mixed Breed/ Other/ I Don’t Know||9.54250850170034||3023.0711302156274|
Write a query that outputs the average amount of time it took customers to complete each type of test where any individual reaction times over 6000 hours are excluded and only average reaction times that are greater than 0 seconds are included.
Write a query that outputs the total number of unique User_Guids in each combination of State and ZIP code (postal code) in the United States, sorted first by state name in ascending alphabetical order, and second by total number of unique User_Guids in descending order.
Write a query that outputs the total number of unique User_Guids in each combination of State and ZIP code in the United States that have at least 5 users, sorted first by state name in ascending alphabetical order, and second by total number of unique User_Guids in descending order.
Pitfalls of GROUP BY
The following query executes without an error, but the query is nonsensical. It includes both aggregate and non-aggregate functions in the
|Mixed Breed/ Other/ I Don’t Know||9499||50|
MySQL resolves this by choosing its own way to “summarize” the unaggregated field, “weight.” Other database systems may through an error, such as the following.
Column 'X' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
The order in which SQL queries are written and actually executed are tabulated below.
|ORDER BY||ORDER BY|
Some content in this note are taken from the Jupyter Notebooks enumerated below. They are accessible as part of the “Managing Big Data with MySQL” course on coursera.org, and licensed by Jana Schaich Borg under CC BY-NC 4.0.
- MySQL Exercise 04 Summarizing Your Data
- MySQL Exercise 05 Breaking Your Summaries into Groups
- MySQL Exercise 06 Common Pitfalls of GROUP BY
Other content for this note is taken from the Coursera course “Managing Big Data with MySQL.”