# Complex Teradata Queries

## Exercises

### How many distinct dates are there in the saledate column of the transaction table for each month/year combination in the database?

T_MONTH T_YEAR DISTINCT_SALEDATES
8 2004 31
9 2004 30
10 2004 31
11 2004 29
12 2004 30
1 2005 31
2 2005 28
3 2005 30
4 2005 30
5 2005 31
6 2005 30
7 2005 31
8 2005 27

Note there are 27 days recorded in August 2005, and 31 recorded for August 2004. August 2005 transactions will be excluded. Thanksgiving (Nov 25), Christmas (Dec 25), and March 27 do not have sales data.

### Use a CASE statement within an aggregate function to determine which sku had the greatest total sales during the combined summer months of June, July, and August.

SKU SUMMER_SALES
4108011 1646017.38
3524026 1464189.00
5528349 1315121.00

### How many distinct dates are there in the saledate column of the transaction table for each month/year/store combination in the database? Sort your results by the number of days per combination in ascending order.

T_MONTH T_YEAR STORE DISTINCT_SALEDATES
3 2005 8304 1
7 2005 7604 1
8 2004 8304 1

There are multiple rows with only one saledate for a given month year and store. This is most likely a result of some data being removed before the dataset was donated. This will need to be taken into account when examining sales trends for subsets of stores.

### What is the average daily revenue for each store/month/year combination in the database? Calculate this by dividing the total revenue for a group by the number of sales days available in the transaction table for that group.

T_MONTH T_YEAR STORE DISTINCT_SALEDATES TOTAL_SALES AVG_DAILY_REVENUE
12 2004 8402 30 3152448.34 105081.61
12 2004 504 30 2724376.37 90812.55
12 2004 2707 30 2711980.93 90399.36

Now build a query to remove all data from August, 2005, because the data for this month is incomplete. This is accomplished by creating a string including both the year and month for each row, and then using a `WHERE` clause to exclude rows having `200508` in that column.

Also, remove all combinations of year_month and store having fewer than 20 transactions, using a `HAVING` clause.

T_YEAR T_MONTH STORE YEAR_MONTH DISTINCT_SALEDATES
2005 4 1704 200504 21
2004 9 3802 200409 21
2004 9 5502 200409 22

This filtering query can be used as a subquery by joining into it on `YEAR`, `MONTH`, and `STORE`. The subquery included below is a shorted version of the query above, but has identical functionality.

Note that it is necessary to specify the purchase type in the outer query as well as the inner query, because the inner query only excludes transactions on the basis of year, month, and store.

T_YEAR T_MONTH T_STORE AVG_DAILY_REVENUE
2004 12 8402 105081.61
2004 12 504 90812.55
2004 12 2707 90399.36

The subquery above will be used to clean (IE, exclude some of) the data for the majority of the queries in the remainder of this section.

### What is the average daily revenue brought in by Dillard’s stores in areas of high, medium, or low levels of high school education?

HS_ATTAINMENT AVG_DAILY_REVENUE
low 34159.76
medium 25037.89
high 20937.31

### Compare the average daily revenues of the stores with the highest median msa_income and the lowest median msa_income. In what city and state were these stores, and which store had a higher average daily revenue?

Top 3 msa_income:

STORE AVG_DAILY_REVENUE MSA_INCOME CITY STATE
3902 17884.08 56099 SPANISH FORT AL
3403 10072.60 46140 SALINA KS
9609 14644.10 45944 LONGMONT CO

Bottom 3 msa_income:

STORE AVG_DAILY_REVENUE MSA_INCOME CITY STATE
2707 56601.99 16022 MCALLEN TX
2907 34590.64 17374 BROWNSVILLE TX
2807 16479.60 17374 HARLINGEN TX

Both stores (highest and lowest median msa_income) can be retrieved in a single query, as well, as follows.

STORE MSA_INCOME CITY STATE AVG_DAILY_REVENUE
3902 56099 SPANISH FORT AL 17884.08
2707 16022 MCALLEN TX 56601.99

### What is the brand of the sku with the greatest standard deviation in sprice? Only examine skus that have been part of over 100 transactions.

SKU BRAND STDDEV_SPRICE AVG_SPRICE
3733090 CABERNET 178.6 11.49
2762683 HART SCH 175.9 335.7
5453849 POLO FAS 170.5 286.7

### Examine all the transactions for the sku with the greatest standard deviation in sprice, but only consider skus that are part of more than 100 transactions.

SKU STORE STYPE ORGPRICE AMT
3733090 1007 R 6.00 5005.00
3733090 1007 P 6.00 5005.00
3733090 9804 R 6.00 6.40

The large standard deviation for this SKU is explained by a single mistaken purchase. A \$6 item was mistakenly sold for \$5005, and then immediately returned.

### What was the average daily revenue Dillard’s brought in during each month of the year?

SALE_MONTH AVG_DAILY_REVENUE SALE_MONTH AVG_DAILY_REVENUE
December 12528288.13 June 7339800.24
February 8122385.46 November 6932939.04
July 8058432.50 October 6721241.89
April 7703249.70 January 6708819.58
May 7478860.06 August 6167751.73
March 7435026.42 September 6157985.19

### Which department, in which city and state of what store, had the greatest % increase in average daily sales revenue from November to December?

PERC_INC N_SALES D_SALES N_DATES D_DATES DEPT STORE CITY STATE
4182.00 5.99 513.00 1 2 GOTTEX 404 PINE BLUFF AR
3550.00 10.00 365.00 1 1 ENVIRON 507 FORT WORTH TX
2751.00 2.99 170.50 1 2 CATALIN 502 ST PETERSBURG FL

### What is the city and state of the store that had the greatest decrease in average daily revenue from August to September?

T_STORE S_M_CITY S_M_STATE A_SALES S_SALES A_DATES S_DATES SPEND_DEC
4003 WEST DES MOINES IA 591489.31 378020.92 31 30 6479.60
9103 LOUISVILLE KY 1385203.66 1183525.98 31 30 5233.12
2707 MCALLEN TX 1526547.16 1324019.74 31 30 5109.47

### Determine the month of maximum total revenue for each store. Count the number of stores whose month of maximum total revenue was in each of the twelve months. Then determine the month of maximum average daily revenue. Count the number of stores whose month of maximum average daily revenue was in each of the twelve months. How do they compare?

Maximum Total Revenue

 MO 12 3 7 9 TOP_SALES_COUNT 321 3 3 1

Maximum Average Daily Revenue

 MO 12 3 7 2 9 5 TOP_SALES_COUNT 317 4 3 2 1 1

Content for this note is taken from the Coursera course “Managing Big Data with MySQL.”

Tags:

Categories:

Updated: