Basics
For each of the example queries, the first few rows of the result are included.
Statements and Queries
The function of a “SQL statement” is to communicate to the database what you’d like to do with the data. Types of statements include:
- CREATE - Creates a new table in the database.
- DROP TABLE - Removes a table from the database.
- SELECT - Allows you to read data and display it. Select statements are commonly referred to as queries.
There are a few other types of statements related to the creation and removal of data. Most analytical work involves reading and manipulating data (using SELECT
) without altering the underlying source.
Query Formatting
SQL is generally not case sensitive, but common formatting convention is to capitalize keywords (IE, SELECT
and FROM
) while leaving the names of columns and tables lowercase (account
). This aids in the legibility of commands, which only becomes more important as the queries become more complex.
Column names should use underscores instead of spaces (web_events
, not "web events"
). In Postgres, if there is a space in a column or table name, that name needs to be referred to with double quotes around them.
SELECT account
FROM web_events;
Different SQL environments may or may not require a semicolon at the end of the query. Best practice is to include it whether or not it is required.
SELECT & FROM
Every query contains the SELECT
and FROM
keywords. The SELECT
keyword is always followed by the columns the user wants returned. The FROM
keyword is always is always followed by the table from which those columns should be selected.
Unlike many other types of programming languages, the order of SQL statements cannot vary. SELECT
is always followed by FROM
, and not the other way around, as shown in the following examples. The order of keywords will likewise be critical for the other keywords that will be introduced in this post.
SELECT flight_date,
origin,
dest,
actual_elapsed_time
FROM tutorial.us_flights
flight_date | origin | dest | actual_elapsed_time |
---|---|---|---|
2015-01-02 00:00:00 | JFK | LAX | 381 |
2015-01-03 00:00:00 | JFK | LAX | 358 |
2015-01-04 00:00:00 | JFK | LAX | 385 |
2015-01-05 00:00:00 | JFK | LAX | 389 |
2015-01-06 00:00:00 | JFK | LAX | 424 |
LIMIT
The keyword LIMIT
places an upper limit on the number of rows returned by a given query. LIMIT
is always the very last keyword included in a query.
The primary usage of this command is to speed up execution of queries that would otherwise take a very long time, due to there being millions of rows. It would be inconvenient to wait several minutes for a query to return especially if the programmer simply wants to quickly get an idea of what the fields contain, for example.
An *
(asterisk) is used here to select all available columns. In SQL, *
means ‘all.’
SELECT *
FROM tutorial.city_populations
LIMIT 3
city | state | population_estimate_2012 | id |
---|---|---|---|
New York | NY | 8336697 | 1 |
Los Angeles | CA | 3857799 | 2 |
Chicago | IL | 2714856 | 3 |
ORDER BY
The keyword ORDER BY
specifies which field the results of the query will be ordered by.
By default, the query returns results from A to Z, lowest to highest, and earliest to latest. This is called “Ascending Order.” In order to reverse this ordering, add the keyword DESC
following the column name.
SELECT school,
student_id,
sat_writing,
sat_verbal,
sat_math
FROM tutorial.sat_scores
ORDER BY sat_math DESC
school | student_id | sat_writing | sat_verbal | sat_math |
---|---|---|---|---|
St. John’s | 118 | 578 | 492 | 797 |
Washington HS | 21 | 624 | 331 | 796 |
Petersville HS | 40 | 513 | 614 | 780 |
Washington HS | 6 | 213 | 500 | 771 |
St. John’s | 87 | 697 | 798 | 767 |
It is also possible to ORDER BY
multiple fields. This will result in a query being ordered by the first field, first.
As an example, the following query orders by flight_date, then by actual_elapsed_time from smallest to largest. The latter ordering will be “nested within” the former.
SELECT flight_date,
origin,
dest,
actual_elapsed_time
FROM tutorial.us_flights
ORDER BY flight_date,
actual_elapsed_time
flight_date | origin | dest | actual_elapsed_time |
---|---|---|---|
2015-01-02 00:00:00 | WRG | PSG | 24 |
2015-01-02 00:00:00 | PSG | WRG | 27 |
2015-01-02 00:00:00 | DEN | COS | 28 |
2015-01-02 00:00:00 | MEI | PIB | 29 |
2015-01-02 00:00:00 | DEN | COS | 30 |
WHERE
A WHERE
clause allows you to filter a set of results on the basis of a given criteria. This functionality is similar to Excel’s filter capability.
The following would produce results where all the values in the origin column are equal to ‘LAX.’ In SQL, each row is one datapoint or observation. If a given row has an origin other than ‘LAX,’ it will be excluded in its entirety from the query results.
Whenever using WHERE
with non-numeric data, the non-numeric data is placed in single-quotes, as shown below.
SELECT flight_date,
origin,
dest,
actual_elapsed_time
FROM tutorial.us_flights
WHERE origin = 'LAX'
ORDER BY actual_elapsed_time
flight_date | origin | dest | actual_elapsed_time |
---|---|---|---|
2015-01-06 00:00:00 | LAX | SBA | 32 |
2015-01-14 00:00:00 | LAX | SBA | 34 |
2015-01-07 00:00:00 | LAX | CLD | 35 |
2015-01-04 00:00:00 | LAX | SBA | 36 |
2015-01-05 00:00:00 | LAX | SBA | 36 |
Comparison operators such as <, >, and the not equal symbol can also be used with WHERE
.
SELECT *
FROM tutorial.city_populations
WHERE population_estimate_2012 < 500000
city | state | population_estimate_2012 | id |
---|---|---|---|
Sacramento | CA | 475516 | 35 |
Long Beach | CA | 467892 | 36 |
Kansas City | MI | 464310 | 37 |
Mesa | AZ | 452084 | 38 |
Virginia Beach | VA | 447021 | 39 |
SELECT *
FROM tutorial.city_populations
WHERE state = 'CA'
city | state | population_estimate_2012 | id |
---|---|---|---|
Los Angeles | CA | 3857799 | 2 |
San Diego | CA | 1338348 | 8 |
San Jose | CA | 982765 | 10 |
San Francisco | CA | 825863 | 14 |
Fresno | CA | 505882 | 34 |
Derived Columns
Derived columns are new columns the results of which come from a manipulation of the existing columns in the database.
The following code snippet demonstrates how it is possible to create new columns from the contents of other columns. In this case, the new column is named ‘sat_total’ using the AS
keyword.
SELECT school,
student_id,
sat_writing,
sat_verbal,
sat_math,
sat_writing + sat_verbal + sat_math AS sat_total
FROM tutorial.sat_scores
ORDER BY sat_total DESC
school | student_id | sat_writing | sat_verbal | sat_math | sat_total |
---|---|---|---|---|---|
St. John’s | 87 | 697 | 798 | 767 | 2262 |
Petersville HS | 38 | 779 | 656 | 724 | 2159 |
St. John’s | 125 | 741 | 644 | 702 | 2087 |
St. John’s | 132 | 619 | 742 | 718 | 2079 |
Petersville HS | 32 | 635 | 743 | 695 | 2073 |
LIKE
LIKE
is a keyword used with the WHERE
keyword. It enables more flexible searching. The following example returns rows with the word “Life” in the song_name, where “Life” can be followed by any number of characters. This is useful whenever you have lots of similar but slightly different values in a given column.
LIKE
requires the use of wildcards. ‘%’ is a wild card that represents a character or any number of characters.
SELECT *
FROM tutorial.billboard_top_100_year_end
WHERE song_name LIKE 'Life%'
year | year_rank | group | artist | song_name | id |
---|---|---|---|---|---|
1978 | 87 | Joe Walsh | Joe Walsh | Life’s Been Good | 2325 |
1986 | 78 | Dream Academy | Dream Academy | Life In A Northern Town | 3161 |
1992 | 18 | Tom Cochrane | Tom Cochrane | Life Is A Highway | 3718 |
2006 | 76 | Rascal Flatts | Rascal Flatts | Life is a Highway | 5461 |
2010 | 96 | Daughtry | Daughtry | Life After You | 6051 |
IN
IN
is a keyword that, similar to LIKE
, is used in combination with WHERE
. The purpose of IN
is to allow you to filter data based on several possible values. For example, the following allows you to search for cities in ‘CA,’ ‘TX,’ or ‘NY.’
Any non-numeric data must be placed in single-quotes, as shown, as with comparison operators. Place a comma in between each pair of distinct values.
SELECT *
FROM tutorial.city_populations
WHERE state IN ('CA', 'TX', 'NY')
city | state | population_estimate_2012 | id |
---|---|---|---|
New York | NY | 8336697 | 1 |
Los Angeles | CA | 3857799 | 2 |
Houston | TX | 2160821 | 4 |
San Antonio | TX | 1382951 | 7 |
San Diego | CA | 1338348 | 8 |
SELECT flight_date,
origin,
dest,
distance,
actual_elapsed_time
FROM tutorial.us_flights
WHERE actual_elapsed_time IN (30, 45)
flight_date | origin | dest | distance | actual_elapsed_time |
---|---|---|---|---|
2015-01-06 00:00:00 | DFW | OKC | 175 | 45 |
2015-01-06 00:00:00 | WRG | KTN | 82 | 30 |
2015-01-06 00:00:00 | WRG | PSG | 31 | 30 |
2015-01-03 00:00:00 | YAK | CDV | 213 | 45 |
2015-01-04 00:00:00 | CDV | ANC | 160 | 45 |
NOT
NOT
provides the inverse results for IN
, LIKE
, and similar operators.
SELECT *
FROM tutorial.billboard_top_100_year_end
WHERE song_name NOT LIKE 'The%'
year | year_rank | group | artist | song_name | id |
---|---|---|---|---|---|
1956 | 1 | Elvis Presley | Elvis Presley | Heartbreak Hotel | 1 |
1956 | 2 | Elvis Presley | Elvis Presley | Don’t Be Cruel | 2 |
1956 | 3 | Nelson Riddle | Nelson Riddle | Lisbon Antigua | 3 |
1956 | 4 | Platters | Platters | My Prayer | 4 |
1956 | 7 | Doris Day | Doris Day | Whatever Will Be Will Be (Que Sera Sera) | 7 |
SELECT *
FROM tutorial.billboard_top_100_year_end
WHERE artist NOT IN ('Elvis Presley', 'Nelson Riddle')
year | year_rank | group | artist | song_name | id |
---|---|---|---|---|---|
1956 | 4 | Platters | Platters | My Prayer | 4 |
1956 | 5 | Gogi Grant | Gogi Grant | The Wayward Wind | 5 |
1956 | 6 | Les Baxter | Les Baxter | The Poor People Of Paris | 6 |
1956 | 7 | Doris Day | Doris Day | Whatever Will Be Will Be (Que Sera Sera) | 7 |
1956 | 9 | Dean Martin | Dean Martin | Memories Are Made Of This | 9 |
AND & BETWEEN
The following example filters for populations less than 500,000 and greater than 450,000. The AND
operator allows you to run two complete logical statements; the word ‘population_estimate_2012’ could not be omitted from the second half of the WHERE
clause shown below.
SELECT *
FROM tutorial.city_populations
WHERE population_estimate_2012 <= 500000
AND population_estimate_2012 >= 450000
The keyword BETWEEN
provides an alternative way of expressing the identical query, above.
SELECT *
FROM tutorial.city_populations
WHERE population_estimate_2012 BETWEEN 450000 AND 500000
city | state | population_estimate_2012 | id |
---|---|---|---|
Sacramento | CA | 475516 | 35 |
Long Beach | CA | 467892 | 36 |
Kansas City | MI | 464310 | 37 |
Mesa | AZ | 452084 | 38 |
The commands can be chained, as shown in the examples below.
SELECT school,
student_id,
sat_writing,
sat_verbal,
sat_math
FROM tutorial.sat_scores
WHERE sat_math BETWEEN 700 AND 750
AND sat_verbal < 300
school | student_id | sat_writing | sat_verbal | sat_math |
---|---|---|---|---|
Washington HS | 25 | 502 | 291 | 716 |
Petersville HS | 84 | 747 | 290 | 729 |
St. John’s | 91 | 394 | 214 | 735 |
SELECT *
FROM tutorial.billboard_top_100_year_end
WHERE artist = 'Elvis Presley'
AND year_rank BETWEEN 1 AND 5
year | year_rank | group | artist | song_name | id |
---|---|---|---|---|---|
1956 | 1 | Elvis Presley | Elvis Presley | Heartbreak Hotel | 1 |
1956 | 2 | Elvis Presley | Elvis Presley | Don’t Be Cruel | 2 |
1957 | 1 | Elvis Presley | Elvis Presley | All Shook Up | 104 |
1958 | 3 | Elvis Presley | Elvis Presley | Don’t / I Beg Of You | 206 |
OR
OR
performs the expected functionality, similar to AND
.
SELECT flight_date,
origin,
dest,
distance,
actual_elapsed_time
FROM tutorial.us_flights
WHERE origin = 'LAX' OR origin = 'SFO'
ORDER BY actual_elapsed_time
flight_date | origin | dest | distance | actual_elapsed_time |
---|---|---|---|---|
2015-01-06 00:00:00 | LAX | SBA | 89 | 32 |
2015-01-11 00:00:00 | SFO | MRY | 77 | 33 |
2015-01-14 00:00:00 | LAX | SBA | 89 | 34 |
2015-01-07 00:00:00 | LAX | CLD | 86 | 35 |
2015-01-06 00:00:00 | SFO | SMF | 86 | 35 |
AND
and OR
can also be combined to form more complex queries. Parentheses can be used to group statements as appropriate. See the following code example.
SELECT school,
student_id,
sat_writing,
sat_verbal,
sat_math
FROM tutorial.sat_scores
WHERE sat_math BETWEEN 775 AND 800
OR sat_verbal BETWEEN 775 AND 800
OR sat_writing BETWEEN 775 AND 800
school | student_id | sat_writing | sat_verbal | sat_math |
---|---|---|---|---|
Washington HS | 2 | 401 | 791 | 248 |
Washington HS | 19 | 799 | 406 | 247 |
Washington HS | 21 | 624 | 331 | 796 |
Washington HS | 27 | 661 | 782 | 393 |
Petersville HS | 35 | 304 | 780 | 516 |
HAVING
HAVING
is used in conjunction with GROUP BY
to filter group rows that do not satisfy a certain condition.
HAVING
is analogous to WHERE
, except it is used with GROUP BY
.
- The
HAVING
clause sets the condition for groups of rows created by theGROUP BY
clause after theGROUP BY
clause applies. - The
WHERE
clause sets the condition for individual rows before theGROUP BY
clause applies.
SELECT column_1,
aggregate_function(column_2)
FROM table_name
GROUP BY column_1
HAVING condition;
SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
HAVING SUM(amount) > 190;
customer_id | count |
---|---|
144 | 40 |
526 | 42 |
148 | 45 |
Example using WHERE
and HAVING
.
SELECT rating,
AVG(rental_rate)
FROM film
WHERE rating IN ('R', 'G', 'PG')
GROUP BY rating
HAVING AVG(rental_rate) < 3;
rating | avg |
---|---|
NC-17 | 5.142 |
PG | 5.082 |
PG-13 | 5.053 |