SQL Aggregation
Previously, we have been dealing with queries that process one row at a time. When we join, we make pairwise combinations of all of the rows. When we use WHERE
, we filter out certain rows based on the condition. Alternatively, applying an aggregate function such as MAX(column)
combines the values in multiple rows.
CREATE TABLE flights AS
SELECT "AUH" AS departure, 932 AS price UNION
SELECT "LAS", 50 UNION
SELECT "LAX", 89 UNION
SELECT "LAX", 99 UNION
SELECT "SEA", 32 UNION
SELECT "SFO", 50 UNION
SELECT "SFO", 40 UNION
SELECT "SFO", 60 UNION
SELECT "SLC", 49 UNION
SELECT "SLC", 42;
By default, we combine the values of the entire table. For example, if we wanted to count the number of flights from our flights
table, we could use:
sqlite> SELECT COUNT(*) from flights;
10
GROUP BY
What if we wanted to group together the values in similar rows and perform the aggregation operations within those groups? We use a GROUP BY
clause.
Here's another example. For each unique departure, collect all the rows having the same departure airport into a group. Then, select the price
column and apply the MIN
aggregation to recover the price of the cheapest departure from that group. The end result is a table of departure airports and the cheapest departing flight.
sqlite> SELECT departure, MIN(price) FROM flights GROUP BY departure;
AUH|932
LAS|50
LAX|89
SEA|32
SFO|40
SLC|42
HAVING
Just like how we can filter out rows with WHERE
, we can also filter out groups with HAVING
. Typically, a HAVING
clause should use an aggregation function. Suppose we want to see all airports with at least two departures:
sqlite> SELECT departure FROM flights GROUP BY departure HAVING COUNT(*) >= 2;
LAX
SFO
SLC
DISTINCT
Note that the COUNT(*)
aggregate just counts the number of rows in each group. Say we want to count the number of distinct airports instead. Then, we could use the following query:
sqlite> SELECT COUNT(DISTINCT departure) FROM flights;
6
This enumerates all the different departure airports available in our flights
table (in this case: SFO, LAX, AUH, SLC, SEA, and LAS).