GROUP BY SQL Statement
Introducing the GROUP BY SQL Statement in PostgreSQL ‘Ello SQL geeks! Welcome back to our SQL learning journey. We left off with a beginner SQL skills challenge and the aggregate SQL functions: MIN, MAX, AVG and SUM. Today we’re looking at the GROUP BY statement. We’ll learn about this function in PostgreSQL and walk through […]
Introducing the GROUP BY SQL Statement in PostgreSQL
‘Ello SQL geeks! Welcome back to our SQL learning journey. We left off with a beginner SQL skills challenge and the aggregate SQL functions: MIN, MAX, AVG and SUM. Today we’re looking at the GROUP BY statement. We’ll learn about this function in PostgreSQL and walk through usage of this handy SQL statement.
About the GROUP BY SQL Statement/Clause
From my simple understanding, GROUP BY functions like a hybrid of the following:
- SELECT DISTINCT keyword (If used without an aggregate function like SUM), and,
- An Excel pivot table, rolling up aggregate figures (Count, Sum, Average, etc.) into unique rows
If you’re familiar with Excel Pivot tables, then you’ll recognize here the power of this function.
Let’s take a look at some examples to clarify.
First Look at Using the GROUP BY Function
To better illustrate the power of GROUP BY, we’ll first show its usage without aggregate functions. Consider the following:
If we query the address table of our sample database with a generic SELECT * FROM address; we get back an atrocious 605 rows of data. Aggregate and useless!
In contrast, if we call the GROUP BY function, we’ll get back a cleaner output, with fewer rows – only unique values returned. While this is an incremental improvement to analyzing data, there’s much left to be desired.
What’s missing here? How about that pivot table-esque functionality? This is where the power of using GROUP BY with aggregate functions gets awesome.
Using the GROUP BY SQL Statement with Aggregate Functions
As with most analysis, a single data point or data series rarely holds significant insight value on its own. Let’s drive home that point by leveraging the GROUP BY statement with the SUM aggregate function. Below, we compare the ratings of films in our sample database in aggregate by replacement cost. Perhaps this could serve in-store strategies for loss prevention.
SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating;
If we extend this functionality to more real world examples, we could use the following for GROUP BY:
- Grouping page-level / URL data to roll up clickstream analytics data
- Large scale analysis of CRM data for customer segmentation analysis
- Analyzing returns for financial data
The list could (and I’m sure it does) go on.
Let’s take this one step further and reduce potential future workload, by building sort functionality into our query. Below, we add a line to get most expensive ratings to least.
SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating
ORDER BY SUM(replacement_cost) DESC;
By the looks of this, no need to guard Land Before Time 8. 🙂
Extending our lesson: you can use the COUNT, AVG, and other aggregate functions to analyze as desired.
Wrap Up
Alright, this was a relatively gentle introduction into more advanced SQL functions. GROUP BY is a rather critical function, so in our next article, we’ll be doing yet another skills challenge. Joy!
Feel free to catch up on our other articles that help you learn PostgreSQL. Also, check out some how-to’s on developing Amazon Alexa skills, and a new series on getting started with Machine Learning. As always, please share with your colleagues and share thoughts in the comments below. Cheers.