## GROUP BY SQL Statement

Introducing the GROUP BY SQL Statement in PostgreSQL

‘Ello SQL geeks! Welcome back to our SQL learning journey. We left off with a beginner SQL skills challenge and the aggregate SQL functions: **MIN**, **MAX**, **AVG** and **SUM**. Today we’re looking at the **GROUP BY** statement. We’ll learn about this function in PostgreSQL and walk through usage of this handy SQL statement.

## About the GROUP BY SQL Statement/Clause

From my simple understanding, GROUP BY functions like a hybrid of the following:

**SELECT****DISTINCT**keyword (If used without an aggregate function like**SUM**), and,- An Excel pivot table, rolling up aggregate figures (Count, Sum, Average, etc.) into unique rows

If you’re familiar with Excel Pivot tables, then you’ll recognize here the power of this function.

Let’s take a look at some examples to clarify.

## First Look at Using the GROUP BY Function

To better illustrate the power of **GROUP BY**, we’ll first show its usage without aggregate functions. Consider the following:

If we query the address table of our sample database with a generic **SELECT * FROM **address; we get back an atrocious 605 rows of data. Aggregate and useless!

In contrast, if we call the **GROUP BY** function, we’ll get back a cleaner output, with fewer rows – only unique values returned. While this is an incremental improvement to analyzing data, there’s much left to be desired.

What’s missing here? How about that pivot table-esque functionality? This is where the power of using **GROUP BY** with aggregate functions gets awesome.

## Using the GROUP BY SQL Statement with Aggregate Functions

As with most analysis, a single data point or data series rarely holds significant insight value on its own. Let’s drive home that point by leveraging the **GROUP** **BY **statement with the **SUM** aggregate function. Below, we compare the ratings of films in our sample database in aggregate by replacement cost. Perhaps this could serve in-store strategies for loss prevention.

**SELECT** rating, **SUM**(replacement_cost)

**FROM** film

**GROUP** **BY** rating;

If we extend this functionality to more real world examples, we could use the following for **GROUP BY**:

- Grouping page-level / URL data to roll up clickstream analytics data
- Large scale analysis of CRM data for customer segmentation analysis
- Analyzing returns for financial data

The list could (and I’m sure it does) go on.

Let’s take this one step further and reduce potential future workload, by building sort functionality into our query. Below, we add a line to get most expensive ratings to least.

**SELECT** rating, **SUM**(replacement_cost)

**FROM** film

**GROUP** **BY** rating

**ORDER** **BY** **SUM**(replacement_cost) **DESC**;

By the looks of this, no need to guard Land Before Time 8. 🙂

Extending our lesson: you can use the COUNT, AVG, and other aggregate functions to analyze as desired.

## Wrap Up

Alright, this was a relatively gentle introduction into more advanced SQL functions. GROUP BY is a rather critical function, so in our next article, we’ll be doing yet another skills challenge. Joy!

Feel free to catch up on our other articles that help you learn PostgreSQL. Also, check out some how-to’s on developing Amazon Alexa skills, and a new series on getting started with Machine Learning. As always, please share with your colleagues and share thoughts in the comments below. Cheers.