The HAVING Clause in PostgreSQL
Howdy SQL-ites! Welcome back to our next step in learning SQL. After a long hiatus, we recently stumbled our way through some intermediate SQL statement challenges, namely on GROUP BY. Today, we’re back in the flow, and learning about the HAVING clause in PostgreSQL. About the HAVING Clause in PostgreSQL The HAVING clause is most […]
Howdy SQL-ites! Welcome back to our next step in learning SQL. After a long hiatus, we recently stumbled our way through some intermediate SQL statement challenges, namely on GROUP BY.
Today, we’re back in the flow, and learning about the HAVING clause in PostgreSQL.
About the HAVING Clause in PostgreSQL
The HAVING clause is most often seen as a helper with the GROUP BY statement. GROUP BY was pretty awesome, right? How might we improve upon it?
Let’s consider two concepts:
- Our occasional parallels of PostgreSQL to Excel, and,
- Our previous intermediate challenge example.
Relating PostgreSQL’s GROUP BY to Excel
If we hearken back to our first encounter with GROUP BY, we compared GROUP BY to pivot tables. Specifically, if we have a big data table that records data of recurring themes, e.g., you customer base, it can be helpful to manipulate as aggregated and assembled, vs. raw.
However, GROUP BY is only an intermediate step in data analysis. If we think about our final challenge in the last post, we had to limit the results to 5. Even if we group data, it’s neither segmented nor useful toward analysis.
Thus, we need an additional method for winnowing down our GROUP BY results.
Meet the HAVING Clause
So, about the HAVING clause. It’s most often used in tandem with GROUP BY to filter out data rows that don’t meet a certain condition. Think of it as similar to the WHERE function, just an accompaniment to GROUP BY.
Let’s take a look at basic syntax:
SELECT column, aggregate(column_2)
GROUP BY column
Off the bat, we should this is extremely familiar if we’ve covered GROUP BY. The differentiation is the additional HAVING condition at the end. This condition could be something like, HAVING sum(column_2) less than 50.
The Difference Between WHERE and HAVING
Some of you sharp folks may want to know, “what’s the difference between WHERE and HAVING?” That would be an excellent question. Here’s the difference:
The WHERE clause sets conditions on individual rows, before the GROUP BY clause has been applied. The HAVING clause specifies conditions on grouped rows, created by the GROUP BY clause.
Let’s run some examples.
Using the HAVING Clause in PostgreSQL
Here’s our first example, very similar to our previous skills challenge:
SELECT customer_id, SUM(amount)
GROUP BY customer_id
HAVING SUM(amount) > 150;
Above, we have added another segmentation layer with the HAVING clause.
You can see we’re pulling information about how much our customers have paid us. Further, we specify that we only want to see customers with a lifetime purchase amount of greater than $150.
Let’s look at another example of the HAVING clause. Say for example, we want to know which store has served the most customers. Below, we’ll execute the following code:
SELECT store_id, COUNT(customer_id)
GROUP BY store_id
HAVING COUNT(customer_id) >275;
Above, we’ve selected both the store and customer ID columns from the customer table. Further, we group by the store ID, because we want store-level data, but we only want to see the stores which have served more than 275 customers. Below, we can see only store has done so. 🙂
Combining Usage of the WHERE & HAVING Clauses
We mentioned earlier the WHERE and HAVING clauses are different, somewhat in function, but mostly in order of execution. Here’s what we didn’t say: you can actually use them in tandem. There is a great theoretical use case for this, unfortunately our sample database is a bit small, but here goes.
Let’s think about the film table (used in previous examples.) Perhaps we want to analyze the films, by rental rate, but only films with certain ratings. For example, perhaps we’re no longer interested in carrying NC-17 films, but still want to get an aggregated view of on average, how much each films rents for, by rating. Additionally, we want to see which ratings, if any, have an average rental rate of less than $3.
Here’s the code we would use:
SELECT rating, ROUND(AVG(rental_rate),2)
WHERE rating IN (‘R’,’G’,’PG’,’PG-13′)
GROUP BY rating
Alright, that concludes our section for today. It feels good to be back. 🙂
Hopefully you found this section on the HAVING PostgreSQL statement useful. If you need to backtrack or further explore, here are some useful/recent links:
- Here’s the series page on “How to Learn PostgreSQL”
- Here’s our previous post, “Intermediate SQL Skills Challenge”
- Other recent PostgreSQL posts: