Welcome back, SQL-ites! Today, we’re going to take a significant leap forward in our journey of learning PostgreSQL.
A quick recap on where we’ve been recently:
- The AS PostgreSQL statement
- An Intermediate PostgreSQL Skills Challenge, and,
- The HAVING clause in PostgreSQL
In more broad terms, most of our PostgreSQL training so far has focused on how to best select and segment our data. However, there is much more to be accomplished in big data manipulation. Enter JOINS.
About JOINS & PostgreSQL
JOINS helps unlock the foundation of most SQL databases, that are known as relational databases. Specifically, when we’re able to triangulate multiple data sources together, combined with sorting, segmentation and some intuition, we can get some really powerful stuff.
So what does JOINS do? It allows you to relate data in one table to the data in other tables.
Think about some of our practice runs in PostgreSQL so far. We’ve looked at DVD rentals, customers, transactions, ID’s, and so on. What if we wanted to start linking in more detail customer names with ID’s, or transaction ID’s with film names, or all the above? Weaving data into an interactive story could allow us to see really cool things, such as which customers prefer certain genres, or perhaps unveil renting habits for a timely promotional offer to smooth out our top line revenue.
BASIC JOINS Syntax in PostgreSQL
As with many of our learning adventures, there is complexity and there are layers to our education. 🙂
First, it’s important to note that there are actually several types of JOINS, so this isn’t just an amorphous, blob-lump function. Here are some of the important kinds:
- INNER JOIN
- OUTER JOIN
To start things out, we’re going to look at INNER JOIN first.
INNER JOINS Concepts for PostgreSQL
We’re going to jump headlong into this, at the expense of immediate clarity. Joy! Let’s say we have two tables which we’ll relate data between.
Above, we’ve pulled out some key points of interest between our tables, unique columns, and more importantly, a shared key (column) of customer_id between the two tables. We’ll explain more about keys in a minute. Ish.
We’ll use the shared column of (example) customer_id to further mash up the data. Let’s take a look at syntax now.
INNER JOINS Basic Syntax
If things aren’t 100% connected for you conceptually, I completely understand. Please do bear with me, as seeing some examples may help “close the loop”. Here’s what an INNER JOINS query might look like for a PostgerSQL usage:
SELECT A.pka, A.c1, B.pkb, B.c2
INNER JOIN B ON A.pka = B.fka;
Let’s break this down a bit more.
The first line is extremely similar to what we’ve been doing all along. Calling in data. The subtle difference here is that we’re first including the respective table names. So, it’s tableA.column, tableB.column, tableB.column2.
Next, we specify the primary table we’re first selecting from, per usual. Then we specify the “donor” table that will be combined with the primary table via INNER JOIN. Further, we make this work by using the ON keyword which specifies the shared key columns. 🙂
More Explanation on the Inner Workings of JOINS (Pun Not Intended)
This has been a bit of a tricky concept for me to learn so far, so we’re going to walk through the nuts and bolts of JOINS as operated by PostgreSQL.
When JOINS is executed, PostgreSQL takes every row in our “Table A” and goes through Table B, looking for rows that match our specified condition of A.pka = B.fka. So combining our example and laymen’s terms, the JOINS function looks for common occurrences of the customer_id so that additional, previously disparate data from separate tables may be awesomely fused together. 🙂
If a match is found between the two tables, it combines columns of both A and B rows into one row, and adds that newly combined row to the data set you get back.
Pardon the Interruption: PostgreSQL “life” hacks for JOINS
A couple of quick items that may prove useful to you as we get more familiar with joins.
- Sometimes your tables may share column names (Analytics data, anyone? Can I get an amen?), so you may need to use the syntax table_name.column_name for clarity.
- If you or someone you love has created excessively long table names, you can introduce a shorter alias (via the AS statement) for the table, and use accordingly.
Hey, I Thought We Were Going to Learn About INNER JOIN?
We are! And what’s better, we’re going to use a Venn diagram to explain it. 🙂 #DataScience.
Okay, the INNER JOIN PostgreSQL clause returns rows in Table A that have corresponding rows in Table B. Super scientific visualization below.
The overlapped area above are the results that will be returned to us.
INNER JOINS PostgreSQL Code Examples
Putting all the pieces together, here’s our first INNER JOINS code sample:
INNER JOIN payment ON payment.customer_id = customer.customer_id;
And below, what we get back from pgAdmin.
Okay, a few notes. We went out of our way to say that overlapping column names need table names for distinction. Here’s what we didn’t say, you don’t need to specify table names for unique columns. Above, you can see that we’ve nicely meshed together customer ID with both customer information with payment information. Awesome!
I know this is a bit of a rough stop- but I’m really trying to work on getting more sleep these days. So it’s 12:30am, and I’ve got to head to bed. We’ll be back with updates on this post and more progress forward. Cheers!
Update – 9/16/2017 – Add’l INNER JOINS Examples
Welcome back again, SQL-ites! We’re back with a weekend update from our first look at INNER JOINS in PostgreSQL. Above, we left off with a basic example of INNER JOINS. To quickly recap, we might think of JOINS as basic data composition. Often in marketing and business useful data may begin in silos, thus, combination into useful form must occur.
Our next example of INNER JOINS is a slight modification to the first full example. We have the same basin INNER JOINS syntax, but have added ORDER BY customer.customer_id at the end, to clean the presentation of our data a bit.
INNER JOIN payment ON payment.customer_id = customer.customer_id
ORDER BY customer.customer_id ASC;
Using WHERE with INNER JOINS
Let’s continue to add modifications to our basic INNER JOINS statement. Below, we harness the power of both combination and selection to specify joined results, but for a specific customer ID. A use case for this might involve isolating or routing specific data after it’s been joined, for reporting purposes.
INNER JOIN payment ON payment.customer_id = customer.customer_id
WHERE customer.customer_id = 2;