Posts

DSC_0045 Zach Doty Cover Photo Group By SQL Statement Function

GROUP BY SQL Statement

Introducing the GROUP BY SQL Statement in PostgreSQL

‘Ello SQL geeks! Welcome back to our SQL learning journey. We left off with a beginner SQL skills challenge and the aggregate SQL functions: MIN, MAX, AVG and SUM. Today we’re looking at the GROUP BY statement. We’ll learn about this function in PostgreSQL and walk through usage of this handy SQL statement.

 

About the GROUP BY SQL Statement/Clause

From my simple understanding, GROUP BY functions like a hybrid of the following:

  • SELECT DISTINCT keyword (If used without an aggregate function like SUM), and,
  • An Excel pivot table, rolling up aggregate figures (Count, Sum, Average, etc.) into unique rows

If you’re familiar with Excel Pivot tables, then you’ll recognize here the power of this function.

 

Let’s take a look at some examples to clarify.

 

First Look at Using the GROUP BY Function

To better illustrate the power of GROUP BY, we’ll first show its usage without aggregate functions. Consider the following:

If we query the address table of our sample database with a generic SELECT * FROM address; we get back an atrocious 605 rows of data. Aggregate and useless!

2017-04-27-001-GROUB-BY-Generic-SELECT-Query-Pitfall

In contrast, if we call the GROUP BY function, we’ll get back a cleaner output, with fewer rows – only unique values returned. While this is an incremental improvement to analyzing data, there’s much left to be desired.

2017-04-27-002-GROUP-BY-Without-Aggregate-Function

 

What’s missing here? How about that pivot table-esque functionality? This is where the power of using GROUP BY with aggregate functions gets awesome.

Using the GROUP BY SQL Statement with Aggregate Functions

As with most analysis, a single data point or data series rarely holds significant insight value on its own. Let’s drive home that point by leveraging the GROUP BY statement with the SUM aggregate function. Below, we compare the ratings of films in our sample database in aggregate by replacement cost. Perhaps this could serve in-store strategies for loss prevention.

SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating;

2017-04-27-003-GROUP-BY-SUM-Aggregate-Function

If we extend this functionality to more real world examples, we could use the following for GROUP BY:

  • Grouping page-level / URL data to roll up clickstream analytics data
  • Large scale analysis of CRM data for customer segmentation analysis
  • Analyzing returns for financial data

The list could (and I’m sure it does) go on.

Let’s take this one step further and reduce potential future workload, by building sort functionality into our query. Below, we add a line to get most expensive ratings to least.

SELECT rating, SUM(replacement_cost)
FROM film
GROUP BY rating
ORDER BY SUM(replacement_cost) DESC;

2017-04-27-004-GROUP-BY-SUM-ORDER-BY-SUM

By the looks of this, no need to guard Land Before Time 8. 🙂

Extending our lesson: you can use the COUNT, AVG, and other aggregate functions to analyze as desired.

 

Wrap Up

Alright, this was a relatively gentle introduction into more advanced SQL functions. GROUP BY is a rather critical function, so in our next article, we’ll be doing yet another skills challenge. Joy!

Feel free to catch up on our other articles that help you learn PostgreSQL. Also, check out some how-to’s on developing Amazon Alexa skills, and a new series on getting started with Machine Learning. As always, please share with your colleagues and share thoughts in the comments below. Cheers.

SQL Aggregate Functions: Min, Max, Avg and Sum

Welcome back, SQL nerds! We’re back in action on the journey of learning SQL, after a beginner PostgreSQL skills challenge.

We’re reaching the end of basic functions and queries with this article. Based on what we’ve learned so far, we can do basic counting, filtering, sorting and pattern matching against PostgreSQL databases.

If you’re just tuning in, here’s the page on how to learn SQL, and the previous SQL article on the LIKE Statement.

Okay, enough jabber. Let’s jump in. The aggregate functions of MIN, MAX, AVG and SUM are our turning point into more complex SQL queries that involve concepts such as GROUP BY, among others.

At the same time, the functions on their own aren’t super complicated. Because we’re ass-u-ming you’re familiar with the general concepts of minimum, maximum, etc., we’re going to forgo conceptual and syntax explanations for demos.

 

AVG Aggregate Function

As we level up in SQL, we’re going to do less and less explaining / handholding / screenshots. That said, we’re going to explore our DVD rental data set for a table with a nice numerical component that would make good use of the functions.

We did a SELECT * FROM film LIMIT 15; to get a peek at the columns. For the purposes of this exercise, the replacement_cost column will do nicely.

To get the AVG:

SELECT AVG(replacement_cost) FROM film;

2017-04-17-001-AVG-Aggregate-PostgreSQL-Function

Using ROUND for Decimal Place Control

You’ll notice in the example above that we got 3 decimal places on what’s supposed to be a dollar amount. How do we fix that? Glad you asked. Meet the ROUND function.

We pass in the target value (average of replacement_cost) and mandate the number of decimal places we’d like returned. Below:

SELECT ROUND( AVG(replacement_cost), 2) FROM film;

2017-04-17-002-ROUND-for-AVG-Aggregate-PostgreSQL-Function

 

MIN Aggregate Function

To find the minimum value in a given column:

SELECT MIN(replacement_cost) FROM film;

2017-04-17-004-MIN-Aggregate-PostgreSQL-Function

 

MAX Aggregate Function

To find the maximum value in a given column:

SELECT MAX(replacement_cost) FROM film;

2017-04-17-003-MAX-Aggregate-PostgreSQL-Function

 

SUM Aggregate Function

To find the maximum value in a given column:

SELECT SUM(replacement_cost) FROM film;

2017-04-17-005-SUM-Aggregate-PostgreSQL-Function

 

Wrap-Up

Alright, that was a relatively quick article! Hopefully this was a reprieve from more involved sections in the past. We should find that as we continue to strengthen our core PostgreSQL  capabilities, these articles and our SQL queries should be easier and easier. If you found this article interesting, you might enjoy a new section on how to get started in machine learning. Cheers.

 

 

 

DSC_0007 Zach Doty Cover Photo for Beginner PostgreSQL Skills Challenge

Beginner SQL Skills Challenge!

Howdy, SQL geeks. Hope this post finds you swell!

Over the past few months, we’ve gained a ton of ground in learning SQL, or at least I have. 🙂

Let’s take a moment to:

  1. Test our knowledge of SQL skills learned thus far
  2. Start seeing SQL queries less as statements of code, and more as real world business challenges

In this article, we’ll have a recap plus three sections:

  • Recap of the training database we’ve been working with
  • General statements of each business problem
  • Hints and thoughts about how to approach each problem
  • Solutions to each problem

 

Recap: Our Training Database

Our training database is a best/old faithful. We’ve been using the surprisingly popular DVD rental database in a .tar file for our practice database.

Contained within this databases are various tables with fictitious information, including: customer information, film production information, business/pricing information and so forth.

In our challenges, we’ll execute various SQL queries to extract pieces of insight for business tasks. It’s assumed in this article that you’ve installed PostgreSQL via pgAdmin and have followed this article series so far, using the DVD rental training database.

Without further ado, let’s begin.

 

The SQL Challenges

Alright, here we go:

  1. How rentals were returned after July 17, 2005?
  2. How many actors have a last name that starts with the letter A?
  3. How many unique districts are represented in the customer database?
  4. Can you return the actual list of districts from challenge #3?
  5.  How many films have a rating of R and a replacement cost between $5 and $15?
  6. How many films have the word Truman somewhere in the title?

 

How to Approach the Challenges

Right, then. In this section, we’ll add some color commentary (read: hints) to our challenges. This should help you understand the mechanics of the solution, while ensuring you can’t see the answers all in one screen. 😉

 

Challenge 1: Rentals Returned After July 17, 2005

As with all challenges, a problem well stated is half (or more) solved. So let’s look at the high levels of the ask, and work our way down. We need information on rentals, so this means we’ll probably be querying the rentals table.

We would want to first examine the rentals table in a concise manner by doing:

SELECT * FROM rental LIMIT 5;

Once you’ve surveyed the table, we really only need one column returned (pun not intended) and we only want the sum figure of returns, where (HINT!!) the return date was after (think a logical operator here) July 17, 2005.

 

Challenge 2: Actors That Have a Last Name Beginning with “A”

Like our first challenge, let’s work from the “top down”. We need actor information, so querying the actor table would be a great place to start. Similar to last time, we need a count of values matching a condition. The difference versus challenge #1  is we need to find match a pattern like or such as an actor’s last name that begins with the letter A.

 

Challenge 3: Number of Unique Districts in the Customer Database

The title and description could cause some confusion here. You may need to do some basic SELECT * FROM table_name LIMIT X; queries to make sure you’ve got the right table. Once you do, we’re looking for an amount of distinct values in the database. Order of operations matters.

 

Challenge 4: Returning the Actual Lists of Districts from Challenge #3

Not much to hint at here – getting challenge #3 is the key here. You’ll really only be simplifying the correct query in challenge #3 to get the correct answer here.

 

Challenge 5: Cheap, (Mildly) Naughty Films

This one might be the longest query yet in this challenge. So we’re looking up film information, thus should know which table to query. We’re returning a value where a certain rating must be returned, and (HINT) we need to layer in one more lens of qualification. That lens dictates we specify a range between (cough, hint!) two values.

 

Challenge 6: Where in the World are Films Containing “Truman”?

This challenge is more of a recency test than retention of older concepts. You’ll need to employ pattern matching again for this business challenge/query to find titles that have some match like Truman in the title.

 

 

Challenge Solutions

Is that your final answer? Below are the queries, with screenshots of what I did.

 

Solution 1: Rentals Returned After July 17, 2005

SELECT COUNT(return_date) FROM rental
WHERE return_date > ‘2005-07-17’;

2017-04-15-001-Challenge-1-Sol-Count-From-Where

 

Solution 2: Actors That Have a Last Name Beginning with “A”

SELECT COUNT(*) FROM actor
WHERE last_name LIKE ‘A%’;

2017-04-15-002-Challenge-2-Sol-Count-From-Where-Like

 

Solution 3: Number of Unique Districts in the Customer Database

SELECT COUNT(DISTINCT(district)) FROM address;

2017-04-15-003-Challenge-3-Sol-Count-Distinct

 

Solution 4: Returning the Actual Lists of Districts from Challenge #3

SELECT DISTINCT(district) FROM address;

2017-04-15-004-Challenge-4-Sol-Select-Distinct

 

Solution 5: Cheap, (Mildly) Naughty Films

SELECT COUNT(*) FROM film
WHERE rating = ‘R’
AND replacement_cost BETWEEN 5 AND 15;

2017-04-15-005-Challenge-4-Sol-Count-From-Where-Between-And

 

Solution 6: Where in the World are Films Containing “Truman”?

SELECT COUNT(*) FROM film
WHERE title LIKE ‘%Truman%’;

2017-04-15-006-Challenge-4-Sol-Count-Where-Like-Wildcard

 

Wrap Up

Well done for completing these challenges! You shall indeed pass. 🙂

Soon, we’ll be covering aggregate SQL functions, such as MIN, MAX, AVG and SUM.

If you’re just joining us, here’s a running list of our articles so to date (4/16/2017):

 

DSC_0104 Zach Doty Cover Photo for Learning LIKE SQL Statement

The LIKE Statement: SQL Statement Fundamentals

Howdy folks! We are overdue for another installation of SQL learning. I’ve slept a few times since the past couple of articles on how to learn SQL. Previously, we talked about the IN Statement, BETWEEN statement, and ORDER BY clause.

In this article, we’ll learn how to execute the LIKE statement in SQL queries. Let’s jump in.

 

About the LIKE Statement & Why it’s Important

Have you ever worked with a data set that’s overwhelmingly large  or complex? Or overwhelmingly large and complex? Sometimes, you need to find data, but can’t recall the exact string or values for a lookup. Or, perhaps, you’re working with a messy data output from say, Google Keyword Planner that groups a range of close variants into one value?

Say you’re looking for values related to designer clothing and designer clothes. Without a better solution, the most probable solution for you is to do a bunch of sorting, filtering, classifying and other data sleuthing at great expense to time and sanity.

The LIKE statement exists to help with the debacle of only having / knowing  part of the lookup criteria you need, courtesy of pattern matching.

PostgreSQL and many other SQL engines/platforms support the LIKE statement, which functions a bit like the below. (Pun not intended.)

 

SELECT keyword, search volume

FROM table

WHERE keyword LIKE ‘cloth%’

 

The above tells pgAdmin / PostgreSQL to get the keyword and search volume columns from table, where the keyword values match values that begin with ‘cloth’ and are followed by anything else, the percent sign. The combination of calling out a text string with an operate is known as a pattern.

When you execute the LIKE statement in a SQL query, pgAdmin will begin reading through the table rows to see if the pattern you’ve specified returns any matches. For the season marketing technology folks, this functionality sure does resemble regex in some ways.

However, there are some differences.

  • Here, instead of the * character being wildcard, the % sign serves as a wildcard matching all characters.
  • If you want to match a single character, the underscore character is used.

 

LIKE Statement Syntax & Examples

Let’s try some examples. Below, we’ll call on the faithful DVD rental practice database, and run a query for customers that have first names like Jen. (Jennifer, Jenny, etc.) Our below code produces the following result.

SELECT first_name,last_name

FROM customer

WHERE first_name LIKE ‘Jen%’;

2017-03-30-001-LIKE-SQL-Statement-Example

 

That being considered, there are other ways we can use the like statement. Above, we used a wildcard to match any endings to a particular string.

Conversely, we could execute a SQL query that specifies a certain ending value, with the wildcard preceding. If we extend that example to such a query below, we should see the following result:

SELECT first_name,last_name

FROM customer

WHERE first_name LIKE ‘%y’;

2017-03-30-002-LIKE-SQL-Statement-Example

 

Above, we’ve flipped the tables so we capture every possible beginning condition under this pattern. Also important to note, the patterns aren’t limited beginnings or ends. You can use this wildcard in the middle, etc. Consider the following:

SELECT first_name,last_name

FROM customer

WHERE first_name LIKE ‘%er%’;

2017-03-30-003-LIKE-SQL-Statement-Example

 

Now, on top of all this awesomeness, realize we’ve been using it as a matching filter. We can also employ the NOT LIKE syntax to exclude values meeting the pattern we’ve specified.

Let’s mix things up a bit. We mentioned a second type of pattern matching character that we haven’t used yet: the underscore.

SELECT first_name,last_name

FROM customer

WHERE first_name LIKE ‘_her%’;

2017-03-30-004-LIKE-SQL-Statement-Example

 

Above, we’ve made a similar ask. However, instead of requesting all possible matches, we’ve mandated SQL only return the ‘er’ string that begins with ‘h’.

To throw you a quick curveball, it’s worth noting the LIKE statement is case sensitive in its matching. Could be bad, could be good, could be neither. However, there’s a way for you to force case insensitivy on the queries. The difference in your statement appears relatively minor. Instead of using a function that calls LIKE or NOT LIKE, you’ll use ILIKE.

 

Wrap-Up

Hope you found this useful! Stay tuned for more SQL learnings and application. If you’re new here, visit the page on how to learn SQL. If you’re interested in more educational material, check out our ongoing series of how to develop Amazon Alexa voice search skills, and getting started with algorithmic trading. Cheers!

The IN Statement: SQL Fundamentals

Howdy, budding SQL masters! It’s been a minute since the last SQL post! I’ve taken some time to devote to Alexa Skills, some other personal housekeeping, and adopting a dog with my wife!!

Last time we talked SQL, we learned the BETWEEN statement to hone in on the exact values we wanted for a SQL query. Prior to that, we looked at the LIMIT statement to get a small return data sample and also working with ORDER BY to take sorting and filtering into our own hands.

Today we’ll be working with the IN statement.

About the IN Statement

The IN SQL operator is a companion to the WHERE clause to see if a value matches any value in a list of values. Sounds something kind of like VLOOKUP’s, yeah? The syntax of an IN statement might look like:

value IN (value1, value2)

The expression in your SQL query will return true if the value(s) you’ve specified in the query match any value in the list you’re referencing.

Sub-queries: Using the IN Statement with SELECT

The list of values is not limited to a static number or list of strings. That is, you can be a bit more dynamic and free-flowing with your query by getting the value via a SELECT statement.

This is also known as a sub-query. To illustrate, the syntax might look like:

value IN (SELECT value FROM table)

NOT IN: Not Making Fetch Happen

The IN statement has a similar corollary as the BETWEEN statement- a NOT modifier. So, we can take the inverse of a statement. (Drawing on some experience in data analysis, this can just as, if not more useful than the original statement!)

value NOT IN (SELECT value FROM table)

 

Sample SQL Queries for the NOT IN Statement

Alright, let’s run some sample queries using the IN statement! Here’s our first go, working again with the ever-present dvdrental sample database. Below, we are selecting rental information for customers matching only a certain ID cohort (think perhaps loyalty group?), with a few columns and sorting by descending order for the return date:

SELECT customer_id,rental_id,return_date

FROM rental

WHERE customer_id IN (1,2)

ORDER BY return_date DESC;

Below, we see our results returned as intended!

2017-03-13-001-IN-SQL-Statement-Example 

How about our corollary? If we want to add the NOT IN modifier, it’s as straightforward as you would imagine.

FROM rental

WHERE customer_id NOT IN (1,2)

ORDER BY return_date DESC;

2017-03-13-002-IN-SQL-Statement-Example-2

Above, we again receive the desired result. I believe we also get a lesson in the importance of clean data here, as blank values are presented first. Let’s try one more sample query, just for kicks. We’ll switch it up ever so slightly, now working with the payment table.

SELECT city_id,city,last_update

FROM city

WHERE country_id IN (44,82,60)

ORDER BY city ASC;

2017-03-13-002-NOT-IN-SQL-Statement-Example

Above, we’ve demonstrated that we don’t necessarily need to display a column we use in the query (country_id) and have sorted by city name in ascending order.

 

Wrap Up

So why the IN statement? At face value, it seems rather simplistic among SQL clauses we’ve explored thus far. Let’s consider the following.

The IN statement allows us to avoid a messy swath of values courtesy of BETWEEN (if we’re working with a proper large data set), or daisy chain list of a bunch of equals OR statements. Here’s another plus, it’s thought by some that pgAdmin / PostgreSQL will execute the IN query faster than the list of OR statements.

Alright, that’s it for today’s article. If you’re like me and need a quick refresh, revisit our page on Learning SQL that contains all the topics we’ve covered so far.

 

The BETWEEN Statement: SQL Statement Fundamentals

Hey there, folks! Welcome back to our journey in learning SQL. Our last few of posts covered the ORDER BY clause, LIMIT statement and the COUNT function. (See full list of SQL tutorials here.)

Today, we’re going to cover the BETWEEN statement. This is the start of some deeper material. In addition to the BETWEEN statement, we’ll also soon be covering IN and LIKE statements.

 

About the BETWEEN Statement

The BETWEEN statement (rather, operator) is used to match a value against a specified range of values. Maybe we want to get transactions between a certain dollar amount.

For example, value BETWEEN low AND high;

More about the BETWEEN statement. If the value is greater than or equal to the low value and  less than or equal to the high value, the expression returns true, or vice versa. Also, the BETWEEN operator can be rewritten by using the greater than or equal to ( >=) or less than or equal to ( <=) operators as the following statement.

value >= low and value <= high;

One other way to think about the BETWEEN statement is that takes two WHERE statements and lumps them into one. Think back to the WHERE statement, as we would say first:

SELECT column1 FROM table WHERE column1 >= 2 AND column1 <= 7;

 

To BETWEEN or NOT BETWEEN

Conversely, you can extend the usage of the original BETWEEN operator to NOT BETWEEN. This is similar in concept, except working toward exclusion, instead of inclusion. So, if we want to check if a given value is outside of a range, we can use the NOT BETWEEN operator as below.

value NOT BETWEEN low AND high;

Again, similarly to the double WHERE statements, NOT BETWEEN simplifies the burden into a single statement. For sanity, we’ll skip the SELECT WHERE example.

 

Using BETWEEN in pgAdmin / PostgreSQL

Below, we’ve taken the BETWEEN statement for a road test on our address table.

SELECT address_id,address FROM address

WHERE address_id BETWEEN 10 AND 20;

2017-02-18-001-BETWEEN-SQL-Statement-Example

How sweet was that? Syonara, comparison operators. Conversely, if we convert the above query into a NOT BETWEEN operator, we should see the below statement return the following results.

SELECT address_id,address FROM address

WHERE address_id NOT BETWEEN 10 AND 20;

2017-02-18-002-NOT-BETWEEN-SQL-Statement-Example

Okay, that was cool. Let’s hold up for a quick second. What about non-integer columns, such as data? Caveat: there’s a lot more to data types than our humble example below. For the purposes of this article, we can take a YYYY-MM-DD date and place the values into strings.

SELECT rental_id,rental_date,inventory_id FROM rental

WHERE rental_date BETWEEN ‘2005-05-24’ AND ‘2005-05-27’;

2017-02-18-003-BETWEEN-Non-Integer-SQL-Statement-Example

Voila!

Wrap-Up

Alright, for those of you reading this later, it’s 2 a.m. on a Friday night / Saturday morning, and I’m about ready to call it a night. Sorry I’m not providing a better conclusion for you today. I’m excited to delve further into more complex SQL queries. Keep track of my shared journey of a beginner learning SQL. Cheers!

 

 

DSC_0163 Zach Doty Cover Photo for ORDER BY SQL Clause

ORDER BY Clause: SQL Statement Fundamentals

Welcome back to our SQL learning journey! It’s been a week and a half since my last post on using LIMIT. Work got crazy and I got the flu! 🙁 But we’re back in action. Today, we’ll delving deeper into SQL statements in PostgreSQL: ORDER BY.

Let’s jump in. Why would you need a statement that orders data? Whenever you query data from a table, PostgreSQL will by default return the rows in the order they were inserted into the table. (Read: not the order you want.)

To sort the result set from your SQL query, you can use the ORDER BY clause in the SELECT statement and specify a certain ascending or descending order.

Sample ORDER BY Clause Syntax

Here’s an example of the ORDER BY clause would look like within a SELECT statement.

SELECT column_1, column_2

FROM table_name

ORDER BY column_1 ASC / DESC;

Some important notes for the ORDER BY clause:

  • Specify the column you want to sort by with the ORDER BY clause
    • If you sort the results by multiple columns, use a comma to separate between the two columns
  • Use ASC to sort the results in ascending order
  • Use DESC to sort the results in descending order
  • Should you leave the ORDER BY clause blank, it will use ASC (ascending) by default

ORDER BY Clause Examples in SQL Statements

Alright, first up. We’ll do a basic SELECT statement, adding both ORDER BY and LIMIT clauses. Code and screen shot below. The query should select the first and last name columns from the customer table, ordered by last name ascending (A > Z), returning only the first 10 rows.

SELECT first_name,last_name  FROM customer

ORDER BY last_name ASC

LIMIT 10;

2017-02-16-001-ORDER-BY-SQL-Statement-Example

 

Let’s try another example. What if we want to do an advanced / multiple sort? Let’s try it out, and change it up ever so slightly from the first sample query.

SELECT first_name,last_name

FROM customer

ORDER BY first_name ASC,

last_name DESC;

2017-02-16-002-ORDER-BY-SQL-Statement-Example-Multiple

If we scroll down to the first set of duplicate first names, we’ll see the last name has been presented in descending order (Z > A).

More Details About the ORDER BY Clause

Did you know? In PostgreSQL, you can ORDER BY columns that aren’t explicitly selected within the SELECT statement? E.g. We could only select first and last name, but order by their address_id value. Interesting, yes? Important caveat, other SQL programs, such as MySQL, may not let you do this.

Below:

SELECT first_name FROM customer

ORDER BY last_name ASC;

2017-02-16-003-ORDER-BY-SQL-Statement-Example-Other-Sort

Wrap-Up

Thanks for joining me and hopefully you’ve learned the essentials for the ORDER BY clause. Feel free to check out the entire journey on how to learn SQL, thanks!

bluebonnets Zach Doty cover photo for SQL COUNT Function

SQL Statement Fundamentals: The COUNT Function

 

Hello, hello, hello SQL fans! (Or gracious friends and family perusing the site. 🙂 ) The journey into learning SQL continues, and today we’ll cover the COUNT function. Jumping right into it, here’s our working definition of the SQL COUNT function:

The COUNT function should return the number of input rows that match a specific condition of a query.

Rather, this would appear to work similarly in concept to the COUNTIF(s) formula(s) in Excel.

COUNT Statement Syntax Examples

Here’s what a simple COUNT SQL statement might look like:

1. Basic COUNT (*) FROM SQL Statement

SELECT COUNT (*) FROM table;

Breaking it down a bit, the COUNT () function returns the number of rows returned by a SELECT clause. When you apply the COUNT () statement to the entire table, pgAdmin/PostgreSQL will scan the entire table in a sequential manner.

Additionally, you can specify a certain column count in your COUNT statement for better readability:

2. COUNT (column) FROM SQL Statement

SELECT COUNT(column) FROM table;

Similar to the COUNT(*) function, the COUNT(column) function returns the number of rows returned by a SELECT clause. However, if you have empty or NULL values, the COUNT function will not take those into account.

3. COUNT (DISTINCT column) FROM SQL Statement

If we do a bit of application from our past learnings, we can make a COUNT with DISTINCT SQL statement:

SELECT COUNT(DISTINCT column) FROM table;

Applying the COUNT SQL Function

Alright, so let’s work our way toward applying what we’ve learned. To start, let’s do the traditional probe of the table before diving in, to familiarize ourselves. Below, we’ve done a basic,

SELECT * FROM address;

2017-02-01-001-SELECT-ALL-Starting-SELECT-COUNT

As we get familiar with the table, we can scroll down and see this particular table has 605 rows in it. This will be a reference point as we continue.

Moving forward, we’ll execute a basic SELECT COUNT (*) FROM address; SQL query. Below, you’ll see a slightly different result was returned. Inst3ead of 605 rows, 603 was returned. At this point, kindly reference our note about empty / NULL  values being excluded from the COUNT function.

2017-02-01-002-SELECT-ALL-COUNT-1

We’ve established a general proof of concept for the SELECT COUNT statement, considering the reduced load on the server and yourself, for a quick count. Let’s now try calling specific columns. In our first exploration of the address table, we saw a number of columns, including the district. Let’s say we want to get a count for how many districts/states our customer base covers.

2017-02-01-003-SELECT-COUNT-DISTINCT-1

Above, a count of 378 distinct district values has been returned. Wow, what coverage!

A quick aside for future usage, you can also nest the column reference in its own set of parentheses, as shown and returned below.

2017-02-01-004-SELECT-COUNT-DISTINCT-Nested

Wrap-Up

There you have it! We’ve learned a bit about the COUNT function, what it does and how to use it. It will likely come in handy for future articles, particularly when we delve into group-by excercises. If you missed it, here’s the previous article on learning how to use SELECT WHERE and another recent article on learning how to use SELECT DISTINCT. Also, to start from the beginning, here’s my running list of articles on how to learn SQL. Cheers!

DSC_0030 copy Zach Doty Cover Photo For SQL Statement SELECT WHERE

SQL Query Fundamentals: SELECT WHERE

Welcome back, travelers! The journey continues in learning SQL. In case you missed it, the past couple of posts were learning to use SELECT DISTINCT and Restoring an SQL database with table schema only.

Today we’re going to learn about using the SELECT clause with the WHERE statement for essential SQL queries.

A Quick Recap, or, Why SELECT WHERE is Important

As previously mentioned, we’ve covered a range of beginning SQL topics. Mainly, we’ve learned about using the SELECT statement to query all (*) or query specific columns of data from a table. Because we’ve been working with a pared down table with only a few hundred rows, it’s not a problem in this “academic” setting to return all the rows. But what if we’re working in a larger database? A recent keyword research dataset 75,000 rows long comes to mind. (Though I would imagine that too, would be a small dataset in the grand scheme of things, but I digress.)

When we start working with larger databases, granularity will become vitally important. This is where (pun not intended) the SELECT WHERE statement comes in.

Sample Syntax of the SELECT WHERE Statement

With proof of concept in mind, let’s jump into it headlong. Below is a syntax example, demonstrating what a SELECT WHERE SQL query might look like.

SELECT column_1,column_2

                FROM table_name

                WHERE condition1;

The SELECT statement is old news to you adventurers that have been following along. The WHERE portion of this query will be the power in this article.

More About the SELECT WHERE SQL Statement

The WHERE clause appears right after the FROM clause of the SELECT statement. The conditions listed within the WHERE clause are used to filter the rows returned from the SELECT statement. Because we’re working in pgAdmin / PostgreSQL, we’ll have available standard operators to construct the conditions. Better still, some (or most) of the operators we’ll look at are fairly universal, so these operators should work in MySQL, Microsoft SQL, Amazon Redshift, etc.

List of SELECT WHERE Operators

Below is a list of common SELECT WHERE operators. Again, most of these should be fairly universal, regardless of the SQL database management program you’re using.

Operator Description
= Equal to
Greater than
Less than
>= Greater than or equal to
<= Less than or equal to
<> or != Not equal to
AND Logical operator AND
OR Logical operator OR

 

Plugging this back into SQL statements, we’ll be using the operators on the left to filter down and return only specific rows in our queries.

Sample SELECT WHERE Statements

Let’s  cover some guiding examples that will help us apply the SELECT WHERE operators. To start, we’ll kill two birds with one stone: jogging the memory by utilizing a previous query and exploring the table we’ll be querying. Below, we’ll be a bit naughty by calling on all columns from the table.

SELECT * FROM customer;

2017-01-30-004-SELECT-WHERE-CustomerID-OR

Above, we’ll see our results and some candidate columns to query the heck out of. Let’s keep moving.

 

Example 1: The Basic SELECT WHERE Statement with 1 Condition, Returning Only Customers with a Certain Name

Okay, let’s say that we want to only return customers of a certain name, say, “James”. The SELECT WHERE statement will help us make quick work of this database need.

SELECT last_name,first_name

                FROM customer

                WHERE first_name = ‘James’;

If all goes correctly, we should only get back a customer with the first name James. ‘Ello James!

2017-01-30-002-SELECT-WHERE-First-Name-Equals

Side note 1: you don’t have to return the columns you’re filtering against. For example, we could return the email column and still filter by name.

 

Example 2: A SELECT WHERE Statement Using 2 Conditions, Returning Customers with a Certain First AND Last Name

Perhaps you’ll want to do something more targeted with your data. I know this is a narrow and frankly creepy example of calling out one name, but think maybe of a City/State or Source/Medium pairing? Anyway, with the sample dataset we have, below we use the AND logical operator to combine two conditions into one query.

SELECT last_name,first_name

                FROM customer

                WHERE first_name = ‘Jared’ AND last_name = ‘Ely’;

If executed properly against this particular sample dataset, we should be returned only the values for one fictional Mr. Jared Ely.

2017-01-30-003-SELECT-WHERE-First-Name-and-Last-Name

Quick side note 2: that we should have mentioned sooner: we’re using single quotes here because the values we’re querying against are string values. As such, the single quotes help us match format, et cetera.

 

Example 3: Another SELECT WHERE Statement Using 2 Conditions, Returning Customer ID’s where payment was in certain dollar amount ranges

Let’s say we are trying to identify a range of customers in our database. In this third example, we want to query Customer ID’s and names in a certain range. We could think of it as a feeble attempt to get our first customers or our most recent customers for a special email flight. Below, we’ll exercise the OR operator to accomplish our desired output.

SELECT customer_id,first_name,last_name

                FROM customer

                WHERE customer_id <= 2 OR customer_id >=20 ;

2017-01-30-004-SELECT-WHERE-CustomerID-OR

Understanding the Subtle Differences Between AND / OR

Quick note about differences between the AND / OR operators. If you’re trying to filter data from two different columns, then AND is your filter. If you’re trying to get distinct values within a single column, then the OR operator will be best suited for the job.

 

Wrap-Up

What a world of possibilities we’ve opened! I found myself needing to slow down and pay more attention to detail in this area of learning. The nuances of selecting certain columns but filtering by others when practicing threw me for a loop once or twice.

Thanks for joining, in the next article, I’ll be covering some introductory material around the COUNT function. In the meantime, check  out our running list of posts on how to learn SQL.