Posts

DSC_0050 Zach Doty AS PostgreSQL Statement Cover Photo

JOINS Foundations: The AS PostgreSQL Statement

Intro to JOINS: the AS PostgreSQL Statement

What’s up SQL people? We’re back, and better than ever, in our foray to learn PostgreSQL. Since we’ve completed some intermediate skills challenges & learned GROUP BY, it’s time to examine JOINS.

Before we examine JOINS, there’s a key foundation piece we must cover first: the AS statement in PostgreSQL. Let’s jump in.

 

About the AS Statement

The AS statement in PostgreSQL enables us to rename sections of a table or table columns with a temporary alias, or almost a variable, for manipulation.

It’s a simple statement, so let’s see it in action.

 

1. Basic AS Statement Example

Our most basic example is a basic query where perhaps a column wasn’t named to our liking. Consider the following.

SELECT rental_rate AS film_cost

FROM film

LIMIT 10;

2017-08-30-001-AS-Statement-Syntax-Example-1

Great for an introductory example, but not inherently useful. Read on as we apply the AS statement more deeply.

2. Semi-Intermediate AS Statement Example

Let’s provide an example that’s a bit more engaged. Example, if we use aggregate functions, the column output doesn’t have a clean name attached to it. But no longer! The AS statement allows us to have the summation output in a GROUP BY statement to something we’ll recognize.

 

SELECT customer_id, SUM(amount) AS customer_ltv

FROM payment

GROUP BY customer_id

ORDER BY customer_ltv DESC

LIMIT 7;

2017-08-30-002-AS-Statement-Syntax-aggregate-example-2

 

This is something more useful for intermediate PostgreSQL work!

 

Wrap Up

We aren’t spending much further time here since this is a simple application and the JOINS statement is the function we’re truly after. If you’re just joining this series, check out our home page on how to learn PostgreSQL.

DSC_0012 Zach Doty PostgreSQL cover photo

PostgreSQL Intermediate Skills Test

Hey there, SQL-ites. Time for another (yes, another!) PostgreSQL skills challenge. It’s our last stop before moving into more intermediate and advanced PostgreSQL skills, such as JOINS.

If you’re just joining this series on SQL, we previously covered the HAVING statement & GROUP BY. Here’s the home page of our journey in learning PostgreSQL. Today, we do a more rounded knowledge check of older PostgreSQL statements, such as SELECT WHERE, COUNT & ORDER BY.

We have 3 problems, and provide the winning code for each, based on our sample database. In the past, I’ve given more explanation to the problems, but because I’m trying to get some traction myself in moving forward, we’ll only have problem & solution laid out here.

Alright, let’s go!

1. Return the customer IDs of customers who have spent at least $110 with the staff member who has an ID of 2.

The answer should be customers 187 and 148.

SELECT customer_id, SUM(amount)
FROM payment
WHERE staff_id=2
GROUP BY customer_id
HAVING SUM(amount) >110;

 

2. How many films begin with the letter J?

The answer should be 20.

SELECT COUNT(*) FROM film

WHERE title LIKE ‘J%’;

 

3. What customer has the highest customer ID number whose name starts with an ‘E’ and has an address ID lower than 500?

The answer is Eddie Tomlin.

SELECT customer_id, first_name, last_name, address_id
FROM customer
WHERE address_id <500 AND first_name LIKE ‘E%’
ORDER BY customer_id DESC;

 

We’ll see you on the other side soon, for some JOINS awesomeness. 🙂

DSC_0300 Zach Doty Cover Photo for HAVING PostgreSQL Clause

The HAVING Clause in PostgreSQL

Howdy SQL-ites! Welcome back to our next step in learning SQL. After a long hiatus, we recently stumbled our way through some intermediate SQL statement challenges, namely on GROUP BY.

Today, we’re back in the flow, and learning about the HAVING clause in PostgreSQL.

 

About the HAVING Clause in PostgreSQL

The HAVING clause is most often seen as a helper with the GROUP BY statementGROUP BY was pretty awesome, right? How might we improve upon it?

Let’s consider two concepts:

  1. Our occasional parallels of PostgreSQL to Excel, and,
  2. Our previous intermediate challenge example.

Relating PostgreSQL’s GROUP BY to Excel

If we hearken back to our first encounter with GROUP BY, we compared GROUP BY to pivot tables. Specifically, if we have a big data table that records data of recurring themes, e.g., you customer base, it can be helpful to manipulate as aggregated and assembled, vs. raw.

However, GROUP BY is only an intermediate step in data analysis. If we think about our final challenge in the last post, we had to limit the results to 5. Even if we group data, it’s neither segmented nor useful toward analysis.

Thus, we need an additional method for winnowing down our GROUP BY results.

 

Meet the HAVING Clause

So, about the HAVING clause. It’s most often used in tandem with GROUP BY to filter out data rows that don’t meet a certain condition. Think of it as similar to the WHERE function, just an accompaniment to GROUP BY.

 

Let’s take a look at basic syntax:

SELECT column, aggregate(column_2)

FROM table

GROUP BY column

HAVING condition;

 

Off the bat, we should this is extremely familiar if we’ve covered GROUP BY. The differentiation is the additional HAVING condition at the end. This condition could be something like, HAVING sum(column_2) less than 50.

The Difference Between WHERE and HAVING

Some of you sharp folks may want to know, “what’s the difference between WHERE and HAVING?” That would be an excellent question. Here’s the difference:

The WHERE clause sets conditions on individual rows, before the GROUP BY clause has been applied. The HAVING clause specifies conditions on grouped rows, created by the GROUP BY clause.

Let’s run some examples.

 

Using the HAVING Clause in PostgreSQL

Here’s our first example, very similar to our previous skills challenge:

SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
HAVING SUM(amount) > 150;

Above, we have added another segmentation layer with the HAVING clause.

You can see we’re pulling information about how much our customers have paid us. Further, we specify that we only want to see customers with a lifetime purchase amount of greater than $150.

 

Let’s look at another example of the HAVING clause. Say for example, we want to know which store has served the most customers. Below, we’ll execute the following code:

SELECT store_id, COUNT(customer_id)
FROM customer
GROUP BY store_id
HAVING COUNT(customer_id) >275;

Above, we’ve selected both the store and customer ID columns from the customer table. Further, we group by the store ID, because we want store-level data, but we only want to see the stores which have served more than 275 customers. Below, we can see only store has done so. 🙂

 

Combining Usage of the WHERE & HAVING Clauses

We mentioned earlier the WHERE and HAVING clauses are different, somewhat in function, but mostly in order of execution. Here’s what we didn’t say: you can actually use them in tandem. There is a great theoretical use case for this, unfortunately our sample database is a bit small, but here goes.

Let’s think about the film table (used in previous examples.) Perhaps we want to analyze the films, by rental rate, but only films with certain ratings. For example, perhaps we’re no longer interested in carrying NC-17 films, but still want to get an aggregated view of on average, how much each films rents for, by rating. Additionally, we want to see which ratings, if any, have an average rental rate of less than $3.

Here’s the code we would use:

SELECT rating, ROUND(AVG(rental_rate),2)
FROM film
WHERE rating IN (‘R’,’G’,’PG’,’PG-13′)
GROUP BY rating
HAVING AVG(rental_rate)<3;

 

 

Wrap Up

Alright, that concludes our section for today. It feels good to be back. 🙂

Hopefully you found this section on the HAVING PostgreSQL statement useful. If you need to backtrack or further explore, here are some useful/recent links:

DSC_0069 Zach Doty Cover Photo for What is data science

What is Data Science?

Welcome back, campers! It’s been a minute (read:months) since I’ve last posted, and we’re back! (For now.)

Today’s topic, data science: supposedly the latest exploding field, critical to every enterprise.

Why is it important and relevant? The rise of big data has created a relatively untapped treasure trove of insight. However, it’s undeveloped! Further, the tapping of this insight requires a blended skill set which is currently in short supply in the market: the data scientist.

Who and what is a data scientist?

A data scientist is someone who finds new discoveries in data. They investigate hypotheses and look for meaning and knowledge within the data. They visualize the data by creating reports and looking for patterns. What distinguishes a data scientist from a traditional business analyst is the use of algorithms. Algorithms are one of the fundamental tools for data scientists. This requires mathematics knowledge, computer science savvy and domain knowledge.

What does it mean to be a data scientist?

A data scientist may handle open-ended questions such as, “Which customers are more likely to churn?” The data scientist would gather all the data, and run algorithms to find dependable patterns to improve the situation. Seems straightforward, yes? However, there are a range of misconceptions about data science and data scientist For example,  a data scientist may not necessarily be a developer-only or business intelligence analyst-only.

A data scientist will be able to combine both technical know-how and business domain knowledge into mathematics and statistics for maximum effect. That being said, true data scientists are extremely difficult to find and train. However, it may be possible to become a data scientist without expensive and time-consuming degrees, via focused tools and application training.

An oversimplified Venn diagram showing the makeup and value of a data scientist

An oversimplified Venn diagram showing the makeup and value of a data scientist

More Notes on Data Science

When considering data science from a managerial perspective, it’s important to understand the current broad allocation of the average data scientist’s time. An estimated 60-705 of a data scientist’s time is spent assembling and cleaning data, tasks which could be delegated to technical specialists, data integration specialists and so forth. (For example, text mining, SQL queries and so forth.)

If you’ve followed my site and blog lately, you’ll noticed I’ve lapsed a bit on posting. I’m trying to get back into sharing my education again, so stay tuned. Things have just been busy lately. 🙂