Marketing Data Lakes & Warehouses

My most recent post covered strategy & planning for digital marketing agencies, specifically from an SEO perspective.

In my mind, one of the most compelling questions was this: why aren’t there more success stories with SEO agencies? From my young perspective, I call out 5 things successful entrants can do for the best shot at success.

We may also find that these pillars of success (ex: depth of product vision, operational efficiency) rarely exist in isolation.

That brings us to marketing data lakes and warehouses! Perhaps popularized by Informatica, data lakes are essentially all your data tossed into one space- hooray!

In their ebook, Informatica notes that data lakes are well-suited for large enterprises at $100MM+ and above. I would largely agree (for many reasons) and would also assert this leaves ample learning opportunity for brands below that threshold.

For the large enterprise above, there are often prerequisites to reaching that level. Specifically, you’ll often need to achieve certain sophistication before advancing to a level of ad hoc ML analyses of unstructured data in the cloud.

If we’re being honest, I’m probably bending and abusing some terminology here, but bear with me! For the vast majority of businesses (which also site below that revenue or capitalization threshold) – there’s a lot of power in data lakes, for the purpose of building small data – focused data warehouses.

To the points of my earlier posts, you need cohesive data that can paint a clear, end-to-end picture of your digital business. To that end, consider the following:

With the proper platforms, API scripts/pulls and storage, you can amass all of your key marketing data into one, or a few areas. Depending on your tactical execution, SQL joins or other BI/dashboarding software can be used to unite your data by URL.

From this picture, you can easily pivot, filter, splice, etc. to see what has been performing, how, and where the impact is going to your business.

Ex: in one picture, you can immediately juxtapose log file activity on your site with Search Console data to see where Google got hung in your site, as well as quickly identifying how much it’s impacting your visibility, and also quickly seeing which keywords are most affecting your business.

To take it a step further, this can be scaled across multiple clients, where you may write the client name or vertical into separate columns into all of your respective data pulls. This also facilitates large-scale rollups of your data. It’s my estimation that a compelling example of this may be Merkle’s quarterly digital marketing reports.

As with previous posts, I hope to continue peeling back this onion of excellence in marketing. However, my next planned immediate posts may pivot slightly to operational efficiency in the context of providing excellent client service.

Digital Marketing Agency Strategy and Planning

Or, What Marketing Could be, Part 2

Recently I broke my unintentional blogging hiatus to write about what marketing could be, and things to think about if your organic search traffic from Google goes down.

I fully intend to return to technical education and blogging, but perhaps the dawn of a new decade is bringing out the introspect and retrospect in me.

Today, let’s continue the conversation about what marketing could be. I’m taking the lens of marketing from an SEO perspective, because it’s most relevant to me. But also because SEO is one of the most challenging marketing disciplines to monetize & scale within the agency and consulting realm.

Have you ever noticed there aren’t really any big, pure-play SEO firms out there? The largest search marketing agencies are just that- search. Paid media spend (thus services) still rules the marketing roost, and probably will for ages to come.

Why aren’t there big, pure-play SEO firms? SEO is tricky to practice, much less manage. Our practices try to reflect our disciplines. Paid media, for its foibles of opaque procurement and pricing, still reflects a largely ordered economy, organized by purchasing power.

SEO, on the other hand, is near-superhuman complexity at global scale- courtesy of Google’s 20+ years of work in computing, information retrieval, and now, artificial intelligence. Something not easily mirrored by ragtag gangs of consultants and agencies.

As a result, the most frequent successful outcomes in SEO are often agency acquisitions or series C/D venture funding rounds for technology platforms.

Outrageously successful outcomes are so rare in SEO, because so few can consistently master all 5 pillars of success at once:

There are 5 key behaviors that consultants and agencies (all disciplines, but most specifically SEO) must master on a consistent basis to achieve, and more importantly, maintain success.

  • Depth of product vision
    • Can you master SEO strategies and tactics to such a degree as to actually achieve results?
  • Client service proficiency
    • Can you keep the clients happy and paying?
  • Operational efficiency
    • Can you scale internally the first two items to a point beyond your own finite time?
  • Marketplace momentum
    • Can you project externally your abilities to do the first three behaviors?
  • Service capability breadth
    • Can you, with SEO as a cornerstone offering, expand from it, into other services? (Ex: analytics, conversion optimization, etc.)

I can’t even say I’m doing all five above behaviors consistently, and our current landscape is evidence that few others are doing so either.

Examining, and perhaps quantifying these pillars will be a fun exercise in the coming days.

Google Traffic Down!

The Hitchhiker’s Guide to Fielding and Surviving Frantic Client Requests

*Fair warning, this is a live draft / work in progress post, as of 12/7/2019! Polishing pending.*

“Our Google traffic is down!” If you’ve done any time in the SEO consulting or digital marketing business, you’ve seen this client (or manager) email at the most inconvenient of times.

From my corner of the industry, it seems like digital marketing and specifically SEO talent comes and goes in waves. For me, it feels like there have been 3 waves, or “generations” of practitioners coming online in the past 5 years.

Again, from my corner of the industry, it seems like marketing agencies are having an increasingly difficult time finding future leaders from in-industry. Thus, if you polled junior (and even perhaps mid-level) SEO practitioners, I fear many wouldn’t be properly equipped to handle the above situation.

For many junior SEO agency personnel, getting the “Google traffic down!” email is an exercise in confusion that mirrors that of the client/manager. Random tactical suggestions emerge, instead of a thorough approach to troubleshooting a potential Google search traffic drop.

Running through a short list of questions can easily cover “gotchas” and a rather large set of common problems.

  • Is the site up?
  • Can our content be found by search engines?
  • Did we set up the site (analytics) properly?
  • Is there search demand for our target terms?
  • Does the content match with the search opportunity?
  • Does the site have a good experience?
  • Is the site properly engaging visitors?
  • Is the brand delighting and retaining users?

This still leaves us with a “broken window” problem- we only wait to look at these metrics until something is wrong. What if we could automate some of this?

  • Is the site up?
    • What if we had consistent uptime monitoring?
  • Can our content be found by search engines?
    • What if we had automated downloads and data warehousing of Googlebot or other search engine log files?
    • What if we were able to submit XML sitemaps on demand with a few button clicks via the API?
    • What if we knew each day or week, the precise page speed figures via the pagespeed api
  • Did we set up the site (analytics) properly?
    • What if we set up a CURL, Beautiful Soup or other method for checking rendered page code to check for tags?
  • Is there search demand for our target terms?
    • What if we set up automated Google Search Console API calls to see daily search data? (Knowing full data may come at a delay)
  • Does the content match with the search opportunity?
  • Does the site have a good experience?
  • Is the site properly engaging visitors?
  • Is the brand delighting and retaining users?
    • What if we rigged up alerts and API data calls for the above data?

What if we were able to warehouse, aggregate and analyze all of this data, benchmarked, at scale? We’d really have something. 🙂

Cover image of a blog post about automating Screaming Frog SEO crawls with command line and batch files

Automated Screaming Frog Reports With Command Line & Batch Files

Did you know that you can automate Screaming Frog crawl reports using Command Line and Windows batch files?

First, the credit goes to Tom Gregan and Open Source SEO for cutting the trail. 🙂

Starting with version 10, Screaming Frog SEO Spider has offered command line functionality– how cool is that? What’s more, the command line functionality is quite customizable, capable of both pulling in configuration files and exporting crawl data with flexibility.

Here’s some sample code for Windows Command Line you can run. Again, credit to Tom and Open Source SEO here. You can drop this into Notepad, Sublime Text, etc. and save as a batch file.

set crawlResults=C:\Users\you\Documents\client\Weekly-SFSEOS-Results
:: Creates variable %results% for where crawl will be saved

set sf=C:\Program Files (x86)\Screaming Frog SEO Spider\
:: Creates another variable telling Command Line where it can find SF

set configFile=C:\Users\you\Documents\client\Weekly-SFSEOS-Results\sample-config.seospiderconfig
:: Creates another variable telling CLI and SF where to find crawl configuration instructions - may be needed for more complex crawls or API settings

set domain=
:: Sets a variable telling CLI and SF which domain or URL to crawl

chdir /d "%sf%"
:: Directs CLI to the directory you specified earlier

ScreamingFrogSEOSpiderCli.exe --config "%configFile%" --crawl "%domain%"  --save-crawl --headless --output-folder "%crawlResults%" --export-format "xlsx" --export-tabs "Internal:All" --timestamped-output
:: Runs the SF CLI exe file, which performs the crawl with the variables specified earlier 

The fun part (as if this wasn’t cool enough!) is that you can make a ton of these batch files, slave them from a separate “leader” batch file, and schedule them to run on a recurring basis.

start cmd /k Call website-a.bat
start cmd /k Call website-b.bat

Important to note- the above code sample assumes that you place the master and the “follower” Screaming Frog crawl batch files in the same directory on your computer or server.

How nice is that? From here, you could conceivably have lots of options to pull into a business intelligence tool like Tableau, Power BI, Data Studio or pull into data frames in R or Python.

This could mean the end of blind spots for your SEO efforts! Think about monthly, weekly, or even daily crawls of your sites, competitors, publishers, etc. Happy coding!

Photo taken by Texas SEO consultant at Clearwater Falls in Oregon

What Marketing Could Be, Part 1

The rambling preamble

For those imaginary regular readers out there, I find myself again starting a post with, “…I know it’s been awhile since I last posted…” Yet it’s truly been, say, 2 years since my last post?

Anyway, this year marked my 7th year in industry after college. I’m looking to reflect and document some thoughts for future reference, out in the public domain for accountability to myself.

I wish I could say I’ve seen it all in marketing and advertising. But I can only say that I’ve seen a few good and bad things out there in my few years of experience. But put plainly, I now feel like the marketing industry can do better. In every way.

That last sentence conveys a wide range of conversation, but this post more pertains to the operations and setup of agency life as we know it today.

Why is marketing so hard?

For the initiated, the marketing and advertising industry may often feel awash in chaos, churn and turmoil. If you don’t feel that way, I’d love to talk to you. But assuming that chaos reigns, a first glance could say it starts at the top:

Ouch, churn seems to flow down the chain, no?

Do clients (brands) cause marketing churn?

A frustration I’ve long held, but rarely shared, is that many brands and clients have large knowledge gaps. Let me explain:

eMarketer tells us that today, digital takes 50% of marketing spend worldwide. By 2023, we expect that almost 2 out of every 3 marketing dollars will be digital.

Why is this significant?

Digital marketing is predominantly technical marketing, rendering many traditional creative disciplines as the price of admission.

Technical and analytical chops now often set the bar for excellence. So much so that millions of dollars and business returns depend on it. Or risk failure, in some instances, such as Accenture’s shortcomings with Hertz, to the tune of an ugly lawsuit.

To put it more plainly, many clients and brands lack the expertise to execute technical marketing. I would further assert that many clients and brands lack the expertise to manage technical marketing.

But what does that mean? That means you have digital marketing and SEO managers at S&P 500 retailers that don’t understand the basics of optimizing a web page or how search engines crawl sites. That means the head of media for a top 50 Internet Retailer doesn’t know what CPC stands for.

The technical marketing knowledge gaps at brands often prohibit them from achieving marketing performance, much less growth.

The salient question: do clients cause marketing churn?

I would offer a qualified yes, because this post started on, and should end with, a focus on marketing agencies.

Are agencies helping marketing churn?

It’s a loaded question. No, most agencies aren’t helping themselves. Many agencies do have a basic burden to educate their clients. Yet a minimum effective dose of education only extends to ensure the safety of a retainer.

Here’s the fun part:

The technical marketing knowledge gaps at brands often prohibit them from achieving marketing performance, much less growth.

By nature, and the bias of some (almost) funny examples, this could lead us to say that education is the big problem in this equation. Here’s where qualification and nuance mean a lot.

Technical marketing knowledge gaps are easy to pin on uneducated clients. But what about unequipped clients? Therein lies the importance of nuanced views. Do clients need to be proficient tacticians in technical marketing to succeed? Honestly, no.

Do clients need to be competent managers of a technical team to succeed? I would say yes. From this perspective, agencies can thrive when they equip their clients to make smart decisions.

The demise of agency engagements

Stop me if you’ve heard this one before.

  • Client engages agency
  • Agency sends recommendations
  • Client doesn’t implement
  • Nothing gets done
  • Results don’t change
  • Agency gets cut
  • Client engages another agency
  • Agency sends recommendations…

And so on.

What goes on in that ridiculous cycle of a bulleted list? Agency tells client, “You need to fix these render blocking resources to fix your page speed.”

Client asks, “Okay, can you tell me why this is important so I can get it at the top of our dev team’s JIRA queue?”

Agency, “…because a Lighthouse audit told us so!”

And this conversation happens for everything, from hreflang sitemaps to migrating to HTTPS, to putting body copy on product category pages, to….the list goes on.

Most agencies don’t have compelling data or evidence to properly answer client questions – rather, they don’t equip the clients!

Why aren’t agencies better equipped?

When you look at marketing agencies, you can generally place players into one of several buckets:

  • Freelancers or small consultancies
  • Mid-market challengers
  • Big-box holding company conglomerates

Before you @ me, yes, there are exceptions to the above. I could think of 3 agencies offhand that break this mold. But if we look at the aggregate landscape, we could safely say that 80%+ of the market fits these buckets.

Across these buckets, there are a few ironically shared challenges that keep these types of players from delivering outstanding marketing.

  • Lack of strategic acumen
  • Lack of technical expertise
  • Lack of time/staffing

There are no easy answers to lacking strategic acumen. If you’re working with any party lacking this- consider changing/improving your situation as needed. (Ex: your PPC partner doesn’t know the right questions to ask about PPC.)

However, technical expertise and lack of time/staffing may be offset to some degrees- though automation.

Building Smarter Marketing

We (marketers) are un- or under-equipped because of these challenges, which render us unable to provide necessary answers to routine questions as mentioned earlier.

Automation (in varying forms) can be a powerful tool to equip both agencies and brands for smarter marketing.

Think about an SEO engagement. What if:

  • Server logs were routinely and automatically retrieved, stored and analyzed?
  • Market data such as Google Trends were routinely captured on brands, competitors, and industries of interest?
  • Google Search Console queries, crawling activity, links, etc. were captured daily or weekly?
  • Automated crawls of brand and competitor sites took place, for action/alert against new or broken content?
  • Analytics and 3rd party rankings and audit data were routinely, automatically pulled

What if all this data could be stored together, routinely and automatically retrieved, stored, analyzed and actioned upon?

You would certainly be able to answer many questions effectively. Better still, you could learn many more of which to ask.

It’s my hope to answer those questions above and more in the coming days.

JOINS in PostgreSQL

Welcome back, SQL-ites! Today, we’re going to take a significant leap forward in our journey of learning PostgreSQL.

A quick recap on where we’ve been recently:

In more broad terms, most of our PostgreSQL training so far has focused on how to best select and segment our data. However, there is much more to be accomplished in big data manipulation. Enter JOINS.


About JOINS & PostgreSQL

JOINS helps unlock the foundation of most SQL databases, that are known as relational databases. Specifically, when we’re able to triangulate multiple data sources together, combined with sorting, segmentation and some intuition, we can get some really powerful stuff.

So what does JOINS do? It allows you to relate data in one table to the data in other tables.

Think about some of our practice runs in PostgreSQL so far. We’ve looked at DVD rentals, customers, transactions, ID’s, and so on. What if we wanted to start linking in more detail customer names with ID’s, or transaction ID’s with film names, or all the above? Weaving data into an interactive story could allow us to see really cool things, such as which customers prefer certain genres, or perhaps unveil renting habits for a timely promotional offer to smooth out our top line revenue.


BASIC JOINS Syntax in PostgreSQL

As with many of our learning adventures, there is complexity and there are layers to our education. 🙂

First, it’s important to note that there are actually several types of JOINS, so this isn’t just an amorphous, blob-lump function. Here are some of the important kinds:

  3. self-join

To start things out, we’re going to look at INNER JOIN first.

INNER JOINS Concepts for PostgreSQL

We’re going to jump headlong into this, at the expense of immediate clarity. Joy! Let’s say we have two tables which we’ll relate data between.



Above, we’ve pulled out some key points of interest between our tables, unique columns, and more importantly, a shared key (column) of customer_id between the two tables. We’ll explain more about keys in a minute. Ish.

We’ll use the shared column of (example) customer_id to further mash up the data. Let’s take a look at syntax now.


INNER JOINS Basic Syntax

If things aren’t 100% connected for you conceptually, I completely understand. Please do bear with me, as seeing some examples may help “close the loop”. Here’s what an INNER JOINS query might look like for a PostgerSQL usage:


SELECT A.pka, A.c1, B.pkb, B.c2


INNER JOIN B ON A.pka = B.fka;


Let’s break this down a bit more.

The first line is extremely similar to what we’ve been doing all along. Calling in data. The subtle difference here is that we’re first including the respective table names. So, it’s tableA.column, tableB.column, tableB.column2.

Next, we specify the primary table we’re first selecting from, per usual. Then we specify the “donor” table that will be combined with the primary table via INNER JOIN. Further, we make this work by using the ON keyword which specifies the shared key columns. 🙂


More Explanation on the Inner Workings of JOINS (Pun Not Intended)

This has been a bit of a tricky concept for me to learn so far, so we’re going to walk through the nuts and bolts of JOINS as operated by PostgreSQL.

When JOINS is executed, PostgreSQL takes every row in our “Table A” and goes through Table B, looking for rows that match our specified condition of A.pka = B.fka. So combining our example and laymen’s terms, the JOINS function looks for common occurrences of the customer_id  so that additional, previously disparate data from separate tables may be awesomely fused together. 🙂

If a match is found between the two tables, it combines columns of both A and B rows into one row, and adds that newly combined row to the data set you get back.


Pardon the Interruption: PostgreSQL “life” hacks for JOINS

A couple of quick items that may prove useful to you as we get more familiar with joins.

  1. Sometimes your tables may share column names (Analytics data, anyone? Can I get an amen?), so you may need to use the syntax table_name.column_name for clarity.
  2. If you or someone you love has created excessively long table names, you can introduce a shorter alias (via the AS statement) for the table, and use accordingly.


Hey, I Thought We Were Going to Learn About INNER JOIN?

We are! And what’s better, we’re going to use a Venn diagram to explain it. 🙂 #DataScience.

Okay, the INNER JOIN PostgreSQL clause returns rows in Table A that have corresponding rows in Table B. Super scientific visualization below.


The overlapped area above are the results that will be returned to us.


INNER JOINS PostgreSQL Code Examples

Putting all the pieces together, here’s our first INNER JOINS code sample:










INNER JOIN payment ON payment.customer_id = customer.customer_id;


And below, what we get back from pgAdmin.



Okay, a few notes. We went out of our way to say that overlapping column names need table names for distinction. Here’s what we didn’t say, you don’t need to specify table names for unique columns. Above, you can see that we’ve nicely meshed together customer ID with both customer information with payment information. Awesome!

I know this is a bit of a rough stop- but I’m really trying to work on getting more sleep these days. So it’s 12:30am, and I’ve got to head to bed. We’ll be back with updates on this post and more progress forward. Cheers!

Update – 9/16/2017 – Add’l INNER JOINS Examples

Welcome back again, SQL-ites! We’re back with a weekend update from our first look at INNER JOINS in PostgreSQL. Above, we left off with a basic example of INNER JOINS. To quickly recap, we might think of JOINS as basic data composition. Often in marketing and business useful data may begin in silos, thus, combination into useful form must occur.

Our next example of INNER JOINS is a slight modification to the first full example. We have the same basin INNER JOINS syntax, but have added ORDER BY customer.customer_id at the end, to clean the presentation of our data a bit.

SELECT customer.customer_id






FROM customer

INNER JOIN payment ON payment.customer_id = customer.customer_id

ORDER BY customer.customer_id ASC;




Let’s continue to add modifications to our basic INNER JOINS statement. Below, we harness the power of both combination and selection to specify joined results, but for a specific customer ID. A use case for this might involve isolating or routing specific data after it’s been joined, for reporting purposes.

SELECT customer.customer_id






FROM customer

INNER JOIN payment ON payment.customer_id = customer.customer_id

WHERE customer.customer_id = 2;


DSC_0050 Zach Doty AS PostgreSQL Statement Cover Photo

JOINS Foundations: The AS PostgreSQL Statement

Intro to JOINS: the AS PostgreSQL Statement

What’s up SQL people? We’re back, and better than ever, in our foray to learn PostgreSQL. Since we’ve completed some intermediate skills challenges & learned GROUP BY, it’s time to examine JOINS.

Before we examine JOINS, there’s a key foundation piece we must cover first: the AS statement in PostgreSQL. Let’s jump in.


About the AS Statement

The AS statement in PostgreSQL enables us to rename sections of a table or table columns with a temporary alias, or almost a variable, for manipulation.

It’s a simple statement, so let’s see it in action.


1. Basic AS Statement Example

Our most basic example is a basic query where perhaps a column wasn’t named to our liking. Consider the following.

SELECT rental_rate AS film_cost

FROM film



Great for an introductory example, but not inherently useful. Read on as we apply the AS statement more deeply.

2. Semi-Intermediate AS Statement Example

Let’s provide an example that’s a bit more engaged. Example, if we use aggregate functions, the column output doesn’t have a clean name attached to it. But no longer! The AS statement allows us to have the summation output in a GROUP BY statement to something we’ll recognize.


SELECT customer_id, SUM(amount) AS customer_ltv

FROM payment

GROUP BY customer_id

ORDER BY customer_ltv DESC




This is something more useful for intermediate PostgreSQL work!


Wrap Up

We aren’t spending much further time here since this is a simple application and the JOINS statement is the function we’re truly after. If you’re just joining this series, check out our home page on how to learn PostgreSQL.

DSC_0012 Zach Doty PostgreSQL cover photo

PostgreSQL Intermediate Skills Test

Hey there, SQL-ites. Time for another (yes, another!) PostgreSQL skills challenge. It’s our last stop before moving into more intermediate and advanced PostgreSQL skills, such as JOINS.

If you’re just joining this series on SQL, we previously covered the HAVING statement & GROUP BY. Here’s the home page of our journey in learning PostgreSQL. Today, we do a more rounded knowledge check of older PostgreSQL statements, such as SELECT WHERE, COUNT & ORDER BY.

We have 3 problems, and provide the winning code for each, based on our sample database. In the past, I’ve given more explanation to the problems, but because I’m trying to get some traction myself in moving forward, we’ll only have problem & solution laid out here.

Alright, let’s go!

1. Return the customer IDs of customers who have spent at least $110 with the staff member who has an ID of 2.

The answer should be customers 187 and 148.

SELECT customer_id, SUM(amount)
FROM payment
WHERE staff_id=2
GROUP BY customer_id
HAVING SUM(amount) >110;


2. How many films begin with the letter J?

The answer should be 20.


WHERE title LIKE ‘J%’;


3. What customer has the highest customer ID number whose name starts with an ‘E’ and has an address ID lower than 500?

The answer is Eddie Tomlin.

SELECT customer_id, first_name, last_name, address_id
FROM customer
WHERE address_id <500 AND first_name LIKE ‘E%’
ORDER BY customer_id DESC;


We’ll see you on the other side soon, for some JOINS awesomeness. 🙂

DSC_0300 Zach Doty Cover Photo for HAVING PostgreSQL Clause

The HAVING Clause in PostgreSQL

Howdy SQL-ites! Welcome back to our next step in learning SQL. After a long hiatus, we recently stumbled our way through some intermediate SQL statement challenges, namely on GROUP BY.

Today, we’re back in the flow, and learning about the HAVING clause in PostgreSQL.


About the HAVING Clause in PostgreSQL

The HAVING clause is most often seen as a helper with the GROUP BY statementGROUP BY was pretty awesome, right? How might we improve upon it?

Let’s consider two concepts:

  1. Our occasional parallels of PostgreSQL to Excel, and,
  2. Our previous intermediate challenge example.

Relating PostgreSQL’s GROUP BY to Excel

If we hearken back to our first encounter with GROUP BY, we compared GROUP BY to pivot tables. Specifically, if we have a big data table that records data of recurring themes, e.g., you customer base, it can be helpful to manipulate as aggregated and assembled, vs. raw.

However, GROUP BY is only an intermediate step in data analysis. If we think about our final challenge in the last post, we had to limit the results to 5. Even if we group data, it’s neither segmented nor useful toward analysis.

Thus, we need an additional method for winnowing down our GROUP BY results.


Meet the HAVING Clause

So, about the HAVING clause. It’s most often used in tandem with GROUP BY to filter out data rows that don’t meet a certain condition. Think of it as similar to the WHERE function, just an accompaniment to GROUP BY.


Let’s take a look at basic syntax:

SELECT column, aggregate(column_2)

FROM table

GROUP BY column

HAVING condition;


Off the bat, we should this is extremely familiar if we’ve covered GROUP BY. The differentiation is the additional HAVING condition at the end. This condition could be something like, HAVING sum(column_2) less than 50.

The Difference Between WHERE and HAVING

Some of you sharp folks may want to know, “what’s the difference between WHERE and HAVING?” That would be an excellent question. Here’s the difference:

The WHERE clause sets conditions on individual rows, before the GROUP BY clause has been applied. The HAVING clause specifies conditions on grouped rows, created by the GROUP BY clause.

Let’s run some examples.


Using the HAVING Clause in PostgreSQL

Here’s our first example, very similar to our previous skills challenge:

SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
HAVING SUM(amount) > 150;

Above, we have added another segmentation layer with the HAVING clause.

You can see we’re pulling information about how much our customers have paid us. Further, we specify that we only want to see customers with a lifetime purchase amount of greater than $150.


Let’s look at another example of the HAVING clause. Say for example, we want to know which store has served the most customers. Below, we’ll execute the following code:

SELECT store_id, COUNT(customer_id)
FROM customer
GROUP BY store_id
HAVING COUNT(customer_id) >275;

Above, we’ve selected both the store and customer ID columns from the customer table. Further, we group by the store ID, because we want store-level data, but we only want to see the stores which have served more than 275 customers. Below, we can see only store has done so. 🙂


Combining Usage of the WHERE & HAVING Clauses

We mentioned earlier the WHERE and HAVING clauses are different, somewhat in function, but mostly in order of execution. Here’s what we didn’t say: you can actually use them in tandem. There is a great theoretical use case for this, unfortunately our sample database is a bit small, but here goes.

Let’s think about the film table (used in previous examples.) Perhaps we want to analyze the films, by rental rate, but only films with certain ratings. For example, perhaps we’re no longer interested in carrying NC-17 films, but still want to get an aggregated view of on average, how much each films rents for, by rating. Additionally, we want to see which ratings, if any, have an average rental rate of less than $3.

Here’s the code we would use:

SELECT rating, ROUND(AVG(rental_rate),2)
FROM film
WHERE rating IN (‘R’,’G’,’PG’,’PG-13′)
GROUP BY rating
HAVING AVG(rental_rate)<3;



Wrap Up

Alright, that concludes our section for today. It feels good to be back. 🙂

Hopefully you found this section on the HAVING PostgreSQL statement useful. If you need to backtrack or further explore, here are some useful/recent links:

DSC_0006 Zach Doty Intermediate GROUP BY SQL Skills Challenge Cover Photo

Intermediate SQL Skills Challenge: GROUP BY

Hey there SQL-ites! Wow, it’s been awhile since I’ve last posted. Work has been crazy busy again, and just life in general. Sure feels good to be back, learning again! I’m daringly dropping straight back into my learnings from where we left off…in April!? Crazy.

Anyway, let’s get back to brass tacks. Before life and work got really crazy for me, we covered:

Another quick recap note, we’ve been using the famous DVD rental training database for our work. On to the good stuff.


GROUP BY SQL Skills Challenge #1

Let’s say it’s time for quarterly reviews, who doesn’t love evaluations? ¯_(ツ)_/¯ Implement your knowledge of GROUP BY against the following problem:

We need to figure out how much revenue our employees have realized from DVD rentals, and how many transactions each handled.


GROUP BY SQL Skills Answer #1

Let’s talk through the problem and dissect it before presenting code. “A problem well-stated is a problem-half solved” – paraphrase of some smart person.

  1. We’re talking about revenue, so we’ll need to be dealing with the payment table.
  2. We’re evaluating employees (staff), SUM of revenue, and COUNT of transactions.
  3. If we’re aggregating this data, we’re GROUPing BY employee.
  4. We’re also ORDERing the employees BY who handled the most transactions.

That said, here’s the code:

SELECT staff_id, SUM(amount), COUNT(amount)
FROM payment
GROUP BY staff_id

…with our results!


GROUP BY SQL Skills Challenge #2

Well done on your challenge! Here’s the second:

It’s time to do inventory, flashbacks of retail and restaurant wonder for all of us. 🙂

In the name of forecasting and planning, we need to figure out the average replacement cost of our movies, by rating.


GROUP BY SQL Skills Answer #2

Ok, let’s walk through our problem.

  1. We need to use the film database here, since we’re gathering information on the movies.
  2. We’re GROUPing our films BY rating
  3. We’re using an aggregate function to determine the average replacement cost of each rating.

Drum roll, here’s a winning code snippet:

SELECT rating, AVG(replacement_cost)
FROM film
GROUP BY rating
ORDER BY AVG(replacement_cost) DESC;

With the output:


Are there more challenges we should be covering? Yes. However, I’m trying to do better about getting more sleep these days. Unlike past SQL articles, it’s still (barely) before midnight. So we’ll take a quick breather, possibly update this article, but definitely keep moving forward. Cheers!


Update- 8/20/2017 —

GROUP BY SQL Skills Challenge #3!

Alright SQL-ites. After getting some rest, I’ve regrouped a few days later to cover the last challenge:

From our database, we want to get the customer ID’s of the top 5 customers, by money spent, so we can send them a thank you for their business!


GROUP BY SQL Skills Answer #3

Let’s diagnose the problem.

  1. If we’re gathering revenue information, we’ll need to use the payment table.
  2. If we’re getting the top spending customers, we’ll need to GROUP all transactions BY customer ID
  3. To see the top 5 paying customers, we’ll want to ORDER the results BY the SUM of payment amount.

Considering the above, here’s our code:

SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id