Google Search Console: Download & Export All Website Data

Another day, another blog post? This is a rarity for sure. At the moment, I’m starting to hit my stride on getting a few old items done in R. 🙂 If I keep writing more blog posts, I’ll need to start taking more photos again for the feature covers!

Quickly sharing how I was able to use R and Google Search Console to download & export all of my website’s search analytics data.

Why Use R to Download & Export all Search Console Data?

Even for the novice SEO, this is likely a well-tread frustration. Valuable data to make critical business decisions is sitting in Google Search Console. But you can’t access it at scale!

It’s either hidden behind the abysmal UI export limit, or you have to do so much filtering that’s impossible to scale the data extraction.

SearchConsoleR: Easy Google Search Console API for Unlimited Data Access

Similar to my previous post (How to Bulk Submit Sitemap URLs in Google Search Console) this article looks at the splendid SearchConsoleR package in R. See the link above for prerequisites.

The code to get all your data in Google Search Console is really quite brief. I’ve listed it below, as well a link to the Github gist.

Note that if it’s your first time using RStudio, you’ll need to use install.packages() to load up the necessary dependencies.

# Load packages
library(searchConsoleR)
library(dplyr)
library(ggplot2)
library(writexl)

# Authorize & choose Google profile
scr_auth()

# Specify website --- CLIENT
website <- "https://www.sample.com"
dimensions <- c("page", "query")

ttl_queries <- 
  search_analytics("https://www.sample.com", 
                   "2019-08-01", "2019-08-31", 
                   c("query", "page"), 
                   searchType="web", rowLimit = 100000)

# Write the data frame to an XLSX file
write_xlsx(ttl_queries, "ttl_queries.xlsx")

If the code has run correctly, you should have a (potentially) massive Excel file at your disposal! The possibilities are endless – data crunching in Excel, piping to a database or some other visualization software. Enjoy!

How to Bulk Submit Sitemap URLs to Google Search Console

Well, howdy! As with many of my posts lately, I find myself saying that’s it’s been awhile since I’ve posted something. It’s true!

Today, I’d like to walk you through how to bulk submit sitemap URLs to Google Search Console using R and RStudio. Before we begin the process of bulk submitting sitemap URLs, you’ll need a few prerequisites:

Prerequisites to Bulk Sitemap URL Submission: R Setup

  • Download and install the appropriate version of R for your system: https://cran.r-project.org/
  • For Windows users, search for “rtools”, download and install from the official cran.r-project.org site.
    • Mac/other OS users may need to as well for their respective OS. However, my primary machine is Windows (described above) and I’ve been able to use RStudio without issue on a Mac, without installing RTools. Doesn’t mean you don’t need it. ¯\_(ツ)_/¯
  • Download and install RStudio from the official site: https://rstudio.com/products/rstudio/download/

SearchConsoleR & Bulk Sitemap Submission in Google Search Console

How we bulk submit XML Sitemap URLs in Google Search Console is made possible by the excellent R package made by Mark Edmondson. Why R and SearchConsoleR? Approachability.

Many readers may find this method much more approachable than trying to get API projects, API keys, and other confusing configuration details just right, to work in Python/Jupyter notebooks. You’ll see what I mean in a minute.

Code for Bulk Sitemap URL Submission in Google Search Console

Below is sample R code for bulk sitemap URL submission in Google Search Console, using R and SearchConsoleR. You can also view my Github gist here.

# Load packages
library(searchConsoleR)
library(dplyr)
library(ggplot2)

# Authorize & choose Google profile
scr_auth()

# Specify website --- FULL URL
website <- "https://sample.com"

# Submit all manually found sitemaps
add_sitemap("https://www.sample.com","https://www.sample.com/sitemap_index.xml")
sys.sleep(1)

add_sitemap("https://www.sample.com","https://www.sample.com/page-sitemap.xml")
sys.sleep(1)

add_sitemap("https://www.sample.com","https://www.sample.com/post-sitemap.xml")
sys.sleep(1)

The code is fairly straightforward, and sadly, not automated. I’ll be working on an automated, scalable way of submitting all Sitemap URLs for all websites in a given Search Console account.

The one custom touch I’d note here is providing the one second system sleep to proactively avoid running afoul of rate limits, if you’re submitting a lot of URLs.

The Simple SEO Strategy & Plan Template

A quick Google search tells me that a lot of folks have already put together some fine SEO strategy plans & templates.

Here’s why I’m offering a different take on a SEO strategy plan template:

Websites are offering SEO launch plans, not ongoing search engine optimization management templates.

The Problem with a “Launch-Only” SEO Strategy / Plan

A study of millions of search results found that only 5% of pages will rank in the top 10 positions within 1 year. (Ahrefs) This data points suggests a deep disconnect between learning and executing SEO.

Moreover: this means many practitioners don’t understand how to manage SEO strategy on an ongoing basis.

It’s an understandable problem. I told a colleague this week that teaching tactics is easy, but teaching strategy is the hard part. (Like boiling the ocean?) Put another way:

Successful marketers don’t teach others how to do, they teach them how to think.

Viewing SEO Strategies & Plans Through an Ongoing Lens

If we’re teaching SEOs how to think, we won’t concern ourselves with an audit tool list. Rather, we’ll look at how we can apply a simple framework to any website and draw from an action plan template. Let’s groove.

There are distinct phases of SEO campaigns:

  • Launch/takeover
  • Growth
  • Maintenance
  • Recovery

We view SEO strategy purely through the lens of performance, with monthly (m/m) and year-on-year (Y/Y) time frames.

Launch / Takeover

Not included in our quadrants above, this phase is pretty self-explanatory. You can look at SEO audits, and other launch-focused plans & templates.

Growth Mode for SEO

If performance is up m/m and Y/Y, we’re in growth mode! We’ll only approach certain tasks with the emphasis of creating new content and expanding our search footprint. Troubleshooting is not something we’ll be concerned with often.

Growth mode as a constant state is rare! I’m fortunate to have some sites that spend quite a bit of time in this mode, but it doesn’t last forever. You’ll see the same for many top agencies. That’s why we have more than one mode. 🙂

In growth mode- you’re only concerned about maintaining growth trajectory. This typically means you’re focused on 3 things:

  • Keyword/marketplace research
  • Content creation
  • Internal linking from new content

Maintenance Mode in SEO

Its representation in 50% of the quadrant is a decent showing for reality. Many campaigns spend most of their time in this area; how you spend maintenance time dictates success or failure.

Maintenance mode, by our definition above, states that you’re up either m/m or Y/Y, but not both at the same time. Specifically, there are two reasons to be in maintenance mode: seasonality or breaking out of a plateau of results.

Maintenance Mode: Seasonality

You’ll find that if you’re trending up Y/Y, but not up m/m, that seasonality is often in play. Some examples:

  • A luxury retailer in January, after Christmas sales & gift-giving season
  • A travel company in April, after Spring Break
  • A DWI law firm in August, after July 4th festivities
  • A kids clothing company in September, after back-to-school

How will you know if seasonality is really a factor? You can maintain a database of key branded & non-branded terms and their search interest, using the pseudo-API to automate Google Trends analysis with Colab notebooks.

Maintenance Mode: Breaking out of a Plateau

Plateaus happen for so many reasons. If you’re breaking out of a plateau, you may be flat/down Y/Y, but up m/m.

In these instances, you may continue most of your growth mode activities, but you’ll also be looking to enhance existing content:

  • Determining if CTR engagement in-SERP is satisfactory, via custom CTR curves
  • Boosting existing content by enhanced internal linking from other pages, via internal linking analysis
  • Populating or enhancing metadata, from bread-and-butter descriptions to image alt tags or structured data / schema markup
  • Issuing redirects for decommissioned content, or eliminating redirect chains due to faulty internal linking

Recovery Mode in SEO

Let’s be real, no one wants to preside over a campaign that dips into recovery mode. If you’re in-house, the CMO wants answers last week, if you’re agency-side, the AD won’t back you up until the ship’s right, if you’re a consultant…well, the client is living in your voicemails.

It’s tough, but also happens – here’s the set list you’re looking at:

  • Troubleshooting & competitive analyses
  • Rewriting content or overhauling content
  • Filing spam or malware reports

Next Steps

This post deserves a better ending, but it’s 10 minutes past my son’s last feed of the night, and I need to go warm his bottle and feed him! But this post desperately needs (and maybe might get) some touchups to make this more detailed, insightful and action-oriented.

Bulk Google Trends Data Automation With a Pseudo-API

For Google Colab notebooks, and a simple CSV with Column Header Keywords, and up to 50 topics. Adjust the timeframe as needed.

pip install pytrends

import pytrends
import io
from pytrends.request import TrendReq
import pandas as pd
from google.colab import  drive
from google.colab import files

uploaded = files.upload()

pytrend = TrendReq()
for keys in uploaded:
  name = keys
df2 = pd.read_csv(io.BytesIO(uploaded[name]))

topics = df2.Topic
dataframe = pd.DataFrame()
i = 0
for topic in topics:
  if isinstance(topic,str):
    pytrend.build_payload(kw_list=[topic], timeframe='today 12-m')
    data = pytrend.interest_over_time()
    df = pd.DataFrame(data)
    if i == 0:
      dataframe = df
      del dataframe['isPartial']
      i = i + 1
    keyword_colum = df[topic]
    dataframe[topic] = keyword_colum
print(dataframe)  
dataframe.to_csv("keywords_trends.csv", encoding='utf-8-sig')
files.download("keywords_trends.csv")

Special thanks to Fozan again for assistance.

Automating Search Interest & Trends Data Access with Python

pip install pytrends

import pytrends
from pytrends.request import TrendReq
import pandas as pd # Sets up the Pandas data analysis library
import time
import datetime
from datetime import datetime, date, time
from google.colab import  drive
from google.colab import files

pytrend = TrendReq()

pytrend.build_payload(kw_list=['digital marketing', 'digital media', 'interactive marketing', 'digital advertising'], timeframe='today 5-y', geo='US')

data = pytrend.interest_over_time()
df = pd.DataFrame(data)
df.to_csv('pytrends_data.csv', encoding='utf-8-sig')
files.download('pytrends_data.csv')

Google Colab notebook for automating download of Trends search interest data, inspired from this Search Engine Land writeup. Special thanks to Fozan for his help with populating the CSV download portion. 🙂

Account Service & Management for Marketing & Advertising

Recently I wrote about digital marketing agency strategy & planning – 5 pillars or behaviors top performers should exhibit.

Today, I want to look at account service & management for marketing / advertising agencies.

From an agency perspective, client management is at best a continually perplexing challenge. For agency owners and executives – the joy of growing relationships is often clouded by financial pressure.

In fact, I would readily wager that most pain & suffering in any given agency comes mismatched expectations with clients. More specifically, a selective application of the Pareto Principle (or 80/20) rule tells us 80% of our headaches at a digital marketing agency come from 20% of the clients.

There’s doubtless plenty of discussion to be had managing expectations, or about revenue diversification. But let’s look at a couple of frameworks A) financial and B) operational that can help simplify client service.

First- are you managing your client portfolio by profitability and touch?

There are two easy squares in the field above: top left and bottom right.

First, if you have a high touch, low profitability client – you can either A) raise retainer rates/prices, or B) fire them. You can try to reset expectations, but many high touch clients that cause pain do not respond well to this. One way you can automatically bake this logic in is by testing a recurring percentage increase to your retainer with these and/or other client types.

Second, if you have a low touch, high profitability client – keep and protect them! They’re rare birds.

Generally, it’s desirable to have high profitability clients over low- yet strangely for both of the quadrants above (L/L, H/H) a potential solve is automation. (With different purposes.)

Low touch, low profitability clients aren’t inherently bad – there are a lot of niche, perhaps even respectable agencies built on these kinds of clients. High touch, high profitability clients are only going to be sustainable if you become the likes of a big box agency.

Either way, you need to A) scale your capabilities to get more similar low touch clients (L/L), or find ways to get your time back against high touch clients (H/H). Especially so you can excel at the 5 pillars of agency strategy!

One small step forward you can take in automation is prioritizing work to become proactive, vs. blindly following retainers & SOWs until your teeth fall out from stress-clenching.

If you can automate portfolio dashboards of your clients – you can always have a real-time picture of what needs to happen in your client’s businesses. (Ex: a digital / performance marketing engagement that could include organic or paid media is shown below.)

Via Data Studio, Power BI, Tableau, etc., you can avoid overplaying winning accounts and quickly servicing trouble accounts before damage becomes significant.

This kind of system can start with high-level traffic and engagement figures. However, this method also holds promise for you to automate wins and insights for clients, drilling down to page and keyword insights that could be automated or outsourced so you can proactively send victories to clients, to maintain healthy temperature and momentum in your engagement.

One other way you try to reduce service burdens is to move to a JIRA-style ticketing system with your clients for requests – and personal interaction comes from you, on your terms.

I’m sure this won’t make any Pulitzer nomination roundups (rightly so), but this has been enjoyable to right, as I explore more ideas about excellence in marketing and account service.

Marketing Data Lakes & Warehouses

My most recent post covered strategy & planning for digital marketing agencies, specifically from an SEO perspective.

In my mind, one of the most compelling questions was this: why aren’t there more success stories with SEO agencies? From my young perspective, I call out 5 things successful entrants can do for the best shot at success.

We may also find that these pillars of success (ex: depth of product vision, operational efficiency) rarely exist in isolation.

That brings us to marketing data lakes and warehouses! Perhaps popularized by Informatica, data lakes are essentially all your data tossed into one space- hooray!

In their ebook, Informatica notes that data lakes are well-suited for large enterprises at $100MM+ and above. I would largely agree (for many reasons) and would also assert this leaves ample learning opportunity for brands below that threshold.

For the large enterprise above, there are often prerequisites to reaching that level. Specifically, you’ll often need to achieve certain sophistication before advancing to a level of ad hoc ML analyses of unstructured data in the cloud.

If we’re being honest, I’m probably bending and abusing some terminology here, but bear with me! For the vast majority of businesses (which also site below that revenue or capitalization threshold) – there’s a lot of power in data lakes, for the purpose of building small data – focused data warehouses.

To the points of my earlier posts, you need cohesive data that can paint a clear, end-to-end picture of your digital business. To that end, consider the following:

With the proper platforms, API scripts/pulls and storage, you can amass all of your key marketing data into one, or a few areas. Depending on your tactical execution, SQL joins or other BI/dashboarding software can be used to unite your data by URL.

From this picture, you can easily pivot, filter, splice, etc. to see what has been performing, how, and where the impact is going to your business.

Ex: in one picture, you can immediately juxtapose log file activity on your site with Search Console data to see where Google got hung in your site, as well as quickly identifying how much it’s impacting your visibility, and also quickly seeing which keywords are most affecting your business.

To take it a step further, this can be scaled across multiple clients, where you may write the client name or vertical into separate columns into all of your respective data pulls. This also facilitates large-scale rollups of your data. It’s my estimation that a compelling example of this may be Merkle’s quarterly digital marketing reports.

As with previous posts, I hope to continue peeling back this onion of excellence in marketing. However, my next planned immediate posts may pivot slightly to operational efficiency in the context of providing excellent client service.

Digital Marketing Agency Strategy and Planning

Or, What Marketing Could be, Part 2

Recently I broke my unintentional blogging hiatus to write about what marketing could be, and things to think about if your organic search traffic from Google goes down.

I fully intend to return to technical education and blogging, but perhaps the dawn of a new decade is bringing out the introspect and retrospect in me.

Today, let’s continue the conversation about what marketing could be. I’m taking the lens of marketing from an SEO perspective, because it’s most relevant to me. But also because SEO is one of the most challenging marketing disciplines to monetize & scale within the agency and consulting realm.

Have you ever noticed there aren’t really any big, pure-play SEO firms out there? The largest search marketing agencies are just that- search. Paid media spend (thus services) still rules the marketing roost, and probably will for ages to come.

Why aren’t there big, pure-play SEO firms? SEO is tricky to practice, much less manage. Our practices try to reflect our disciplines. Paid media, for its foibles of opaque procurement and pricing, still reflects a largely ordered economy, organized by purchasing power.

SEO, on the other hand, is near-superhuman complexity at global scale- courtesy of Google’s 20+ years of work in computing, information retrieval, and now, artificial intelligence. Something not easily mirrored by ragtag gangs of consultants and agencies.

As a result, the most frequent successful outcomes in SEO are often agency acquisitions or series C/D venture funding rounds for technology platforms.

Outrageously successful outcomes are so rare in SEO, because so few can consistently master all 5 pillars of success at once:

There are 5 key behaviors that consultants and agencies (all disciplines, but most specifically SEO) must master on a consistent basis to achieve, and more importantly, maintain success.

  • Depth of product vision
    • Can you master SEO strategies and tactics to such a degree as to actually achieve results?
  • Client service proficiency
    • Can you keep the clients happy and paying?
  • Operational efficiency
    • Can you scale internally the first two items to a point beyond your own finite time?
  • Marketplace momentum
    • Can you project externally your abilities to do the first three behaviors?
  • Service capability breadth
    • Can you, with SEO as a cornerstone offering, expand from it, into other services? (Ex: analytics, conversion optimization, etc.)

I can’t even say I’m doing all five above behaviors consistently, and our current landscape is evidence that few others are doing so either.

Examining, and perhaps quantifying these pillars will be a fun exercise in the coming days.

Google Traffic Down!

The Hitchhiker’s Guide to Fielding and Surviving Frantic Client Requests

*Fair warning, this is a live draft / work in progress post, as of 12/7/2019! Polishing pending.*

“Our Google traffic is down!” If you’ve done any time in the SEO consulting or digital marketing business, you’ve seen this client (or manager) email at the most inconvenient of times.

From my corner of the industry, it seems like digital marketing and specifically SEO talent comes and goes in waves. For me, it feels like there have been 3 waves, or “generations” of practitioners coming online in the past 5 years.

Again, from my corner of the industry, it seems like marketing agencies are having an increasingly difficult time finding future leaders from in-industry. Thus, if you polled junior (and even perhaps mid-level) SEO practitioners, I fear many wouldn’t be properly equipped to handle the above situation.

For many junior SEO agency personnel, getting the “Google traffic down!” email is an exercise in confusion that mirrors that of the client/manager. Random tactical suggestions emerge, instead of a thorough approach to troubleshooting a potential Google search traffic drop.

Running through a short list of questions can easily cover “gotchas” and a rather large set of common problems.

  • Is the site up?
  • Can our content be found by search engines?
  • Did we set up the site (analytics) properly?
  • Is there search demand for our target terms?
  • Does the content match with the search opportunity?
  • Does the site have a good experience?
  • Is the site properly engaging visitors?
  • Is the brand delighting and retaining users?

This still leaves us with a “broken window” problem- we only wait to look at these metrics until something is wrong. What if we could automate some of this?

  • Is the site up?
    • What if we had consistent uptime monitoring?
  • Can our content be found by search engines?
    • What if we had automated downloads and data warehousing of Googlebot or other search engine log files?
    • What if we were able to submit XML sitemaps on demand with a few button clicks via the API?
    • What if we knew each day or week, the precise page speed figures via the pagespeed api
  • Did we set up the site (analytics) properly?
    • What if we set up a CURL, Beautiful Soup or other method for checking rendered page code to check for tags?
  • Is there search demand for our target terms?
    • What if we set up automated Google Search Console API calls to see daily search data? (Knowing full data may come at a delay)
  • Does the content match with the search opportunity?
  • Does the site have a good experience?
  • Is the site properly engaging visitors?
  • Is the brand delighting and retaining users?
    • What if we rigged up alerts and API data calls for the above data?

What if we were able to warehouse, aggregate and analyze all of this data, benchmarked, at scale? We’d really have something. 🙂

Cover image of a blog post about automating Screaming Frog SEO crawls with command line and batch files

Automated Screaming Frog Reports With Command Line & Batch Files

Did you know that you can automate Screaming Frog crawl reports using Command Line and Windows batch files?

First, the credit goes to Tom Gregan and Open Source SEO for cutting the trail. 🙂

Starting with version 10, Screaming Frog SEO Spider has offered command line functionality– how cool is that? What’s more, the command line functionality is quite customizable, capable of both pulling in configuration files and exporting crawl data with flexibility.

Here’s some sample code for Windows Command Line you can run. Again, credit to Tom and Open Source SEO here. You can drop this into Notepad, Sublime Text, etc. and save as a batch file.

set crawlResults=C:\Users\you\Documents\client\Weekly-SFSEOS-Results
:: Creates variable %results% for where crawl will be saved

set sf=C:\Program Files (x86)\Screaming Frog SEO Spider\
:: Creates another variable telling Command Line where it can find SF

set configFile=C:\Users\you\Documents\client\Weekly-SFSEOS-Results\sample-config.seospiderconfig
:: Creates another variable telling CLI and SF where to find crawl configuration instructions - may be needed for more complex crawls or API settings

set domain=https://www.nike.com/
:: Sets a variable telling CLI and SF which domain or URL to crawl

chdir /d "%sf%"
:: Directs CLI to the directory you specified earlier

ScreamingFrogSEOSpiderCli.exe --config "%configFile%" --crawl "%domain%"  --save-crawl --headless --output-folder "%crawlResults%" --export-format "xlsx" --export-tabs "Internal:All" --timestamped-output
:: Runs the SF CLI exe file, which performs the crawl with the variables specified earlier 

The fun part (as if this wasn’t cool enough!) is that you can make a ton of these batch files, slave them from a separate “leader” batch file, and schedule them to run on a recurring basis.

start cmd /k Call website-a.bat
start cmd /k Call website-b.bat

Important to note- the above code sample assumes that you place the master and the “follower” Screaming Frog crawl batch files in the same directory on your computer or server.

How nice is that? From here, you could conceivably have lots of options to pull into a business intelligence tool like Tableau, Power BI, Data Studio or pull into data frames in R or Python.

This could mean the end of blind spots for your SEO efforts! Think about monthly, weekly, or even daily crawls of your sites, competitors, publishers, etc. Happy coding!