Google Search Console: Download & Export All Website Data

Another day, another blog post? This is a rarity for sure. At the moment, I’m starting to hit my stride on getting a few old items done in R. 🙂 If I keep writing more blog posts, I’ll need to start taking more photos again for the feature covers!

Quickly sharing how I was able to use R and Google Search Console to download & export all of my website’s search analytics data.

Why Use R to Download & Export all Search Console Data?

Even for the novice SEO, this is likely a well-tread frustration. Valuable data to make critical business decisions is sitting in Google Search Console. But you can’t access it at scale!

It’s either hidden behind the abysmal UI export limit, or you have to do so much filtering that’s impossible to scale the data extraction.

SearchConsoleR: Easy Google Search Console API for Unlimited Data Access

Similar to my previous post (How to Bulk Submit Sitemap URLs in Google Search Console) this article looks at the splendid SearchConsoleR package in R. See the link above for prerequisites.

The code to get all your data in Google Search Console is really quite brief. I’ve listed it below, as well a link to the Github gist.

Note that if it’s your first time using RStudio, you’ll need to use install.packages() to load up the necessary dependencies.

# Load packages
library(searchConsoleR)
library(dplyr)
library(ggplot2)
library(writexl)

# Authorize & choose Google profile
scr_auth()

# Specify website --- CLIENT
website <- "https://www.sample.com"
dimensions <- c("page", "query")

ttl_queries <- 
  search_analytics("https://www.sample.com", 
                   "2019-08-01", "2019-08-31", 
                   c("query", "page"), 
                   searchType="web", rowLimit = 100000)

# Write the data frame to an XLSX file
write_xlsx(ttl_queries, "ttl_queries.xlsx")

If the code has run correctly, you should have a (potentially) massive Excel file at your disposal! The possibilities are endless – data crunching in Excel, piping to a database or some other visualization software. Enjoy!

How to Bulk Submit Sitemap URLs to Google Search Console

Well, howdy! As with many of my posts lately, I find myself saying that’s it’s been awhile since I’ve posted something. It’s true!

Today, I’d like to walk you through how to bulk submit sitemap URLs to Google Search Console using R and RStudio. Before we begin the process of bulk submitting sitemap URLs, you’ll need a few prerequisites:

Prerequisites to Bulk Sitemap URL Submission: R Setup

  • Download and install the appropriate version of R for your system: https://cran.r-project.org/
  • For Windows users, search for “rtools”, download and install from the official cran.r-project.org site.
    • Mac/other OS users may need to as well for their respective OS. However, my primary machine is Windows (described above) and I’ve been able to use RStudio without issue on a Mac, without installing RTools. Doesn’t mean you don’t need it. ¯\_(ツ)_/¯
  • Download and install RStudio from the official site: https://rstudio.com/products/rstudio/download/

SearchConsoleR & Bulk Sitemap Submission in Google Search Console

How we bulk submit XML Sitemap URLs in Google Search Console is made possible by the excellent R package made by Mark Edmondson. Why R and SearchConsoleR? Approachability.

Many readers may find this method much more approachable than trying to get API projects, API keys, and other confusing configuration details just right, to work in Python/Jupyter notebooks. You’ll see what I mean in a minute.

Code for Bulk Sitemap URL Submission in Google Search Console

Below is sample R code for bulk sitemap URL submission in Google Search Console, using R and SearchConsoleR. You can also view my Github gist here.

# Load packages
library(searchConsoleR)
library(dplyr)
library(ggplot2)

# Authorize & choose Google profile
scr_auth()

# Specify website --- FULL URL
website <- "https://sample.com"

# Submit all manually found sitemaps
add_sitemap("https://www.sample.com","https://www.sample.com/sitemap_index.xml")
sys.sleep(1)

add_sitemap("https://www.sample.com","https://www.sample.com/page-sitemap.xml")
sys.sleep(1)

add_sitemap("https://www.sample.com","https://www.sample.com/post-sitemap.xml")
sys.sleep(1)

The code is fairly straightforward, and sadly, not automated. I’ll be working on an automated, scalable way of submitting all Sitemap URLs for all websites in a given Search Console account.

The one custom touch I’d note here is providing the one second system sleep to proactively avoid running afoul of rate limits, if you’re submitting a lot of URLs.

The Simple SEO Strategy & Plan Template

A quick Google search tells me that a lot of folks have already put together some fine SEO strategy plans & templates.

Here’s why I’m offering a different take on a SEO strategy plan template:

Websites are offering SEO launch plans, not ongoing search engine optimization management templates.

The Problem with a “Launch-Only” SEO Strategy / Plan

A study of millions of search results found that only 5% of pages will rank in the top 10 positions within 1 year. (Ahrefs) This data points suggests a deep disconnect between learning and executing SEO.

Moreover: this means many practitioners don’t understand how to manage SEO strategy on an ongoing basis.

It’s an understandable problem. I told a colleague this week that teaching tactics is easy, but teaching strategy is the hard part. (Like boiling the ocean?) Put another way:

Successful marketers don’t teach others how to do, they teach them how to think.

Viewing SEO Strategies & Plans Through an Ongoing Lens

If we’re teaching SEOs how to think, we won’t concern ourselves with an audit tool list. Rather, we’ll look at how we can apply a simple framework to any website and draw from an action plan template. Let’s groove.

There are distinct phases of SEO campaigns:

  • Launch/takeover
  • Growth
  • Maintenance
  • Recovery

We view SEO strategy purely through the lens of performance, with monthly (m/m) and year-on-year (Y/Y) time frames.

Launch / Takeover

Not included in our quadrants above, this phase is pretty self-explanatory. You can look at SEO audits, and other launch-focused plans & templates.

Growth Mode for SEO

If performance is up m/m and Y/Y, we’re in growth mode! We’ll only approach certain tasks with the emphasis of creating new content and expanding our search footprint. Troubleshooting is not something we’ll be concerned with often.

Growth mode as a constant state is rare! I’m fortunate to have some sites that spend quite a bit of time in this mode, but it doesn’t last forever. You’ll see the same for many top agencies. That’s why we have more than one mode. 🙂

In growth mode- you’re only concerned about maintaining growth trajectory. This typically means you’re focused on 3 things:

  • Keyword/marketplace research
  • Content creation
  • Internal linking from new content

Maintenance Mode in SEO

Its representation in 50% of the quadrant is a decent showing for reality. Many campaigns spend most of their time in this area; how you spend maintenance time dictates success or failure.

Maintenance mode, by our definition above, states that you’re up either m/m or Y/Y, but not both at the same time. Specifically, there are two reasons to be in maintenance mode: seasonality or breaking out of a plateau of results.

Maintenance Mode: Seasonality

You’ll find that if you’re trending up Y/Y, but not up m/m, that seasonality is often in play. Some examples:

  • A luxury retailer in January, after Christmas sales & gift-giving season
  • A travel company in April, after Spring Break
  • A DWI law firm in August, after July 4th festivities
  • A kids clothing company in September, after back-to-school

How will you know if seasonality is really a factor? You can maintain a database of key branded & non-branded terms and their search interest, using the pseudo-API to automate Google Trends analysis with Colab notebooks.

Maintenance Mode: Breaking out of a Plateau

Plateaus happen for so many reasons. If you’re breaking out of a plateau, you may be flat/down Y/Y, but up m/m.

In these instances, you may continue most of your growth mode activities, but you’ll also be looking to enhance existing content:

  • Determining if CTR engagement in-SERP is satisfactory, via custom CTR curves
  • Boosting existing content by enhanced internal linking from other pages, via internal linking analysis
  • Populating or enhancing metadata, from bread-and-butter descriptions to image alt tags or structured data / schema markup
  • Issuing redirects for decommissioned content, or eliminating redirect chains due to faulty internal linking

Recovery Mode in SEO

Let’s be real, no one wants to preside over a campaign that dips into recovery mode. If you’re in-house, the CMO wants answers last week, if you’re agency-side, the AD won’t back you up until the ship’s right, if you’re a consultant…well, the client is living in your voicemails.

It’s tough, but also happens – here’s the set list you’re looking at:

  • Troubleshooting & competitive analyses
  • Rewriting content or overhauling content
  • Filing spam or malware reports

Next Steps

This post deserves a better ending, but it’s 10 minutes past my son’s last feed of the night, and I need to go warm his bottle and feed him! But this post desperately needs (and maybe might get) some touchups to make this more detailed, insightful and action-oriented.

Bulk Google Trends Data Automation With a Pseudo-API

For Google Colab notebooks, and a simple CSV with Column Header Keywords, and up to 50 topics. Adjust the timeframe as needed.

pip install pytrends

import pytrends
import io
from pytrends.request import TrendReq
import pandas as pd
from google.colab import  drive
from google.colab import files

uploaded = files.upload()

pytrend = TrendReq()
for keys in uploaded:
  name = keys
df2 = pd.read_csv(io.BytesIO(uploaded[name]))

topics = df2.Topic
dataframe = pd.DataFrame()
i = 0
for topic in topics:
  if isinstance(topic,str):
    pytrend.build_payload(kw_list=[topic], timeframe='today 12-m')
    data = pytrend.interest_over_time()
    df = pd.DataFrame(data)
    if i == 0:
      dataframe = df
      del dataframe['isPartial']
      i = i + 1
    keyword_colum = df[topic]
    dataframe[topic] = keyword_colum
print(dataframe)  
dataframe.to_csv("keywords_trends.csv", encoding='utf-8-sig')
files.download("keywords_trends.csv")

Special thanks to Fozan again for assistance.

Automating Search Interest & Trends Data Access with Python

pip install pytrends

import pytrends
from pytrends.request import TrendReq
import pandas as pd # Sets up the Pandas data analysis library
import time
import datetime
from datetime import datetime, date, time
from google.colab import  drive
from google.colab import files

pytrend = TrendReq()

pytrend.build_payload(kw_list=['digital marketing', 'digital media', 'interactive marketing', 'digital advertising'], timeframe='today 5-y', geo='US')

data = pytrend.interest_over_time()
df = pd.DataFrame(data)
df.to_csv('pytrends_data.csv', encoding='utf-8-sig')
files.download('pytrends_data.csv')

Google Colab notebook for automating download of Trends search interest data, inspired from this Search Engine Land writeup. Special thanks to Fozan for his help with populating the CSV download portion. 🙂

Account Service & Management for Marketing & Advertising

Recently I wrote about digital marketing agency strategy & planning – 5 pillars or behaviors top performers should exhibit.

Today, I want to look at account service & management for marketing / advertising agencies.

From an agency perspective, client management is at best a continually perplexing challenge. For agency owners and executives – the joy of growing relationships is often clouded by financial pressure.

In fact, I would readily wager that most pain & suffering in any given agency comes mismatched expectations with clients. More specifically, a selective application of the Pareto Principle (or 80/20) rule tells us 80% of our headaches at a digital marketing agency come from 20% of the clients.

There’s doubtless plenty of discussion to be had managing expectations, or about revenue diversification. But let’s look at a couple of frameworks A) financial and B) operational that can help simplify client service.

First- are you managing your client portfolio by profitability and touch?

There are two easy squares in the field above: top left and bottom right.

First, if you have a high touch, low profitability client – you can either A) raise retainer rates/prices, or B) fire them. You can try to reset expectations, but many high touch clients that cause pain do not respond well to this. One way you can automatically bake this logic in is by testing a recurring percentage increase to your retainer with these and/or other client types.

Second, if you have a low touch, high profitability client – keep and protect them! They’re rare birds.

Generally, it’s desirable to have high profitability clients over low- yet strangely for both of the quadrants above (L/L, H/H) a potential solve is automation. (With different purposes.)

Low touch, low profitability clients aren’t inherently bad – there are a lot of niche, perhaps even respectable agencies built on these kinds of clients. High touch, high profitability clients are only going to be sustainable if you become the likes of a big box agency.

Either way, you need to A) scale your capabilities to get more similar low touch clients (L/L), or find ways to get your time back against high touch clients (H/H). Especially so you can excel at the 5 pillars of agency strategy!

One small step forward you can take in automation is prioritizing work to become proactive, vs. blindly following retainers & SOWs until your teeth fall out from stress-clenching.

If you can automate portfolio dashboards of your clients – you can always have a real-time picture of what needs to happen in your client’s businesses. (Ex: a digital / performance marketing engagement that could include organic or paid media is shown below.)

Via Data Studio, Power BI, Tableau, etc., you can avoid overplaying winning accounts and quickly servicing trouble accounts before damage becomes significant.

This kind of system can start with high-level traffic and engagement figures. However, this method also holds promise for you to automate wins and insights for clients, drilling down to page and keyword insights that could be automated or outsourced so you can proactively send victories to clients, to maintain healthy temperature and momentum in your engagement.

One other way you try to reduce service burdens is to move to a JIRA-style ticketing system with your clients for requests – and personal interaction comes from you, on your terms.

I’m sure this won’t make any Pulitzer nomination roundups (rightly so), but this has been enjoyable to right, as I explore more ideas about excellence in marketing and account service.

Marketing Data Lakes & Warehouses

My most recent post covered strategy & planning for digital marketing agencies, specifically from an SEO perspective.

In my mind, one of the most compelling questions was this: why aren’t there more success stories with SEO agencies? From my young perspective, I call out 5 things successful entrants can do for the best shot at success.

We may also find that these pillars of success (ex: depth of product vision, operational efficiency) rarely exist in isolation.

That brings us to marketing data lakes and warehouses! Perhaps popularized by Informatica, data lakes are essentially all your data tossed into one space- hooray!

In their ebook, Informatica notes that data lakes are well-suited for large enterprises at $100MM+ and above. I would largely agree (for many reasons) and would also assert this leaves ample learning opportunity for brands below that threshold.

For the large enterprise above, there are often prerequisites to reaching that level. Specifically, you’ll often need to achieve certain sophistication before advancing to a level of ad hoc ML analyses of unstructured data in the cloud.

If we’re being honest, I’m probably bending and abusing some terminology here, but bear with me! For the vast majority of businesses (which also site below that revenue or capitalization threshold) – there’s a lot of power in data lakes, for the purpose of building small data – focused data warehouses.

To the points of my earlier posts, you need cohesive data that can paint a clear, end-to-end picture of your digital business. To that end, consider the following:

With the proper platforms, API scripts/pulls and storage, you can amass all of your key marketing data into one, or a few areas. Depending on your tactical execution, SQL joins or other BI/dashboarding software can be used to unite your data by URL.

From this picture, you can easily pivot, filter, splice, etc. to see what has been performing, how, and where the impact is going to your business.

Ex: in one picture, you can immediately juxtapose log file activity on your site with Search Console data to see where Google got hung in your site, as well as quickly identifying how much it’s impacting your visibility, and also quickly seeing which keywords are most affecting your business.

To take it a step further, this can be scaled across multiple clients, where you may write the client name or vertical into separate columns into all of your respective data pulls. This also facilitates large-scale rollups of your data. It’s my estimation that a compelling example of this may be Merkle’s quarterly digital marketing reports.

As with previous posts, I hope to continue peeling back this onion of excellence in marketing. However, my next planned immediate posts may pivot slightly to operational efficiency in the context of providing excellent client service.