Google Search Console: Download & Export All Website Data

Another update! I’ve tested this head-to-head with some out-of-the-box solutions and found some interesting results. Say with a Data Refinery-type product, once you add dimensions (such as URL) the completeness of the data will decrease, due to data loss GSC developer documentation notes as unavoidable. However, out-of-the-box API connector products may see 30% or more […]

Another update! I’ve tested this head-to-head with some out-of-the-box solutions and found some interesting results. Say with a Data Refinery-type product, once you add dimensions (such as URL) the completeness of the data will decrease, due to data loss GSC developer documentation notes as unavoidable. However, out-of-the-box API connector products may see 30% or more of data loss, where this script may only experience 15-20%. Note this is a limited sample and shouldn’t be considered an authoritative view. (Data loss is determined when summed or pivoted KPIs such as clicks don’t match from the export vs. the summary scorecard figures in the UI.)

Update! I’ve done a bit more work recently, and have included an R script at the end of this that will let you mechanically pull up to 16 months of data, by month and also get a combined Excel file. It’s mechanical in the sense that I haven’t yet worked out automating date lookups, or expanding this beyond one website at a time. However, if you need a lot of data in a hurry, with minimal work…this should help you!

Another day, another blog post? This is a rarity for sure. At the moment, I’m starting to hit my stride on getting a few old items done in R. 🙂 If I keep writing more blog posts, I’ll need to start taking more photos again for the feature covers!

Quickly sharing how I was able to use R and Google Search Console to download & export all of my website’s search analytics data.

Why Use R to Download & Export all Search Console Data?

Even for the novice SEO, this is likely a well-tread frustration. Valuable data to make critical business decisions is sitting in Google Search Console. But you can’t access it at scale!

It’s either hidden behind the abysmal UI export limit, or you have to do so much filtering that’s impossible to scale the data extraction.

SearchConsoleR: Easy Google Search Console API for Unlimited Data Access

Similar to my previous post (How to Bulk Submit Sitemap URLs in Google Search Console) this article looks at the splendid SearchConsoleR package in R. See the link above for prerequisites.

The code to get all your data in Google Search Console is really quite brief. I’ve listed it below, as well a link to the Github gist.

Note that if it’s your first time using RStudio, you’ll need to use install.packages() to load up the necessary dependencies.

# Load packages
library(searchConsoleR)
library(dplyr)
library(ggplot2)
library(writexl)

# Authorize & choose Google profile
scr_auth()

# Specify website --- CLIENT
website <- "https://www.sample.com"
dimensions <- c("page", "query")

ttl_queries <- 
  search_analytics("https://www.sample.com", 
                   "2019-08-01", "2019-08-31", 
                   c("query", "page"), 
                   searchType="web", rowLimit = 100000)

# Write the data frame to an XLSX file
write_xlsx(ttl_queries, "ttl_queries.xlsx")

If the code has run correctly, you should have a (potentially) massive Excel file at your disposal! The possibilities are endless – data crunching in Excel, piping to a database or some other visualization software. Enjoy!

Update: A Mechanical Download of All Google Search Console Data for up to 16 months

You can also find this script in a Github Gist here.

# Load packages #
library(searchConsoleR)
library(dplyr)
library(ggplot2)
library(writexl)
library(stringr)

# Authorize & choose Google profile #
scr_auth()

# Specify website --- Client Code or Line of Business Code Here #
client <- "CLCO"
website <- "https://www.twosigma.com"

############ SPECIFYING THE MONTHS ############
start1 <- "2020-04-01"
end1 <- "2020-04-30"

start2 <- "2020-05-01"
end2 <- "2020-05-31"

start3 <- "2020-06-01"
end3 <- "2020-06-30"

start4 <- "2020-07-01"
end4 <- "2020-07-31"

start5 <- "2020-08-01"
end5 <- "2020-08-31"

start6 <- "2020-09-01"
end6 <- "2020-09-30"

start7 <- "2020-10-01"
end7 <- "2020-10-31"

start8 <- "2020-11-01"
end8 <- "2020-11-30"

start9 <- "2020-12-01"
end9 <- "2020-12-31"

start10 <- "2021-01-01"
end10 <- "2021-01-31"

start12 <- "2021-02-01"
end12 <- "2021-02-28"

start13 <- "2021-03-01"
end13 <- "2021-03-31"

start14 <- "2021-04-01"
end14 <- "2021-04-30"

start15 <- "2021-05-01"
end15 <- "2021-05-31"

start16 <- "2021-06-01"
end16 <- "2021-06-30"


# # # Start mo 1

mo1_queries <- search_analytics(
  siteURL = website, 
  startDate = start1,
  endDate = end1,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

# Dynamically get the number of rows from the GSC export so the created vectors match number of rows
rows <- nrow(mo1_queries)

# Make a vector that populates the client code
clientName <- rep(client,rows)

# Make a vector that populates the start date for Month filtering, segmenting, pivoting
date <- rep(start1,rows)

# Make a data frame from the created vectors
mo1_queries_merged_1 <- data.frame(clientName,date,mo1_queries)

############### QA CHECKPOINT ######################
# Write the data frame to an XLSX file
write_xlsx(mo1_queries_merged_1, "mo1_queries_1.xlsx")

# # # END mo 1


# # # Start mo 2

mo2_queries <- search_analytics(
  siteURL = website, 
  startDate = start2,
  endDate = end2,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo2_queries)
clientName <- rep(client,rows)
date <- rep(start2,rows)
mo2_queries_merged_1 <- data.frame(clientName,date,mo2_queries)
write_xlsx(mo2_queries_merged_1, "mo2_queries_1.xlsx")

# # # END mo 2



# # # Start mo 3

mo3_queries <- search_analytics(
  siteURL = website, 
  startDate = start3,
  endDate = end3,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo3_queries)
clientName <- rep(client,rows)
date <- rep(start3,rows)
mo3_queries_merged_1 <- data.frame(clientName,date,mo3_queries)
write_xlsx(mo3_queries_merged_1, "mo3_queries_1.xlsx")

# # # END mo 3


# # # Start mo 4
mo4_queries <- search_analytics(
  siteURL = website, 
  startDate = start4,
  endDate = end4,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo4_queries)
clientName <- rep(client,rows)
date <- rep(start4,rows)
mo4_queries_merged_1 <- data.frame(clientName,date,mo4_queries)
write_xlsx(mo4_queries_merged_1, "mo4_queries_1.xlsx")

# # # END mo 4


# # # Start mo 5

mo5_queries <- search_analytics(
  siteURL = website, 
  startDate = start5,
  endDate = end5,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo5_queries)
clientName <- rep(client,rows)
date <- rep(start5,rows)
mo5_queries_merged_1 <- data.frame(clientName,date,mo5_queries)
write_xlsx(mo5_queries_merged_1, "mo5_queries_1.xlsx")

# # # END mo 5


# # # Start mo 6

mo6_queries <- search_analytics(
  siteURL = website, 
  startDate = start6,
  endDate = end6,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo6_queries)
clientName <- rep(client,rows)
date <- rep(start6,rows)
mo6_queries_merged_1 <- data.frame(clientName,date,mo6_queries)
write_xlsx(mo6_queries_merged_1, "mo6_queries_1.xlsx")

# # # END mo 6


# # # Start mo 7

mo7_queries <- search_analytics(
  siteURL = website, 
  startDate = start7,
  endDate = end7,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo7_queries)
clientName <- rep(client,rows)
date <- rep(start7,rows)
mo7_queries_merged_1 <- data.frame(clientName,date,mo7_queries)
write_xlsx(mo7_queries_merged_1, "mo7_queries_1.xlsx")

# # # END mo 7


# # # Start mo 8

mo8_queries <- search_analytics(
  siteURL = website, 
  startDate = start8,
  endDate = end8,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo8_queries)
clientName <- rep(client,rows)
date <- rep(start8,rows)
mo8_queries_merged_1 <- data.frame(clientName,date,mo8_queries)
write_xlsx(mo8_queries_merged_1, "mo8_queries_1.xlsx")

# # # END mo 8


# # # Start mo 9

mo9_queries <- search_analytics(
  siteURL = website, 
  startDate = start9,
  endDate = end9,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo9_queries)
clientName <- rep(client,rows)
date <- rep(start9,rows)
mo9_queries_merged_1 <- data.frame(clientName,date,mo9_queries)
write_xlsx(mo9_queries_merged_1, "mo9_queries_1.xlsx")

# # # END mo 9


# # # Start mo 10

mo10_queries <- search_analytics(
  siteURL = website, 
  startDate = start10,
  endDate = end10,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo10_queries)
clientName <- rep(client,rows)
date <- rep(start10,rows)
mo10_queries_merged_1 <- data.frame(clientName,date,mo10_queries)
write_xlsx(mo10_queries_merged_1, "mo10_queries_1.xlsx")

# # # END mo10


# # # Start mo 11

mo11_queries <- search_analytics(
  siteURL = website, 
  startDate = start11,
  endDate = end11,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo11_queries)
clientName <- rep(client,rows)
date <- rep(start11,rows)
mo11_queries_merged_1 <- data.frame(clientName,date,mo11_queries)
write_xlsx(mo11_queries_merged_1, "mo11_queries_1.xlsx")

# # # END mo 11


# # # Start mo 12

mo12_queries <- search_analytics(
  siteURL = website, 
  startDate = start12,
  endDate = end12,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo12_queries)
clientName <- rep(client,rows)
date <- rep(start12,rows)
mo12_queries_merged_1 <- data.frame(clientName,date,mo12_queries)
write_xlsx(mo12_queries_merged_1, "mo12_queries_1.xlsx")

# # # END mo 12


# # # Start mo 13

mo13_queries <- search_analytics(
  siteURL = website, 
  startDate = start13,
  endDate = end13,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo13_queries)
clientName <- rep(client,rows)
date <- rep(start13,rows)
mo13_queries_merged_1 <- data.frame(clientName,date,mo13_queries)
write_xlsx(mo13_queries_merged_1, "mo13_queries_1.xlsx")

# # # END mo 13


# # # Start mo 14

mo14_queries <- search_analytics(
  siteURL = website, 
  startDate = start14,
  endDate = end14,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo14_queries)
clientName <- rep(client,rows)
date <- rep(start14,rows)
mo14_queries_merged_1 <- data.frame(clientName,date,mo14_queries)
write_xlsx(mo14_queries_merged_1, "mo14_queries_1.xlsx")

# # # END mo 14


# # # Start mo 15

mo15_queries <- search_analytics(
  siteURL = website, 
  startDate = start15,
  endDate = end15,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo15_queries)
clientName <- rep(client,rows)
date <- rep(start15,rows)
mo15_queries_merged_1 <- data.frame(clientName,date,mo15_queries)
write_xlsx(mo15_queries_merged_1, "mo15_queries_1.xlsx")

# # # END mo 15


# # # Start mo 16

mo16_queries <- search_analytics(
  siteURL = website, 
  startDate = start16,
  endDate = end16,
  dimensions = c("query", "page"),
  searchType="web",
  rowLimit = 1000000
)

rows <- nrow(mo16_queries)
clientName <- rep(client,rows)
date <- rep(start16,rows)
mo16_queries_merged_1 <- data.frame(clientName,date,mo16_queries)
write_xlsx(mo16_queries_merged_1, "mo16_queries_1.xlsx")

# # # END mo 16


total_queries_by_page <- rbind(mo1_queries_merged_1,mo2_queries_merged_1,mo3_queries_merged_1,mo4_queries_merged_1,mo5_queries_merged_1,mo6_queries_merged_1,mo7_queries_merged_1,mo8_queries_merged_1,mo9_queries_merged_1,mo10_queries_merged_1,mo11_queries_merged_1,mo12_queries_merged_1,mo13_queries_merged_1,mo14_queries_merged_1,mo15_queries_merged_1,mo16_queries_merged_1)

write_xlsx(total_queries_by_page, "total_queries_by_page.xlsx")
%d bloggers like this: