Actually Running Google Lighthouse Programmatically

Are you ready to run Google Lighthouse programmatically? This challenge has weighed on me for years. But with the help of some resources and friends, I’d love to show you a start-to-finish guide of truly running Google Lighthouse programmatically.

Programmatic Means Prep Work

First, my personal machine runs Windows 10, so my apologies to Mac & Linux users out there. Here’s what you need to get started, and huge shout-out to Upbuild for cutting the first part of the path here.

  • Install Node: https://nodejs.org/en/download/
  • After installation, search your start menu for “Command Prompt” or CMD
  • Assuming Node was installed correctly, and your command line is correctly configured, the following line in Command Prompt should install a Batch Lighthouse utility, a splendid bit of programming by Mike Stead
npm install lighthouse-batch -g

With this first round of prep, you can go get a list of URLs! For example, you might use Screaming Frog Command Line (CLI) automation for some pages.

JSON Files: Step 1 of Running Google Lighthouse Programmatically

The fastest way to run Lighthouse Batch programmatically is to use the -s method, shown below. Essentially the -s option allows you to paste in a comma-separated list of URLs to run.

Whether you’re working in Screaming Frog, or getting Google Search Console export data, you’ll want to use Convert.Town to transform column data into comma-separated values. (Hat tip again to Upbuild for this find.)

When you’re ready, open up command line, and let’s get Google Lighthouse running programmatically!

Here’s what your script should look like:

lighthouse-batch -s https://www.domain.com/page-1/, https://www.domain.com/page-2/, https://www.domain.com/page-3/, https://www.domain.com/page-4/, https://www.domain.com/page-5/ 

Hit enter, and watch it go! You may notice that the Command Line may take a few seconds to spool everything up before it starts running. After a few weeks of field-testing, I’ve been able to do up to 110 URLs consistently at a time without tripping an error.

You’ll see lots of wonderful code scrolling through the window, and after it finishes, you should have some JSON files in C:\Users\[YOUR USERNAME]\report\lighthouse

The Tricky Part of Running Google Lighthouse Programmatically

So if I ended this article here, it would be a poor summary of an Upbuild blog post. However! I found that while Google Sheets was indeed a brilliant solve, it didn’t meet my needs.

Here’s what I encountered in Google Sheets:

  • I may have broken some quotas for running scripts 🙂
  • The sheet took a long time to refresh queries
  • Because Google Sheets is less stable for large amounts of interactive data, it was slow to work with
  • Because the queries auto-refreshed, it wasn’t super handy for having a snapshot in time for later manipulation
  • Also, not being able to work with data locally (vs an FTP) didn’t go with my personal workflow very well

Again, huge ups to Upbuild for doing way more effort than I ever did on this! I just found that my needs diverged with the use case. Here’s what I did differently.

Larger Lighthouse Scale? Back to JSON

Here’s my bone to pick with trying run Lighthouse programmatically: JSON files make no sense to me as an SEO strategist and practitioner. (I get that JSON is great for developers for many reasons, but I’m not them.)

Sadly, Lighthouse only offers JSON or HTML reports, as far as I know. (If you know otherwise, please reach out! Would love to buy you a coffee. 🙂 ) So, that means you have to embrace the JSON.

I need tabular data, specifically, Excel exports to share with marketing and development teams for fixes from the report. So here’s where we are: we have a bunch of JSON files sitting on the hard drive.

This means command line batch script might be a good fit. Is it possible? How do we do it?

Command Line Batch Scripting to the Rescue!

In very simple terms:

  • We have information in File A (JSON) that needs to be extracted, and
  • Transformed into tabular (CSV) form, and
  • Both executed in a programmatic fashion

I unsuccessfully tried a number of manual batch script options, but if we go looking, there’s a solution to get us on our way! Meet JQ, a command line processor for JSON. Particularly, JQ can extract data from a JSON file and transform it into a CSV!

If we mesh JQ with some traditional batch scripting, we can achieve the requirements above.

A Little More Prep Work to Get Programmatic

You can download JQ here or at the link above (same URL.) There is some attention to detail here. In order to make JQ work properly in Command Line / Batch Script, you’ll need to copy the address of where you installed JQ and add it to your Path. Here’s a quick video showing you how to do it on Windows.

Once you have JQ successfully downloaded, installed and tested, it’s time to begin some scripting.

Programmatic Magic With Batch Script

Here’s both the fun and challenging part! In order to achieve a successful result, we need to lay out how our script will work in “psuedocode”. (Using pseudocode very loosely here as instructions to how the program should function.) Pseudocode follows.

Psuedocode for Programmatic Lighthouse Batch Script

For each JSON file (Sample JSON files attached) in directory C:\Users\yourusername\report\lighthouse, perform the following:

  • Extract the keys and values from this sub-section of each json file: audits > uses-rel-preload > details > items (We’re looking for url and wastedMS, and the values underneath)
  • Convert the extracted JSON data to a CSV, and place it in newly created folder named “uses-rel-preload” (without quotation marks) within C:\Users\yourusername\report\lighthouse. The CSV file should be named the same as the JSON file name (without the .json file extension)
  • Within the newly created CSV, insert two new columns to the left of “url” and “wastedMS”. Title the new column A “auditItem” and the new column B “sourceURL”.
  • In column A, populate “uses-rel-preload” (without qoutation marks) into each row where there are values in Columns C & D.
  • In column B, populate the CSV file name for each row where there are values in Columns C & D.
  • If there are no values in the JSON file from Step 1, create a new CSV with the JSON file name (no JSON file extension) in the new folder under C:\Users\yourusername\report\lighthouse (from step 2), with first column header “url”, second column header “wastedMS” and a single row below, with both cells containing “N/A”.
  • Within the newly created CSV, insert two new columns to the left of “url” and “wastedMS”. Title the new column A “auditItem” and the new column B “sourceURL”.
  • In column A, populate “uses-rel-preload” (without qoutation marks) into the first blank row.
  • In column B, populate the CSV file name into the first blank row.
  • Repeat/loop this action for every JSON file in the starting directory, but do not create a new folder in Step 2 for each following run, continue to use the same “uses-rel-preload” folder.
  • After the loop has completed for all files in the starting directory, Change directories to the new uses-rel-preload folder in Step 2. (C:\Users\yourusername\report\lighthouse\uses-rel-preload)
  • In that directory, combine all the CSV files (except for the header row in 2nd and subsequent CSV files, do not include the header row after the first file) into a new file titled uses-rel-preload.csv.

Batch Script for Programmatic Lighthouse Data Extraction

Alright, fire up your favorite text or code editor, and customize this to your needs! Personally, I will call the file via Command Prompt. Also, special thanks to Albin for his help making the JQ and CSV transformation.

:: Housekeeping and settings
TITLE
echo off
:: Enable replacing variables with values in the for loop
setlocal enabledelayedexpansion
:: Clear the screen
cls
:: Specify the working directory where the loop needs to perform
cd "C:\Users\yourusername\report\lighthouse"
:: For all the items (declare the working variable) in the set: active directory previously specified, all files matching *wildcard - all json files
for /f "tokens=*" %%A in ('dir /b /a-d *.json ') do ( 

	echo %%A
	set inputfile=%%A
	echo !inputfile!
	set inputfilename=%%~nA
	del !inputfilename!.csv_temp
	del !inputfilename!.csv

:: Parse the JSON with jq, extracting the specific details from the audit items and recommendations

:: Call JQ and a filter file that transforms the data into a temporary CSV file 
	type !inputfile! | jq .audits.\"uses-rel-preload\".details.items | jq -r -f filter.jq>>!inputfilename!.csv_temp

:: Create the first row of the CSV
	echo auditItem,sourceURL,url,wastedMs>!inputfilename!.csv

:: Check for empty files and provide fallback values if there are no recommendations
	for /f %%x in ("!inputfilename!.csv_temp") do set size=%%~zx
	if not !size! gtr 0 echo uses-rel-preload,!inputfile!,N/A,N/A>>!inputfilename!.csv

:: For all lines, make csv and add additional column
	for /F "skip=1 tokens=1,2 delims=," %%i in (!inputfilename!.csv_temp) do ( 			
			set url=%%i
			set wastedms=%%j
			set url=!url:"=!
			set wastedms=!wastedms:"=!
			echo uses-rel-preload,!inputfile!,!url!,!wastedms!>>!inputfilename!.csv
		)

:: Clear the temporary CSV files out of the working directory
	del !inputfilename!.csv_temp

	)

:: Make a new folder to house the files
mkdir uses-rel-preload


:: Move all CSV files into the audit type folder
move *.csv "uses-rel-preload"

:: Change the working directory
cd "C:\Users\yourusername\report\lighthouse\uses-rel-preload"

:: Start merging the individual files into a single CSV for the Lighthouse audit
:: Set a counter to 1
set cnt=1

:: For each file that matches *.csv, do the following loop
for %%i in (*.csv) do (
:: Call the counter 1 it's the first time running
  if !cnt!==1 (
:: Push the entire file complete with header into uses-rel-preload.csv - this will also create uses-rel-preload.csv
    for /f "delims=" %%j in ('type "%%i"') do echo %%j >> uses-rel-preload.csv
:: Otherwise, make sure we're not working with the uses-rel-preload.csv file and
  ) else if %%i NEQ uses-rel-preload.csv (
:: push the file without the header into uses-rel-preload.csv
    for /f "skip=1 delims=" %%j in ('type "%%i"') do echo %%j >> uses-rel-preload.csv
  )
REM increment count by 1
  set /a cnt+=1
)

Additionally, here’s the filter file referenced in the code.

def tocsv:
    (map(keys)
        |add
        |unique
        |sort
    ) as $cols
    |map(. as $row
        |$cols
        |map($row[.]|tostring)
    ) as $rows
    |$cols,$rows[]
    | @csv;

tocsv

Ok! There’s a lot going on. Let’s make sure a few important details aren’t missed:

  • This assumes that you left the JSON files in the default export location, C:\Users\yourusername\report\lighthouse\uses-rel-preload
  • This also assumes that you know how to run a batch script.
  • It’s not necessary per se, but for simplicity, I highly recommend you run and save the batch script files and resources all in the same directory as cited above
    • You’ll just have to be thoughtful about file path references if you spread resources/calls out.

The Output

Here’s what you should have:

  • The JSON files in the original directory
  • A new folder with the CSV files that contain URLs referenced in the Lighthouse uses-rel-preload report
    • The rows will contain N/A if the audit was empty, or had no results
  • Also within the folder should be a report-level CSV with all the CSVs containing all preload recommendations for all page you ran, combined into one sitewide file! Hooray!

Limitations on this Programmatic Lighthouse Approach

The astute will notice that this approach only covers one audit! Therein lies the downside to this approach, you’ll need to customize and program out additional code/files for other audits.

Assuming that you program out all the audits, you’ll want to create an audit summary folder within the first batch script file. This can be done with the mkdir directive in the script.

Additionally for each audit, you’ll want to copy the single overarching audit file into the audit summary folder. Using similar script as the last half of the above, you can combine all the audits into a single CSV that can easily be manipulated or ingested.