Scraping the SERPs to Determine Timing of Journalist’s Topic Coverage
You researched the right websites, the right contacts, wrote an exciting e-mail and now you’re waiting for a story to pick up. This is the moment where a lot of us get anxious because the outcome is out of our control. Some journalists will get in touch right away, some will open your emails over and over and do nothing, and a some will publish a story days or weeks later. If the story isn’t picked up quickly, we start questioning if the campaign will fail.
Sometimes you can have a good story, but the timing is wrong. Have you ever wondered when journalists are likely to publish about a certain topic? Well, now you can have that answer.
Using a crawler and some easy XPath rules, you can scrape Google News and find out the specific dates those topics have hit the news in the past. When does the Christmas season start in the press? When is the new GoT season likely to become trendy? Keep reading to find out!
Guide and template for SERP scraping
We’ll show you how to run this using Screaming Frog, but I imagine other crawlers could do the same. The first step is to learn some XPath rules, which I learned from this guide published by BuiltVisible.
The goal is to scrape the top 100 (or more if you like) results for a certain topic and extract the date when these articles were published. This will give you a view for recurrent events (such as holidays) or even general behaviour towards a topic.
I very much recommend learning those rules because there’s much more you can do using XPath rules – but for the sake of this exercise, you just need to configure Screaming Frog with a few rules.
- Untick all boxes from Basic Configuration
Configuration > Custom Extraction:
- Add //h3, //div/span and choose “Extract Text”
- Add //h3/a and choose “Extract Inner HTML”
If you want to save the trouble on all of the above, here are the configured files for Google universal and Google News.
Now, we need to find the exact URL we want to scrape. This is where we I narrow down our target. Since I work for Wolfgang Digital and we’re an Irish agency, we picked St Paddy’s as a test.
The event happens every year in March, so I decided to filter anything available on Google News published between 1st February and 31st of March. You can filter the content directly on Google.
Usually shorter periods work better because you’re showing the 100 most important pieces of coverage on that particular topic and a larger selection of dates will just ignore many articles.
Once you have the URL, just add &num=100 at the end, so you can see the top 100 results in one page.
All configured? Then hit Start on Screaming Frog and go to the Custom tab (the very last one), where you’ll find the extracted data you just requested. The view is not …read more
Read more here:: buzzstream.com