Sean Butcher, Head of SEO at Blue Array talks about crawl budget, how you can make the most of it, and why having less, higher quality content is better for SEO, and for your website’s users.
More recently however, attitudes have shifted away from creating content for the sake of it; particularly so following after a post written by Gary Illyes on Google’s Official Webmaster Blog earlier this year, which focused on the importance of crawl budget.
This article is going to focus on what crawl budget is, and why this it is something you need to take into account if you want to improve your SEO performance.
What exactly is crawl budget?
It's a fairly common misconception that your whole website will be crawled each and every time those spiders from Google pay a visit. In truth, they will only ever crawl a limited number of pages - this is your crawl budget
It’s therefore important to refine what Googlebot sees when it comes to your website. Doing this ensures your most important pages are crawled on a frequent basis.
Crawl budget is a combination of two things:
1) Crawl rate limit
Factors that can affect a website’s crawl rate limit include:
How quickly your website and pages respond to requests to the server
Whether other factors are blocking Googlebot, such as limits set within the website’s Search Console account
Other technical factors (which we’ll look at in more detail below)
2) Crawl demand
How popular and authoritative your website is on the web
Whether pages have gone ‘stale’ - i.e. they haven’t been visited or updated for a while
Ultimately however, the size of your website (number of pages) will determine whether you need to worry about crawl budget. As a general rule, refining a crawl is for websites with many thousands of pages, therefore if you run a blog or even a small eCommerce website, crawl budget isn’t something you’re likely going to need to worry about.
How to find what Google is crawling
There are a couple of main ways to see what Googlebot is crawling when it comes to your website.
The most straightforward way is to look at your Google Search Console account, under the Crawl Stats area. This will show you an average number of pages crawled per day, alongside the highest and lowest numbers in the last 90 days.
The other, more detailed yet slightly more complicated way is to analyse your server log files. If you aren’t particularly technical then it’s likely you would need to request your log files from your developer.
Once you have your log files to hand, you can then run them through an analyser tool - Screaming Frog have log file analyser built into their crawling software, allowing you to see how many times your pages have been requested by Googlebot and other search engines.
Getting access to this data then allows you to assess which URLs on your website are currently deemed as low value, and therefore whether the important pages on your website are being crawled as much as you would like.
How to optimise your crawl budget
Once you’ve taken a look at your website’s log files and crawl stats, you can start to determine whether your crawl budget can be improved and/or focused more towards your most important pages.
It’s important to have a full understanding of your website when deciding how to refine your site. Carrying out a detailed content audit can help here - this usually involves combining your crawl with traffic data from your Google Analytics account.
Some of the biggest factors which can affect your crawl budget can include (as provided by Google):
1) URLs with faceted navigation or session ID’s
2) Duplicate content/pages on your site (e.g. caused by parameters)
3) Soft error pages
4) Hacked pages
5) Infinite spaces
6) Low quality content
It’s important that any URLs which include/fall under any of the above are not being crawled by Googlebot or other spiders. In some instances this may mean:
Blocking parameterised URLs using Search Console’s URL Parameter tool
NoIndexing/disallowing the URLs in the page/site’s robots.txt file
Removing the URLs from the website completely (as well as updating their server status to 404 or 410 and updating all internal links)
More tips for boosting crawl budget include:
Ensuring your XML sitemaps are completely up to date (ensuring they only include live, canonical versions of your pages)
Minimising any unnecessary redirects (update internal links to point to intended destination, tidy up any long redirect chains)
Work on improving your site speed to ensure server requests are delivered as quickly as possible)
Time to shift your focus?
The days of piling more and more useless content onto websites is now over. Instead, attitudes have shifted to a quality not quantity approach.
With so much content available online, it’s now all about how you can make your content stand out from an increasingly busy crowd. Before hitting that publish button, really consider who you are going to help, and how you are going to help them by having that content on your website.
Not only is creating higher quality content in less volume important from an SEO perspective, it’s also important in terms of usability. Surely it’s better to be offering true value to every single user coming to your website than chasing traffic which ultimately isn’t going to find the experience of being on your website a particularly good one?
ABOUT BLUE ARRAY
Blue Array was founded in 2015 as a specialist rather than generalist agency focused completely on the discipline of SEO. So different is the model, they had to make up a whole new word to describe themselves. Part agency, part consultancy, a ‘Consulgency. In just a couple of short years the team has grown from two to fifteen with an enviable list of clients from big brands such as Time Inc and Mumsnet to smaller startups such as Lexoo and RiseArt. The team is led by founder Simon Schnieders and Head of SEO Sean Butcher.