Technical SEO Audit with Google Webmaster Tools

by Jason Acidre on September 24, 2012 · 48 comments · Search


There are so many tools these days that can extremely make the process of site audit and optimization so much easier, and I’m betting that several tools are already running through your head. But sometimes, the best ones are those that are offered for free.

Google’s Webmaster Tools is certainly on the top of my list. This browser-based web application from Google has ton of functionalities that can help determine your site’s condition from one end to another.

Particularly in areas that are really important when it comes to search optimization (such as site structure and performance as well as content-level issues that the site should be fixing/improving).

So in this post, I’ll share a few of its features that you can use to easily analyze and optimize your site for search.

Finding Duplicates and Thin/Poor pages

Webmaster Tools offers lots of features that can help you identify poor content pages that could be affecting how your site performs on search results.

Nowadays, it’s really important to weed out pages from a site that may not be very useful to searchers. Having thin and duplicate pages from a site allowed to be accessed and be indexed by search engines might harm all its other pages’ ability to rank (Panda), because these pages mostly serve irrelevant and unusable content to search users.

In finding possible duplicate and thin pages within a site, I usually start by comparing the number of pages from the sitemap vs. the number of pages already indexed by Google.

On Webmaster Tools, go to “Optimization”, then to “Sitemap”:

There are two ways to compare the pages from your sitemap to the indexed pages on Google. The first one is by searching all the site’s pages on Google search:

The second method is through Google Webmaster Tools’ Index Status. Go to “Health”, and then to “Index Status”:

By doing this, you’ll get a rough estimation of how many thin/duplicate pages from the site have been already indexed by Google.

This will then make it easier for you to know how many pages you’d be looking for to be removed on Google’s index (by tagging these pages to “noindex” or by blocking access through your Robots.txt).

There are several ways to find thin and possible duplicate pages within the site. But the best place to start is through Google Webmaster Tools’ HTML Improvements feature.  You can start off by going to “Optimization, and then choose “HTML Improvements”:

From there, you can instantly get clues for possible issues that are causing duplication within your site and easily identify pages (URL parameters, session IDs and/or pagination problems) that you should be blocking search engines from indexing.

Check for each URL parameter if they are being indexed by Google, and take note of the amount for each to assess if there are still more possible duplicates/poor content pages within the site. You can use the “site:” and “inurl:” search operators in doing this task.

You can also get clues from the site’s crawl error data. Go to “Health”, and choose “Crawl Errors”. See the URLs, particularly the extended URL strings being crawled by Google:

Bonus: You can check your site’s “tag” and “search” folders too, and see if they are being indexed by Google. As these are commonly providing poor and irrelevant content to search users and can somehow hurt your site’s ability to have its important pages get better search rankings.

Once you have identified the pages that could be hurting your site’s overall rankings due to duplication and improper indexation, you can now start removing these pages from Google’s indices, by tagging them to noindex or by blocking bots from accessing these pages via Robots.txt.

Crawl Errors

The next one is pretty basic, but definitely as important as the first one shared on this post. Ensuring that search crawlers will have no issues in accessing internal pages of the site is necessary, as this aspect of site optimization improves both user-end experience and the process of site crawling.

It is also used as a ranking factor by search engines, wherein the site’s condition in terms of its crawl errors/status can send signals if the site (or its content) is geared up to be served to their users.

Identifying the pages that cause crawl errors (which may vary on several response codes) is easy with Google Webmaster Tools. You can easily get this data through the “Health” > “Crawl Error” feature of the toolset.

The next step is to gauge the importance of each page found causing crawl errors to the site, as by distinguishing their importance will lead you to the necessary fixes needed for each (you can download the list of all the pages with errors in excel format).

After identifying the level of priorities of the pages with errors, manually check the pages that are linked to them (as these are the ways search crawlers access the pages in your site with problems). This will help you decide with what solutions to take for each issue.

Most common fixes that you can do to fix crawl errors on a site:

  • Reviving the page on a new or old URL (if the non-existent page is important), then 301 redirect the old URL to the new one.
  • 301 redirecting the page to another relevant page/category (if the page is linked from external websites)
  • Removing internal links pointing to the 404 page (if page is not that important)
  • Fixing the page (if issue was caused by server-end or coding errors).

HTML Improvements

Another feature of WMT that I think is mostly overlooked by its users is the HTML Improvements, which can be found under the “Optimization” tab.

This feature allows webmasters to see pages of their site that may cause problems for both user experience and search performance. This includes pages that have:

  • Duplicate meta descriptions
  • Long meta descriptions
  • Short meta descriptions
  • Missing title tags
  • Duplicate title tags
  • Long title tags
  • Short title tags
  • Non-informative title tags
  • Non-indexable content

The list for each potential page-level issue can guide you on what changes/improvements to implement for the pages that search crawlers might have found to be causing indexation problems for your site.

Site Speed

Gaining insights on how your site is performing when it comes to its pages’ loading time is also available with Google Webmaster Tools. Just go to “Labs”, and then choose “Site Performance”.

The performance overview that this feature provides will give you better understanding if you need to optimize for this aspect of your or your client’s website.

Site speed has been a very important ranking factor for quite some time now, so using other tools like Google’s Page Speed or Pingdom is a good option to further elaborate your client recommendations. With that, you can include the specific areas/elements of their site that’s affecting its loading time.

Search Queries

The “Search Queries” feature – which can be found under the “Traffic” tab – is also a great way to track the progress of your SEO campaign or to determine if the site has been hit by a penalty/algorithmic update.

The search queries graph on the image above is from a website that has been affected by the first Penguin Update (April 2012). And with this feature we’re somehow able to see the progress of our campaign in regaining the site’s search visibility.

Another great way to make use of the available data from this feature is by downloading the table of search queries (with each query’s status when showing up on SERPs, like CTR, impression, position and number of clicks) in excel format.

This list can help you improve your campaign, in terms of optimizing, targeting and discovering high performing keywords (based on average search positions, number of impressions and click-through rate).

Structured Data

This new function on GWMT is also a great addition for your campaigns, especially in giving site recommendations to your clients. You can easily find this feature on the “Optimization” – and choose “Structured Data”:

The Structured Data feature will also tell you if the site hasn’t used any type of schema/microdata or authorship markups on any of its pages. You can then suggest it to your clients to improve their website’s performance in search.

But if in case the site already has implemented schema/microdata markups, clicking on each type listed on the “Structured Data” table will allow you to see all the pages that have used that certain type of markup.

You can then test some of these pages using the Structured Data Testing Tool to see if their markups are working well, as well as to see how the pages’ snippets will most likely be seen in the search results.

Link Profile Analysis

The thing that I love the most about Google Webmaster Tools is the amount of site data available to be extracted. And this also includes a site’s full link data, which means implementing an efficient link profile analysis is very doable.

What I usually do when using Google Webmaster Tools for link profile analysis is to download the entire list of external domains linking to the site.

You can start by going to “Traffic”, and then to “Links to your site”:

Check the full list of domains “who linked the most” to your site by clicking on “More”. Then download the entire list by choosing “download this table”:

 You’ll now have the full list of domains linking to your site in excel format:

Download the Neils Bosma SEO Tools for Excel (unzip the file after downloading it, and drag the SEO Tools XLL add-in file to the spreadsheet that you’ve just downloaded from Webmaster Tools):

I use this tool so I can include more metrics for each domain listed in the excel sheet, which I can use in better understanding the site’s entire link profile.

Next is to add the Alexa Reach scores for each listed domain (I just chose Alexa Reach so I can easily classify the listed domains in the latter part of this audit process – and the Alexa Popularity function doesn’t seem to work these days).

You can start by clicking on the 4th cell after the name of the domain (D2), and select “Offpage” from the “SEOTools” tab, and then choose “AlexaReach”:

After choosing on “AlexaReach”, it will display a pop-up window. The next step is to just simply click on the name of the domain (on cell A2) and hit “enter”.

The chosen cell will now then give you the current Alexa Reach score of the first listed domain. Copy the formula on that cell (press Ctrl+C on D2), and paste it up to the last cell on that column (to automatically extract the Alexa Reach scores for each listed domain).

Note: 0 Alexa Reach Score means the domain hasn’t been ranked by Alexa (N/A). This metric is pretty much the same with Alexa Popularity, the lower the number is the better (ex: Google is ranked #1 and Facebook is ranked #2).

With this upgraded list, you can analyze a lot of areas of a site’s link profile. For instance you can easily see if you’re site is getting sitewide links from low quality domains, just by sorting the “number of links” from the domain from largest to smallest (and seeing each domain’s Alexa Reach with the most links):

After sorting the second column of the spreadsheet, you’ll be able to see low quality domains that may have been containing sitewide links to your site:

Another way to utilize this list when performing link auditing is to use it to determine the ratio of low quality vs. high quality domains linking to your site.

What I usually do in this process of the audit is to sort the list by the listed domains’ Alexa Reach, from largest to smallest.

From there, I would copy the entire D column (where all the Alexa Reach numbers are) and paste it on a new excel worksheet. And then after pasting the numbers in a new spreadsheet, I’ll have to segment it into 4 parts:

  • High Alexa Rank (1,000,000+)
  • Decent Alexa Rank (100,000 – 999,999)
  • Low Alexa Rank (1 – 99,999)
  • No Alexa Rank (0)

And have a quick count of each column and list the amount for each type of domain (preferably on a new tab of the worksheet):

Because in that way, I can create a chart that depicts the type of domains linking to the site (you can easily create the chart by choosing the “Insert” tab, and then “Column”):

The list that you have created, with the help of Webmaster Tools, can also be used in pruning the links that might be passing little to no value to your site (or could be damaging to your site’s ranking performance).

Also, if you have created your own chart, you can easily assess if the site has participated on low quality link schemes in the past – basing on the ratio of low quality vs. high quality domains linking to them.

Lastly, the data you can extract from your Webmaster Tools’ “most linked content” can also help you evaluate if the site has been over optimizing, or even being attacked by negative SEO campaigns (which actually happened to my blog months ago).

There are so many things that you can do and data you can explore with Google Webmaster Tools. And the best thing about it is that Google is continuously enhancing the toolset – so take advantage of it!

If you liked this post, you can subscribe to my feed and follow me on Twitter @jasonacidre.

Jason Acidre

Jason Acidre is Co-Founder and CEO of Xight Interactive, marketing consultant for Affilorama and Traffic Travis, and also the sole author of this SEO blog. You can follow him on Twitter @jasonacidre and on Google+.

More Posts - Twitter - Facebook - Google Plus

{ 39 comments… read them below or add one }

Leave a Comment

{ 9 trackbacks }

Previous post:

Next post: