The Google Webmaster Tools (GWT) Sitemaps page can be an extremely useful feature for webmasters trying to get better insight into how Google is crawling their XML sitemaps. However, this tool may be deceiving you if you’ve mistakenly magnified the number of URLs submitted and indexed, which is an easy mistake to make because of how Google deals with sitemaps loaded by themselves and those loaded within a Sitemap Index. Let’s explore the problem and how advertisers can avoid it.
Double-Counting of URLs Submitted and IndexedThe problem arises when a site submits a particular XML Sitemap by itself, as well as within a sitemap index, through this tool. Instead of GWT realizing this one sitemap is duplicated, it will count all URLs submitted and indexed. This means that if a 1,500 URL XML sitemap that has 1,000 of those URLs indexed is submitted by itself and within an index, Google will report this as 3,000 URLs submitted and 2,000 indexed. Unfortunately, the issue above carries over to individual URLs within a sitemap. If you submit an XML sitemap with 1,500 URLs, and one URL is repeated 20 times in this sitemap, GWT will still report this as 1,500 URLs submitted. Furthermore, if this URL is indexed it will also be counted 20 times in the indexed figure. Lastly, if a particular URL is repeated in different XML sitemaps, and those sitemaps have been submitted to GWT, it will still be counted each time towards the submitted and indexed figures.
Preventing Inaccurate URL Counts in GWTAs you can see, this issue can cause not only confusion but also inaccurate data for webmasters. To ensure this duplication issue does not affect your sitemaps data, we recommend the following:
- If a particular XML sitemap is contained within your sitemap index, do NOT submit it by itself outside of this sitemap index
- If you have multiple XML sitemaps, ensure that no URLs are duplicated between sitemaps
- URLs submitted in an XML sitemap should be:
- 200 status URLs
- Have self-referencing canonical tags, or at the least not have canonical tags pointing to other pages
- Not be excluded in your robots.txt file