Creating an XML sitemap file for SEO

What is a sitemap?

A Sitemap is a file that contains the exhaustive list of internal URLs of a website. This file has a XML format. It can provide major search engines with information on the URLs of a website they could not otherwise discover by themselves.

In order to understand what a sitemap.xml is, you must understand how search engine crawlers work (also called Web spiders, Web crawlers or bots) - Google's crawlers, Yahoo's or Bing's. Here are their actions when analyzing a site like www.mysite.com:

  • they start by downloading and parsing the robots.txt file (http://www.mysite.com/robots.txt).
  • they analyze the rules defined in the file to see what URLs they are allowed to download
  • If robots.txt allows them, they download the website's root, i.e. http://www.mysite.com/
  • they analyze the content of this page and extract primarily the list of internal links.
  • all these internal links are in turn downloaded (if the rules of the robots.txt file does not disallow it), and their internal links are extracted
  • the process of downloading and analyzing links continues until there are no other new links to extract.

So, we can make the following observation: if a specific URL is not accessible from the root or recursively in its descendants, then a robot cannot magically find it. It is the reason xml sitemaps were originally created for: so that the Webmaster of a website can indicate to robots the URLs they could not find alone.

The 3 types of entries of a sitemap: links, videos and images

A sitemap can contain "classic" URLs (HTML, PDFs ...), URLs of images and even video URLs.

The internal URLs of a sitemap

In addition to enabling the indexing of URLs that are hard to find, the sitemap file allows webmaster to associate URLs with meta data like:

  • the last date the URL resource was modified
  • an estimation of how frequently this resource is modified
  • the relative importance of this URL on the website

Search engines use the information listed in the sitemap to optimize the indexing of your website. But as always, it is unclear how exactly they use sitemaps and their information. However, they recommend not to 'cheat' on these values (e.g. artificially declaring in the sitemap a refresh rate that is not real).

Videos xml sitemap

Creating and submitting a video sitemap.xml allows you to control the information you send to search engines and also to make sure that all the videos on your website can be indexed by search engines.

Here is the information a video sitemap should contain

Tag Importance Description
<video:title> Mandatory Video title
<video:description> Mandatory Video description
<video:player_loc> Mandatory Video URL
<video:thumbnail_loc> Mandatory The URL of the thumbnail

If you want the video sitemap to fully contribute to Search Optimization (SEO), avoid:

  • Listing more than 50000 URL in a sitemap. If this limit is reached, you have to create multiple sitemaps files.
  • Disallowing videos in the robots.txt
  • Distribute videos in multiple directories

Images sitemap.xml

Submitting an images sitemap make these more easily indexed and increase their chances of being ranked in search engines. An images sitemap file allows you to:

  • provide additional information about your images
  • enable the indexing of all your images
  • identify the most important images that you want to index
  • index the images that would be accessible only through dynamic forms

Here is the information an images sitemap.xml should contain

Tags Importance Description
<image:image> Mandatory Tag that indicates all information concerning an image.
<image:loc> Mandatory Tag that indicates the URL of an image.
<image:capitation> Optional Image description
<image:title> Optional Image description
<image:license> Optional URL for the images's licence
<image:geo_location> Optional Here you can mention the geographical location of the image

If you want the images sitemap to fully contribute to images' SEO, avoid

  • Exceeding 50,000 lines per sitemap. If this limit is reached then do not hesitate to submit multiple sitemaps for images.
  • Blocking images in robot.txt
  • Placing images in multiple directories

Sitemaps also have a second advantage, and this one concerns all internal URLs of a site, not only those which are difficult to access: the sitemap file is used to enrich the URL list with metadata about those URLs, such as:

  • the last date when the a resource corresponding to a certain URL was modified
  • an estimation of how frequently this resource is modified
  • the relative importance of this URL on the website

Does my website need an XML sitemap file?

Now the question that arises is whether you need a sitemap for your website. Maybe your SEO consultant has advised you to create one. Well, for more simple websites, a sitemap is not really necessary. We recommend you not to expend energy to create a good sitemap in the following cases:

  • if all the links on your site are accessible by the root or its descendants (recursively). This is actually the case with most sites!
  • if you have relatively few technical resources for your website. A good sitemap is very well organized, respecting the XML standard and with very precise metadata values. Otherwise there is no point in having one. There are tools that automatically generate sitemaps, but often it is necessary to optimize them manually or by coding (particularly for choosing the values of meta-data).
  • if your website does not have content that might be interesting from an SEO point of view, or it contains content very often refreshed.

In other cases, especially for sophisticated and complexe websites, an XML sitemap can be very useful for the Search optimizations (SEO)

How to generate a sitemap.xml?

There are 3 ways to generate sitemaps :

  • manually, by creating an XML file. This method is usually not recommended, unless you really know what you are doing! You need to at least use an XML editor to create this file.
  • by doing a development process specific to your website: this method is very powerful and is going to take into account all your website's problems and particularities. Pay attention to the maximum number of URLs that goes into a sitemap. Sometimes you need to split them into multiple files.
  • using a software that generates sitemaps automatically. This method is very tempting: it involves minimal work and the results are very professional. However, make sure you choose a good tool. Yakaferci, for example!


    Analyze your xml sitemap with our free SEO Page Analyzer:
    ANALYZE


How to inform Google about your website?

Unlike with Robots.txt file, it is not enough to put a sitemap on your site for a robot to find it. Actually there is no specific naming convention for this file (several are possible, the file can be zipped etc ...). Therefore, you must explicitly declare the sitemap in search engines that you are particularly interested in. In Google's case, for example, you must submit the sitemap in the Webmaster tools.

Read more

If this article made you want to know more about sitemaps, here are some useful links: