The meta robots tag
Presentation of the meta robots tag
The meta robots tag is an HTML tag that can control how search engines index and explore a web page.
In practice, this allows a Webmaster to prevent a search engine like Google or Bing to index a page or follow the links it contains.
Usually, webmasters are delighted when all their websites are indexed by the web crawlers of the major search engines. The more the website appears in the search results, the more „natural” traffic it will have.
However, sometimes the Webmaster doesn’t want certain pages of the website to appear in the results of the search engines (back-office URL, premium content addressing their visitors only, pages that are under construction etc...). The meta robot tag is one possible way of blocking this indexing.
For this, the webmaster must only place these lines between the tags<head>...</head> of the page :
<head> ... <meta name="robots" content="none" /> ... </head>
Why use the robots meta tag rather than the robots.txt file?
The most common way of blocking all or a part of a website for an indexing robot (like Googlebot for example) is to use a robots.txt file
However, in certain cases the meta robots tag is advisable :
- if a webmaster doesn’t have control over the robots.txt file placed at the root of the site (for example a blog platform shared by thousands of webmasters with a common domain name). In this case, the webmaster can place a meta robots tag on these pages to block the indexing.
- most websites have "back-offices", that is secret and secure access that allows for example to access the website content. The crawlers should never index these areas. But if we block the indexing using a robots.txt file, which is not secured, a hacker could easily discover the "secret" URL of the back-office. In this case, we use the none meta robot tag to make sure that the secret URL will never be in the search engines’ result pages
- even if a URL is blocked by the robots.txt file, it may still go up in the result pages of search engines (without having any associated useful content). For example, if backlinks point to a blocked page. In this case, we use a noindex meta robot tag to make sure that the URL will never appear in the first results of search engines.
The possible values of the meta robots tag
The possible values of the "content" attribute of the meta robots tags are :
|Variable||When using the meta robots tag|
This meta robot tag blocks the indexing of the page which contains it.
Note that the page will still be crawled but it won’t be indexed.
To prevent the crawling and the indexing of a page, you should rather use the file robot.txt
|nofollow||The nofollow meta robot tag prevents the Google robot (googlebot) from following the links contained in this page.|
|none||The same as noindex and nofollow.|
|nosnippet||The nosnippet meta robot tag blocks the display of a content extract in the search results.|
|noodp||The noodp meta tag blocks the use of a replacement description taken from a DMOZ directory.|
|noarchive||The noarchive meta robot tag is used to prevent the display of a link placed en cache and which is associated to a page..|
|unavailable_after: [date ]||This meta tag robot is used to specify the exact date and time when the crawling and indexing of a certain page must stop.|
|noimageindex||This meta tag robot is used to indicate that a certain page must be specified as the source of an image.|
Optimal use of the meta robots tag
Here are the rules to be known for an optimal use of the meta robots tags :
Case sensitivity of the robots meta tags
There are no strict rules on how to write the robots meta tags. Googlebot is able to read both uppercase and lowercase characters.
Thus, the following values will be interpreted exactly the same :
<meta name="ROBOTS" content="NOODP"> <meta name="robots" content="noodp"> <meta name="Robots" content="NoOdp">
But for the code to be easily readable, we advise you to stay consistent and to write only one way.
You need to insert multiple values?
If you must use multiple values for the robots meta tags, it is recommendable to integrate all the values in one robots meta tag. This will facilitate the reading of the robots meta tags and will avoid confusion or conflict between the different pages of your website.
If you use multiple values for the robots meta tags, separate them with a comma or a space.
Example of integration:
<meta name="robots" content="noindex, nofollow">
Using conflicting meta robots values
When using conflicting values of the meta robots tag, the search engines will take into consideration by default the most restrictive value.
For example, if the page contains these two meta robots values :
<META NAME="ROBOTS" CONTENT="NOINDEX"> <META NAME="ROBOTS" CONTENT="INDEX">
Google will only take into account « NOINDEX » which, in this example, is the most restrictive value
The conflict between the meta robots tag and the robots.txt file
In case of contradictory instructions between the robot.txt file and the meta tags robots, Google will follow the most restrictive instruction.
Therefore, when :
- a page is blocked by a robots.txt file, it will never be crawled by the search engines as they will never be able to read the meta tags robots.
- a page is not blocked by a robots.txt file, but its indexing is blocked with the meta tags robots, the page will be accessed, crawled, but not indexed.
The special meta robots tag for Googlebot.
If you want to give instructions to all the engines without making any distinction, use the meta « robots ».
To communicate exclusively with Google, use the meta « googlebot »