Using canonical URL to avoid the duplicate content

What is a link rel=canonical tag ?

The rel=canonical url tag is an HTML tag that is used to specify the Canonical URL to search engines on a given page. See the next section to understand what a canonical URL is

This tag must be placed in the <head> area of the HTML file :

<link rel="canonical" href="" />

Several English terms are used to designate the link canonical tag: « canonical tag », « rel=canonical », « rel canonical tag », « canonical URL tag » , « link canonical tag ».

What is a canonical URL ?

A canonical URL represents the preferred version of a set of pages with similar content which can cause duplicate content issues. The canonical URL should be carefully chosen by the Webmaster in charge of the website.

The use of the canonical tag allows site owners to identify the sets of identical content and inform search engines such as Google about : "Among all the pages with very similar content, please only take into account this canonical URL in the search results".

Actually, the URLs containing an HTML code indicating another canonical URL via this tag will never appear in Google search results. It is the canonical URL that will be used by search engines.

Thus, the canonical URL helps us avoid the duplicate content issues.

Read the article on duplicate content for more information on duplicate content.

Read the article on the URLs (Uniform Resource Locator) for more information on URLs.

Goals of the «link rel=canonical» tag

The use of the «link rel=canonical» element has 3 very important goals for SEO:

  • The canonical URL helps search engines to easily and quickly identify which page to include or exclude from indexing and what URL version to choose for the search queries
  • The «link rel=canonical» helps avoid possible SEO penalties related to duplicate content use often automatically generated by the CMS.
  • The «link rel=canonical» tag may increase content popularity by making all backlinks point towards a unique URL. Sometimes more backlinks point to the same content, but the links to the contents may vary depending on the sorting parameters they contain.

How to set up the «link rel=canonical» URLs?

Marking pages with «link rel=canonical»

When you have two or more very similar pages, it is recommendable to indicate which one should be crawled and indexed by the search engines.

For that, you can use the «link rel=canonical» element. On similar or secondary pages, you should add the «link rel=canonical» element in the <head> of the page.

<link rel="canonical" href="" />

A sitemap indicates the favorite URLs with the «link rel=canonical» tag

To accelerate the exploration of your site by search engines and ensure your canonical URLs will be well taken into account, it is often advisable tocreate a sitemap. Select the canonical URLs for each of your pages and specify them in the sitemap that you submit to search engines.

Of course, the secondary URLs must not be listed in the sitemap.

How do I know whether to use the link canonical URLs?

It is recommended to use the canonical tag link if your site contains two or more different URLs presenting identical content.

If the content of two pages is pretty similar but not identical, then, from a marketing and from an SEO point of view it is recommendable to rethink the pages and make their content unique.

To find out whether two or more pages are similar, there are two simple verification methods :

  • Place the two pages side by side and check if the same sentences are found on both pages. In this case, you are dealing with duplicate content.
  • Analyze the pages with our Yakaferci tool. In the chapter content / keyword density, verify if you have the same keywords list with the same variations (lemmatization) and the same occurence.

The use of the link canonical tag:

The canonical link tag is a fast and healthy solution to resolve the cases of duplicate content found on your website

Case 1 : Dynamical Urls are generated by the product sorting function

To illustrate the canonical URL concept, imagine that the /all-products.html URL of a website contains the complete list of all the products of an e-commerce website, in alphabetical order. And if the user sorts the products in descending order, the URL used by the web developper is /all-products.html?sort=Z-A. In this case, the two URLs have the same content but in a different order. It is a typical case where it is recommendable to indicate to search engines a "canonical URL " that should be indexed :

Link canonical

Case 2 : CMS generated different URLs

Your management system (CMS) generates different URLs each time you place a product in several categories or subcategories.

In this case to avoid duplicate content issues, it is advised to define a favorite URL.

URL with identical content

Solution: define your favorite URL

URL with identical content Solution: define your favorite URLégory-red/01-product

What errors should be avoided?

Here is a partial list of errors of implementation of the "link rel=canonical" tag discovered on different websites:

  • the «canonical» tag placed in the body and not in the <head> of the HTML code of the web pages (canonical page and pages with duplicate content)
  • If you want the rule you set up to be rapidly and efficiently taken into account, you should place the « link rel=canonical » tag in the head tag. This way, the engines are informed about it as soon as the page is loaded.
  • The URL writing (relative or absolute) in the link canonical tag: It is recommendable to use absolute paths ( rather than relative paths (/01-product) This will accurately indicate the pages affected by the rule and will avoid any confusion related to certain pages.
  • Inserting the canonical tag in the content if the on page content is different. If you have 200 different articles in a certain category, but you only have 50 articles on a page, you often use a pagination method (page 2, page 3, page 4) to present all your articles. You should not use the link rel=canonical tag to indicate that page 1 is your favorite because the other pages (page 2, page 3 and page 4) containing different articles will not be indexed, which is bad for the SEO.
  • Specify different canonical URLs for the same page. This will create confusion between contradictory rules. The same page can be designated canonical only once.
  • The canonical URL returns a 404 error. If the canonical URL returns a 404 error, and the secondary pages are « forbidden » to indexing by the link rel=canonical tag, your content will completely disappear from the search results.
  • Your chosen page must not contain the meta robots «noindex» tag or be forbidden in the robots.txt file. The robots.txt prevails over the meta robots «noindex» tag, which prevails over the link rel=canonical tag. If the robots.txt or the meta robots «noindex» tag forbid a canonical page to indexing, and at the same time all the secondary pages (for the canonical page) are forbidden by the link rel=canonical tag, then the content will disappear completely from the search engines’ index.
  • The chosen page and the secondary pages can’t contain other link canonical tags. If more pages with link canonical tags send mixed messages, this will not help the search engines and might invalidate the effect of the canonical tags.
  • The wrong page is marked as canonical page.

The difference between a canonical URL and a 301 redirect

From an SEO point of view, the canonical URL and the 301 redirect send a very similar message to search engines: « my website contains several pages with very similar content which can be «regrouped» under a single page »

However, there are important differences between a canonical URL and a 301 redirect:

The 301 redirect returns 100% of the traffic coming from users and search engines towards a unique URL, while the canonical URL is used only for the search engines. The secondary URLs improve website usability for the users, (for example the sorting function) but are not useful for the search engines.

Read the article on 301 Redirects 301 for more information on the subject.