How to protect your site against duplicate content ?

What is duplicate content?

We call duplicate content the text we find in two or more pages, whether it's on the same website or on two or more different websites.

In case we find the same text on several different websites, it is often called plagiarism, or theft of content.

Duplicate content and SEO

Plagiarism is obviously sanctioned legally when you can prove the ownership.

Also note that for the sake of natural search results quality (SEO), search engines try to identify websites that host duplicate content to punish them heavily.

Websites that duplicate content may:

  • Decline in engines'search results pages (SERP). In other words the copying website may notice a decline in search results.
  • The complete disappearance of the website in organic searches results. In other words, the site may no longer be referenced by the search engines.

Here are some cases of duplicate content on the same website :

  • The same content (text, images, links...) appearing on two different URLs of the same Website. This case of duplicate content usually appears in a CMS (Content management system) when you create two identical pages with a different URL for each. This case of duplicate content occurs due to human error or to technical error. This is obviously not the most serious case of duplicate content from the SEO point of view - unless you have a very large number of duplicate pages.
  • If you have recently restructured your website and have for example changed its structure and therefore its URLs, make sure the old and the new URLs do not coexist and do not create duplicate content.

Here are some examples of duplicate content on different websites :

  • The description of the same item on different e-commerce websites, which would be identical or too similar.
  • A much more common case of duplicate content is the use of text extracts which already appear on other web pages or websites. This case of duplicate content appears when one uses the same content provider when dealing with the same subject matter. This case could be problematic from the SEO point of view as it may prevent your website from appearing in top positions of search results because your content is not considered original.
  • RSS feeds are retrieved in XML format and are displayed on websites across the web.
  • There are of course cases of intentional duplicate content and in this case we are obviously dealing with copying and therefore content plagiarism. Engines tend to try looking primarily to identify this case. Naturally, the websites with duplicate content are those who will be most severely SEO penalized.

Note that this list is not exhaustive, it only includes the most common cases of duplicate content on the web and its impact on SEO.

How to avoid duplicate content?

Constantly check the pages on your website

This is to monitor cases of involuntary content duplication caused by your CMS.

In case you restructure your website, use the 301 redirections - "Permanent Redirect "

This will effectively redirect users and search engines to the new urls and eliminate any problem of duplicate content.

Regularly check Google Search Console

It will inform you of any duplicated content found on your website and its impact on your website SEO.

Use canonical URLs

There are legitimate cases of more URLs with the same content. For example, imagine a list of products in an e-commerce website sorted by increasing price. Then imagine the same list sorted by descending price. It's the same content but, perhaps, technically, there will be two different URLs. In this case, the developer can tell Google which URL is the reference for the others.

This easily allows you to prevent the appearance of duplicate content within a website. Indeed, the canonical URLs allow you to specify which are the mother pages among multiple duplicate or very similar pages in terms of content. This way, you offer all your content to your visitors without risking to be penalized on SEO. The tag helping clarify a canonical URL is:

<link rel="canonical" href="url"/>

You should just place it between the html header of your page, between <head/> and </head/>

For websites created with turnkey technologies, options exist to indicate the canonical urls and thus guard against duplicate content.

Check if other sites have duplicated your content

You can do a manual check: randomly take bits of text and run Google searches. If another website has used the same text, it will appear in the Google search results pages (SERP).

Take precautions before buying the content from a third party

If you purchase content from a supplier, stipulate in a contract that the content should be original and exclusive, otherwise you may end up with duplicate content. If this is not possible, you can choose to write yourself the content.

If you must buy content already sold to other sites, to avoid duplicate content, rework them to make them different from those of your competitors. The changes you make should really differentiate your content from your competitors'. If you only change a few words, then this will result in « retouched » content which is considered duplicate content and is therefore SEO penalized.

How to protect your copyright?

Here are the possibilities that Google offers for you to notify when someone infringes your copyright or to report a contesting of SEO penalties caused by a case of duplicate content.

Request re-examination of your website

If your website is no longer displayed in the search results or if you notice a decrease in your results, Google may consider that you have duplicate content. In this case you can ask Google to reconsider your website in order to return to the search results pages (SERP).

To do this, go to Google Search Console.

Notify a contesting

Upon notice of copyright, you saw your website disappear from the search results? You can contest this decision by going to the following link: http://support.google.com/legal

Submit a copyright infringement notification

You can file a complaint to Google for infringing your copyright.

However, be careful not to take this step lightly. In fact, if you make a false statement, then you must pay damages to the notified site.

For this, go tohttp://support.google.com/legal and follow the instructions.