Duplicate content issues
Hot to check for duplicate content
Duplicate content is one of the most common SEO problems and, interestingly, one of the most neglected. Often we talk about how to optimize a page, how to get backlinks, but very little is mentioned about annoying duplicate content. The problem is that if your site is flooded with duplicate pages, for search engines is difficult to give importance as they deserve. In this post I will explain everything you need to know about duplicate content – What is it, where you should look for it and how to get rid of it.
What is duplicate content
Duplicate content is any text repeated in more than one web page, either on your site or outside. This is what happens when a web page appears with different URLs. But also when a spammer copy text from your page and modify it and post it on his website.
At first glance it may appear that duplicate content is not so important, but the truth is that it is a very serious problem. Google search engine users expect different results, not the same results repeated. So to avoid this, the search filters prevents the occurrence of duplicate content.
The consequences of duplicate content
Now that you know why it is so important to avoid duplicate content, you should know the problems that may arise in your site. Some of the most important are:
Incorrect page – different pages with the same content to let search engine to make the best choice. This is not a good choice because the browser can choose a version that we do not want.
Poor visibility – As a result of this search engine can show a version with worse optimization and therefore a lower rank.
Indexing issues – indexing your pages may be affected due to the fact that search engine search the duplicate pages instead of pages that really are important. In many cases duplicate content gets to be a significant portion of indexed pages.
Lost links – duplicate pages can get links and link power will be diluted.
Moreover, you should know that Google rejects duplicate content, not penalize; it just filters it out and this is punishment enough to consider avoiding it.
Causes duplicate content
The main source of your site duplicate content is the site itself and does not matter how well you’ve optimized in terms of SEO. As you will see there are plenty of reasons why you can have a lot of duplicates without knowing.
These are the main reasons:
Noncanonical links– Your website can work with as a subdomain that begins with the prefix “www” while the main domain does not begin with this prefix. Canonical version is the good one good and if is not set correctly your content appears in both variants thus generating duplicates.
HTTPS pages – similar to what happens to the canonical urls above, if using SSL encryption on a site, you can have an exact copy of your site on the secure (https) and one non-secure (http)
Dynamic content – There are sites that assign url parameters to control the content. As with session IDs, search engines interpret this as a duplicate.
Archives – A typical problem is that blogs can show the same content on different pages such as categories and tags.
Paging – Any site that uses paging may have this problem, especially if you share a page title and description.
Off-site duplicate content:
Syndication – used to send your content to other websites to generate traffic, such as via RSS. The problem can occur when these sites publish a full copy of the content, instead of a fragment.
Location – To target your content to several countries it could be used the same content (or almost) in several domains such as .com and localized domains
Scraping – Scrapers are people who are using a software copy of some or all your content and publish it in another sites.
Plagiarism – Anyone who copy some text and publish it on his website as their own. Sometimes it happens intentionally.
How can we detect duplicate content
Google identifies duplicate content primarily through pages with titles, descriptions, identical or very similar content . Therefore, if you want to find duplicate content on your site should start here.
Here are you the most effective methods to find duplicate content:
Google Webmaster Tools – If you registered the site in Google Webmaster Tools, this is definitely the best place to start. Access the your site in Search -> Enhancements -> HTML and pay attention to duplicate title tags and meta descriptions. This instrument will show the amount of duplicates so you can review them.
“site” command in search – it is an effective method, but requires some work. Consists in searching the website for particular words or phrases such as products, if is an online store (eg site sample.com “product in the store”) In the results you can see if the titles and descriptions are duplicated .
Screaming Frog is a powerful tool that allows you to track your site for duplicate content, among others. What will matter are Page Title, Meta Description and H1 with Filter Duplicate.
Google Analytics– can find also the ratio of duplicate pages in Content -> Site content -> pages of destination. The key is to look at URLs and pages that receive less traffic than they should have.
Where duplicate content is outside your site can use the command “site” to detect it, however there are tools as Copyscape. Other SEO tools that help detect duplicate content are Duplichecker, Plagiarism and Plagium.
Eliminate duplicate content
Clearly, search engines do not like duplicate content, it leads to a poor user experience. So if your site has duplicate content, you need to do everything possible to eliminate it.
These are the main options for solving the duplicate problem:
Uses Rel Canonical – The label “rel = canonical” was designed precisely to address this problem, so it is the best solution. It consists of a line of code in the <head> section of your HTML page.
301 Redirect – is the best thing when you cannot use the canonical tag, when you move content from one page to another.
Deny access to robots – To prevent search engines to find duplicate pages you can help robots through robots.txt file
In case of duplicate content offsite it is best to ask by email the offenders to remove this content. If this does not work ask that at least have a link redirected to a page from where it is copied, so the search engine will get help to identify the original.
As a last option, you can ask Google to remove the page in search results through a request based on US law protection of copyright (DMCA). You will also help improve search engine results by detecting duplicate content and sending your case as an example.
Further conclusions and some tips:
Never use the same description / title in more than one page
The text of each page must be unique
Do not forget to use the canonical tags
When you copy a quote from another place always include a link directed to the original
If you copy an entire page, ask permission before including a link to the source.