Duplicate content is all over the internet. No matter how much you try to optimize your content, sometimes, duplicate content occurs unintentionally.
Can it be a product description on your e-commerce site that you borrowed from the original manufacturer (seller)? Or some boilerplate text on your blog? Or a quote you extract from another blog to add value to your own blog post?
Yes, it happens in several ways. Google had admitted that the percentage of duplicate content is around 25-30% or even more.
And it’s among the top 5 SEO issues that most websites face when it comes to optimizations.
Defining duplicate content
According to Google definition, duplicate content refers to:
“[…] blocks of content within or across domains that either completely matches other content or are appreciably similar.”
Google understand the issues of duplicate content. Matt Cutts state that the issues of duplicate content always happen all over the internet from blog posts, web pages, and social media.
“[…] you don’t have to worry about it. Google doesn’t treat duplicate content as spam. It is true that Google only wants to show one of those pages in their search results, which may feel like a penalty if your content is not chosen — but it is not.”
Most SEO’s do worry about duplicate content from their sites. Can it affect your SEO strategies? Certainly yes but your site doesn’t get penalized.
There has never been and there is no such thing as “Google’s Duplicate Content Penalty.”
As said, Google doesn’t penalize websites that use duplicate content but it does discourage it.
Form our first Google’s definition of duplicate content, there are two types of such; the one that can occur within your own site domain and the other across multiple domains.
How duplicate content occur within a site
It could be anything you use to interact with your users. Be it a blog post, e-commerce products etc. here are some instances that can cause duplicate content to occur within your domain.
Inconsistent URL structures
As you can see, they mean the same thing. It refers to yoursite.com. To search engines bots, all those mean different things and are interpreted differently. They are different URLs and when they crawl, the consider them as duplicate content
Sometimes your site may be using URL parameters which are generated for tracking purposes which also cause duplicate content issues on your site.
Content like global navigation such as at the top or bottom of the page (home, about us etc.) or content placed in a special area like navbar including links which is a problem caused by CMS.
How duplicate content occurs in different domains
a method for looking for ideas/stories from other sites and creating blog posts for your site. You may use those original sites to borrow their excerpts and quotes. To avoid this worry, do make your post as unique as possible and explain your ideas from your own perspective having done a thorough research on the topic.
without permission which poses a risk of being penalized by Google. Your site may be at risk especially if your domain is still young and offers low-quality content.
Many sites do perform content syndication. Sites like Buffer, Wordstream, Kissmetrics, Helpscout, content marketing institute, Learnvest and more do perform content syndication. This instance increases duplication of content and surprisingly Google doesn’t penalize them.
Even Google does this in its immediate SERP. It achieves this by fetches content from the top result and shows it directly on the SERP. Which according to Wikipedia definition, is content scraping.
SEO Perspective: Problems caused by duplicate content
As stated by Google, duplicate content does not cause a penalty to your site but it affects how your site is ranked in SERPs as well as other SEO strategies. Lest look at some of the instances which duplicate content affects your site.
Link popularity dilution (ranking power)
Affected by inconsistent link structure. When you have a different version of your product or blog posts URL, you’ll lack page authority since each URL will end up attracting different backlinks and traffic even if the destination is the same hence you can’t acquire optimum page authority as much as you’d expect.
Destroy search engine crawler resources
Google crawlers (meat robots) crawl your site depending on how frequent you publish fresh content. The problem comes when these crawlers find the same content in various URL locations which makes you lose some crawler cycles or even decide not to crawl over your newly published content. This instant of losing crawler cycles will hurt your SEO strategy and indexing of your newly published content.
There are some other problems caused by duplicate content but for the sake of our topic, let’s look at how Google handles duplicate content.
Confuses search engines.
Which URL of the duplicate content should Google rank in SERPs? It’s hard. When Google encounters duplicate content from the web, it goes through a hard time deciding which of the content of the duplicate page is original and most relevant for its user search query.
This is because Google does not display the same content in its SERPs. So it must uniquely find the duplicate page content to display which at times isn’t the best for a user and your site.
How to Identify Duplicate Content
Since duplicate content hurts your SEO ranking, you must find a solution to resolve this issue on your site.
The good news that they are identifiable and easily removed from within your domain.
Here are the best ways to detect and remove duplicate content from your site
Google site search
Perhaps the easiest and free way to identify. Do perform a google search for your site plus keyword for articles you suspect to contain duplicate URL content.
E.g. site:mysite.com + content keyword
Google will search within your site and display content containing your search keyword giving you an opportunity to identify content duplicates.
Google Webmaster Tools
Within your Google Webmaster Tools account, go to Optimization > HTML Improvements which gives you a list of some issues affecting your site user experience, performance and likely Search engines. You can also use crawler stats within your dashboard.
Using site crawlers
Tools like Screaming Frog and Xenu does a lot in detecting duplicate content issues. Use these software(s) to analyze and several types of duplicate content and URL parameter issues available within your site.
How to solve Duplicate Content Issues
301 redirects are used to change page URLs probably from duplicate content to the original one. This is most helpful when you find that there are different paths users can use to access the same content on your site. For instance:
http://example.com/home, http://home.example.com, or
It’s most helpful when you want to create authority for your content and send traffic from the other URLs to your preferred URL.
This code is implemented in the <head> section of your webpage HTML. It tells Google that the current page is the exact copy of the original content found in another URL.
Here is an example of duplicate page B referencing page A:
The link juice will be transferred to the original page hence boosting the original page ranking power. Unlike 301 redirect which deletes the copy of the content permanently, Rel=canonical still retain the duplicate page and is accessible.
Meta Robots Tag
This method is used to prevent duplicate pages from being indexed by search engines. You simply add a meta robots tag with the “index” parameter.
URL structure consistent
URL inconsistency is a major source of duplicate content. The best solution to this is to standardize your preferred URL link structure and making use of proper canonical tags. Choose either use www or non-www version of your site domain. You can also implement one of HTTP or HTTPs version of your web pages.
How Google handles duplicate content
At the end of the hurdle, Google must display only one copy of the duplicate content in its SERP.
As such, Google takes identical content seriously and can ban your site from being at the top of SERPs.
Google’s policy on how it handles duplicate content is:
“In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results”
From this policy, Google doesn’t pass a penalty but it hurts your SEO. At all cost, do a frequent search to always confirm that your site pages aren’t duplicated.