Duplicate content – what is it, when does Rel = Canonical appear and how does it solve this problem?
In 2011, Google launched the Panda algorithm, which analyzes sites and stops their ranking process if it is found that there is poor quality content or duplicate. But since 2016, Google has integrated this algorithm into its system for refining results, and since then, duplicate content has become a major issue for website owners, especially as it runs in real-time. Experts believe that from 2016 to date, at least 30% of websites worldwide have suffered – about 5% have had to be rebuilt from scratch and 25% of them have received restrictions on what it means to reveal results.
Beyond this aspect, the issue of duplicate content has always been the main concern of SEO agencies, gaining even greater value in on page and off page applied practices. If at first glance the ignorant consider that it is not necessarily a problem, SEO specialists contradict them, reinforcing the idea that this problem should not be neglected, especially since search engines find it difficult to give pages the importance they deserve.
What does “duplicate content” mean?
As the name suggests, it is about the content posted on our pages that can be similar or identical to that of other sites, or even with your own site pages. Duplicate content may mean the same content on multiple pages of the site and/or the same content as other sites. However, the most important aspect of this strategy is often forgotten, the one we mentioned earlier: before we reach our users through our sites, they must be indexed page by page by search engines.
Since the launch of the Panda algorithm, websites that apply this content technique are penalized extremely quickly and are also considered the black hat optimization technique.
Usually, the problem of duplicate content occurs especially in the case of online stores, where the wide variety of products makes it almost impossible to optimize the unique SEO of pages, followed closely by news sites, whose content is uploaded daily.
Also in the category of duplicate content are the texts copied and posted on other pages. Unfortunately, the discussion here is broader, because search engines will not look for the source page but will display it as a result of trust the one with the highest relevance. This includes various parameters that can be taken into accounts such as the age of the site, the bounce rate or the traffic generated on the site.
How does duplicate content appear?
No matter how well the website is optimized, if it contains duplicates in terms of texts or link structure, detailed SEO analysis is required, first of all, then strategies for solving the situation.
In principle, the biggest problems in terms of duplicate content occur in online stores. The wide variety of products, together with the descriptions imposed by the manufacturers (which are also found on several sites), filters and options related to relevance are the most important aspects to consider.
In short, duplicate content stands out through:
- non-canonical domains – the fact that the website has worked on distinct web addresses, such as http://apptians.com, http://www.apptians.com, https://apptians.com, https://www.apptians.com, without them redirecting 301 to the preferred version of the Search Console
- incorrect implementation of the SSL certificate – if this encryption is used there are chances that there will be copies of the site on the version that was secured, ie both the HTTP and HTTPS version
- dynamic content – there are sites that assign URL parameters to control content. As with session IDs, search engines interpret this as duplicates
- pagination – if multiple web pages have the same description or title the duplicate content issue reappears
- the product page has multiple URLs – caused by applying filters or selecting certain categories from the site
- applying price or sorting filters (relevance, lower price, higher price) generates duplicate or static content
- the platform automatically generates and saves multiple URLs, while a post is assigned to multiple sections of the site
- the site has the indexing of URL variants with the AMP (Accelerated Mobile Pages) parameter
- the platform automatically generates and saves multiple URLs, while a post is assigned to multiple sections of the site.
Regardless of the type of website,
SEO specialists recommend the uniqueness of the content. In the case of online stores, the situation is much worse the greater the number of products. In other words, most eCommerce platform owners simply copy product descriptions from a database they share with other distributors, and if product information is added to the photo structure, the duplicate content ratio grows alarmingly. In this situation, Google only scans the page, but it is difficult to identify the subject, its relationship between the pages and the entire website and why it should be relevant in Internet searches.
Also, online marketing specialists come up with a solution to the problem of technical descriptions of manufacturers. Although this information is useful and creates a positive customer experience, Google also identifies this type of content, which can be followed by a penalty. In this situation, the opinions and experiences of the visitors are indicated, and the technical characteristics of the products should be integrated into Social Media.
How does duplicate content affect the SEO evolution of the website?
Once you mention what “duplicate” means, it’s time to find out how this technique negatively affects your evolution. First of all, duplicate content (text, URL) will make Google not understand which version to index. Second, Googlebot will not know which URL to assign metric data to and which version to rank for search results.
In other words, some of the most important impediments in the evolution of the website in search engines are represented by:
- incorrect pages – in other words, when there is duplicate or slightly similar content on the pages, search engines choose which page is indexed first
- low visibility – obviously, the main problem when there is duplicate content is the low visibility of the website/page in the main search engines
- Deficient indexing – basically, a search engine searches first for duplicate pages rather than those that are really important. Thus, if the proportion of duplicate content is very high, Google indexes a significant part of it.
How to check if a website has duplicate content or not!
Another way to check for duplicate content is to search for certain products, phrases or texts on the website, which indicates whether the titles and meta descriptions are duplicates or not. Although it requires more work, it remains the most viable solution to find duplicate content and its weight on the entire website.
A complete report on the situation of duplicate content of a website is also provided by Screaming Frog. This is a program that SEO specialists frequently use to check for links, images, meta tags, headings and other errors. Basically, it acts like a spider, and in a few minutes, a complete report is ready to be downloaded.
Eliminating or minimizing the effects of duplicate content through rel = Canonical is the best solution?
The rel = canonical tag is a technical element of SEO that does not influence in any way the user experience on the website, but once identified by search engines, it will prioritize the various landing pages for indexing, crawling and ranking.
In other words, the canonical tag specifies to Google and other search engines what the source URL is (respectively the original page), so as to help it increase in ranking. Search engines use this tag to combat duplicates and allow original pages to outperform so-called “clones”. For example, in the case of selecting the presentation of products in the form of a grid or list, the Url is subject to change, although it is the same content on several pages. Thus, in order for search engines not to identify two URLs as duplicates, SEO specialists use the Canonical tag, which indicates that the rest of the pages are part of the parent category or subcategory.
This is an example of a website whose pages can be considered duplicates, due to the application of filters:
The rel = canonical tag was designed precisely to address the issue of duplicate content, so it remains the best solution. Basically, during implementation, a line of code is created in the section of the HTML code of the page.
Although the canonical tag is the best option, there are other methods to reduce duplicate content from a website – by redirect 301. This solution can be approached only if the content is transferred from one page to another, but the situations differ from case to case. More information about the 301 redirect here.
Canonicalization and the element rel = canonical
The rel = canonical tag and the canonicalization process are just two of the actions to block the indexing of URLs that contain duplicates. In fact, the two work together to eliminate duplicates and protect websites from Google’s penalties, such as reducing traffic and visibility, as well as ranking.
Thus, a short differentiation is required between the two, as follows:
Canonicalization – is required when a website generates several URLs with the same content. Basically, the canonicalization process represents the choice of the best URL from the wide variety included in the architecture, where there are several pages with the same content.
On the other hand, the rel = canonical tag is an element of HTML language, an integral part in the Header section of any online page. With this tag, Google and other search engines identify the original content and rank it higher than the duplicate or similar. In other words, once this tag is implemented, Google will return results to the preferred URL version. Applying canonical rel = from one URL to another, the search engine receives the information according to which the two links have joined forces and shows an increase of trust and authority in front of the internet users.
Rel = Canonical – how to use it correctly?
First of all, it all depends on the choice of the original link, which should contain quality and relevant information for visitors. Thus, between 2 or more links considered to be similar, the decision must be weighed according to page structure, authority, quality of meta, as well as the existence of reviews from customers.
Duplicate content can have important negative implications for a site’s SEO campaign. Moreover, users will not be very happy to find the same content that they have seen many times on other sites. From our point of view, it is better to have less content but original and relevant, than to have thousands of words copied.
Besides the fact that any site with URLs that display duplicate/similar content is a safe target followed by the core algorithm of the search engine, the losses suffered in terms of rank and traffic can be substantial.
So, avoid duplicate content as much as possible or, if you can’t do this for various reasons, try to minimize the negative effects using the canonical tag. Set up and implemented correctly can provide effective results!
How do you approach the issue of duplicate content?