This will most definitely be one of the most difficult topics to understand within the topic of learning on-page SEO.
Many on-page SEO guides attempt to explain canonicalization, but few actually do the topic justice.To understand why you need to implement this tag, first you need to understand how URL’s and permalinks work.
Content management systems and eCommerce frameworks such as WordPress and Shopify make life so easy, but can be a nightmare when it comes to showing the same content on multiple pages. For instance, these example all URLs all might display the same products on an eCommerce store:
http://www.example.com/shirts/size/large http://www.example.com/shirts/color/blue http://www.example.com/shirts/style/tshirt
Now imagine a conversation between your website and Googlebot:
Your website: My website sometimes creates multiple versions of the same page
Googlebot: Ok, but how am I supposed to tell which one is the original?
Your website: I’ll insert a “rel=canonical” tag at the top of the original version of the page
Googlebot: Sounds good to me 😉
So how do we go about doing this? For a normal website if you want to implement the rel=canonical tag
<link rel="canonical" href="http://example.com/the-original-version/">
For websites that have a CMS or eCommerce stores, you most likely already have a system to implement this such as a plugin or addon which makes life much easier. Other plugins will actually make bulk determinations based on known-issues within certain frameworks.
For instance in WordPress, category, tag, and archive pages tend to produce duplicate pages so a lot of canonical plugins will ask you if you want these pages canonicalized.
Just like there is on-page and off-page SEO, there is on-site and off-site canonicalization as well.
This topic is actually a tad easier to grasp. Let’s say you have two versions of the same blog post, the first one is on your website, the other one is published on the New York Times.
Since the New York Times version of the post would technically be considered duplicate content, we would ask them to add the rel=canonical tag, pointing back to our website. In essence, this tells Google bot “Hey, the real version is actually on this site, ignore the New York Times version. Thanks!”
In short, the rel=canonical tag can help you with duplicate content or syndicated content on other websites. There is on catch: you have to have control of those websites. So let’s say you decide to publish content on LinkedIn you are out of luck because you can’t edit LinkedIn’s HTML header unfortunately.
Another way to send the rel=canonical signal to Google is through your web server. This implementation is a little bit more difficult to implement, and has a few pro’s and con’s. There are a few pro’s and a lot of cons.
On the pro’s side of the equation, the rel=canonical http header is great because you can canonicalize resources such as PDF’s and other resources that aren’t HTML editable.
There are a few obvious cons:
But fear not, chances are this isn’t that big of a deal. Unless your website is really PDF heavy and you have a lot of them scattered throughout your website and off-site, this shouldn’t be a problem.