When Canonical Tags Go Bad

The rel=”canonical” tag has been a Godsend for many of us SEO’s. In case you have been having a real life instead of devoting your time to search here is a quick breakdown of how they are used:

Situation

You have two pages ( blue-widgets.html and pink-widgets.html ). Both of them are identical other than the colour.

Problem

These two pages are duplicates as far as Googlebot is concerned. While duplicate pages don’t actually incur a penalty, it is unclear which should be indexed (and only one will be). The result is generally that they will swap places being in the SERPs while steadily falling in rank. It’s not a penalty, but it sure feels like one if you don’t know what’s happening.

Solution

Adding a rel=”canonical” tag to both pages pointing to just one of them tells Google which one you have a preference for to be indexed. So the solution in this case is to have a tag reading :

<link rel="canonical" href="http://www.example.com/pink-widgets.html" />

and placing it in the <head> section of your site.

Now, when search engines crawl either page they know which to index only the pink-widget.html page. The result is better ranking and no jumping about swapping places.

Rel=”canonical” Doesn’t always work

I’m not talking about improper implementation. I’m assuming that everything is as it should be.

I had a case recently where a site had products as top-level orphans – nearly. The products quite logically belonged in multiple categories. This would result in a situation like in diagram 1 below:

Duplicate content flow chart
Diag.1. Because the path to the product is different we now have duplicate content.

The solution to the diagram above is to place the product as a top level orphan as shown in diagram 2 below:

Top Level Orphans
Diag.2. Now we no longer have duplicate content (unless the category lists only contain one item of course).

The other solution is to chose a single category path and use canonical tags on the product pages to indicate that the product on that path is the one to be indexed.

Still doesn’t sound like a problem right? The company in question had implemented the top level orphan method in diagram 2 which would be my preferred method. There would have been no problem if it was not for deciding after that to add in breadcrumbs.

Why were breadcrumbs a problem?

All of a sudden they needed to pass the user route information from cat-1 or cat-2 to prod.html. The most reliable way to do that was to add in the information to the url, so now we had a structure that looked like diagram 1 but with a url structure of “home/prod1.html” on the listing from cat-1 and “home/prod2.html” on the listing from cat-2.

This requires us to use canonical tags to prevent duplicate content issues. You would implement canonical tags just like you would for diagram 1.

Still no problem right?

Wrong!

Why you can’t rely on canonical tags

Canonical tags are a “serving suggestion” to search engines. They promise to try to honour them, but reserve the right to override them. This is great if you make a mess of implementing them. Google in particular should notice and then ignore the tags.

However, in this case there was no error in the implementation of the canonical tags. Google for the most part honoured them too. Sometimes however they didn’t.

The result was inconsistent. Pages were bouncing around all over the place. It was hard to find out what the extent of the problem was too because as one canonical fixed itself, another would be broken. The only way to find them was to check which one was indexed. With a lot of products that’s a difficult and frustrating.

The problem was compounded by the products fitting into not just two, but multiple categories. In fact, after crawling the site it became apparent that 75% of the site was product pages with canonical tags pointing to a different product page.

It would appear that the sheer volume of canonical tags was the problem. I haven’t found this happening on any other site so I’m not sure if it is volume or percentage of site that is the problem. Maybe a bit of both.

What it does mean is that, where possible you are better off implementing the model in diagram 2 rather than relying on canonical tags. Canonical tags should be used as a last resort rather than a first resort as I see so often.

Solutions

There are a few possible solutions. None of them are ideal.

One possible solution would be to remove breadcrumbs from just the product pages. I’ve seen that done, but I don’t like it.

Another way would be to pass the information in a session. If a user has opted out of sessions on their browser they will get nothing though.

You could use the HTTP_referer. This too can be problematic. Browsers and some anti-virus software can prevent it from passing. Also if you have a site that is part HTTPS and part HTTP (which you really shouldn’t by the way), then the referer information will not pass from a HTTPS page to a HTTP page. Within a HTTP or HTTPS site is ok so long as you are not trying to pass referer data cross domain with HTTPS.

Finally you could change the type of breadcrumb to attribute crumbs instead of location crumbs. This wouldn’t have worked in this case though.

What we did

In our case using the HTTP referer was the easiest fix since the coding didn’t have to be changed radically from what we started with. We prevented the loss of a visible breadcrumb for users who’s browsers prevent the passing of the referer by putting in a default crumb matching the original canonical.

The result is a site that is only a quarter of the size, so it’s a good housekeeping exercise as well as making it impossible for Google to get confused about which product page to index.

Published by

Ian Wortley

Ian knows a thing or two about SEO... and Adwords... and UX... and a few more things besides that. If you're feeling social add him on the following social networks: | Linkedin | Twitter

  • Ian

    This is in response to a question posed by David Quaid on Linkedin about this article. The answer was too long to fit it seems.

    Hi David,

    I wrote you a slightly lengthy reply yesterday evening, but it seems to have not saved so I’ll summarise it again.

    I now have some more data. Interestingly the number of indexed pages on the site increased sharply in WMT as a result of getting rid of the canonical tags. We knew that there was an issue with low index rates in some area of the site but had attributed it to other reasons assuming that the canonical tags were doing what they should.

    We also saw rank increases. This means that, while not actually penalised, the confusion was causing the concerned pages to be treated as duplicate content pages would. Still ranking, but lower than they should be.

    Why would they not work?

    Here’s one theory on what was going on:

    home/cat-4/product.html home/motor-oil/fiat-uno/wonderoil.html

    home/cat-3/product.html home/motor-oil/seat-leon/wonderoil.html

    home/cat-2/product.html home/motor-oil/ford-fiesta/wonderoil.html

    home/cat-1/product.html home/oil-companies/mobil/wonderoil.html

    The canonical tag went to home/cat-1/product.html. However users rarely searched for motor oil by brand. They want to know what oil will go in their car. We can’t have a canonical going to the fiesta page because a user may own a fiat uno instead.

    Google seemed to be “correcting” our canonical URL to go to the more popular fiesta page. But then it would change it’s mind again on the next crawl. So my conclusion is that they most likely use search volume metrics to ascertain the relevance of a canonical tag. It is also possible that they use actual page visits which would also show the cat-1 page to be least popular.

    In terms of volume, you can see how, in that little example with only a few cars you would have a massive volume / proportion of canonicalised pages pointing to other pages. In this case we are looking at hundreds of thousands.

    When to get worried / take action?

    1. Webmaster Tools shows more pages than you would expect not indexed.

    2. A high percentage of the site uses canonical tags. I think a high volume of pages with canonical tags pointing to other pages is more of an issue. I won’t even speculate as to actual numbers to get worried at.

    3. The hairs on the back of your neck stand up.

    I would not get worried unless all three points are the case. Any one of the others could point to something entirely different. The only way to check is to do searches and see which urls are coming up. If somebody can come up with a better way I’ll be delighted.

    I hope that answers your questions a little.

    Regards,

    Ian.