Skip to content

rethink canonical links #414

@fthobe

Description

@fthobe

Caution

Blocked by #417

Brief Definition of what Canonical does:

Canonical tells a search engine which page is important of a collection of similar or identical pages.

What google says:

Canonicalization is the process of selecting the representative –canonical– URL of a piece of content. Consequently, a canonical URL is the URL of a page that Google chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Google show only one version of the otherwise duplicate content in its search results.

The effective outcome is that if I don't set it correctly none of my pages are considered really relevant as they all share the relevance among each others and search engines usually penalize all of them.

What's the issue

Current canonical tag setup

Any product is reachable under multiple urls:

  1. https://example.com/products/{slug}
  2. https://example.com/products/{ID}
  3. https://example.com/products/{historyslug} (if present)

One URL is current, all the others are history, non search friendly or only kept for legacy reasons (well running off site links). 1 could rank well, but needs to share visibility with 2 (that doesn't rank well because it doesn't have keywords in the slug) and 3 (which doesn't receive proper internal linking as all linking goes to the current slug).
Nevertheless the slug is set on 1 which is the intended canonical url. 2 Should be forwarded or return a 404 depending on how you want to see it and 3 should 301 redirect to 1 to avoid loosing previously created backlinks from other websites.

The commit message here contains

Generates a simple canonical tag based on the request path,…

which is exactly the opposite of what canonical tags are made for (indicate the url of primary html page instead of the request path to explain to search engines which page is dominant in a collection of pages that are a derivative of the primary one to avoid duplication of content).

What should be done?

Throw out all current canonical logic and reduce the canonical

A sane default would be that canonical renders always the correct current {storeurl}/{language}/{resourceroute}/{ressource-slug}. So globalize should probably override something here in case of translation.

What should also be done?

We have mitigated the problem through #413 redirecting friendlyID (which you should approve:) history urls and IDs (as in example 2) to the current slug. So while the construction of the canonical is still not that great, it is mitigated.
We are working on having the same thing working also on taxons and in content / blog pages.

Solidus Version:
Any

To Reproduce
Create a product and navigate to that product via

  1. https://example.com/products/{slug}
  2. https://example.com/products/{ID}
  3. https://example.com/products/{historyslug} (if present)

Current behavior
All links return distinct canonical links despite being the same resource.

Expected behavior
2 and 3 have 301 redirects to 1 and 1 has a canonical link identical to the slug configured.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions