Something we regularly come across here at Boom are crawl issues with e-commerce websites. These usually arise from some combination of pagination (e.g. multiple pages of products in the same category), duplication (e.g. items that appear in multiple categories) and query parameters (e.g. changing the sort order or filtering the list of products). Occasionally we get some extra fun when a developer throws some AJAX into the mix (that’s Asynchronous Javascript And XML – the stuff that makes things happen in web pages without reloading the whole page, e.g. Google Maps, Facebook etc.) Why are these such an issue for search engines, you may wonder? Several reasons:

Not getting indexed: If a search engine can’t reach a page, it can’t be indexed and nobody will find it via searching.
Duplicate content: Known since Google’s first Panda update to have the potential to cause a penalty, you really don’t want to present Google with multiple URLs for identical or very similar content.
Crawl budget: A search engine only expends so much effort to crawl your site when it visits, known as the crawl budget. You don’t want to waste your crawl budget on many versions of the same page, at the expense of new pages getting crawled and hence less than 100% of your important pages being available via search.
Landing pages: If a search engine has multiple URLs for the same page, you have no control over which one it will present in a search result. Would it be the best outcome if a user clicked from a search to your category page listing products with the most expensive at the top?

Let’s dive in to some of the common issues that can cause these problems…

Query Parameters

These could be in use for a number of reasons, most commonly to implement sort orders and filtering. Often, you will see query parameters on the end of URLs following a question mark, e.g. https://www.patra.com/category/womensthermals?SortBy=2. The issue with this approach is that you can’t be sure whether search engines will crawl or ignore these – search engines try to understand whether the parameters affect the page content, but you can’t guarantee they will get it right . In this example, the products are being ordered by lowest price , meaning the content is likely to be very similar to other options such as order by highest price (particularly if there is only one page of products); as such, we wouldn’t want all of the sorting option URLs to be indexed.

However, in this example, we also have more than one page of products. Clicking to the second page gives us this URL: https://www.patra.com/category/womensthermals?SortBy=2&Page=2

So now we have a query parameter controlling pagination as well, which we do want indexing. We still don’t want the different sort orders indexed though, so what’s to be done? Ideally, three things:

Canonicalisation: Use the canonical tag to tell Google that whatever version of the URL it happens to be crawling right now, there is in fact only one it should use and treat all the others as being one and the same. In the case of multiple parameters, only keep the ones that you do want indexing, such as pagination. In the example above, this would be: <link rel=”canonical” href=”https://www.patra.com/category/womensthermals?Page=2″>
Rel next & prev tags: Google also provides tags to help it understand when a page is part of a paginated sequence and where in that sequence it belongs. In a nutshell, you have only a rel=”next” tag on the first page, only a rel=”prev” tag on the last page, and both on the pages in-between, each referencing the relevant previous and next URLs in the sequence. In the case of our example page number 2 in the sequence, these would be: <link rel=”prev” href=”https://www.patra.com/category/womensthermals”> <link rel=”next” href=”https://www.patra.com/category/womensthermals?Page=3″>
Webmaster Tools URL Parameters: Google provides a mechanism to tell it whether it should index or ignore URLs (or let Googlebot decide if you’re not sure yourself). This is a tool that if misused, could stop pages being indexed when you didn’t intend it – so only use this tool if you are confident that you know what the parameters do and what the consequences of your settings in Webmaster Tools will be.

Other Paginated URL Formats

Of course, not all sites use traditional query strings to handle pagination. Often, because the developers will have been told that query parameters are a problem for SEO, alternatives will be in place, such as automatically re-writing query strings into a series of sub-directories (e.g. https://www.jannersmugs.co.uk/product-category/mugs-and-steins/page/2/). These aren’t a problem, because they are distinct URLs and the content of each will be different. It is debatable that if you end up with URLs that are many sub-directories deep, search engines see them as less important, but I have seen little evidence of this if the pages in question are relevant and well-optimised.

Then you have the AJAX method of pagination – loading different products into the page without changing the URL (like https://www.petsathome.com/shop/en/pets/cat/cat-food-and-treats/dry-cat-food) or using a hash instead of a question mark (https://www.stringsdirect.co.uk/c/541/strings/electric-guitar-strings-sets/#page-6). In both of these cases, you need to be very careful – we have seen several recent examples (thankfully all now fixed) where products listed on page two and beyond were not indexed, because to a search engine, there were no pages beyond the first one.

The reason for this is that in the case of Javascript links such as those used by Pets At Home, there is every chance that the search engine won’t follow them – there is too great a risk that they do something the search engine can’t handle, so they get ignored. In the case of the hash symbol in URLs, it is expected to represent an in-page anchor (link), the kind that jump you down the page. As a result, search engines don’t index the part after the # symbol as they think it means a link within the same page.

That is not to say you cannot use these forms of pagination if you want to get indexed – you will notice that both the examples above are seeing products listed several pages into their categories being indexed. You just need to ensure that you’re aware of the potential issues and consider implementing something such as HTML Snapshots, Google’s recommended solution. If you look at the actual link URL on Strings Direct (not the URL you see in the browser address bar), you will see this in action, using “hashbang” URLs that contain “#!page=”, the clue Google needs to understand that AJAX is in action (it isn’t that simple though – read the guide linked to above!)

The golden rule of thumb with any URL structure is to ensure that you tell search engines not to index versions of the page where the content is not substantially different (e.g. sort orders) or is a sub-set of content found via another URL (e.g. filtering by brand, size, price etc.) Meanwhile, you need to ensure that search engines don’t ignore URLs of pages that are different content (usually pagination).

Cookie	Duration	Description
gdpr_status	6 months 2 days	This cookie is set by the provider Media.net. This cookie is used to check the status whether the user has accepted the cookie consent box. It also helps in not showing the cookie consent box upon re-entry to the website.
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID		This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__adiCookieCheck	session	No description
_anon_id	20 years	No description
_ga_JH5Q7M3QH5	2 years	No description
_gat_UA-5413109-2	1 minute	No description
_lfa	2 years	This cookie is set by the provider Leadfeeder. This cookie is used for identifying the IP address of devices visiting the website. The cookie collects information such as IP addresses, time spent on website and page requests for the visits.This collected information is used for retargeting of multiple users routing from the same IP address.
adiErr	5 minutes	No description
adiLP	30 minutes	This cookie is used by the provider ResponseTap. This cookie is used for ensuring that no tracking errors occur when the visitor have multiple tabs open in the same browser.
adiS	30 minutes	This cookie is set by the provider ResponseTap. This cookie contains an identifier which helps to track the visitors session.
adiV	1 year	This cookie is used by the provider ResponseTap. This cookie is used for tracking the multiple visits made by the visitor from the same browser.
adiVi	30 minutes	This cookie is used by the provider ResponseTap. This cookie is used for tracking the visitor's path while they are on the website.
AnalyticsSyncHistory	1 month	No description
browser_session_id		No description
CONSENT	16 years 7 months 20 days 13 hours 13 minutes	No description
expiring_session_token	20 minutes	No description
ig_putma		No description
UID	2 years	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
wfvt_1408356384	30 minutes	No description
wmc	10 years	No description

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
iutk	5 months 27 days	This cookie is used by Issuu analytic system. The cookies is used to gather information regarding visitor activity on Issuu products.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language		This cookie is used to store the language preference of the user.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

We’re Google Premier Partners For The 8th Year Running

What Makes a Good eCommerce Category Page?

Why “Form Following Function” is Crucial in Content Marketing

Call Us: 0115 857 7755

Call Us: 0115 857 7755

Pagination Pain! Dealing With Crawl Issues In E-Commerce Sites

Query Parameters

Other Paginated URL Formats

Ian Lockwood

Related Posts

What Makes a Good eCommerce Category Page?

Why “Form Following Function” is Crucial in Content Marketing

How to Optimise Local SEO for Multiple Physical Locations

Leave a Reply Cancel reply

Contact us

Call Us: 0115 857 7755

Call Us: 0115 857 7755