You can make or break a site, just based on how you handle your search filters.
The difference between a good & bad search filter setup can mean the difference between a good amount of quality results that are indexable, and millions upon millions of crawlable & indexable URLs that Google will ignore the majority of.
Let’s take a look through filtered search optimisation for programmatic SEO builds.
The two types of search filters
One thing I always try to define for a client is the two types of filters.
Those you want a pretty URL for, and those you don’t.
You shouldn’t be creating a pretty, completely indexable, URL for every filter combination ever.
You need to define a set of filters that will allow you to target 90% of searches, with a small portion of the URLs.
I was going to use the 80/20 rule here, but 20% of the URLs would be so wrong.
You would be creating millions upon millions of crawlable URLs to target the remainder.
I go further into this during the post.
The best URL structure for search filters
When creating search filter URLs, you have to keep in mind the structure of the website.
You should be parenting the filtered content, below its parent page.
So for real estate, if you have an ‘apartments for sale in Sydney’ page, the filters would be;
Property Type: Apartments
Channel: For Sale
How you’re structuring the website will control the order of usage, along with how you use, each of the different filters.
Some filters may be a part of the pretty URL whereas some other filters will be query parameters that are just tacked on at the end.
For my clients with the above filters, I would be recommending the following structure;
The reasons behind this choice will be documented separately, however, with that structure in place we know how we should handle this type of URL.
If we then add a pricing filter, or a bedrooms filter, the URL would change to something similar to the below;
So there is a clear separation between the ‘pretty’ portion of the URL and the ugly query parameters.
How to handle URLs for multi-select filters
Multi-select filters can lead to issues, particularly when the multi-select filter is a part of a pretty URL.
Let’s say you have a multi-select filter of property type, with apartments & houses as an option.
If a consumer selects both of them, you want to make sure that both apartments & houses don’t end up in the URL.
You don’t want to end up with /buy/sydney/apartments/houses/ or /buy/sydney/apartments-houses/.
Whilst you could handle this where you prioritise one, and then query parameter the other, I prefer a simpler solution.
When 2 or more options of a ‘pretty URL’ filter are selected, use them both in a query parameter rather than pretty URL.
This gets the parameters stripped in the canonical tag, and just ensures you don’t get any issues arising from duplicate targeting or the creation of additional pretty URLs.
How to handle internal links to filtered search results
This is one of the main causes of indexation issues, due to the fact that internal links hold such weight with Google.
It can also lead to one of the easiest ways for a larger-scale website to make improvements, that can actually move the needle in the rankings.
The URLs you should be linking to
The quick summary here is if it has a pretty URL, at least 1 result, and relates to the current page, a page should definitely be linked to.
So you link to all the filters of the current page, that have a pretty URL & and at least 1 result.
I cover the two different types of internal links for programmatic sites in a little more detail here.
You should avoid actively linking to any query parameter filter.
Why crawlable links to non-pretty URLs, or 0-result SRPs, should be avoided
Each crawlable link acts as a vote for a URL in Google’s eyes. If you’re constantly ‘voting’ on poorly filtered URLs, or 0-result SRPs, Google is going to place more weight on these URLs than what they’re worth.
This will lead to crawling and indexation issues, particularly when it comes to prioritising specific URLs above each other.
So on top of your pretty URL filters of Channel, Property Type, and Location, you could have a significant quantity of other filters available, including but not limited to;
New / Established
The list quickly builds up.
As an example, let’s say that each of the 9 filters had 10 options available to filter by.
You’re going to have 90 filterable versions of a URL, all with query parameters, that are just filtered views of the primary page.
Each of these is significantly re-using the primary results, and Google will crawl and see this on each page.
How many of those filters are actually going to have search volume?
A small handful might, for some top-level locations.
On top of this, these 90 filtered versions of the URL, would then link to the other 89 versions of that URL, with that additional filter applied.
You’d create 8,010 versions of a single URL, with just 9 filters and 10 options for each.
But then if you have 2 channels, 5,000 locations, and 5 property types, you’d have 50,000 pages, with 8,010 versions each, giving you a lovely 400,500,000 URL combinations that will be crawlable.
Google would have a field day.
Do you think “new 3 bedroom home with 2 bathrooms and 2 car spaces under $450000 with swimming pool and balcony” gets enough search volume to warrant the page being linked to?
There are ways you can optimise the pretty URLs to capture a large portion of these super long-tail queries.
Yeah, you won’t capture it all. But you also won’t need to create 400 million pages to ensure that you do.
Also obviously a worst-case scenario should you have no limits in place. Yes, I have seen this multiple times.
How you should link to filtered URLs
It’s not just what pages you link to, but how you link to them.
You need to keep SEO and the user experience in mind when linking, as you obviously still want users to filter by price, bedrooms, and an array of other filters that will help them find the content they’re after.
Links will be broken down into your ‘SEO links’ that should be server-side and in the source HTML, and then other links/interactions that should not be crawlable, and should only be available via client-side JS / onclick event links.
You’ll probably have some sort of filtering widget, normally in the sidebar, that contains every filter available for the set of results.
Provided these filters don’t expose any links in the HTML source of the page, this widget can be left completely untouched and left for the designers to toy with as required. Don’t fight design on this, we can get more value from some slightly separate SEO links :)
The other links, the pages you actively want to link to, should be added to a separate little widget under the filters under a title like “popular locations”, or “popular property types”.
You can also throw them in a nice footer widget, but a sidebar link might have more value so could be prefered.
They just need to be exposed in the HTML source, unlike the other non-pretty, parametered URL links.
Blocking parameter filters in the robots.txt
This is something many SEOs will do to avoid crawling & indexation of query parameters.
It’s a viable strategy for newly launched sites to prevent issues, however, existing sites need to keep a few things in mind.
Personally, I prefer to try alternative methods of patching crawling/indexation issues that can be attributed to parametered pages.
I will try and lower their value, by removing links pointing in, and then hope that the canonical tag takes over and does what it’s supposed to.
1. Are the primary, clean URLs indexed?
If you’ve actively linked to parameter-filtered URLs of SRPs, then you may be blocking the only indexed version of a page.
Google might not have the primary page for those search results, indexed.
While yeah, Google will eventually index the new URL, you might be temporarily killing a chunk of your traffic.
2. Do the parametered URLs have links coming in?
If the URLs filtered with parameters have links pointing in, you could be culling any value they have. Google will no longer look at the canonical tag and assign any weight to the parent, non-parametered, URL.
Double-check this, and make sure you’re not about to potentially remove the value these links would have passed.
Google says no
All in all, there’s this.
Don't use robots.txt to block indexing of URLs with parameters. If you do that, we can't canonicalize the URLs, and you lose all of the value from links to those pages. Use rel-canonical, link cleanly internally, etc.
— 🐝 johnmu.csv (personal) weighs more than 16MB 🐝 (@JohnMu) November 8, 2019
So keep that in mind.
Handling canonical tags of search filters
Pretty URLs get included, along with pagination.
All non-page query parameters should get stripped.
That will help pass your SEO value around, ensure minimal over-indexation, and try to keep Googlebot in check if it discovers these filter URLs.
Let’s say you have a URL of example.com/buy/sydney/apartments/?priceBetween=500000-1000000&bedrooms=3
The canonical tag I would be recommending is;
<link rel=”canonical” href=”https://example.com/buy/sydney/apartments/” />
Canonicals are seeming more and more like a suggestion though, rather than a rule, so just keep in mind that they, unfortunately, don’t work as well as they used to control indexation.
Common mistakes in search filter handling
Here are the common mistakes I see with clients when it comes to handling search filters.
Ordering of filters not handled leading to a duplicate page for each possible combination
Having both ?priceBetween=500000-1000000&bedrooms=3 and ?bedrooms=3&priceBetween=500000-1000000 crawlable will lead to crawling & indexation issues. Ensure these alternate orders are either never possibly crawled and that all internal filters & links contain the correct order, or, you have a way to detect them and 301 redirect to the primary order.
Even though you’ll have canonical tags attempting to clean this up, they don’t always work so it should be at least attempted to have this patched before it becomes an issue.
If you’re 100% definitely not actively linking to any of these query parameter pages, and there is absolutely no chance of links coming through, then you’ll be fine.
But, well… bugs happen. Just keep that in mind.
Search filters are available as both a pretty URL and a query parameter
When you have both /buy/sydney/apartments/ and /buy/sydney/?propertyType=apartments crawlable and indexable, you’re duplicating and devaluing some of this content. You need to make sure the parameter redirects to the pretty URL version (unless multiple values exist).
Multi-select filters adding both selections to a pretty URL
When URLs are handled like /buy/sydney/apartments/houses/ or /buy/sydney/apartments-houses/. My recommend way of handling this is moving both to a query parameter version, like /<channel>/<location>/?propertyType=type1-type2.
Search filters included on listing/product links
This is one I have now seen a few times, and it caused massive indexation issues. The query parameters were included on the links to the listings. Google then not only crawled them all, but indexed many of them too, and plenty were ranking. Ranking with a search filter query parameter URLs on a page that didn’t do anything with them.
I’ve also recently learnt that this is a Shopify default… which is weird.
You can read here how to fix it, but the collections part of the URL is included in the product URL they link to. This is stripped with a canonical, but what a mess this can cause for Google!
And yes, I am looking at a site right now, that does this, and the collection’s version of a product URL is ranking.
Non ‘pretty’ search filters are included in the canonical
If you don’t want it to rank, don’t include it in the canonical tag. Many sites still include these parameters in the canonical, that generate thin & heavily re-used content.
To avoid issues, these thing parameter filters should be stripped and try to pass their value/indexing back up the chain to their primary results URL.
Results not loaded server-side
There are plenty of site builds where the majority of the site is server-side rendered, but then their entire search is client-side. This dramatically impacts indexation & crawling of the search, which is critical. The entire page, including the search results, should be loaded server-side, not just the overall template.
Cleaning up over-indexation of search filters
If you’ve exposed too many search filters, and need to clean them up, then you will need to undertake over-indexation clean-up… which is a whole separate topic.
You can read more about over-indexation clean-up for programmatic builds here.
Extra: How can you target the higher-value low-tier non-pretty URL filters
Well, that’s a mouthful.
Something I will cover in a bit more detail later, but yes. There are super long tail filter variations that have value, and are worth targeting.
But you need to be careful with this, as it is extremely easy to create 100s of thousands of pages, to just target a handful of keywords.
You want to target the top 80% of each of these types of keywords.
The ideal scenario to handle these is a set of custom rules.
These rules should allow you to map out what filter combinations could be used.
A way you can select 100 key locations, out of the 5,000 you have in your data set.
A way to select just 2 property types, from the 10 in your data set.
A way for you to set the price filter to under 300,000.
The system could then spit out the combinations of those 100 key locations and 2 property types, to create pages for “<property type> for sale in <location> under 300,000”.
You’ll get 200 combo pages, which will have 80% of the total volume, rather than 50,000.
Top tier strat: Automatically targeting these higher value but low tier filters
Recently, I have started recommending a new approach.
Three tiers, but could be expanded, of filters with pretty URLs.
Top tier – Every value gets a pretty URL, at all levels
Mid-tier – Every value gets a pretty URL, but only for top-tier locations
Low tier – Always in a query parameter.
This middle tier is new to the way I am pushing clients to handle URLs, and offers a more “simplistic” approach to landing page automation.
Let’s go back to the bedrooms and pricing examples.
Rather than creating “Properties under <price> in <location>”, and “Properties with <bedrooms> bedrooms in <location>” for every single location in your database (5,000+), you could filter this to just a set of top locations.
For price, you might have 15 values. Instead of creating 15 x 5,000 pages, you could create 15 x 100 pages.
For bedrooms, you might have 6 values. Instead of creating 6 x 5,000 pages, you could create 6 x 100 pages.
You might want more pages than just the ‘top tier’ locations for some filters though. Bedrooms are a good example of this. It might not be worth creating 5,000 locations worth of bedrooms pages, but more than 100 locations could be warranted.
To solve this you could add an extra tier into what I discussed above, where you create a set of ‘mid-tier’ locations, to compliment top-tier locations, and use this mid-tier list for some filters.
This mid-tier location list could be 500 top locations as an example.
That way, bedrooms could be 6 variations x 500 locations, rather than just the 100 top locations.
Faceted search vs search filters
This is where people lose me a bit.
UX people/designers etc seem to say filters are basically a single selection to refine the data, and “Faceted search” is about multiple selections. So basically multi-select filters are “faceted search”?
Filters do the same though? They filter the results.
Seems weird, but oh well. They’re the same to me, just ‘multi-select’ filters…
So yeah, I’m only talking about faceted because that’s what others call it, and what you could be searching for.
This is still what you’re after, I swear.
It’s all the same for this.
Keep your filters in check
Google is dumb. You might think they’re smart, but in reality, they’re just a system that needs direction. The more you give them, the better they’ll be able to crawl, index, and pass value around your site.
Know what indexable and crawlable URLs you’re creating, keep them tamed, and you will be rewarded with that lovely SEO traffic in the long term.
If you haven’t already, make sure you check out my programmatic SEO checklist to help you tick off the build.