Market Analysis & Keyword Insights Part 1: Keyword Research

It’s time to dive into my full process of doing SEO market research & analysis.

This is something I do for some of my bigger clients when they’re either kicking off their SEO and want to understand the market, or they want to understand current positioning.

A market research & analysis piece is essentially keyword research, combined with SERP data to understand the current market & competitor performance. It can then leverage the data to make optimisation decisions.

We’ll work on an imaginary bedding e-commerce site, selling linens, pillows, and quilts (*fancy accent* duveeetttts).

In this post, the plan will be to;

  • Build out a seed list of products
  • Leverage seeds for keyword research
  • Categorise and identify modifiers
  • Identify duplicates from pluralisation, stop words, and word re-ordering
  • Highlight keyword templates

We’ll then work through SERP data for the keywords, and optimise based on that data.

Let’s get stuck in!

Mapping Out the Main Site Structure

To kick things off, I tend to do a basic mapping of what I think the primary site structure will be.

This tends to just be the top-level categories of the site.

Since we’ll be working on an e-com store, this is going to be the store’s key product categories: Linens, Pillows, and Quilts.

 

Breaking Down the Categories

Next, we’ll be delving deeper into each of the categories to map out the specific products.

Normally, you’d be told by the client what they sell, or can find it on their website, so you can leverage this.

Since this is just a fake one, I’m just going to say that they sell a big range of linens, with different types of pillows and quilts/duvets.

Whilst the client will provide a list of products, they might not provide everything you need. Therefore, I would still recommend doing some Googling, comparing their inventory to others, and seeing if anything is missing.

This can all be modified later, post-research, as you’ll probably find some product gaps. Still, it’s best to be thorough earlier so that your data is cleaner later.

For linens, we’ll have the product as the style of sheet/cover, the material, and then the sizing. For the other products, the product range will be much smaller: material for pillows, and material & size for quilts.

This won’t be about capturing every attribute possible, just the primary ones.

 

Product Line: Linens

Products: Top/flat sheets, Bottom/fitted sheet, Quilt/comforter/duvet cover, Pillowcases

Materials: Cotton, Bamboo, Polyester, Linen, Silk, Microfiber, Satin

Size: King, Queen, Double, Single

 

Product Line: Pillows

Products: Down, Memory foam, Latex, Gel

 

Product Line: Duvets/Quilts

Products: Down, Wool, Cotton

Size: King, Queen, Double, Single

 

This is what we’ll say our client ‘sells’. Quite a range!

We’ll work with the product line as a category, and the products listed as sub-categories.

 

Building the Seeds

Now it’s time to build out our ‘seed words’ for each product offering.

These seed words are the core of our keyword research, and we build them out before using keyword research tools.

The best way to explain why we want these seed terms is with one of our categories – Quilts.

You probably call them Duvets. I don’t. I know them as quilts (DON’T JUDGE ME).

Many tools won’t be able to determine the relationship between the two items so we need to put them in as different seed words to encompass the full category.

We would also add in ‘comforter’ as that is another word people search for to mean the same item, even though it may also mean something slightly different. Essentially keyword variations.

So rather than just using a term like ‘duvet’ to do our research, we would also want to list ‘quilt’ and ‘comforter’.

Another one to keep in mind here is ‘pillowcases’ versus ‘pillow cases’. Since we have no volumes, we can’t just pick one and go with it and tools like Ahrefs won’t combine the volumes. We should be including both in the research, so that we can ensure we capture the full suite of keywords.

And one last one, ‘fitted sheets’ vs ‘bottom sheets’. They’re the same thing, so we just need to account for them both in the seeds.

For our e-com site, they’re all same same but different – SSBD.

Okay, let’s break each one down, and delve into their seeds.

To do this, I just list them all out from what I know, and then Google each line to see what other words are used.

So first Google ‘flat sheets’. You’ll find ‘top sheets’ bolded, or mentioned too, so we know that’s what they’re also called, and thus they’re added as a seed.

We’re also going to include material here, as it may pull in a few extra keywords from some research tools when being a bit more specific.

We won’t include sizing, as we will be assuming that the tools will come back with sizing-based keywords.

If you’re using a data source like DataForSEO, then going deeper with the seeds to include sizing may be more beneficial, as you can return more keywords.

Since I will just use Ahrefs for quick and dirty research here, I can only enter 10 keywords at a time, so including sizing will mean 5x my seeds to run. It’ll then return pretty much everything, up to my report limits.

We end up with a seed table like below for the bottom sheets sub-category;

 

Similar to pillowcases;

And then one for quilt covers too;

And then after all the main products in the category are covered, we just want to ensure the category is fully covered in terms of keyword variations.

For this one, we will use some extremely broad terms like ‘sheets’, ‘queen sheets’, ‘cotton sheets’ etc;

I also included sizing in this one, because it’s a lot more broad. We will filter these further later on, but it will be good to see what it captures.

This gets repeated for the other product lines.

For Pillows, we end up with;

And for Duvets;

There are probably plenty of other synonyms and alternate names here, but this should have captured the main ones.

Pillows & Duvets/quilts was a harder one, as I can’t find any direct product sub-types. They are just some modifiers like material, and sizing.

I decided to run with material for the sub-category, but you could probably just leave sub-category as ‘Generic’, since all the keywords will get size/material modifiers separately anyway.

Building out the Keywords

Now we’ve got the seeds, we can build out the keyword list.

Running the Seeds

I’m going to take a shortcut here, and just throw them all into Ahrefs.

We can run 10 seeds at a time so that we can export the keywords under ‘matching terms’. Since I can only export a handful from Ahrefs at a time, we’ll just limit it as we go to try and get the most keywords exported.

So if 10,000 keywords are returned, I can limit the seeds to just 1 or 2, for the top ones, and still be able to export all the keywords.

You can work across products, but I tend to keep the seed searches within the same category as I go.

So I’ll take these;

Throw them into the Ahrefs search;

Get the phrase match, matching terms;

And then export the rows I can;

Keep an eye on the keywords as you go, as you don’t want to go too broad and have to spend significantly more time cleaning up. I almost did ‘sheets’, but then realised Google sheets, among other things, popped up.

I switched the more generic seed of ‘sheets’ to ‘bed sheets’ and was golden.

Aggregating the Data

Now we’ve got a heap of CSV files, with all the Ahrefs data;

Rather than manually combining these, we can use CMD on Windows to combine them all!

Find how to merge the CSV file in cmd here.

There’s a mac equivalent somewhere, and this looks like it might work.

Dedupe the keywords in Excel or Sheets. I just prefer to do it before moving to Sheets, as there can be quite a few duplicates to clean up.

So now we’ve got 15,000 somewhat related keywords to play with.

After removing all the zero-volume keywords, we get to 12,000 that we can import to Google Sheets and begin categorising.

 

Categorisation

I’ll use my keyword categorisation formulas from here to categorise the keywords.

I put the plural and singular formats of each seed keyword into the Find column, and then give it the associated Category and Product they’re mapped to;

‘Len’ on the end is the character length. The categorisation formula works top down, so we want the most specific words up top. The most specific can be determined by the length, as they’re the longest, so just sorting by this solves the problem.

This formula then works through and will categorise all my keywords based on this.

When it finds “cotton comforter” or “cotton duvet”, it will categorise them as Duvets, and then Cotton Duvets.

Because we have ‘Cotton Duvets’ first in categorisation, but then ‘Duvets’ at the end, anything with ‘cotton duvets’ gets categorised first. Then anything with ‘duvet’ gets left at the end.

Due to the word order requiring ‘cotton duvets’ in that specific sequence, there may be cotton duvet keywords marked just as duvet, so now we need to extend the seeds.

For example, I found ‘cotton bed quilts’ and ‘100 cotton-filled comforters’, both of which would need to be added as separate seeds to account for the variations.

You will also see in the screenshot above, that “duvet covers” is getting categorised. We just need to reorder the categorisation, so that ‘duvet covers’ is always categorised higher than duvets.

I use an extra column called ‘Flag’, where I just throw an ‘x’ in anything I think needs to be added to the rule list. This way, I can just sort them and add to it later.

Some extra rules can be included so these get categorised, but a couple of quick passes here will capture most of what I need to include anyway.

 

Additional Categories for Filters

We can add new columns for other category types, like materials and colours.

I created a base level of materials;

Whilst this all directly relates to what the client sells, sometimes creating an oversized list from your keywords, that you can refine after, might be better.

I threw a random sample of a few hundred keywords into ChatGPT, with the following prompt, and got it to help extend the list a bit;

Act as my ecommerce filter finding expert. Knowing that the following are the current materials of products:

Find Material Sub-Material
egyptian cotton Cotton Egyptian Cotton
100 cotton Cotton 100% Cotton
100% cotton Cotton 100% Cotton
cotton Cotton
down Down
feather Down
wool Wool
latex Latex
gel Gel
goose down Down Goose Down

please extend this data in a visual table, extracting additional product materials from the following list of keywords;

<paste ~300 keyword sample list>

GPT extracted the following;

So a few clear misses after my quick start were bamboo, silk, and satin. Not sure what I’ll keep from the others, but it’ll be easy enough to just clear them after.

Did a similar thing with colours, too, and it spat out quite a few;

And then once more, for sizing;

It gave me a nice little list there. Plenty to use and tweak myself.

The final one I would like to extend the categorisation to is ‘feature’.

This isn’t pre-defined, but it’s more like “bed sheets with xxxxx”, or “bed sheets for xxxx”. I’ll merge the top patterns in here too, and use it as kind of a catch-all for flagging extra long-tail keyword modifiers.

Threw together this quick list of filters;

And that will give us a good starting point.

I’ll go through how we can use these in the filters from an SEO point of view later on, but for now we just want a broad coverage of the different types.

You’ll see some “for” use cases, sizing, styles, thread counts, and some other general features like ‘pet friendly’ or ‘shallow pockets’.

 

Splitting Commercial vs Informational

I kept the SERP features on my data from the ahrefs exports.

Whilst it’s not exact, I’ve used the ‘Shopping results’ feature to determine a commercial intent, and then everything else gets informational.

Running the following formula lets me check the SERP Features, and if it contains ‘Shopping’, then flag it as Commercial, and fallback to Informational.

=if(len(F2)>0, if(ISNUMBER(SEARCH(“Shopping”,F2)),”COMMERCIAL”,”INFORMATIONAL”),)

It will only run if the cell contains text, since not all keywords will have this data.

Ir’s a nice, quick way of filtering out any keywords that have a high likelihood of not having a commercial intent, and could be reserved for supporting content on the landing pages, or for blog content.

As you can see though, whilst ‘green fairy quilts’ doesn’t have the shopping results, and isn’t flagged as commercial, this keyword might better suit a specific filtered shopping landing page if you have these sorts of products.

It gets it 80% right. Plenty good enough for this sort of quicker analysis, but if there was significant $$$ associated to this I would probably be spending at least a bit of time cleaning some of the bits up. Especially for client work.

 

Excluding Keywords via Negatives

Due to the way this keyword research process works, we get a lot of unrelated keywords.

We’ll do a quick prioritisation process a bit later, to better filter and sort the keywords, but we can do some quick negatives to completely remove keywords containing specific words from our list.

To do this, I work through the keyword list using the ‘Flag’ column and mark anything I don’t like with an x.

These can then be copied and pasted to the negative table, and then stripped to just the core word/phrase we want to flag as negative when found.

So we get rules like this;

Once the rules are added, the keywords will automatically get the negative column processed, and will be flagged when negative.

After just 5 minutes, we can already cull 600 keywords from the list via a negative list of less than 40 words.

If this was for a client, I’d be spending a couple of hours going through this initially, and another hour or so a couple more times, cleaning this up after we extend the research.

I’ll remove the negatives from the list by just deleting them.

 

De-Duping & Filtering Close-Matched Keywords

The de-duping I’ll do depends on the use case.

If you’re not using Ahrefs data, or another data set that does unique search volumes, then you should do some similar word de-duping to remove duplicate volumes.

Google AdWords, and thus any source using Adwords data, groups search volumes.

This would mean that many near-duplicates, or even keywords with just an ‘a’ or similar in the middle, will have the same search volumes as volume from all highly-similar keywords and will be aggregated;

From memory, they’ll also have the same CPC and some other data, but I can’t remember exactly.

The keywords could be plural/singular versions of each other, or even the same words in the keyword, but in a different order.

Ahrefs, and some others, give their own volumes that chop this data up based on clickstream or some sort of “finger in the air” method.

Separate from this, when sharing the keywords with clients, sometimes you don’t want different variations of the same keyword.

We shouldn’t just be listing out the top keywords by volume if they’re like;

feather down comforter
feather down comforters
best feather down comforter
feather and down comforter
down comforter feather
down comforter with feathers

They’re all the same.

A better list to provide would be something that could de-duplicate these a little, like;

down comforter
goose down comforter
lightweight down comforter
feather down comforter
alternative down comforter
cooling down comforter

They’re very close, yet still it mixes things up a bit. It shows a variety of modifiers, and the different words people use, rather than just plural/singular, same words re-ordered, or stop words like “with”.

To do this, there are a few different formulas we can use.

Stripping Stop Words

The first method is stripping some basic stop words. I’ve used one of my basic lists for this, and it could be extended depending on what you’re using the analysis for.

Some words that may also be considered a stop word like ‘down’ could mess with keywords, particularly ‘down sheets’, so each stop word list is custom for each project based on this.

The stop word list is added in a new column;

And then we use a regex formula that just strips out any word in the list;

=trim(substitute(SUBSTITUTE(REGEXREPLACE(” “&A2&” “,$BE$1,” “),” “,” “),”‘”,””))

This will then strip these stop words, and pull a more refined parent keyword, out of each keyword;

We’ve stripped “for” and “best” from these keywords, giving us the core root words of the keywords, and stripping that extra fluff.

Singularisation

The next de-duping method is singularisation. Making every word the single form, essentially bringing it back to its root form. Similar to lemmatization, just a bit more basic, and trying to maintain a full word.

We can now map “soft pillows for stomach sleepers” to “soft pillow for stomach sleepers”, along with similar, by stripping the stop words and singularising the entire keyword.

It’s done with an extremely hideous, but rather sexual formula;

=TEXTJOIN(" ", TRUE, ARRAYFORMULA(
IF(ISNUMBER(VALUE(SPLIT(H2, " "))), 
SPLIT(H2, " "),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ies$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ies$", "$1y"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*les$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)les$", "$1le"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*mes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)mes$", "$1me"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ves$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ves$", "$1ve"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*tes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)tes$", "$1te"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ues$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ues$", "$1ue"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*nes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)nes$", "$1ne"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ess$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ess$", "$1ess"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*kes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)kes$", "$1ke"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ges$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ges$", "$1ge"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*oes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)oes$", "$1oe"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*pes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)pes$", "$1pe"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*gps$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)gps$", "$1gps"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ses$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ses$", "$1se"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ces$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ces$", "$1ce"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*bes$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)bes$", "$1be"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*res$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)res$", "$1re"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*ees$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)ees$", "$1ee"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*sses$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)sses$", "$1ss"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*yres$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)yres$", "$1yre"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*vies$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)vies$", "$1vie"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*es$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)es$", "$1e"),
IF(REGEXMATCH(SPLIT(H2, " "), ".*s$"), 
REGEXREPLACE(SPLIT(H2, " "), "(.*)s$", "$1"),
SPLIT(H2, " ")
)))))))))))))))))))))))))
)

What a beast. The formula strips common plural word endings, and substitutes them for their singular alternate. It should cover the majority of words, as I have been building it out whenever I encounter words that aren’t covered with a rule.

It will run through the stop words stripped keyword, but could easily be modified to run direct off the keyword by just find/replacing “H2” for “A2”.

Letter Ordering

We’ve now managed to get “sheets for beds” down to “sheet bed”, but how can we now map this to “bed sheets”?

Well, bed sheets will be singularised to “bed sheet”, which is the same as “sheet bed”. Huh?

The letters!

They both have the exact same letters, just different ordering.

I realised if I just alphabetically sort the letters from these stripped-down keywords, we can find words that are the same, yet in a different order.

At first, I thought I’d run into an issue with many keywords being different words, yet the same letters. Surprisingly, that wasn’t the case.

As far as I have seen, the majority are completely fine. I don’t think I saw one that was same letters, different words. I know.

To do it, we use this formula;

=REPT("a",LEN(I2)-LEN(SUBSTITUTE(I2,"a","")))&REPT("b",LEN(I2)-LEN(SUBSTITUTE(I2,"b","")))&REPT("c",LEN(I2)-LEN(SUBSTITUTE(I2,"c","")))
&REPT("d",LEN(I2)-LEN(SUBSTITUTE(I2,"d","")))&REPT("e",LEN(I2)-LEN(SUBSTITUTE(I2,"e","")))&REPT("f",LEN(I2)-LEN(SUBSTITUTE(I2,"f","")))
&REPT("g",LEN(I2)-LEN(SUBSTITUTE(I2,"g","")))&REPT("h",LEN(I2)-LEN(SUBSTITUTE(I2,"h","")))&REPT("i",LEN(I2)-LEN(SUBSTITUTE(I2,"i","")))
&REPT("j",LEN(I2)-LEN(SUBSTITUTE(I2,"j","")))&REPT("k",LEN(I2)-LEN(SUBSTITUTE(I2,"k","")))&REPT("l",LEN(I2)-LEN(SUBSTITUTE(I2,"l","")))
&REPT("m",LEN(I2)-LEN(SUBSTITUTE(I2,"m","")))&REPT("n",LEN(I2)-LEN(SUBSTITUTE(I2,"n","")))&REPT("o",LEN(I2)-LEN(SUBSTITUTE(I2,"o","")))
&REPT("p",LEN(I2)-LEN(SUBSTITUTE(I2,"p","")))&REPT("q",LEN(I2)-LEN(SUBSTITUTE(I2,"q","")))&REPT("r",LEN(I2)-LEN(SUBSTITUTE(I2,"r","")))
&REPT("s",LEN(I2)-LEN(SUBSTITUTE(I2,"s","")))&REPT("t",LEN(I2)-LEN(SUBSTITUTE(I2,"t","")))&REPT("u",LEN(I2)-LEN(SUBSTITUTE(I2,"u","")))
&REPT("v",LEN(I2)-LEN(SUBSTITUTE(I2,"v","")))&REPT("w",LEN(I2)-LEN(SUBSTITUTE(I2,"w","")))&REPT("x",LEN(I2)-LEN(SUBSTITUTE(I2,"x","")))
&REPT("y",LEN(I2)-LEN(SUBSTITUTE(I2,"y","")))&REPT("z",LEN(I2)-LEN(SUBSTITUTE(I2,"z","")))&REPT("1",LEN(I2)-LEN(SUBSTITUTE(I2,"1","")))
&REPT("2",LEN(I2)-LEN(SUBSTITUTE(I2,"2","")))&REPT("3",LEN(I2)-LEN(SUBSTITUTE(I2,"3","")))&REPT("4",LEN(I2)-LEN(SUBSTITUTE(I2,"4","")))
&REPT("5",LEN(I2)-LEN(SUBSTITUTE(I2,"5","")))&REPT("6",LEN(I2)-LEN(SUBSTITUTE(I2,"6","")))&REPT("7",LEN(I2)-LEN(SUBSTITUTE(I2,"7","")))
&REPT("8",LEN(I2)-LEN(SUBSTITUTE(I2,"8","")))&REPT("9",LEN(I2)-LEN(SUBSTITUTE(I2,"9","")))&REPT("0",LEN(I2)-LEN(SUBSTITUTE(I2,"0","")))

It counts the occurrences of each letter & number, and then just repeats them in alphabetical order.

So basic.

Here is a great example, where we have ‘bamboo sheets cal king’, ‘cal king bamboo sheets’, and ‘bamboo cal king sheets’.

The letter ordering returns ‘aabbceeghiklmnoost’, allowing us to have all 3 a matching field, so that we can group them all.

Setting a Parent Keyword

Leveraging 3 de-duping practices, we can now set a parent keyword based on the ‘Lettersordered’ column.

I’ll simply be finding all matches with the same Lettersordered value, and then picking the highest volume one, and setting that as a parent.

We can do this with this formula;

=INDEX(SORT(FILTER(A$2:A$11861, K$2:K$11861 = K2), FILTER(C$2:C$11861, K$2:K$11861 = K2), FALSE), 1, 1)

Which generates;

If multiple keywords are the same with plurals switched for singular, stop words stripped, and words re-ordered, they will have a parent keyword set.

Now if the keyword = parent keyword, we know it’s a parent. If it doesn’t, it’s a duplicate.

So knowing this, we can run a formula and flag the duplicates;

If I was using AdWords data, with volumes aggregated, you can also add an extra filter here so that when the search volume for the keyword is the same as the search volume for the parent keyword, you flag as dupe.

That way you can relatively easily filter out “duplicate” keywords, with the same search volumes. By far the best way to trim down those over-estimated search volumes from the broad/near-exact matched data when you’ve got no way to break the volumes down.

You can also go a step further, and remove more words, or even modifiers if you’d like. You’d be able to transform that keyword into a proper top-level product range parent with enough rules.

Since we’ve got a heap of modifiers already laid out, we could strip them off of a guidance cell, and then set the parent based on that.

 

Prioritisation

Prioritising keywords can help us get an even spread of keywords across the board, if we want.

It will also allow us to do a bit more advanced filtering on the keywords, so that we only include the best of the best, that include specific enough text for us.

I don’t need it for this analysis, but wanted to still include it to show you how we could further filter keywords if we wanted to.

There are two types of scoring we could do.

The first score we could give is if a keyword included a specific product name that we mapped out earlier;

So if it includes those 2 words, then add 2 to the score.

More specific, or higher-priority products, could be given a higher score if you’d like.

This is purely based on the words found and their associated score. You can also use a second column, with separate words like colour or style, and then combine the scores and create separate tiers of priority.

The other score we could do is just whether it includes a modifier, and how many. Since we have all the different modifiers like material, size, colour, and feature, we can leverage those columns.

This will just look at the modifier column, and if the length of the text is 1 character or longer, add 1 to the score.

Just another way to be able to prioritise keywords, and help you filter out ones with possible value.

Everyone will do this differently, so I’ll leave it up to you to decide how to handle this one.

For this analysis, we’ll skip using this.

 

A Final Cleanse

Before we look into insights, let’s do a final cleanse.

To do this, I will sort the raw keywords table in the following order;

Volume > Product Range > Category > Type

This way, each product range is grouped together within its category, all sorted by volume, with the keywords flagged as Commercial separated from the others.

I’ll go through now and check for any missed Materials, Colours or any other filters, and also flag keywords to add to the negative list to exclude them and others similar to them.

Anything with a C is getting added or updated in the category setup.

Anything with an F will be added to the features.

Anything with an X will be deleted.

M for material, S for sizing, etc. etc.

One big one I have found, is from the whole Quilts vs Duvet thing.

More generic ‘quilt’ terms lean towards knitted & handmade-type quilts. Highly unrelated to our products.

I am going to cull these entirely, by removing the ‘quilt’ word from the categorisation, and adding more filtered words like “bed quilt” and “king size quilt” etc to make it a bit more specific.

It will cut a lot of useful keywords, unfortunately, but these will be super broad anyway. They’ll still be included if they are related enough!

Tempted to do the same with comforters, but will leave it in for now and see what it looks like.

 

Identifying Keyword Insights

We can now set up some basic pivot tables to analyse the keywords category by category.

This is a great opportunity to flag any keyword gaps that you think you might have missed, and then extend the research before we move on to looking at SERP data.

 

Filtering Prioritisation

The first thing I will analyse is whether I have the categories setup in the right way. One thing I was thinking about from the start, for something like pillows, was if I should focus on the material or the size first.

We’ve got the subtypes for linen, like flat or fitted sheets as they’re the actual product categories, but there was no real subtype I could see for ‘pillows’. So what can we use?

Looking around at pillow/duvet e-coms, the majority have a limited material selection, and more size offerings, so I went with material as the ‘parent’ sub-category, and then sizing is just a filter.

All products offered different sizing, but not all products offered different material.

But is that how people search, too? Are people looking for a type of sheet, and then just selecting a size? Or, are they looking for a size of sheet and then selecting the material?

I assume material first, but let’s look at the data.

To do this, I break down the category I want to look at for the 2 different modifiers;

So this breaks down pillows by material first, and then by size.

The biggest key I use here is the blank row. Those items are flagged as pillows, but won’t mention the modifier.

There is 106,260 search volume associated with pillow keywords that don’t mention a material.

There is 133,730 search volume associated with pillow keywords that don’t mention a size.

So by this logic, there is twice the search volume for keywords that mention a material, verse keywords mentioning a size. Thus, material-related keywords have more value within the pillows category.

How about duvets?

This is a lot closer than I would have thought. I’m happy to stick with material as the primary for this one.

This is where I realised I missed something in the research.

What about pillow firmness or thickness? Is that a thing?

That could be a key modifier.

I’ll already have many of the keywords in the data, but let’s extend the research to include seeds like “firm pillow”, “soft pillow” etc., and see what that modifier looks like.

Firmness is big, but there’s still more volume for keywords not mentioning a firmness, than those that do – phew.

Still close though!

 

Keyword Word Ordering

A great insight we can pull from a keyword research piece like this is the ordering of words in the keywords.

In particular, the ordering of the filters within the keyword.

This would let us optimise for exactly how people are searching for a product, matching the primary keywords exactly.

I’ve done a substitution in a new column, that substitutes any matched material, size, colour or feature, with a templated element of that filter.

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,lower(I2),“<material>”),lower(H2),“<material>”),lower(J2),“<size>”),lower(L2),“<colour>”),lower(K2),“<colour>”),lower(M2),“<firmness>”),lower(N2),“<feature>”)

This will then give me a keyword template, that will allow us to aggregate search volume and usage count.

The caveat with this is that it’ll only replace in the final name you’ve given the material/size etc. So if you have both “100% cotton” and “100 cotton” in keywords, being categorised as “100% Cotton”, this will only replace the “100% Cotton” mentions. It covers the majority for me, so that’ll do for a quick analysis.

If you want to extend and properly do the keyword template, you’ll need to add a new column for each modifier. This new column should categorise just off the ‘Find’ column. That way, when it finds the category ‘find’ in a keyword, it will return the rule it found, and then you can substitute that out. It’ll be more accurate, but it’ll add 5 more columns, and thus 50,000 extra cells of data so I wanted to skip that and just cover the main ones.

Let’s take a look at the duvet cover product line keywords;

We’ve got both colour & material being shown in front of the product grouping.

I had to search much further down the list to find the other two variations;

Then we have a size with very similar usage across before & after, but when it’s taken into a multi-filter keyword, we can see size predominately at the end so we will go with the end for size.

Then when both colour and material are used, we can see the order of it going colour, and then material, with “<colour> <material> duvet cover”.

Just a cool little way to determine optimisation phrasing, and go for that extra 1% optimisation.

You can also do this off the parent keywords, rather than the raw keyword, as that may help aggregate the data a bit better.

 

Keywords by Category/Product Range

We can now break down the keywords by their category, and provide these to the client.

Create some pivot tables, and apply the specific filters to them to return the keywords you want.

Suggested filters are commercial & duplicates, and then filter by category/sub-category, or in my case, just the product range to show the keywords;

From left to right, we have “bottom sheets”, “top sheets”, and “quilt cover” product ranges.

There’s an insight we can pull from here immediately. Can you spot it? Comment if you can!

I’ll run through that in a future post.

 

Extracting & Analysing SERP Data

We can now take our keywords, and extract the SERP data for them and see what insights we can pull.

Find out how to get SERP data for 1,000 keywords for only $2.50, in Part 2 here.

 

Want my data?

Copy my sheet with the link below.

Throw your email in above and you’ll get the sheet link. You might need to disable any ad blocker :)

It’s not cleaned up as a deliverable, but you’ll get all the formulas and raw data. Nothing hidden.

 

Leave a comment, too! I’d love to hear your thoughts on the process, and anything you do differently.

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top