Stripping and Rebuilding a URL Structure for Migration Testing

Some clients have recently required a migration mapped out for 10’s to 100’s of thousands of URLs.

To do this, we needed to capture all the different URLs, build out the migration rules, and then test the redirects.

Mapping out and building the rules is one thing, but then how do we bulk generate what the URLs should look post-migration so that we can actually test the redirects?

How can we test the implemented redirect rules, to ensure they match what we originally mapped out?

No one wants to manually create and map 1,000+ URLs to compare the redirect path too, and I personally prefer to not just test the redirects then give them ‘after’ the once over.

I’d like something to compare the implementation to.

Using Google sheets (or Excel) you can strip down the existing URL, extract the parts the redirection rules will use, and then rebuild the existing URL into their new formats.

This will allow you to do a bulk migration test against the implemented redirect rules, and confirm everything redirects where it should.

There are a couple of ways of achieving this, and I will just go through the different methods I use below.

A quick note though, this has nothing to do with building the actual redirection rules themselves. Whilst it might help you build better examples of what you need to redirect, this post is purely about building the expected URLs so you can confirm the redirects match.

 

Stripping a URL down

There are a few main types of URLs we can account for here.

The standard hierarchical folder-based structure, the flat URL structure with everything to load the page in the base URL, and the ugly query parameter based structure.

They can all be stripped down relatively easily, but firstly we can strip out the URL path from the domain to simplify issue patching later.

 

Removing the domain from a URL Path

A handful of different ways to strip a domain from a URL are available, but this formula is possibly the easiest;

=RIGHT(A2,LEN(A2)FIND(“domain.com”,A2)LEN(“domain.com”))

This will strip everything up to and including ‘domain.com’ allowing us to account for all https, http, www, and non-www variations of a URL leaving us with just the URL path.

Stripping the domain from a URL

It will also cull subdomains, so just keep that in mind. If you need subdomains left, you can still use the same formula, but each subdomain will need it’s own.

I’d recommend extracting the subdomain in a separate column, and then referencing the subdomain’s cell, rather than plain text domain. That should save you a bit of time!

 

Extracting query parameter values from a URL

Let’s go over this one first, as it is slightly simpler than breaking down a ‘pretty’ URL.

There’s just one formula you need, and that’s regex extraction.

=iferror(regexextract(B2, “[?&]parent=([^&]*)”),“”)

That will extract the value for the ‘parent’ query parameter, and leave the cell blank if the parameter doesn’t exist, or is blank.

Extracting query parameter value from a URL

You then just need to repeat that formula, for each query parameter value you need to extract.

 

Stripping a hierarchical ‘pretty’ URL structure

Stripping down a folder-based, or hierarchical is the next structure that could be at play here – and generally my favourite to use.

The absolute simplest way is to use the “split text to columns” method.

Splitting url path text to columns

Highlight your text, click ‘tools’, and then click on ‘Split text to columns’.

You’ll then get asked for a separator. In this case, the separator is a ‘/’, so you’ll need to click ‘custom’ and enter the ‘/’.

After which, you’ll immediately get the following;

Select the URL separator

Each folder split across the different cells, giving you all the different folder levels all accounted for.

Works perfectly! If you don’t mind splitting any new URLs like this going forward any time you add new ones.

I personally prefer to try and work with formulas where possible, so that we can keep expanding the URL list, along with tweaking it as we go if modifications are required.

Provided you did the URL path column above, the following formula is what you’ll need;

=iferror(LEFT(B6,FIND(“/”,B6)1),“”)

Parent folder extraction from URL

That will just strip off everything after the first slash, leaving you with the parent folder.

Next, you’ll do a similar thing to get the child folder, but first, you’ll just subtract the parent folder from the URL path.

=iferror(LEFT(substitute(B6,C6&“/”,“”),FIND(“/”,substitute(B6,C6&“/”,“”))1),“”)

This will just remove the parent folder from the URL path, and leave you with the child folder, which will be the second folder in the path.

Extracting a child folder from a URL

And if you’re playing with 3 or more levels, here’s the formula you can modify;

=iferror(LEFT(substitute(B6,C6&“/”&D6&“/”,“”),FIND(“/”,substitute(B6,C6&“/”&D6&“/”,“”))1),“”)

You’ll just need to keep adding the new cells in there each time you need a new level

 

Breaking down a flat URL structure

These are extremely difficult to break down.

Unless there is consistency in how they’re formatted, this could be a very manual process.

However, I am just going to assume you’ve gotten lucky with something that resembles a consistent format for you to pull apart.

=LEFT(B15,FIND(“-for”,B15)1)

Stripping a flat URL structure

For the child folder, there are two formulas I have that basically just do the same thing.

They both look at a break point and just trim everything including and after it. For my example, my breakpoint is “-in”.

=substitute(REGEXREPLACE(B15,“(.*)-in.*”,“$1”),C15&“-“,“”)

Extract a part of the URL

=substitute(LEFT(B16,FIND(“-in”,B16)1),C16&“-“,“”)

Google sheets URL modification formula

Same same but different.

I just don’t remember how they’re different lol.

The last part here is to extract the location.

If your location is like mine, you don’t need anything fancy.

You just need to substitute all the previous folders from the structure, then remove the slash, and you’ve got your location.

=substitute(SUBSTITUTE(B15,C15&“-“&D15&“-in-“,“”),“/”,“”)

Extraction a location from a URL for SEO

You essentially rebuild the URL that came before it, based on what you had stripped out, and then you’re left with the ‘location’ that is on the end.

If your location isn’t on the end or doesn’t have something that can be clearly substituted out after it, you’ll need to modify the formula used for the child folder.

 

Rebuilding the New Post-Migration URL Structures

Once you’ve stripped down the old URL to its main elements, you can start building the new one!

If you’re merely converting query parameters to a clean structure, with no folder changes, you’re super lucky.

Many of us though, are not that lucky.

 

Converting query parameters to pretty SEO URLs in Google Sheets

Okay you lucky people, this is super easy.

=”https://www.domain.com/”&C2&”/”&D2&”/”

Converting query parameter to pretty SEO URL in Google Sheets

You just piece the URL together using the ampersand, ‘&’.

Include your domain with a slash, then the parent folder and a slash after it, and then the child folder and another slash.

Some people like to use concatenate, but you complicate things. Complications through concatenations.

Annnnnd you’re done! You’ve now turned your query parameters into a pretty URL structure, in bulk, so that you can test these out.

It’s rarely that easy though.

 

Rebuilding a URL structure where folders change names or get merged

Whilst it’s easier when things don’t change, it really isn’t that hard when they do now that you’ve stripped the URL down.

The first step is to paste all your parent, child, and any other folders into a separate column, and then remove all their duplicates.

Removing duplicates from a list of URL paths

This will leave you with a unique set of folder values.

You then just need to add their new value next to them.

If the value doesn’t change, just paste the same value in.

If the folders are getting merged, just paste the folder you’re merging into here.

If it is a completely new value, type out the replacement.

Old to new URL folder mappings

The mapping of the values might take time, as some websites could have hundreds and hundreds.

The benefit of doing it this way is that you aren’t dealing with all the possible variations & combinations.

Separating the child folders from the parent folders lets us independently map them to their new values.

Now we need to bring these new values into the mix, with a good old vlookup (if you don’t know how to do it, check out my SEO Excel formulas post).

=VLOOKUP(C2,M:N,2,0)

Vlookup the old to new parent folders

Repeat the process for the child folders, and you’ll get all the shiny new folder values.

You’ll just need a slight tweak to the child formula, due to the possibility of there being no child.

=iferror(VLOOKUP(D2,P:Q,2,0),“”)

If no child exists, the cell will remain blank.

Vlookup the new child URL folders

You should now have completely mapped the old to new folder values, so we can move on to constructing the URL.

Now you just piece it all together!

=“https://www.domain.com/”&E2&“/”&IF(len(F2)>1,F2&“/”,“”)

Rebuilding a new SEO URL for migration testing

This will add one folder to the other, and rebuild your new URL structure for you!

If there is no child available, it will ensure there is no extra slash added.

If you prefer the old flat structure, tweak the formula to be;

=“https://www.domain.com/”&E2&IF(len(F2)>1,“-“&F2&“/”,“/”)

Which will ensure there is no extra dash added, unless the child exists.

And if you need a stop word, or two, in the middle to clean her up a bit, just do the following;

=“https://www.domain.com/”&”theres-a-“&E2&“-before-the-“&F2&“/”

You’ll have to account for your own variations of the template, and whether a child may or may not exist.

 

Rebuilding a Structure that Previously Used ID’s

What if you’re just stuck with a heap of parameter IDs, and no ‘pretty’ URL to start with?

The benefit to the above vlookup style URL rebuilding is that you can extract the query parameter values, and then just map the old values to their new pretty folder name.

A quick rebuild, and you’ve gone from the old ugly URLs, to what could be thousands of crawlable new versions of the URL!

Start by getting the path out of the URL, and then extract all the parameter values, like I showed you earlier.

Converting query parameters to a pretty URL

Add your new parameter values to the parent/child vlookup lists;

Query parameters to clean folders

Then just vlookup the parameter value, to the new parent & new child values, and build your new URL!

New pretty URLs for query parameter ugly URLs

Everything was covered earlier, so just grab all the formulas from there.

That covers it though!

You’ve gone from the ugly parameter version to a shiny new & testable pretty URL.

 

Bulk testing your new URL structure

You can now run the old URLs through screaming frog (or your preferred bulk testing tool) and copy out the final URL in the redirect path.

Match it up to your expected ‘new URL’ and confirm whether it is the same or whether they’re different.

If they’re different, is it the redirect rule that doesn’t match? Is there an error in the expected URL? Are the redirects just not implemented?

This is guaranteed to save you time bulk testing your migration.

 

Testing a URL migration on staging before going live

In an ideal world, for a migration this size, you should be able to test out your redirection rules on a staging environment.

Swap out the website’s domain in the formula, for the URL of the staging environment, and do everything else the same.

You’ll get thousands of staging URLs ready to load screaming frog up with!

 

Download the sample

You can download all my sample data from here if you want to jump right in.

I’m always looking at new ways to simplify bulk analysis, so if this worked for you, or you do something completely different, let me know!

I’d love to hear about it all 🙂

 

Also, just a quick shout out to the guys at ExtendOffice.com though!

Plenty of awesome formulas, with this one definitely being one for the bookmarks: https://www.extendoffice.com/documents/excel/1783-excel-remove-text-before-character.html

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest

2 Replies to “Stripping and Rebuilding a URL Structure for Migration Testing”

Leave a Reply

Your email address will not be published. Required fields are marked *

Leave a comment