How To Count Pages or Posts Moving from HubSpot to WordPress

Whether you are moving from HubSpot, Blogger, TypePad, Drupal, HTML, Moveable Type or a custom CMS into WordPress we often have to analyze how much work is involved before we are given usernames and passwords for a potential move. This is the first in a series of posts that will look at HubSpot, Blogger and Drupal.

There are a range of tools that allow you to peer into a site, but we have settled on the Screaming Frog SEO Spider for our pre-move assessments. We ponied up for the Pro version. Let’s look at how to use the SEO Spider before a migration project begins.

screaming from seo spider

First, launch the SEO spider and then export your results as a .csv or Excel spreadsheet.

Second, in Excel click on the Data tab and then click on Filter. This will offer the little drop down menu for each column. Here is where it gets specific and powerful.

excel filters
Click to see a larger image illustrating Excel Filters

The first two columns are labeled Address and Content; these will be the focus of our attention.  Each website or blog may have a very unique URL structure, so here is some guidance.

Below are instructions for HubSpot.  Each example also illustrates unique URL structures that could be found in any website or blog you are moving.

If you are leaving HubSpot for WordPress;

I like to expand the column width of Column A (Address) so that I can read most URLs before I begin.

  1. Click on the Content Filter drop down, in Column B. It is usually found on Row 2.
  2. Click on Select All to uncheck all boxes.
  3. Select the boxes that start with Text/html and/or Text/html; charset=UTF-8.
  4. Click OK.

Note that your configuration may vary. The point here is to select only text. This removes the images, applications, JavaScript, Blanks and other file types.

Now lets sort for just blog posts.

Leave the above settings in place. This is where we can find even more dramatic differences. You have to look at the URLs and think about what you want to exclude or include. In the HubSpot blog that I am looking at for reference I can see an alphabetical listing of the URLs.

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Sort A to Z option
  3. Click OK

Now scroll down the page and look for patterns.

I can see that I don’t want to include URLs like;

Tags =
BB Pages =
Pages =
RSS Feeds =

So we are up to at least four URL structures to exclude now. There is an easier way.

All HubSpot blog posts have the following structure and we can search for a unique part of the URL;

Back in Excel;

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Text Filters  and then choose Custom Filter
  3. In the drop down, select “Contains” and type in /blog/bid/
  4. Click OK

Now Excel will show you the revised count of just HubSpot blog posts at the bottom left corner. Pretty hand, eh?
This excludes, images, pages, tags and other non-blog URLs. HubSpot does not have Categories.

Keep in mind that the SEO Spider will follow the robots.txt and other similar directives. If a section of the site is excluded, the SEO Spider will not reveal them.  We have found sites that have pages that are not linked to from any other page and are not in the Google index and are yet important parts of a website transfer. This is an example of where a site owner’s first hand knowledge can be helpful when exporting a site.

How do you assess the number of pages or posts before moving?

Read the second in our series to learn more tricks and the specifics in counting pages or posts when moving from Blogger to WordPress.

Read the third in our series to learn even more tricks and the specifics of counting pages or posts when migrating from Drupal to WordPress.

This entry was posted in Blog-Move, HubSpot to WordPress. Bookmark the permalink.

One thought on “How To Count Pages or Posts Moving from HubSpot to WordPress

Leave a Reply

Your email address will not be published. Required fields are marked *