How to Count Pages or Posts Moving from Blogger to WordPress

Whether you are moving from HubSpot, Blogger, TypePad, Drupal, HTML, Moveable Type or a custom CMS into WordPress its handy to know how much work is involved (how many pages or posts), even before you have usernames and passwords.

This is the second in a series of posts where you can learn how to use the SEO Spider and Excel to see the differences between HubSpot, Blogger and Drupal URL structures.screaming from seo spider

We tried many tools and settled on the Screaming Frog SEO Spider for our pre-move assessments. We ponied up for the Pro version and removing the 500 URL limit is well worth it. Let’s look at how to use the SEO Spider before a migration project begins.

First, launch the SEO spider and then export your results as a .csv or Excel spreadsheet.

Second, in Excel click on the Data tab and then click on Filter. You will now see the little drop down menu for each column. Here is where it gets specific and powerful.

excel filters
Click to see a larger image illustrating Excel Filters

If you are Migrating from Blogger to WordPress;

Let’s see how many blog posts we have.  The difference between a Blogger move and other CMSs is in the URL structure filtering.

I like to expand the column width of Column A (Address) so that I can read most URLs before I begin.

  1. Click on the Content Filter drop down, in Column B. It is usually found on Row 2.
  2. Click on Select All to uncheck all boxes.
  3. Select the boxes that start with Text/html; charset=UTF-8. Generally the other two options visible relate to XML feeds.
  4. Click OK.
Blogger uses a different URL structure than HubSpot or Drupal. For example the comments each have a URL that matches the post they are associated with.

Post – http://blog.domain-name.com/2008/05/post-name.html

Comment – http://blog.domain-name.com/2008/05/post-name.html?showComment=1210259340000

Comment – http://blog.domain-name.com/2008/05/post-name.html?showComment=1211221980000

Comment – http://blog.domain-name.com/2008/05/post-name.html?showComment=1211988840000

Thankfully the Archives use a different string to show the date.

http://blog.domain-name.com.com/2007_11_01_archive.html

http://blog.domain-name.com.com/2008_12_01_archive.html

Feed URLs are distinctive.

http://blog.domain-name.com/feeds/1058267882026467620/comments/default

For Blogger we need to filter for /year/month/ and exclude Comment. Here is how to do that.

First off, it may already be obvious that we can’t filter on one year or month because it will exclude all others. So, we will avoid that issue by using a wildcard *.

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Text Filters  and then choose Custom Filter
  3. In the drop down, select “Contains” and type in /20**/**/
  4. In the second drop down, select “does not contain” and type in Comment
  5. Click OK

Now you can see just the blog posts in Blogger.

Be sure to read the two other related posts. Each one has unique lessons that will still apply regardless of CMS you are using. The first in the series discusses moving from HubSpot to WordPress.

The second post reveals a few more tips and looks at migrating from Drupal to WordPress.