Whether you are moving from HubSpot, Blogger, TypePad, Drupal or a custom CMS into WordPress its handy to count how many pages or posts are involved, even before you have usernames and passwords.

This is the third in a series of posts that show how to use the SEO Spider and Excel to see the differences between HubSpot, Blogger and Drupal URL structures.screaming from seo spider

We tried many tools and settled on the Screaming Frog SEO Spider for our pre-move assessments. We ponied up for the Pro version and removing the 500 URL limit is well worth it. Let’s look at how to use the SEO Spider before a migration project begins.

First, launch the SEO spider and then export your results as a .csv or Excel spreadsheet.

Second, in Excel click on the Data tab and then click on Filter. You will now see the little drop down menu for each column. Here is where it gets specific and powerful.

If you are Moving from Drupal to WordPress;

First recognize that HubSpot and Blogger (like TypePad) are essentially hosted closed systems as compared to Drupal (or MovableType or Joomla) which will allow you to customize and convolute all you like. Trust me, some people do just that, which makes those projects harder to export and migrate.

I like to expand the column width of Column A (Address) so that I can read most URLs before I begin.

  1. Click on the Content Filter drop down, in Column B. It is usually found on Row 2.
  2. Click on Select All to uncheck all boxes.
  3. Select the boxes that start with Text/html; charset=UTF-8. Generally the other two options visible relate to XML feeds.
  4. Click OK.

Filter the content for just text/html and then sort the Address column alphabetically. Now just read the URLs as you scroll down and look for patterns.

Lets find out how many pages are on the site.

In the example that I am reviewing the developers used a unique /category-name/ for each section of the website.  Since there are more than a dozen of these it will be easier to find a way to exclude the blog than include all the category names and their pages.

Now here is an interesting situation. This particular domain has not implemented a canonical URL. This means that the SEO Spider is listing both the www and the non-www version of every page and post.  We will have to filter out the www so that all the counts are not doubled up. It’s never simple.

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Text Filters  and then choose Custom Filter
  3. In the drop down, select “Contains” and type in www
  4. In the second drop down, select “does not contain” and type in /blog/
  5. Click OK

Now we can see a much shorter list of just the pages of the website. No images, blog posts, archives and so on.  Scrolling down the list I still see all the categories and I recognize the pages names.

You can certainly modify the filter and count only blog posts or images as well. You have to pay attention to use such a powerful tool with any hope of getting useful results.

Always expect surprises when working with websites.  These same principles apply to straight HTML sites, hosted solutions like HubSpot, Blogger, Active Rain and TypePad as well as Open Source solutions such as Drupal, Movable Type or Joomla.

This post is third in a series. Each post has unique information that is relevant no matter what CMS you are working with.

The first in the series is about moving from HubSpot to WordPress and has a nice tip regarding robots.txt and the SEO Spider.

The second in the series focuses on moving from Blogger to WordPress and illustrates counting blog posts.

{ 1 comment }

Whether you are moving from HubSpot, Blogger, TypePad, Drupal, HTML, Moveable Type or a custom CMS into WordPress its handy to know how much work is involved (how many pages or posts), even before you have usernames and passwords.

This is the second in a series of posts where you can learn how to use the SEO Spider and Excel to see the differences between HubSpot, Blogger and Drupal URL structures.screaming from seo spider

We tried many tools and settled on the Screaming Frog SEO Spider for our pre-move assessments. We ponied up for the Pro version and removing the 500 URL limit is well worth it. Let’s look at how to use the SEO Spider before a migration project begins.

First, launch the SEO spider and then export your results as a .csv or Excel spreadsheet.

Second, in Excel click on the Data tab and then click on Filter. You will now see the little drop down menu for each column. Here is where it gets specific and powerful.

excel filters

Click to see a larger image illustrating Excel Filters

If you are Migrating from Blogger to WordPress;

Let’s see how many blog posts we have.  The difference between a Blogger move and other CMSs is in the URL structure filtering.

I like to expand the column width of Column A (Address) so that I can read most URLs before I begin.

  1. Click on the Content Filter drop down, in Column B. It is usually found on Row 2.
  2. Click on Select All to uncheck all boxes.
  3. Select the boxes that start with Text/html; charset=UTF-8. Generally the other two options visible relate to XML feeds.
  4. Click OK.
Blogger uses a different URL structure than HubSpot or Drupal. For example the comments each have a URL that matches the post they are associated with.

Post – http://blog.domain-name.com/2008/05/post-name.html

Comment – http://blog.domain-name.com/2008/05/post-name.html?showComment=1210259340000

Comment - http://blog.domain-name.com/2008/05/post-name.html?showComment=1211221980000

Comment - http://blog.domain-name.com/2008/05/post-name.html?showComment=1211988840000

Thankfully the Archives use a different string to show the date.

http://blog.domain-name.com.com/2007_11_01_archive.html

http://blog.domain-name.com.com/2008_12_01_archive.html

Feed URLs are distinctive.

http://blog.domain-name.com/feeds/1058267882026467620/comments/default

For Blogger we need to filter for /year/month/ and exclude Comment. Here is how to do that.

First off, it may already be obvious that we can’t filter on one year or month because it will exclude all others. So, we will avoid that issue by using a wildcard *.

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Text Filters  and then choose Custom Filter
  3. In the drop down, select “Contains” and type in /20**/**/
  4. In the second drop down, select “does not contain” and type in Comment
  5. Click OK

Now you can see just the blog posts in Blogger.

Be sure to read the two other related posts. Each one has unique lessons that will still apply regardless of CMS you are using. The first in the series discusses moving from HubSpot to WordPress.

The second post reveals a few more tips and looks at migrating from Drupal to WordPress.

{ 3 comments }

Whether you are moving from HubSpot, Blogger, TypePad, Drupal, HTML, Moveable Type or a custom CMS into WordPress we often have to analyze how much work is involved before we are given usernames and passwords for a potential move. This is the first in a series of posts that will look at HubSpot, Blogger and Drupal.

There are a range of tools that allow you to peer into a site, but we have settled on the Screaming Frog SEO Spider for our pre-move assessments. We ponied up for the Pro version. Let’s look at how to use the SEO Spider before a migration project begins.

screaming from seo spider

First, launch the SEO spider and then export your results as a .csv or Excel spreadsheet.

Second, in Excel click on the Data tab and then click on Filter. This will offer the little drop down menu for each column. Here is where it gets specific and powerful.

excel filters

Click to see a larger image illustrating Excel Filters

The first two columns are labeled Address and Content; these will be the focus of our attention.  Each website or blog may have a very unique URL structure, so here is some guidance.

Below are instructions for HubSpot.  Each example also illustrates unique URL structures that could be found in any website or blog you are moving.

If you are leaving HubSpot for WordPress;

I like to expand the column width of Column A (Address) so that I can read most URLs before I begin.

  1. Click on the Content Filter drop down, in Column B. It is usually found on Row 2.
  2. Click on Select All to uncheck all boxes.
  3. Select the boxes that start with Text/html and/or Text/html; charset=UTF-8.
  4. Click OK.

Note that your configuration may vary. The point here is to select only text. This removes the images, applications, JavaScript, Blanks and other file types.

Now lets sort for just blog posts.

Leave the above settings in place. This is where we can find even more dramatic differences. You have to look at the URLs and think about what you want to exclude or include. In the HubSpot blog that I am looking at for reference I can see an alphabetical listing of the URLs.

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Sort A to Z option
  3. Click OK

Now scroll down the page and look for patterns.

I can see that I don’t want to include URLs like;

Tags = http://blog.domainName.com/?Tag=TagName
BB Pages = http://blog.domainName.com/?BBPpage=2
Pages = http://blog.domainName.com/PageName
RSS Feeds = http://blog.domain-name.com/CMS/UI/Modules/BizBlogger/rss.aspx?tabid=139217&moduleid=193596&maxcount=25&tag=post+name

So we are up to at least four URL structures to exclude now. There is an easier way.

All HubSpot blog posts have the following structure and we can search for a unique part of the URL;

http://blog.domainName.com/blog/bid/49582/post-name

Back in Excel;

  1. Click on the Address Filter drop down, in Column A. It is usually found on Row 2.
  2. Click on the Text Filters  and then choose Custom Filter
  3. In the drop down, select “Contains” and type in /blog/bid/
  4. Click OK

Now Excel will show you the revised count of just HubSpot blog posts at the bottom left corner. Pretty hand, eh?
This excludes, images, pages, tags and other non-blog URLs. HubSpot does not have Categories.

Keep in mind that the SEO Spider will follow the robots.txt and other similar directives. If a section of the site is excluded, the SEO Spider will not reveal them.  We have found sites that have pages that are not linked to from any other page and are not in the Google index and are yet important parts of a website transfer. This is an example of where a site owner’s first hand knowledge can be helpful when exporting a site.

How do you assess the number of pages or posts before moving?

Read the second in our series to learn more tricks and the specifics in counting pages or posts when moving from Blogger to WordPress.

Read the third in our series to learn even more tricks and the specifics of counting pages or posts when migrating from Drupal to WordPress.

{ 1 comment }

Blog or Website Move Checklist

by SEO Wrangler on November 23, 2011

Thinking of moving your website or blog? This post explains the moving steps.

It doesn’t matter whether you are moving from HubSpot, Drupal, Blogger, MoveableType or a custom CMS. The project process described below will prepare you and your team for a successful project.

Share the URL

If you need our help, here is a personal request. Some  website move inquiries don’t offer the website or blog web address. Crazy right? Even if you are inquiring as an agency or are unlikely to move, let us give more useful information by looking at your project directly. This way you get more accurate information and avoid surprises.

Time to call the website movers.

Confirm All Log in Information

We will need to log in to the host and any current Content Management System (CMS) to grab all of your data. (We have also performed migrations without being able to log in.)  Log in information may include most or all the following;

  • Current host
  • Current CMS – this includes database access
  • Future host
  • Future CMS
  • Domain name registration account. This is needed to update the A Name record which points visitors to the new site
  • Third Party accounts –  Analytics, Social Sharing, RSS, integration with CRM and email lists and so forth.
We suggest that you create unique accounts for vendors. This allows you to revoke access at any time. Once the vendor account is deleted or the password updated, this also removes the vendor from the list of people to contact if there is a hiccup years after the project is completed.

If you need help choosing a Future web host, we can offer suggestions based on your requirements.

Export or Full Migration Services?

We can do anything, but we generally see three major types of move projects.

  1. Export the data only.
  2. Move the data and duplicate the current design
  3. Migrate the data and create a fresh new design

Special Functionality

Review any features or special functionality on your current site and decide whether you want it on the new site. Each CMS has unique features.

Publishing

During a migration project, two websites exist concurrently.  The current live site and the destination site.

At a specific point in time your content will be exported and moved from the live site. After that time anything that gets published on the live site, by definition, was not moved.

We have a couple of ways to make sure that nothing gets lost.

  1. Some clients may “double post”. This means posting the same thing on the live site and the destination site.
  2. High volume publishers pay a modest fee for an extra export the day of the switch over to gather any new content and include it on the destination site.
  3. Some clients opt to stop publishing. This may last for a week or two.
Whether you are publishing on the Internet or not during your project, you should keep writing content.  Once the new site goes live, you can immediately post the articles that you have written. This avoids the let down of a new site launch looking stale because it does not have any fresh content.

301 Permanent Redirects

All projects so far have included a change in the URL. The two most common reasons are;

  1. Shortening the URL or
  2. Moving from a sub-domain to a sub-directory.

Here are examples of each;

  • A shorter URL – http://domain.com/blog/bid/3481/blog-post-title -
    •  gets shortened – to http://domain.com/blog/blog-post-title
  • Switch from sub-domain to sub-directory – http://info.domain.com/blog-post-title -
    •  is changed to – http://domain.com/blog/blog-post-title

The 301 Redirect points traffic from the old URL to the new improved URL instead of letting the server show an error message when the old URL is no longer available.

Broken Link Search

Most sites are not perfect. In fact often more than one programmer, designer or SEO has left their fingerprints on a site. After the export as the migration wraps up is the perfect time to validate all the links on the website.  We have software that will list any links that are not working. This could be a link to a page you no longer have or to another website that has shut down. Cleaning up broken links is a great way to enhance your search optimization and your user experience at the same time.

Change the Domain Name Server Settings

Once the blog is moved, the design is complete, the links are verified, the redirects are in place and everything else checks out, it is time to “go live”. Love that term.   There are a couple of ways this can happen. Either way, the point is to direct the traffic at the new site files.
  1. A Name Record – if you changed hosts you probably got a new IP address. Edit the A Name Record by removing the old IP address and entering the new one.
  2. New Home Directory - if you stay on the same server the IP address does not change. The Home directory will be redefined to point to the folder where the new site resides.

Go Live

These are the major components of  site move up to the point of “going live”. The project continues, but that is for another post.
Please feel free to use the comment section below to leave your questions and remarks. We want to be sure you have all of your questions answered before you “Go Live” with your migration project.

{ 1 comment }

Blog Move Considerations

November 22, 2011

You have decided to export your data and move your website or blog. WordPress is a great platform to migrate in to whether you are moving from HubSpot, Typepad, Blogger or Drupal. Which project scope is right for you? Random Before and After Blog Migration Considerations RSS - There are platforms that do not offer the [...]

Read the full article →

Move Your Blog or Website – Three Design Considerations

November 19, 2011

It’s time to move your website or blog from HubSpot, Drupal, TypePad, Blogger or Tumblr to WordPress. Three Ways to Move 1) Just export the data – posts, comments, pages, categories, tags, images, videos, meta data etc. Some projects need just the data moved  from one platform to another. For example, the client may have already started creating [...]

Read the full article →

The Longest URL I Have Ever Seen

October 22, 2011

When moving websites and blogs we find great code and pretty poor code. When we run into the pretty poor code during a migration we always make the necessary corrections as part of the move after consulting with the client.   In this example an image URL is waaaaaaaaaay too long. There are 238 sub [...]

Read the full article →

Battle of the Free Mobile Blogging Apps

August 23, 2011

One of the most important aspects of maintaining a blog is to post updates as often as possible. By getting valuable information posted before other blogs, a blogger will have a better chance of attracting and retaining an audience than the competition. However, some bloggers are not always near a computer, such as at an [...]

Read the full article →

Google Search Results for Blogsmith Disappoint

May 31, 2011

Blogsmith hit my project radar recently so I started doing some elementary research on the platform. A search in Google for “Blogsmith” took me off on a tangent and revealed how imperfect Google search results are.  Here is what I found in the first 10 search results and beyond. Result #1 If you visit http://www.blogsmith.com/ you [...]

Read the full article →

WordPress Celebrates 8 Years of Goodness

May 27, 2011

Blog Wranglers sends out a big thank you to Matt Mullenweg, the WordPress core team, the folks at Automattic, the ever-expanding eco-system of WordPress developers/programmers/designers and the many site owners that take advantage of the most useful publishing platform on the planet. It has taken many people to get here and they all deserve a [...]

Read the full article →