Migrating from Blogger to WordPress 2.0

Ever since I saw the new ‘import from Blogger’ functionality in WordPress 2.0, I’ve known I would eventually migrate my main blog. Blogger is a great way to start blogging , but I want categories, easy template updating (without republish) and all the WordPress plug-in sweetness. As a dress rehearsal, I migrated my Dutch poetry blog first: Zo helpt Poezie ….


  • The site was managed with Blogger but published via FTP on one of my own domains. Because my old hosting system did not support domain mapping while serving multiple domains, I had to publish each domain in a subfolder. All blog’s files were stored under www.samoera.com/poezie/.
  • The individual posts (one poem per post) were saved as /poezie/[YEAR]/[MONTH]/[TITLE].html (e.g. /poezie/2004/04/02/kwijt-bart-moeyaert.html). I always used “[POEM TITLE] ([POEM AUTHOR])” as title for a post. Since Blogger removes special characters, this means that the file name typically ends with the author’s last name (something I will try to use later).
  • The monthly archives were saved as /poezie/[YEAR]_[MONTH]_01_gedichten.html (e.g. /poezie/2006_02_01_gedichten.html).


I have taken an account with Bluehost.com. For $6.95 they offer 10GB storage, 250GB bandwidth and the excellent CPanel/Fantastico combo to easily configure sites, install software and manage your DNS.
My Bluehost hosting is on www.smoothouse.com. I use it already for stuff like the podcast feed validator and other small Smoothouse development projects.
Another option is Dreamhost.com: $7,99 per month, 20GB storage, 1TB bandwidth(!) but a less handy management panel. Don’t pay more than this.


Setting up WordPress with Bluehost is quite easy: you go to the Fantastico page, select WordPress, decide on a subfolder name (in my case: “poezie”), click “Install” and all the rest is automatic. After this, the blog is installed on -in my case- www.smoothouse.com/poezie. Later I will have to map the poetry site to this folder (without the /poezie folder showing)
Even if you don’t have the Fantastico wizard, WordPress is one of the easiest programs to install. Then take one of the standard templates


On the new blog, go to the /wp-admin/import.php page, and give you Blogger username/password. Then select the Blogger Blog you want to import and then let the import wizard run. It will import ALL POSTS and ALL COMMENTS! This is friggin’ awesome! It might take 5-10 minutes if you have a large blog.


Now download your full archive (via FTP with e.g. FileZilla) to your local drive and upload them to where they should be after you moved the domain. In my case: I uploaded them to www.smoothouse.com/poezie/poezie which will be mapped to www.samoera.com/poezie/ once the DNS transfer is done.
The reason for this: all your posts will have new URLs and you don’t want people who find your old URLs in Google and click on them to get a “Error 404 not found” page. So you start by copying them to the new hosting server. We will do some more fancy redirect stuff later.


Now comes the tricky stuff: you want your domain name to point to the new host. So you edit the A or CNAME record for the domain name. This will take somewhere between 1 and 24 hours to propagate.
In my case (Bluehost) this also meant I had to transfer DNS management for all subdomains to Bluehost (i.e. change the SOA records). Bluehost requires you to this because the whole DNS management is linked to the Fantastico wizards. In this case it just meant that it took a while longer. I then mapped the www.samoera.com domain to map to the same /poezie folder I just created.
Once that the transfer is done, all your URLs should continue to work (since you took care of that in step 4)!


Change WordPress root path to www.samoera.com instead of www.smoothouse.com/poezie/ (WordPress will adapt all links on the blog pages). I removed the index.html from archive root (www.samoera.com/poezie) because a lot of sites link to it and replaced it by a index.php that redirects to www.samoera.com.
OPTIONAL: you can play with Apache Redirect/Rewrite rules to take every visitor to one of the old URLs automatically to the new URL. What I tried was:

### for the archives: easy to do!
RedirectMatch permanent /poezie/([0-9][0-9][0-9][0-9])_([0-9][0-9])_01_.*html$ http://www.samoera.com/$1/$2/
### for the post pages: this would have worked if Apache didn't have a bug
#RedirectMatch permanent /poezie/([0-9][0-9][0-9][0-9])/([0-9][0-9])/.*-([a-z]*).html$ http://www.samoera.com/$1/$2/?s=$3

I tried to use the fact that the author’s last name (a quite ‘unique’ word) was the last word before the .html to construct queries like: www.samoera.com/2006/01/?s=tellegen (which shows all posts from Feb 2006 with the string ‘tellegen’ in the text – which almost always translates into 1 post). However, due to a bug in Apache (the ‘?’ before the querystring is always translated into %3f and this results in invalid URLs) I haven’t found the right way to do it yet. I could have used
RedirectMatch permanent /poezie/([0-9][0-9][0-9][0-9])/([0-9][0-9])/.*html$ http://www.samoera.com/$1/$2/ but this maps onto a whole month – or up to 10 poems. Maybe I’ll find some other trick.


Change your Feedburner source to the new URL. Everyone that’s subscribed stays subscribed. Ain’t that neat? You don’t have a Feedburner feed? What, you only had Blogger Atom feed? Shame on you. Go get one!

Christmas present: podcast feed validator!

I get a lot of “what is wrong with my podcast feed?” kind of questions because I have written a fairly popular tutorial on podcasting with Blogger and Feedburner, and a lot of people start doing podcasts that way. There’s a couple of things that can go wrong:

  • Not a valid RSS feed
  • RSS feed without enclosures
  • Feed not updated when posting new article

To check some of those things, I needed to read and interpret the RSS feed by hand. That’s why I decided to make a podcast feed validator to do the checks automatically. Let’s take Adam Curry‘s DailySourcecode podcast as an example:

  • the URL of the feed is radio.weblogs.com/ 0001014/ categories/ dailySourceCode/ rss.xml, so I input it into the input field and the results are:
  • #1: feed URL exists and can be reached
  • #2: feed is a valid RSS feed (but does not conatin the iTunes extensions),
  • #3: feed items have audio enclosure (but not all, as you see in the image below. The reason is that two enclosures are wrongly specified as text/html instead of audio/mpeg.)
  • #4: the audio enclosure (MP3 file) exists and can be reached

podcast feed validator
So the enhancements for this feed would be: make sure all enclosures have the right type, and provide iTunes meta data. Better still: use Feedburner to get that and more: subscriber statistics and lots of feed tools.

Try it out for yourself:
Check your podcast RSS feed!

Some more features of the podcast feed check:

  • estimation of mean-time-between-posts (MTBP), a metric I talked about in RFM for RSS feeds
  • estimation of required bandwidth/storage per month (DailySourcecode: 600MB/month, 175-25.be podcast: 10MB/month)
  • works with MP3 audio enclosures and AAC (MPEG-4) audio/video enclosures (any audio/mp* enclosure)
  • detailed (technical) information is hidden by default and can be shown through some AJAX functionality.