Migrating Blog From Ghost to Jekyll

I just finished migrating this blog from Ghost to Jekyll.

Why

Ghost served me well for a long time, but it finally reached a point where it was more effort to maintain than I was getting out of it.

I self host Ghost on my server, and it’s become hard to keep up with Ghost upgrades. Some upgrades are very simple, but there have been at least two major upgrades that required significant work. And I worry that if I don’t keep up, I’m leaving my site open to security vulnerabilities.

In contrast, Jekyll is a static site generator. The resulting blog is just a bunch of static files served by nginx, so the risk of security issues is vastly reduced.

A static site is also much simpler to host and uses far fewer CPU and memory resources.

While I appreciated many of the features of Ghost, it’s not much more difficult for me to write blog posts on my laptop in markdown. And it’s nice being able to see the exact same blog locally and on my hosted server.

An ancillary benefit of switching is that some of my old posts look a lot better now. The formatting was corrupted in the conversion to Ghost originally, and the converted Jekyll markdown looks better now.

Migrating

I started with jekyll_ghost_importer which gave me a nice baseline, but left a few issues:

I fixed these issues with a ruby snippet like:

Dir.glob("_posts/*").each do |post|
  contents = File.read(post)
  updated = contents
    .gsub(%r{(\s)(https?://.*)(\s)}, "\\1[\\2](\\2)\\3")
    .sub("---\n\n", "author: Paul\n---\n\n")
    .sub(/date: '(.+)'/, "date: '\\1 UTC'")
  File.open(post, "w") { |f| f.write(updated) }
end

Next, I browsed a bunch of Jekyll themes until I found a simple one I liked: tale. I also tweaked the About page.

Then, I configured Disqus, Google Analytics, and added redirects for my old RSS feed. I also changed the permalink to include a trailing slash to match Ghost: permalink: /:year/:month/:day/:title/

I added a simple script to deploy:

JEKYLL_ENV=production jekyll build
rsync -avz --delete _site/ pgrs.net:/var/www/blog/

Finally, I checked to make sure my top posts still worked and looked good. I spot checked a few, but wanted to make sure all my popular posts maintained their current URL. I downloaded my 50 most popular posts from Google Analytics and put them in a file. Then, I wrote a simple wget command to check that they all returned a successful response:

wget -q --spider -i <file of URLs>
echo $? # exit code of 0 means success

Aftermath (Updated November 27, 2018)

After I made the cutover, I watched my server logs for a few days for abnormalities. I discovered a few important things:

Missing 404s

If a page wasn’t found, my nginx setup would render the 404 error page, but as a successful 200 response instead of the proper 404 response code. It would look correct in a browser but be incorrect for search engines and other bots.

I fixed this by adding this to my server block:

error_page 404 /404/index.html;

And then this to my location block:

try_files $uri $uri.html $uri/index.html =404;

The =404 tells nginx to serve a 404 response if it can’t find the static file, and the error_page tells it which page to render.

Broken AMP URLs

My Ghost blog used to serve Accelerated Mobile Pages (AMP) pages with a trailing /amp/ at the end of every post URL. My new Jekyll doesn’t do this, which was causing a lot of 404s in the logs. I’m not sure how many errors were shown to users vs Google just serving a stale version of the page.

In any case, I fixed it by redirecting AMP links back to the main post:

rewrite ^(.*)/amp/?$ $1 last;

I verified that the non-AMP pages look good on my phone.

Google still seems to have some AMP pages cached, but hopefully these go away over time.

Incorrect Date URLs

I noticed many 404s in my logs of requests to posts with an off by one date. For example, they would request /2018/11/27/foo instead of /2018/11/26/foo.

I discovered that Ghost doesn’t seem to mind if the date is incorrect. It will properly redirect to the correct URL. Jekyll’s static site does care, however, and these will 404.

I tried to figure out where these incorrect URLs came from. My best guess is that an old Ghost bug served the post at the wrong URL for some amount of time: https://github.com/TryGhost/Ghost/issues/7655. I deployed the fixed version a long time ago, but I think the incorrect URLs made their way into search indexes.

Since the 404s in the logs seem to only be from bots, and the issue on my Ghost blog has likely been fixed for a long time, I figured it was ok to leave these 404s. I double checked that search results in the major US search engines returned the correct URLs.