Converting my Jekyll website to Hugo

2020 Oct 09 - Brian Kloppenborg

Every decade or so some new technology comes along that vastly simplifies the maintenance and operation of my personal website. In the late 1990s I wrote pure HTML. In the early 2000s I wrote HTML with server-side includes. Later I started using custom Perl or PHP scripts. I tried out a few content management systems, like WordPress and Drupal but found them too complex for my needs. I hopped on the static site generator bandwagon in 2012 when I started using Jekyll and GitHub Pages.

Jekyll served me well for the past eight years, but I’ve never been fond of several of its implementation details. In particular, the following issues became annoying when writing blog posts:

Forced segmentation of static assets and data from related content (big complaint). For static files (e.g. images, PDFs, and other non-rendered content), Jekyll prefers that they are located with a directory specified in the global _config.yml file. This can be overridden on a per-page basis by inserting additional information into the front matter, but that’s just annoying. Unlike static content, data files must be placed in the _data directory or a sub-directory thereof.
No segmentation between configuration data and content (minor complaint). Although blog posts are separated into the _posts directory in Jekyll, regular pages are mixed into the over-all directory structure of the site. Thus you end up with content in directories with and without an underscore prefix.
The use of Ruby (minor complaint). Ruby has been a powerhouse for web development since the mid 2000s peaking in 2012. Despite its popularity, I’ve never used it and am not fond of the idea of installing another interpreter on my machine just to build a static website.

None of these complaints were deal breakers for me because Jekyll was absolutely a step in the right direction and it was so easy to use with GitHub pages; however, I kept looking for something better. When Hugo was released in 2013 it didn’t offer many compelling features to switch. But with the introduction of page bundles in version 0.32 (late 2017 time period) I became quite intrigued. After using it for my business website for the last two years, I finally decided to convert my website from Jekyll to Hugo.

Converting from Jekyll to Hugo

Although there are a few tools to convert Jekyll to Hugo (see migrations), I elected to do it by hand this time. Here are the major steps:

Move all posts, pages, images, and data to old-content directory. Most of my content was under _posts, but I did have a few additional pages sitting around.
Delete all Jekyll-specific content and directories. This includes the _data, _includes, _layouts, _plugins, _posts, and _site directories as well as a few random pages.
Instantiate a new Hugo website and populate templates. This is done using the hugo new site . command plus a little bit of reading on the hugo website. The Hugo lookup order is particularly important to read and understand.
Re-package and restore content, fixing broken pages / images along the way. This will be discussed below.
Fix a few issues with my Git repository. This will also be discussed below.

Re-packaging content

Updated and standardized the content in the front matter

With Hugo I’m using the TOML format for my front matter. For blog posts I implemented the following archetype and updated the pages accordingly:

+++
title = "{{ replace .Name "-" " " | title }}"
date = {{ .Date }}
draft = true
author = "Brian Kloppenborg"
categories = [""]
tags = [""]
+++

Likewise, for regular pages, I’m using the following:

+++
title = "{{ replace .Name "-" " " | title }}"
date = "{{ .Date }}"
draft = true
author = "Brian Kloppenborg"
+++

After fixing this I re-packaged pages and their data into page bundles which let me keep the content neatly organized. Within the blog, I applied this to the following posts:

Posts about my first startup attempt, Hastings Wireless
Instructions for setting up Restic on Windows and Linux
My 2017 review of Linux media center software
and seven other pages.

This really simplified the directory structure of my blog as now the images and content are hosted in the same place. Although I have plenty of pages with tables, I didn’t use Jekyll’s data feature to implement them because of how separated the content and data would be. Perhaps, in the future, I’ll try out Hugo’s data-driven content feature which lets you use JSON or CSV files directly using the getCSV and getJSON functions.

Fixing broken links

As with any major website migration, links are going to be broken. The first thing I repaired was Jekyll internal page references. I found them using grep:

grep -I -r 'post_url' content

Next I fired up Hugo and ran Linkchecker to find any additional broken references:

hugo server
linkchecker http://localhost:1313/

(Note that linkchecker depends on Python 2 and, as of this date, has not been updated for Python 3.)

Fixing git repository issues

While doing the conversion I noticed a few issues with my git repository:

Some of my commits used a different email address.
The repository was a direct pull of Jekyll from around 2015 which I never synced with upstream, so the history of Jekyll was superfluous.
The repository had a bunch of binary data stored in it (mostly PDF files of my posters and papers).

These issues were relatively easy to fix, but took a little time.

To fix the author email issue, I used the instructions found on Stack Overflow for how to change the author and committer name information. This was quite straightforward.

Next to remove the old Jekyll commits, I truncated my git history starting from my first commit to this repository in 2015. The only thing I had to do was find my first commit which was easily found by git log author="Brian Kloppenborg" and going to the bottom.

Lastly, I decided to move the PDF files into Git Large File Storage. To do so, I applied the BFG Repo-Cleaner to remove the PDFs from the git history as follows:

java -jar bfg-1.13.0.jar --convert-to-git-lfs "*.pdf" --no-blob-protection .

Then I cleared up the reflog as follows:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

And instantiated git-lfs

sudo apt install git-lfs
cd REPOSITORY
git lfs install
git lfs track "*.pdf"
git add .gitattributes

git add content/*.pdf
git commit -m "..."
git push

After this the repository was squeaky clean.

Deployment

One of the most awesome thing about static site generators is that they build static HTML pages which means you can host your website anywhere. In the case of kloppenborg.net I just host things locally. So my deployment process is as easy as running Hugo to produce the pages in the public directory and then rsync-ing them to my hosting provider:

hugo && rsync -avz --delete public/ ${USER}@${HOST}:${DIR}

Where USER and HOST are the username and hostname of the server where I place my content and DIR is the directory where the content goes.

Tags:

Categories:

self-hosting