Rarely are computing problems as straightforward as “simple, or complex?”. Generally, a complex solution is one with extra engineering time put into it, designed to cover more ground in a specific domain (be generic, be resilient, allow easier expansion, etc.)

In the Software Engineering world, adding <Thing> support to your application can be as simple as writing class <Thing>, but maybe you want the forward-thinking <Thing>Factory approach, implementing some <Thing>FactoryInterface even.

How do you know which to choose? When should the “enterprise” solution be skipped in the name of simplicity?

The answer heavily depends on the use case, but a handful of one-offs or shared solutions may benefit from the simpler approach. Generally, you’ll point to predicted growth, domain information, & other system factors before deciding which pattern fits best.

Downscaling the enterprise for the simple

In the software writing example, “downscaling” just means writing less code, or writing 1 class instead of 2-3 classes. Pretty easy! All the same tools are available (libraries, builtin language constructs, etc.), just less of them are used, generally.

After a handful of failed homelab setups, I wanted to apply the same complexity analysis to the sysadmin world, and take a good look at my use case. Before picking a cool hosting technology to jump to, what did it offer? More importantly, what does it cost me?

Downscaling in the sysadmin world is not always so straightforward. If you’re looking to avoid the complexity of a full-blown k8s instance, but still want the advantages k8s offers, there’s options! minikube, k3s, and microk8s come to mind. Pick one of those, and you’re on your way to HA setups & self-healing, right?

While these will leave a smaller footprint & simplify initial setup, the interface remains the same. And the interface has a steep learning curve: there’s so many terms and concepts to learn. Separate from the cost of deployment is the cost of understanding the interface. If you want to avoid learning helm charts, you can use straight kubectl create commands, but but many other options exist for k8s.

This is also something a cloud provider likely won’t help with: sure, they provide k8s as a service, but you won’t escape understanding the interface!

A reasonable “step down” from k8s is Docker Swarm, or Compose: it offers service replicas & multi-node setups, without the rich storage capabilities or HA promises. However, those interfaces are completely different! We use different tools to create nodes, check on deployment status, debug deploys, etc. Soon, they won’t even share the same underlying technology: k8s is moving away from docker as a container provider! Some paradigms carry over (base images, containers, health checks)

Every time we pull off a layer, we’re looking at a new interface: straight docker, or another containeration solution (lxc, podman), or a completely different compartementalization solution (VMs, chroots, systemd-nspawn, jails). The interface is simplified significantly, but not a lot of skills carry over: I’m only unfamiliar with jails, and I’m sure I’d be lost if I dug into them!

We’ll expand later!

Part of my 3rd self hosting attempt was built to allow swapping a VM with a NAS, if I wanted more space or bare-metal systems in the future. I didn’t think about all the infrastructure that comes along with that:

The best way to prepare for expanding later, is to have a working example. “One node, to be turned into a two node HA setup later” is significantly less valuable than “Two node HA setup, to be turned into a three node HA setup later”. The challenges of multi-node systems are addressed first-hand.

Starting from the bottom

Self hosting can be as complex & reliable as enterprise hosting, but certainly doesn’t have to be: a static site on 1 bare metal host counts! My first self-hosting setups were ~apt-get install~ing services onto a RasPi.

Breaking down the most straightforward approach, what do we gain with complexity at each layer?

Storage

Backups/Recovery

Reverse proxy

Installation process

Maintenance

Availability

# of hosts

With my next setup, my focus would be on reliability & low maintenance.

Ultimately, I decided on a hyperconverged Ceph + Docker Swarm cluster, with Traefik reverse-proxying services. This replaces the self-hosting setup I wrote about a while ago, focusing on resilancy and minimal continued maintenance.

My old setup

The setup to beat started with docker-compose on Debian with ZFS, using Proxmox on top for one-off VMs. This worked well-enough for a couple years (with some slight modifications), but was very hands-on & performed poorly.

I tried to address the maintenance issue by separating containers & data into two separate VMs: a “web-facing” VM and a “NAS” VM. This set the stage for a future hardware NAS, and made remote maintenance less scary, but was still more fragile than I had hoped.

Perforamnce issues got worse with time. During typical guest use, ZFS performance varied depending on how the VM disk was backed (zvol vs qcow2), and how full the zpool was. Writes would slow down to tens or hundreds of kilobytes, seemingly for no reason.

I’m sure ZFS is configured improperly somewhere along the chain, but I don’t want to look at thousands of dials, trying to determine which needs turning.

Having experimented with different setups before, I knew what criteria I wanted from a new solution.

Resilancy & availability - how much do you need? Kubernetes HAProxy Docker Swarm Docker [Compose]

Backups

Complexity

Alerting, Maintenance need, & maintenance urgency

Learning curve, investment cost

Alerting & urgency of alerts