Running etcd in memory

This is an RPM for running etcd in memory for a kubeadm kubernetes install. It solves the issue that in a home setup, your disks will make quite some noise because of etcd. See also the post where this was described.

This RPM has been tested on Centos Stream 8 with kubelet 1.24.10. It will probably/hopefully work on other RHEL-like systems as well.

Disclaimer

This is provided as an example and you can use at it your own risk. As an administrator you are responsible at all times for the health of your kubernetes cluster and all the data. Make sure that you have backups and can restore from backup in case of problems, or try it out on a less critical cluster.

Requirements

The cluster must be using containerd as the container runtime and etcd must be running as a pod in the kubernetes cluster such as with a kubeadm cluster setup.

Other container runtimes

There is a docker script in the RPM that uses nerdctl to adapt to containerd. It is possible to adapt this setup to another container runtime by replacing the docker script with another implementation. Currently, only containerd is supported.

Building the RPM

Build the RPM using maven and install it on your controller node using yum/dnf. This will provide the following:

backups of etcd in 15 minute intervals in the /var/lib/wamblee/etcd directory. This will also preserve a number of older backups.
prior to shutdown of the kubelet service, an additional backup is taken.
prior to startup of the kubelet a restore is done.

In a production setup you would add a distribution management section to the pom.xml and configure the maven release plugin to deploy to a repository (e.g. nexus).

Setup

After installing the RPM, wait until the first backups are appearing. In the next step, drain the controller node and stop the kubelet.

kubectl drain NODENAME --ignore-daemonsets
systemctl stop kubelet

Then stop all running containers on the controller node:

/opt/wamblee/etcd/bin/docker  ps | 
awk 'NR > 1 { print $1}'  | 
xargs /opt/wamblee/etcd/bin/docker  stop

After the above steps, all services in your cluster should still be running.

Now backup the contents of the /var/lib/etcd directory

cd /var/lib/etcd
tar cvfz ~/etcd.tar.gz . 
rm -rf /var/lib/etcd/*

Now in /etc/fstab, create an entry to mount /var/lib/etcd in memory:

tmpfs                       /var/lib/etcd        tmpfs   defaults,,noatime,size=2g  0 0

Then remove all contents from the /var/lib/etcd directory and mount the ramdisk:

rm -rf /var/lib/etcd/*
mount -a

Now you can start the kubelet again using systemctl start kubelet. After this, you should see all the nodes as before: kubectl get nodes.

After this, uncordon the controller node

kubectl uncordon NODENAME

If anything goes wrong in the above steps, then drain the controller node (it at all possible), stop the kubelet, and stop all containers, unmount /var/lib/etcd and then restore the etcd data from backup and start the kubelet again.

3.1 KiB Raw Blame History