Backing Up Etcd Database in Kubernetes

Backing Up Etcd Database in Kubernetes

Why backing up a cluster's Etcd database is so important...

Kubernetes as an orchestration tool of choice has made quite the footprint from its humble origins in the open-source community, mainly thanks to the devotion and development of the community. This along with the subsequent investment by leading tech companies has made it a champion of orchestration automation and high availability (HA).

In prepping for my Certified Kubernetes Administrator exam and from past experience, I can advise without reservation the following:

Unless your Kubernetes enviornment is immutable (e.g. AWS EKS/Azure AKS), you need to backup your Etcd Database in additon to your existing disaster recovery plan

Here are the reasons why:

- Etcd database (and yes, there are many talented engineers out there who don't know what it does) stores your Kube-API server transactions. This means it stores the state of your Kube cluster as the Kube-API server is the entry point for the cluster.

- If you use Kubernetes in a stacked or external etcd architectural pattern, these HA patterns will have the latest state data on the etcd database, making a backup every x minutes by cron job worthwhile versus retrieving your assets from an external source (or region if you are in the cloud), should both the primary and backup cluster go down at once into an unrecoverable state. Your time to mitigate on a destroyed cluster in production will be greatly reduced as a result.

- You can recreate a cluster with your original deployment.yaml file and just retrieve your etcd-backup.db from backup to restore its state. This means you can with the knowledge or an automated process in place, create a new cluster as was before, then restore the state to the etcd database from your last etcd database backup. This becomes useful in restoring the cluster, in a disaster recovery context for a mutable Kubernetes environment where all related clusters were destroyed in the major incident.

image of cluster etcd database backup command

Let's see what's involved in the etcd database backup and restore. To save your snapshot noting the image above is not a production console output, and the file path for the snapshot save will be your s3/file storage URL or path as detailed in the example below. So, a cron job could point to a script that creates a new backup DB by timestamp and may destroy the oldest database file in the storage folder as an administrative action that will achieve the snapshot save and store objective. Let's check out the following command from the etcd node or control plane server with etcdctl installed:

ETCDCTL_API = 3 etcdctl snapshot save /filepath/you/wish/to/save/to/etcd-backup.db \

--endpoints= \

--cacert=/filepath/to/cacert/etcd-ca.pem \

--cert=/filepath/to/cert/etcd-cert.crt \


With one command, can have taken a snapshot of your database! Hopefully, you will never have to use it but if disaster strikes, swift mitigation of your major incident is paramount. After the recreation of the cluster, you can have your cluster's state restored with the backup restore command.

image of etcd database restore command

 The command is similar to the backup command and can also be scripted for automation as a recovery tool. The script should be based on the following command:

ETCDCTL_API = 3 etcdctl snapshot restore /path/to/backup/etcd-backup.db \

--initial-cluster etcd-restore= \

--initial-advertise-peer-urls \

--name etcd-restore \

--data-dir /var/lib/etcd/

Note both commands in a pinch can be run from the command line to backup and restore your etcd database and thus your cluster's state. However, I would advise automation and sewing this into your disaster recovery process (and runbooks) for your Kubernetes cluster.

Stay tuned for more on DevOps in this blog along with articles on other areas of interest in the Infrastructure and Writing arenas. To not miss out on any updates on my availability, tips in related areas or anything of interest to all, why not sign up for one of my newsletters in the footer of any page on Maolte. I look forward to us becoming pen pals!

Best Regards



Related Articles

Image of Jenkins workflow

CICD and Jenkins

Azure DevOps VNet Topology image on Azure portal.

Azure V-Net Demo