Getty Images/iStockphoto

How to restore a Kubernetes cluster from an etcd snapshot

In case of an event that leads to the loss of a Kubernetes cluster, it's important to be restore-ready. Here's how to restore a cluster from an etcd snapshot with commands.

Brien Posey

Published: 16 Dec 2021

If disaster strikes and you need to restore a Kubernetes cluster, etcd snapshots are a helpful fix. Even so, restoring an etcd snapshot is not an intuitive process.

The steps involved in restoring a Kubernetes cluster from an etcd snapshot can vary depending on how the Kubernetes environment is set up, but the steps described below are intended to familiarize users with the basic process.

It is also worth noting that the process described below replaces the existing etcd database, so if an organization needs to preserve the database contents, it must create a backup copy of the database before moving forward.

Here are the steps required to restore a Kubernetes cluster from an etcd snapshot. This tutorial works under the assumption that Kubernetes uses default folder paths.

Install the etcd client

The first step to restore a Kubernetes cluster from an etcd snapshot is to install the ETCD client. That command is:

apt install etcd-client

Admins can use a single command to complete the restoration process, although there is additional work required to bring the new ETCD database online.

Identify appropriate IP addresses

The following command assumes that users are restoring the snapshot to the local machine, which should be a computer within the Kubernetes cluster. This should use the 127.0.0.1 IP address. When restoring the snapshot to another machine, substitute the appropriate IP address.

Additional information to have is the name of the snapshot being restored and its location.

When backing up Kubernetes, one of the parameters that is supplied to the ETCDCTL command is snapshot save, followed by a path and filename.

This is the same path and filename needed during the restoration process, assuming that the snapshot is stored in the tmp folder and is named example.db.

Here is the command that is used to restore the database:

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/Kubernetes/pki/etcd/ca.crt \
--name=master \
--cert=/etc/Kubernetes/pki/etcd/server.crt –key=/etc/Kubernetes/pki/etcd/server.key \
--data-dir /var/lib/etcd-from-backup \
--initial-cluster=master=https://127.0.0.1:2380 \
--initial-cluster-token etcf-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380 \
Snapshot restore /tmp/example.db

Edit a manifest file to update paths

Once the restoration process completes, users will typically need to edit a manifest file to update the paths that are contained within it. To edit this file, enter the following command:

vi /etc/Kubernetes/manifests.etcd.yaml

Locate the Spec section

When the YAML file opens, locate the Spec section and then edit the data directory command, replacing the default path (var/lib/etcd) with the path that points to a new location.

/var/lib/etcd-from-backup is often used, but you can use whatever location makes sense.

Here is what the full line looks like:

--data-dir=/var/lib/etcd-from-backup

Add the initial cluster token to the file

The next step is to add the initial cluster token to the file. To do so, add a line after the initial-cluster-k8snode line. This new line will look like this:

--initial-cluster-token=etcd-cluster-1

Update the mount path

Now, scroll down and locate the volumeMounts section of the file. Here, update the mount path to reflect the path that you specified as the data directory.

Because I used etcd-from-backup as the data directory, the mount path line would read:

-   mountPath: /var/lib/etc-from-backup

Replace the name of the hose path

Replace the name of the host path (the name value in the host path section) with the same path. It should look like this:

Name: /var/lib/etc-from-backup

When this is complete, save changes and wait a few seconds. Now, create a new process by entering the following command:

ps -ef | grep etcd

Verify the newly restored database

Users can verify that the newly restored database is up and running by entering the following command:

kubectl get pods -A

This will show the running pods. It should display a pod named etcd-k8snode that has recently been started.