Cluster Management ¶
Scenario: etcd
Backup and Restore
- Install
etcdctl
- Create Deployment Before Backup
- Backup
etcd
- Create Deployment After Backup
- Stop Services
- Stop etcd
- Restore
etcd
- Start Services
- Verify
etcd
Backup and Restore ¶
Install etcdctl
¶
Download etcd
package from Github.
wget https://github.com/etcd-io/etcd/releases/download/v3.5.3/etcd-v3.5.3-linux-amd64.tar.gz
Unzip and grant execute permission.
tar -zxvf etcd-v3.5.3-linux-amd64.tar.gz
cp etcd-v3.5.3-linux-amd64/etcdctl /usr/local/bin/
sudo chmod u+x /usr/local/bin/etcdctl
Verify
etcdctl --help
Create Deployment Before Backup ¶
Create Deployment before backup.
kubectl create deployment app-before-backup --image=nginx
Backup etcd
¶
Usage
*<CONTROL_PLANE_IP_ADDRESS>
is the actual IP address of Control Plane. * --endpoints
: specify where to save backup of etcd, 2379 is etcd port. *--cert
: sepcify etcd certificate, which was generated by kubeadm
and saved in /etc/kubernetes/pki/etcd/
. * --key
: specify etcd certificate key, which was generated by kubeadm
and saved in /etc/kubernetes/pki/etcd/
. * --cacert
: specify etcd certificate CA, which was generated by kubeadm
and saved in /etc/kubernetes/pki/etcd/
.
etcdctl \
--endpoints=https://<CONTROL_PLANE_IP_ADDRESS>:2379 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
snapshot save snapshot-$(date +"%Y%m%d%H%M%S").db
etcdctl \
--endpoints=https://<cka001_ip>:2379 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
snapshot save snapshot-$(date +"%Y%m%d%H%M%S").db
Output:
{"level":"info","ts":"2022-07-24T18:51:21.328+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"snapshot-20220724185121.db.part"}
{"level":"info","ts":"2022-07-24T18:51:21.337+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2022-07-24T18:51:21.337+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://<cka001_ip>:2379"}
{"level":"info","ts":"2022-07-24T18:51:21.415+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2022-07-24T18:51:21.477+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://<cka001_ip>:2379","size":"3.6 MB","took":"now"}
{"level":"info","ts":"2022-07-24T18:51:21.477+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"snapshot-20220724185121.db"}
Snapshot saved at snapshot-20220724185121.db
We can get the backup file in current directory with ls -al
command.
-rw------- 1 root root 3616800 Jul 24 18:51 snapshot-20220724185121.db
Create Deployment After Backup ¶
Create Deployment after backup.
kubectl create deployment app-after-backup --image=nginx
Delete Deployment we created before backup.
kubectl delete deployment app-before-backup
Check Deployment status
kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
app-after-backup 1/1 1 1 108s
Stop Services ¶
Delete etcd
directory.
mv /var/lib/etcd/ /var/lib/etcd.bak
Stop kubelet
systemctl stop kubelet
Stop kube-apiserver
nerdctl -n k8s.io ps -a | grep apiserver
0c5e69118f1b registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.0 "kube-apiserver --ad…" 32 hours ago Up k8s://kube-system/kube-apiserver-cka001/kube-apiserver
638bb602c310 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 32 hours ago Up k8s://kube-system/kube-apiserver-cka001
Stop those up
status containers.
nerdctl -n k8s.io stop <container_id>
nerdctl -n k8s.io stop 0c5e69118f1b
nerdctl -n k8s.io stop 638bb602c310
No up
status kube-apiserver
now.
nerdctl -n k8s.io ps -a | grep apiserver
0c5e69118f1b registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.0 "kube-apiserver --ad…" 32 hours ago Created k8s://kube-system/kube-apiserver-cka001/kube-apiserver
638bb602c310 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 32 hours ago Created k8s://kube-system/kube-apiserver-cka001
Stop etcd ¶
nerdctl -n k8s.io ps -a | grep etcd
0965b195f41a registry.aliyuncs.com/google_containers/etcd:3.5.1-0 "etcd --advertise-cl…" 32 hours ago Up k8s://kube-system/etcd-cka001/etcd
9e1bea9f25d1 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 32 hours ago Up k8s://kube-system/etcd-cka001
Stop those up
status containers.
nerdctl -n k8s.io stop <container_id>
nerdctl -n k8s.io stop 0965b195f41a
nerdctl -n k8s.io stop 9e1bea9f25d1
No up
status etcd
now.
nerdctl -n k8s.io ps -a | grep etcd
0965b195f41a registry.aliyuncs.com/google_containers/etcd:3.5.1-0 "etcd --advertise-cl…" 32 hours ago Created k8s://kube-system/etcd-cka001/etcd
9e1bea9f25d1 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 32 hours ago Created k8s://kube-system/etcd-cka001
Restore etcd
¶
Execute the restore operation on Control Plane node with actual backup file, here it's file snapshot-20220724185121.db
.
etcdctl snapshot restore snapshot-20220724185121.db \
--endpoints=<cka001_ip>:2379 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt\
--data-dir=/var/lib/etcd
Output:
Deprecated: Use `etcdutl snapshot restore` instead.
2022-07-24T18:57:49+08:00 info snapshot/v3_snapshot.go:248 restoring snapshot {"path": "snapshot-20220724185121.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:254\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:129\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"}
2022-07-24T18:57:49+08:00 info membership/store.go:141 Trimming membership information from the backend...
2022-07-24T18:57:49+08:00 info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2022-07-24T18:57:49+08:00 info snapshot/v3_snapshot.go:269 restored snapshot {"path": "snapshot-20220724185121.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
Check if etcd
folder is back from restore.
tree /var/lib/etcd
/var/lib/etcd
└── member
├── snap
│ ├── 0000000000000001-0000000000000001.snap
│ └── db
└── wal
└── 0000000000000000-0000000000000000.wal
Start Services ¶
Start kubelet
. The kube-apiserver
and etcd
will be started automatically by kubelet
.
systemctl start kubelet
Execute below comamnds to make sure services are all up.
systemctl status kubelet.service
nerdctl -n k8s.io ps -a | grep etcd
nerdctl -n k8s.io ps -a | grep apiserver
The current status of etcd
.
0965b195f41a registry.aliyuncs.com/google_containers/etcd:3.5.1-0 "etcd --advertise-cl…" 32 hours ago Created k8s://kube-system/etcd-cka001/etcd
3b8f37c87782 registry.aliyuncs.com/google_containers/etcd:3.5.1-0 "etcd --advertise-cl…" 6 seconds ago Up k8s://kube-system/etcd-cka001/etcd
9e1bea9f25d1 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 32 hours ago Created k8s://kube-system/etcd-cka001
fbbbb628a945 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 6 seconds ago Up k8s://kube-system/etcd-cka001
The current status of apiserver
.
0c5e69118f1b registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.0 "kube-apiserver --ad…" 32 hours ago Created k8s://kube-system/kube-apiserver-cka001/kube-apiserver
281cf4c6670d registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.0 "kube-apiserver --ad…" 14 seconds ago Up k8s://kube-system/kube-apiserver-cka001/kube-apiserver
5ed8295d92da registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 15 seconds ago Up k8s://kube-system/kube-apiserver-cka001
638bb602c310 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 32 hours ago Created k8s://kube-system/kube-apiserver-cka001
Verify ¶
Check cluster status, if the Pod app-before-backup
is there.
kubectl get deploy
Result
NAME READY UP-TO-DATE AVAILABLE AGE
app-before-backup 1/1 1 1 11m
Upgrade ¶
Scenario: Upgrade
- Evict Control Plane node
- Check current available version of
kubeadm
- Upgrade
kubeadm
to new version - Check upgrade plan
- Apply upgrade plan to upgrade to new version
- Upgrade
kubelet
andkubectl
- Enable Control Plane node scheduling
- Evict Worker nodes
- Upgrade
kubeadm
andkubelet
- Enable Worker node scheduling
Upgrade Control Plane
¶
Preparation ¶
Evict Control Plane node.
kubectl drain <control_plane_node_name> --ignore-daemonsets
kubectl drain cka001 --ignore-daemonsets
node/cka001 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-dsx76, kube-system/kube-proxy-cm4hc
evicting pod kube-system/calico-kube-controllers-5c64b68895-jr4nl
evicting pod kube-system/coredns-6d8c4cb4d-g4jxc
evicting pod kube-system/coredns-6d8c4cb4d-sqcvj
pod/calico-kube-controllers-5c64b68895-jr4nl evicted
pod/coredns-6d8c4cb4d-g4jxc evicted
pod/coredns-6d8c4cb4d-sqcvj evicted
node/cka001 drained
The Control Plane node is now in SchedulingDisabled
status.
kubectl get node -owide
Result
NAME STATUS ROLES AGE VERSION OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cka001 Ready,SchedulingDisabled control-plane,master 32h v1.24.0 Ubuntu 20.04.4 LTS 5.4.0-122-generic containerd://1.5.9
cka002 Ready <none> 32h v1.24.0 Ubuntu 20.04.4 LTS 5.4.0-122-generic containerd://1.5.9
cka003 Ready <none> 32h v1.24.0 Ubuntu 20.04.4 LTS 5.4.0-122-generic containerd://1.5.9
Check current available version of kubeadm
.
apt policy kubeadm
kubeadm:
Installed: 1.24.0-00
Candidate: 1.24.3-00
Version table:
1.24.3-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
1.24.2-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
1.24.1-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
1.24.0-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
1.24.2-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
*** 1.24.0-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
100 /var/lib/dpkg/status
1.23.7-00 500
500 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
......
Upgrade kubeadm
to Candidate: 1.24.2-00
version.
sudo apt-get -y install kubeadm=1.24.2-00 --allow-downgrades
Check upgrade plan.
kubeadm upgrade plan
Get below guideline of upgrade.
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.24.0
[upgrade/versions] kubeadm version: v1.24.2
I0724 19:05:00.111855 1142460 version.go:255] remote version is much newer: v1.24.3; falling back to: stable-1.23
[upgrade/versions] Target version: v1.24.2
[upgrade/versions] Latest version in the v1.23 series: v1.24.2
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT TARGET
kubelet 3 x v1.24.0 v1.24.2
Upgrade to the latest version in the v1.23 series:
COMPONENT CURRENT TARGET
kube-apiserver v1.24.0 v1.24.2
kube-controller-manager v1.24.0 v1.24.2
kube-scheduler v1.24.0 v1.24.2
kube-proxy v1.24.0 v1.24.2
CoreDNS v1.8.6 v1.8.6
etcd 3.5.1-0 3.5.1-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.24.2
_____________________________________________________________________
The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.
API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io v1alpha1 v1alpha1 no
kubelet.config.k8s.io v1beta1 v1beta1 no
_____________________________________________________________________
Upgrade Control Plane ¶
Refer to upgrade plan, let's upgrade to v1.24.2 version.
kubeadm upgrade apply v1.24.2
With option --etcd-upgrade=false
, the etcd
can be excluded from the upgrade.
kubeadm upgrade apply v1.24.2 --etcd-upgrade=false
It's successful when receiving below message.
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.24.2". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
Upgrade kubelet
and kubectl
.
sudo apt-get -y install kubelet=1.24.2-00 kubectl=1.24.2-00 --allow-downgrades
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Get current node status.
kubectl get node
NAME STATUS ROLES AGE VERSION
cka001 Ready,SchedulingDisabled control-plane,master 32h v1.24.2
cka002 Ready <none> 32h v1.24.0
cka003 Ready <none> 32h v1.24.0
After verify that each node is in Ready status, enable node scheduling.
kubectl uncordon <control_plane_node_name>
kubectl uncordon cka001
Output:
node/cka001 uncordoned
Check node status again. Make sure all nodes are in Ready status.
kubectl get node
Output:
NAME STATUS ROLES AGE VERSION
cka001 Ready control-plane,master 32h v1.24.2
cka002 Ready <none> 32h v1.24.0
cka003 Ready <none> 32h v1.24.0
Upgrade Worker ¶
Preparation for Worker ¶
Log on to cka001
Evict Worker nodes, explicitly specify to remove local storage if needed.
kubectl drain <worker_node_name> --ignore-daemonsets --force
kubectl drain <worker_node_name> --ignore-daemonsets --delete-emptydir-data --force
If have error on dependency of emptydir
, use the 2nd command.
kubectl drain cka002 --ignore-daemonsets --force
kubectl drain cka002 --ignore-daemonsets --delete-emptydir-data --force
node/cka002 cordoned
WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: dev/ubuntu; ignoring DaemonSet-managed Pods: kube-system/calico-node-p5rf2, kube-system/kube-proxy-zvs68
evicting pod ns-netpol/pod-netpol-5b67b6b496-2cgnw
evicting pod dev/ubuntu
evicting pod dev/app-before-backup-66dc9d5cb-6xc8c
evicting pod dev/nfs-client-provisioner-86d7fb78b6-2f5dx
evicting pod dev/pod-netpol-2-77478d77ff-l6rzm
evicting pod ingress-nginx/ingress-nginx-admission-patch-nk9fv
evicting pod ingress-nginx/ingress-nginx-admission-create-lgtdj
evicting pod kube-system/coredns-6d8c4cb4d-l4kx4
pod/ingress-nginx-admission-create-lgtdj evicted
pod/ingress-nginx-admission-patch-nk9fv evicted
pod/nfs-client-provisioner-86d7fb78b6-2f5dx evicted
pod/app-before-backup-66dc9d5cb-6xc8c evicted
pod/coredns-6d8c4cb4d-l4kx4 evicted
pod/pod-netpol-5b67b6b496-2cgnw evicted
pod/pod-netpol-2-77478d77ff-l6rzm evicted
pod/ubuntu evicted
node/cka002 drained
Upgrade Workers ¶
Log on to cka002
.
Download kubeadm
with version v1.24.2
.
sudo apt-get -y install kubeadm=1.24.2-00 --allow-downgrades
Upgrade kubeadm
.
sudo kubeadm upgrade node
Upgrade kubelet
.
sudo apt-get -y install kubelet=1.24.2-00 --allow-downgrades
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Make sure all nodes are in Ready status, then, enable node scheduling.
kubectl uncordon <worker_node_name>
kubectl uncordon cka002
Verify Upgrade ¶
Check node status.
kubectl get node
Result
NAME STATUS ROLES AGE VERSION
cka001 Ready control-plane,master 32h v1.24.2
cka002 Ready <none> 32h v1.24.2
cka003 Ready <none> 32h v1.24.0
Repeat the same on node cka003
.
Log onto cka001
. If have error on dependency of emptydir
, use the 2nd command.
kubectl drain cka003 --ignore-daemonsets --ignore-daemonsets --force
kubectl drain cka003 --ignore-daemonsets --ignore-daemonsets --delete-emptydir-data --force
Log onto cka003
and perform below commands.
sudo apt-get -y install kubeadm=1.24.2-00 --allow-downgrades
sudo kubeadm upgrade node
sudo apt-get -y install kubelet=1.24.2-00 --allow-downgrades
sudo systemctl daemon-reload
sudo systemctl restart kubelet
kubectl get node
kubectl uncordon cka003
Get final status of all nodes.
kubectl get node
NAME STATUS ROLES AGE VERSION
cka001 Ready control-plane,master 32h v1.24.2
cka002 Ready <none> 32h v1.24.2
cka003 Ready <none> 32h v1.24.2
Summary:
- Control Plane
kubectl get node -owide
kubectl drain cka001 --ignore-daemonsets
kubectl get node -owide
apt policy kubeadm
apt-get -y install kubeadm=1.24.0-00 --allow-downgrades
kubeadm upgrade plan
kubeadm upgrade apply v1.24.0
# kubeadm upgrade apply v1.24.0 --etcd-upgrade=false
apt-get -y install kubelet=1.24.0-00 kubectl=1.24.0-00 --allow-downgrades
systemctl daemon-reload
systemctl restart kubelet
kubectl get node
kubectl uncordon cka001
- Worker Node
- On Control Plane
kubectl drain cka002 --ignore-daemonsets
- On Workder Node
apt-get -y install kubeadm=1.24.1-00 --allow-downgrades
kubeadm upgrade node
apt-get -y install kubelet=1.24.1-00 --allow-downgrades
systemctl daemon-reload
systemctl restart kubelet
kubectl uncordon cka002
- Repeat for other Worker nodes