ETCD 是用于共享配置和服务发现的分布式,一致性的 KV 存储系统。ETCD 是CoreOS 公司发起的一个开源项目,授权协议为 Apache。
ETCD 是 k8s 集群极为重要的一块服务,存储了集群所有的数据信息。同理,如果发生灾难或者 etcd 的数据丢失,都会影响集群数据的恢复。
查看 ETCD 状态
查看 etcd 集群状态
1
2
3
4
5
6
7
8
9export ETCDCTL_API=3
/usr/local/etcd/bin/etcdctl \
--cacert=/usr/local/etcd/ssl/etcd-ca.pem \
--cert=/usr/local/etcd/ssl/etcd.pem \
--key=/usr/local/etcd/ssl/etcd-key.pem \
--endpoints="https://10.1.40.61:2379,\
https://10.1.40.62:2379,\
https://10.1.40.63:2379" endpoint status --write-out=table输出如下信息
1
2
3
4
5
6
7+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.1.40.61:2379 | 93ed298870a73c23 | 3.4.13 | 8.4 MB | false | false | 21 | 1752481 | 1752481 | |
| https://10.1.40.62:2379 | 45e2bed5ef11abc6 | 3.4.13 | 8.4 MB | false | false | 21 | 1752481 | 1752481 | |
| https://10.1.40.63:2379 | 4a20c27570f92f1b | 3.4.13 | 8.4 MB | true | false | 21 | 1752481 | 1752481 | |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+获取 etcd 版本信息
1
2
3
4
5
6
7# /usr/local/etcd/bin/etcdctl \
--cacert=/usr/local/etcd/ssl/etcd-ca.pem \
--cert=/usr/local/etcd/ssl/etcd.pem \
--key=/usr/local/etcd/ssl/etcd-key.pem \
--endpoints="https://10.1.40.61:2379,\
https://10.1.40.62:2379,\
https://10.1.40.63:2379" version输出如下信息
1
2etcdctl version: 3.4.13
API version: 3.4获取 ETCD 所有的 key
1
2
3
4
5
6
7
8# /usr/local/etcd/bin/etcdctl \
--cacert=/usr/local/etcd/ssl/etcd-ca.pem \
--cert=/usr/local/etcd/ssl/etcd.pem \
--key=/usr/local/etcd/ssl/etcd-key.pem \
--endpoints="https://10.1.40.61:2379,\
https://10.1.40.62:2379,\
https://10.1.40.63:2379" \
get / --prefix --keys-only
备份与还原
备份 ETCD
注意: ETCD 不同的版本的 etcdctl 命令不一样,但大致差不多,本文备份使用
napshot save
, 每次备份一个节点就行。
执行以下命令备份 ETCD 集群
1
2
3
4
5
6
7
8export ETCDCTL_API=3
/usr/local/etcd/bin/etcdctl \
--cacert=/usr/local/etcd/ssl/etcd-ca.pem \
--cert=/usr/local/etcd/ssl/etcd.pem \
--key=/usr/local/etcd/ssl/etcd-key.pem\
--endpoints="https://10.1.40.61:2379" \
snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db输出如下信息
1
2
3
4
5
6
7{"level":"info","ts":1670223148.1573944,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/data/etcd_backup_dir/etcd-snapshot-20221205.db.part"}
{"level":"info","ts":"2022-12-05T14:52:28.177+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1670223148.1780982,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.1.40.61:2379"}
{"level":"info","ts":"2022-12-05T14:52:28.355+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1670223148.3745964,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.1.40.61:2379","size":"8.4 MB","took":0.217055582}
{"level":"info","ts":1670223148.37477,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/data/etcd_backup_dir/etcd-snapshot-20221205.db"}
Snapshot saved at /data/etcd_backup_dir/etcd-snapshot-20221205.db查看备份文件状态
1
2
3
4
5/usr/local/etcd/bin/etcdctl \
--cacert=/usr/local/etcd/ssl/etcd-ca.pem \
--cert=/usr/local/etcd/ssl/etcd.pem \
--key=/usr/local/etcd/ssl/etcd-key.pem \
snapshot status /data/etcd_backup_dir/etcd-snapshot-20221205.db输出如下信息
1
425c233e, 1317803, 2050, 8.4 MB
恢复 ETCD
停止所有 Master 上
kube-apiserver
服务1
systemctl stop kube-apiserver
停止集群中所有 ETCD 服务
1
systemctl stop etcd
移除所有 ETCD 存储目录下数据
1
mv /var/lib/etcd /var/lib/etcd.bak
拷贝 ETCD 备份快照到每台 ETCD 节点
1
2scp /data/etcd_backup_dir/etcd-snapshot-20221205.db root@k8s-sit-master2:/data/etcd_backup_dir/
scp /data/etcd_backup_dir/etcd-snapshot-20221205.db root@k8s-sit-master3:/data/etcd_backup_dir/在每个 etcd 节点执行恢复 etcd 命令
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29# 在 k8s-sit-master01 节点上执行
/usr/local/etcd/bin/etcdctl \
snapshot restore /data/etcd_backup_dir/etcd-snapshot-20221205.db \
--name k8s-sit-master01 \
--initial-cluster 'k8s-sit-master01=https://10.1.40.61:2380,k8s-sit-master02=https://10.1.40.62:2380,k8s-sit-master03=https://10.1.40.63:2380' \
--initial-cluster-token 'etcd-k8s-cluster' \
--initial-advertise-peer-urls 'https://10.1.40.61:2380' \
--data-dir=/var/lib/etcd/ \
--wal-dir=/var/lib/etcd/wal
# 在 k8s-sit-master02 节点上执行
/usr/local/etcd/bin/etcdctl \
snapshot restore /data/etcd_backup_dir/etcd-snapshot-20221205.db \
--name k8s-sit-master02 \
--initial-cluster 'k8s-sit-master01=https://10.1.40.61:2380,k8s-sit-master02=https://10.1.40.62:2380,k8s-sit-master03=https://10.1.40.63:2380' \
--initial-cluster-token 'etcd-k8s-cluster' \
--initial-advertise-peer-urls 'https://10.1.40.62:2380' \
--data-dir=/var/lib/etcd/ \
--wal-dir=/var/lib/etcd/wal
# 在 k8s-sit-master03 节点上执行
/usr/local/etcd/bin/etcdctl \
snapshot restore /data/etcd_backup_dir/etcd-snapshot-20221205.db \
--name k8s-sit-master03 \
--initial-cluster 'k8s-sit-master01=https://10.1.40.61:2380,k8s-sit-master02=https://10.1.40.62:2380,k8s-sit-master03=https://10.1.40.63:2380' \
--initial-cluster-token 'etcd-k8s-cluster' \
--initial-advertise-peer-urls 'https://10.1.40.63:2380' \
--data-dir=/var/lib/etcd/ \
--wal-dir=/var/lib/etcd/wal所有 etcd 节点启动 etcd
1
systemctl start etcd
查看 etcd 集群状态
1
2
3
4
5
6
7# /usr/local/etcd/bin/etcdctl \
--cacert=/usr/local/etcd/ssl/etcd-ca.pem \
--cert=/usr/local/etcd/ssl/etcd.pem \
--key=/usr/local/etcd/ssl/etcd-key.pem \
--endpoints="https://10.1.40.61:2379,\
https://10.1.40.62:2379,\
https://10.1.40.63:2379" endpoint health --write-out=table启动 kube-apiserver 服务
1
systemctl start kube-apiserver
查看 k8s 集群状态
1
2
3
4
5
6
7
8# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}Kubernetes 集群备份主要是备份 ETCD 集群。而恢复时,主要考虑恢复整个顺序:
停止 kube-apiserver –> 停止 ETCD –> 恢复数据 –> 启动ETCD –> 启动kube-apiserve
注意:备份ETCD集群时,只需要备份一个ETCD就行,恢复时,拿同一份备份数据恢复。