etcd在kubernetes中是一个很核心的组件,所有数据都存储在etcd中,如果etcd发生故障将导致整个群集的不可用,生产中etcd一定要做高可用和数据的备份与恢复。
> etcd 版本为3.2.26,kubernetes为1.14.2所以这里使用的是etcd v3
## 备份
```
ETCDCTL_API=3 etcdctl --endpoints=${endpoints} --cert=/usr/local/kubernetes/ssl/etcd.pem --key=/usr/local/kubernetes/ssl/etcd-key.pem --cacert=/usr/local/kubernetes/ssl/ca.pem snapshot save back.db
Snapshot saved at back.db
```
## 恢复
1. 停止etcd群集
```
systemctl stop etcd
```
2. 删除etcd目录
```
rm -rf /opt/etcd
```
> 需要将整个目录删除,恢复时会自动创建
3. 复制备份文件到群集所有节点
```
scp back.db 10.0.20.12:~/
```
4. 恢复数据
```
ETCDCTL_API=3 etcdctl --endpoints=https://10.0.20.11:2379,https://10.0.20.12:2379,https://10.0.20.13:2379 --cert=/usr/local/kubernetes/ssl/etcd.pem --key=/usr/local/kubernetes/ssl/etcd-key.pem --cacert=/usr/local/kubernetes/ssl/ca.pem --initial-cluster etcd1=https://10.0.20.11:2380,etcd2=https://10.0.20.12:2380,etcd3=https://10.0.20.13:2380 --initial-advertise-peer-urls https://10.0.20.12:2380 snapshot restore back.db --data-dir=/opt/etcd/ --name etcd2
```
- `--initial-advertise-peer-urls`(每台不一样)和`--initial-cluster` 参考你自己的etcd配置文件填写
- `--data-dir=/opt/etcd/` 指定etcd的数据目录
- `--name etcd2` etcd名称
5. 验证
```
ETCDCTL_API=3 etcdctl --endpoints=https://10.0.20.11:2379,https://10.0.20.12:2379,https://10.0.20.13:2379 --cert=/usr/local/kubernetes/ssl/etcd.pem --key=/usr/local/kubernetes/ssl/etcd-key.pem --cacert=/usr/local/kubernetes/ssl/ca.pem member list
b40a71b8cf44c74, started, etcd3, https://10.0.20.13:2380, https://10.0.20.13:2379
a9027edffe4ef2d2, started, etcd1, https://10.0.20.11:2380, https://10.0.20.11:2379
c1e9eb55fcf40d38, started, etcd2, https://10.0.20.12:2380, https://10.0.20.12:2379
```
```
kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
```
etcd数据恢复成功
- kubernetes基础
- 安装kubernetes
- kubeadm平滑升级群集
- Taint和Toleration
- 使用HostAliases向Pod /etc/hosts 文件添加条目
- ConfigMap
- 插件
- 支持外部dns
- 安装helm
- HPA
- 存储
- 本地存储
- 网络存储
- Secret
- ConfigMap
- QA
- k8s使用时需要注意的坑点
- 容器中的JVM资源该如何被安全的限制
- 项目实践
- eureka集群
- Traefik ingress服务发现与负载均衡
- etcd数据备份与恢复
- deployment滚动升级与回滚
- 监控
- prometheus operator初体验
- prometheus-operator监控
- metrics-server监控kubernetes资源
- weave scope可视化监控