[reset] Unmounting mounted directories in "/var/lib/kubelet" W0321 14:19:36.524481 91940 cleanupnode.go:99] [reset] Failed to remove containers: [failed to stop running pod cdf897d2b8231f7d3fffa49017090a655b94ddb22cbc24d06f32d519fbaa2c5f: output: E0321 14:19:19.953922 92079 remote_runtime.go:205] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"cdf897d2b8231f7d3fffa49017090a655b94ddb22cbc24d06f32d519fbaa2c5f\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.96.0.1:443: connect: connection refused" podSandboxID="cdf897d2b8231f7d3fffa49017090a655b94ddb22cbc24d06f32d519fbaa2c5f" time="2025-03-21T14:19:19+08:00" level=fatal msg="stopping the pod sandbox \"cdf897d2b8231f7d3fffa49017090a655b94ddb22cbc24d06f32d519fbaa2c5f\": rpc error: code = Unknown desc = failed to destroy network for sandbox \"cdf897d2b8231f7d3fffa49017090a655b94ddb22cbc24d06f32d519fbaa2c5f\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.96.0.1:443: connect: connection refused" : exit status 1, failed to stop running pod e4fee45416ccc066380fe4d0f2f28ca861bf836967d88f2074ce62d6d77ec9ed: output: E0321 14:19:20.255860 92242 remote_runtime.go:205] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"e4fee45416ccc066380fe4d0f2f28ca861bf836967d88f2074ce62d6d77ec9ed\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.96.0.1:443: connect: connection refused" podSandboxID="e4fee45416ccc066380fe4d0f2f28ca861bf836967d88f2074ce62d6d77ec9ed" time="2025-03-21T14:19:20+08:00" level=fatal msg="stopping the pod sandbox \"e4fee45416ccc06638
时间: 2025-03-21 09:14:01 浏览: 67
从您提供的日志来看,这是 Kubernetes 集群节点重置 (`kubectl reset`) 过程中遇到的问题。主要原因是无法成功卸载挂载目录并停止运行的 Pod 沙盒网络。以下是分析及解决思路:
---
### 日志解析
1. **错误描述**
- `[reset] Failed to remove containers`: 表示 `kubeadm reset` 命令尝试移除容器时失败。
- 错误信息指出,在销毁沙盒网络(`sandbox`)时出现问题,Calico 网络插件未能正常工作。
2. **关键点提取**
- 错误发生在与 Calico 插件交互的过程中。
- 具体报错:`error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: connect: connection refused`
- IP 地址 `10.96.0.1` 是 Kubernetes 的 API Server Service IP。
- 报错表明连接到该地址时被拒绝,说明服务不可达或网络异常。
3. **根本原因推测**
- kube-proxy 或者 CoreDNS 可能未正确启动导致集群 DNS 不可用。
- 节点上的 CNI(Calico)配置文件损坏或者状态异常。
- kubernetes 控制平面组件(如 apiserver、controller-manager)可能处于非健康状态。
---
### 解决方案步骤
#### 1. 清理残留资源
使用以下命令清理旧的 Docker 容器和镜像:
```bash
docker ps -aq | xargs -r docker rm -f
docker images --filter=reference='*kube*' -q | xargs -r docker rmi -f
```
#### 2. 卸载已有的 CNI 插件
删除现有的 CNI 文件夹内容(需谨慎操作),避免冲突:
```bash
sudo rm -rf /etc/cni/net.d/*
sudo systemctl restart kubelet
```
#### 3. 手动重启核心服务
强制重启相关的核心服务,确保它们能够正常通信:
```bash
sudo systemctl daemon-reload && sudo systemctl start kubelet
curl https://10.96.0.1:443/version --insecure # 测试 APIServer 是否可达
```
#### 4. 回滚网络设置
如果仍然存在问题,则可以考虑完全回退所有 Kubernetes 组件后再重新初始化:
```bash
kubeadm reset
ip link delete cbr0 || true
iptables -F && iptables -t nat -F
sysctl net.bridge.bridge-nf-call-iptables=1
modprobe br_netfilter
```
#### 5. 再次部署 Calico
根据官方文档再次安装最新的稳定版本:
```yaml
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
```
---
### 注意事项
- 在高负载环境下执行上述清除动作前务必备份重要数据;
- 若问题依旧存在建议检查主机防火墙规则是否阻止了必要的端口流量;
阅读全文
相关推荐


















