etcd pod 默认使用的cpu和内存都是100M,当集群变大时,资源会不够用,导致异常,如:
apiserver 默认使用cpu为250M
k8s-master121 kubelet[31020]: E0425 14:41:54.026671 31020 controller.go:187] failed to update lease, error: etcdserver: request timed out
kubelet[31020]: E0425 15:08:32.089005 31020 controller.go:187] failed to update lease, error: Put “https://k8s-master.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-master121?timeout=10s”: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
同时在k8s的dashboard中,apiserver的pod会有健康检查失败的事件。
这些都是由于cpu资源不够导致的问题。
我们可以根据需要,把etcd 和 apiserver使用的资源需求都改为500M,如下:
修改文件/etc/kubernetes/manifests/etcd.yam
# cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.1.71:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.1.71:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://192.168.1.71:2380
- --initial-cluster=k8s-master71=https://192.168.1.71:2380,k8s-master65=https://192.168.1.65:2380
- --initial-cluster-state=existing
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.71:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://192.168.1.71:2380
- --name=k8s-master71
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: registry.aliyuncs.com/google_containers/etcd:3.5.6-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health?exclude=NOSPACE&serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 500m
memory: 500Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health?serializable=false
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}
保存文件后会自动重启pod