本教程是通过kubeadm部署k8s的详细教程。我们在阿里云购买了五台ecs服务器,我们创建的vpc网段是:172.16.0.0/12
一、安装Docker、Docker-Compose
安装docker后,默认会安装容器运行时containerd, kubernetes也支持containerd作为runtime
1.安装docker
sudo yum install -y docker-ce-24.0.6 docker-ce-cli-24.0.6 containerd.io sudo mkdir -p /etc/docker sudo tee /etc/docker/daemon.json <<EOF { "registry-mirrors": [ "https://docker.m.daocloud.io", "https://hfxvwdfx.mirror.aliyuncs.com", "https://dockerproxy.com", "https://proxy.1panel.live", "https://dockerproxy.cn", "https://hub1.nat.tf", "https://docker.ketches.cn", "https://hub1.nat.tf", "https://hub2.nat.tf", "https://docker.6252662.xyz" ], "insecure-registries" : ["https://ude7leho.mirror.aliyuncs.com"], "log-driver": "json-file", "log-opts": {"max-size": "100m", "max-file": "1"} } EOF sudo systemctl daemon-reload sudo systemctl restart docker sudo systemctl enable docker sudo systemctl start docker sudo docker version --format '{{.Server.Version}}'
由于docker容器网段和vpc网段重叠
原来的两台服务器,跑了docker容器的,配置网段如下:
{ "bip": "192.168.0.1/24", "default-address-pools": [ {"base": "10.10.0.0/16","size": 24} ] } { "bip": "192.168.1.1/24", "default-address-pools": [ {"base": "10.11.0.0/16","size": 24} ] }
分别为node1-node5分配bip 2.1/24~6.1/24,修改docker配置
在docker里面添加:bip和default-address-pools "bip": "192.168.6.1/24", "default-address-pools": [ { "base": "10.11.0.0/16", "size": 24 } ]
2.安装docker-compose
curl -SL https://github.com/docker/compose/releases/download/v2.18.1/docker-compose-$(uname -s)-$(uname -m) mv docker-compose /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose docker-compose --version
二、安装kubernetes
由 kubeadm 创建的 Kubernetes 集群依赖于使用内核特性的相关软件。
这些软件包括但不限于容器运行时、kubelet 和容器网络接口(CNI)插件
清单
IP(内网) | hostname | 配置 | 节点类型 |
---|---|---|---|
172.22.162.243 | k8s-node1 | 4C/16G/100G | worker |
172.22.162.245 | k8s-node2 | 4C/16G/100G | worker |
172.22.162.244 | k8s-node3 | 4C/8G/100G | master |
172.22.162.242 | k8s-node4 | 4C/8G/100G | master |
172.22.162.241 | k8s-node5 | 4C/8G/100G | master |
配置hosts
cat >> /etc/hosts << EOF 172.22.162.243 k8s-node1 172.22.162.245 k8s-node2 172.22.162.244 k8s-node3 172.22.162.242 k8s-node4 172.22.162.241 k8s-node5 172.22.162.241 cluster-endpoint EOF
- 映射关系对
master-ip cluster-endpoint
与kubeadm init
时传递参数--control-plane-endpoint=cluster-endpoint
关联,配置高可用集群时,该参数必须指定。 - 高可用集群部署成功后,可将该映射 IP 地址改为负载均衡地址。
- 该映射 IP 为第一个部署的 master 节点地址。
1.前提条件
- Linux主机,每台机器2GB+的RAM
- Control Plane要2C+的核心
- 网络互通,内网公网均可
- 节点之中不可以有重复的主机名、MAC 地址或 product_uuid
- 开启机器上的某些端口
- 交换分区的配置。kubelet 的默认行为是在节点上检测到交换内存时无法启动
所有节点都需要执行一遍
2.检查mac和product uuid唯一性
硬件设备会拥有唯一的地址,但是有些虚拟机的地址可能会重复,如果这些值在每个节点上不唯一,可能会导致安装失败
# 检测 mac 地址 ip link ifconfig -a # 检查 product_uuid cat /sys/class/dmi/id/product_uuid
3.检查所需端口
-
必须开启6443端口,用于Kubernetes API服务器入站请求
-
目前服务器已关闭防火墙,所有的端口在内网均可访问
4.配置内核参数 转发ipv4并让iptables看到桥接流量
- 启用Overlay文件系统,Kubernetes 需要它来管理容器镜像和存储
- 让Linux桥接网络支持iptables防火墙规则, Linux 的网桥(如 Docker 或 Kubernetes 创建的虚拟网桥
cni0
)会将多个容器/Pod 的网络连接在一起,形成局域网。默认情况下,桥接的流量不经过宿主机的iptables
规则链。 - Iptables是Linux 的防火墙工具,Kubernetes 用它实现:Service负载菌核、网络策略、节点流量转发
- 如果桥接流量不经过
iptables
,Kubernetes 的 Service 和网络策略将失效。这两行配置强制让桥接流量经过iptables/ip6tables
规则链。
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf overlay br_netfilter EOF sudo modprobe overlay sudo modprobe br_netfilter # 设置所需的 sysctl 参数,参数在重新启动后保持不变 cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 # 启用 Linux 内核的 IPv4 数据包转发功能,允许主机充当路由器,在不同网络接口间转发流量 net.ipv4.ip_forward = 1 EOF # 应用 sysctl 参数而不重新启动 sudo sysctl --system
5.yum源配置 安装常用包
yum install -y vim bash-completion net-tools gcc
6.时间同步
yum install -y chrony # 同步的时间服务器修改为国家授时中心 NTP 服务器 sed -i 's/^pool 2.centos.pool.ntp.org iburst$/pool ntp.ntsc.ac.cn iburst/' /etc/chrony.conf systemctl start chronyd systemctl enable chronyd # 查看同步的时间服务器 chronyc sources
7.关闭防火墙
systemctl stop firewalld systemctl disable firewalld
8.关闭swap分区
# 临时有效 swapoff -a # 永久生效,修改 /etc/fstab,删除如下行 /dev/mapper/cl-swap swap swap defaults 0 0
9.关闭SELinux
setenforce 0 sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
10.配置ssh互信
ssh-copy-id
将本地 SSH 公钥复制到远程主机 k8s-node1
,以实现免密码登录
# 生成 ssh 公私钥 ssh-keygen # 配置 authorized_keys(各节点交叉执行) ssh-copy-id -i ~/.ssh/id_rsa.pub root@k8s-node1 ssh-copy-id -i ~/.ssh/id_rsa.pub root@k8s-node2 ssh-copy-id -i ~/.ssh/id_rsa.pub root@k8s-node3 ssh-copy-id -i ~/.ssh/id_rsa.pub root@k8s-node4
11.containered配置
修改运行时为containerd
⚠️:containerd虽然和docker一起安装上,但是他们的镜像是单独保存的
- docker镜像存储在/var/lib/docker, containerd存储在/var/lib/containerd
- 可以手动导出docker镜像并导入containerd,他们的格式是一样的(OCI)
crictl config runtime-endpoint unix:///var/run/containerd/containerd.sock
修改containerd配置文件,如果不存在,先生成默认配置
containerd config default > /etc/containerd/config.toml
修改配置文件 vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"] endpoint = ["https://docker.6252662.xyz","https://docker.m.daocloud.io","https://dockerproxy.com"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://docker.6252662.xyz","https://docker.m.daocloud.io","https://dockerproxy.com"]
修改沙箱sandbox pause的镜像地址
grep sandbox_image /etc/containerd/config.toml # 按上一步输出选择 sed -i "s#k8s.gcr.io/pause#registry.aliyuncs.com/google_containers/pause#g" /etc/containerd/config.toml 或 sed -i "s#registry.k8s.io/pause#registry.aliyuncs.com/google_containers/pause#g" /etc/containerd/config.toml # 确认修改成功 grep sandbox_image /etc/containerd/config.toml # 应用所有更改后,重新启动containerd systemctl restart containerd # 检查registry配置是否正确 crictl info | grep -A 20 "registry"
12.修改systemd cgroup驱动
官方建议使用systemd cgroup驱动,SystemdCgroup默认是false,使用cgroupfs驱动,改为true使用systemd驱动
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml # 应用所有更改后,重新启动containerd systemctl restart containerd
13.安装kubeadm、kubelet、kubectl
kubeadm
:用来初始化集群的指令。kubelet
:在集群中的每个节点上用来启动 Pod 和容器等。kubectl
:用来与集群通信的命令行工具。
配置k8s yum源
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-$basearch enabled=1 gpgcheck=1 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg exclude=kubelet kubeadm kubectl EOF
安装
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes systemctl enable --now kubelet
⚠️注意:kubelet 现在每隔几秒就会重启,属于正常现象,因为它陷入了一个等待 kubeadm 指令的死循环。
14.使用kubeadm创建集群
在master1 也就是cluster-endpoint节点上执行kubeadm init
# –-apiserver-advertise-address 用于为控制平面节点的 API server 设置广播地址。 # –-image-repository 指定从什么位置来拉取镜像。 # --control-plane-endpoint 用于为所有控制平面节点设置共享端点。如果不设置,则无法将单个控制平面 kubeadm 集群升级成高可用。 # --upload-certs 指定将在所有控制平面实例之间的共享证书上传到集群。后面 join 其他 master 节点时,可以使用该证书。 # –-kubernetes-version 指定 k8s 版本号。 # –-pod-network-cidr 指定 Pod 网络的范围。不同 CNI 默认网段也不一样,Calico 默认为 192.168.0.0/16。 kubeadm init \ --apiserver-advertise-address=172.22.162.241 \ --image-repository registry.aliyuncs.com/google_containers \ --control-plane-endpoint=cluster-endpoint \ --upload-certs \ --kubernetes-version v1.31.9 \ --service-cidr=10.1.0.0/16 \ --pod-network-cidr=192.168.0.0/16 \ --v=5
部署成功后,输出如下,保存kubeadm join的字符串
Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: kubeadm join cluster-endpoint:6443 --token l3o830.bnrk2y3fsi40w10o \ --discovery-token-ca-cert-hash sha256:87219069b3e533e86dc28d62446daebe0628b76703aa740f149b032fa7690b6a \ --control-plane --certificate-key ee9fff16e3f8a46dd31624e68df31065cfc8ecdd007ce7cb020a492e8952eedf Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join cluster-endpoint:6443 --token l3o830.bnrk2y3fsi40w10o \ --discovery-token-ca-cert-hash sha256:ee9fff16e3f8a46dd31624e68df31065cfc8ecdd007ce7cb020a492e8952eedf
- cluster-endpoint:6443: 集群API服务器端点
- token:加入集群的认证令牌 24H有效期
- discovery-token-ca-cert-hash,CA证书,验证API服务器身份
- control-plane:加入的是控制平面的服务器
- certificate-key:获取控制平面证书,2H过期
手动重新获取join命令
获取worker的join:
kubeadm token create --print-join-command
获取master的join:
先获取worker的join string 再组合上--control-plan --discovery-token-ca-cert-hash = certificate-key
token过期
kubeadm token create # yqiri0.oxybs1qfy7trixy3 kubeadm token create --print-join-command # kubeadm join cluster-endpoint:6443 --token 0tx28e.kskomhxaz21efutu --discovery-token-ca-cert-hash sha256:87219069b3e533e86dc28d62446daebe0628b76703aa740f149b032fa7690b6a
certificate-key过期
kubeadm init phase upload-certs --upload-certs kubeadm token create --print-join-command
master1服务器 配置kubectl kube config环境变量
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config # 设置环境变量 echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile source ~/.bash_profile
查看node信息 node5 notReady
[root@k8s-node5 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-node5 NotReady control-plane 115s v1.28.2
查看日志报错原因
journalctl -xeu kubelet # Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
安装网络插件CNI calico
你必须部署一个基于 Pod 网络插件的容器网络接口(CNI), 以便你的 Pod 可以相互通信。在安装网络之前,集群 DNS (CoreDNS) 将不会启动。
- 通过kubectl apply来安装
- 安装Pod网络后,检查 CoreDNS Pod 是否
Running
来确认其是否正常运行。一旦 CoreDNS Pod 启用并运行,你就可以继续加入节点 - 我们安装的k8s是1.31 兼容的calico是3.29.5
下载calico 3.29.5的yaml
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.5/manifests/calico.yaml kubectl create -f calico.yaml kubectl get pod -n kube-system -o wide
如果安装过程出现失败,需要删除所有资源,然后重新执行上面的安装命令
# 删除 DaemonSet 和 Deployment kubectl delete daemonset calico-node -n kube-system kubectl delete deployment calico-kube-controllers -n kube-system # 删除 ServiceAccount kubectl delete serviceaccount calico-kube-controllers -n kube-system kubectl delete serviceaccount calico-node -n kube-system kubectl delete serviceaccount calico-cni-plugin -n kube-system # 删除 ClusterRole 和 ClusterRoleBinding kubectl delete clusterrole calico-kube-controllers kubectl delete clusterrole calico-node kubectl delete clusterrole calico-cni-plugin kubectl delete clusterrole calico-tier-getter kubectl delete clusterrolebinding calico-kube-controllers kubectl delete clusterrolebinding calico-node kubectl delete clusterrolebinding calico-cni-plugin kubectl delete clusterrolebinding calico-tier-getter # 删除 ConfigMap kubectl delete configmap calico-config -n kube-system # 删除 PodDisruptionBudget kubectl delete poddisruptionbudget calico-kube-controllers -n kube-system # 删除calico crd kubectl delete crd \ bgpconfigurations.crd.projectcalico.org \ bgpfilters.crd.projectcalico.org \ bgppeers.crd.projectcalico.org \ blockaffinities.crd.projectcalico.org \ caliconodestatuses.crd.projectcalico.org \ clusterinformations.crd.projectcalico.org \ felixconfigurations.crd.projectcalico.org \ globalnetworkpolicies.crd.projectcalico.org \ globalnetworksets.crd.projectcalico.org \ hostendpoints.crd.projectcalico.org \ ipamblocks.crd.projectcalico.org \ ipamconfigs.crd.projectcalico.org \ ipamhandles.crd.projectcalico.org \ ippools.crd.projectcalico.org \ ipreservations.crd.projectcalico.org \ kubecontrollersconfigurations.crd.projectcalico.org \ networkpolicies.crd.projectcalico.org \ networksets.crd.projectcalico.org \ tiers.crd.projectcalico.org \ adminnetworkpolicies.policy.networking.k8s.io
kubectl apply -f tigera-operator.yaml kubectl apply -f custom-resources.yaml
node3-4加入集群
certificate-key可能会过期,有效时间2H,需要已经加入集群的master1重新生成:
kubeadm join cluster-endpoint:6443 --token l3o830.bnrk2y3fsi40w10o \ --discovery-token-ca-cert-hash sha256:87219069b3e533e86dc28d62446daebe0628b76703aa740f149b032fa7690b6a \ --control-plane --certificate-key ee9fff16e3f8a46dd31624e68df31065cfc8ecdd007ce7cb020a492e8952eedf
- calico在整个集群里面只需要装一次
- 但是需要初始化kubectl
node1-2加入集群作为worker
kubeadm join cluster-endpoint:6443 --token l3o830.bnrk2y3fsi40w10o \ --discovery-token-ca-cert-hash sha256:ee9fff16e3f8a46dd31624e68df31065cfc8ecdd007ce7cb020a492e8952eedf
- worker节点不需要kubectl,kubectl的初始化可以跳过 worker节点也看不到nodes
安装kubesphere
安装helm
- helm需要安装在master节点上
1.下载helm安装包github.com/helm/helm/r…
2.解压后,移动mv linux-amd64/helm /usr/local/bin/helm
3.helm version检测是否安装完成
4.修改源 微软的源推荐azure的
helm repo remove stable helm repo add stable http://mirror.azure.cn/kubernetes/charts/ helm repo update
安装kubesphere
helm upgrade --install \ -n kubesphere-system \ --create-namespace \ ks-core \ https://charts.kubesphere.com.cn/main/ks-core-1.1.4.tgz \ --debug