腾讯云自建k8s对接文件存储CFS

环境

服务器:轻量应用服务器,使用私有网络VPC,需关联云联网实现内网互联

k8s:通过sealyun一键安装1.19.16版本k8s单主集群

csi插件:kubernetes-csi-tencentcloud

部署步骤

创建文件系统及挂载点

  • 创建vpc,注意不要与轻量应用服务器自动生成的vpc冲突,否则云联网无法互联
  • 创建文件系统及挂载点,记录下文件存储的客户端IP:172.16.0.10

image-20220314134502850

验证网络通信

  • 申请云联网关联,将轻量应用服务实例关联云联网;
  • 将文件系统的 VPC 实例关联至云联网
  • 在轻量应用服务器上telnet文件存储客户端的nfs端口,验证网络通信
telnet 172.16.0.10 111
telnet 172.16.0.10 2049 

image-20220314134410000

挂载测试

mkdir /localfolder
sudo mount -t nfs -o vers=4.0,noresvport 172.16.0.10:/ /localfolder

安装CSI插件

准备API密钥

创建子用户,并授权文件存储全读写访问权限,记录API密钥

SecretId AKID4xaw2rRknZ4IFvKSJtCEzUifLksjKfkY 
SecretKey xxxxxxxxxxxxxxxxxxxxxxx
安装CSI-CFS插件

插件Github地址:kubernetes-csi-tencentcloud/cfs

  • rbac
kubectl apply -f  deploy/cfs/kubernetes/csi-cfs-rbac.yaml
  • 根据k8s版本安装相应版本的插件,将API密钥填充至csi-provisioner-cfsplugin.yaml文件中
kubectl apply -f  deploy/cfs/kubernetes/csi-cfs-csidriver-old.yaml
kubectl apply -f  deploy/cfs/kubernetes/csi-nodeplugin-cfsplugin-new.yaml
kubectl apply -f  deploy/cfs/kubernetes/csi-provisioner-cfsplugin-new.yaml

静态挂载测试

  • 创建web目录
mkdir -p /localfolder/web
  • static-allinone
vim static-allinone.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: web-static-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  csi:
    driver: com.tencent.cloud.csi.cfs
    volumeHandle: web-static-pv
    volumeAttributes: 
      host: 172.16.0.10
      path: /web       # 如果不是根目录的话,需提前创建
  storageClassName: ""
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: web-static-pvc
spec:
  storageClassName: ""
  volumeName: web-static-pv
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx 
spec:
  containers:
  - image: ccr.ccs.tencentyun.com/qcloud/nginx:1.9
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    volumeMounts:
      - mountPath: /usr/share/nginx/html
        name: data-vol
  volumes:
  - name: data-vol
    persistentVolumeClaim:
      claimName: web-static-pvc
  • 测试
echo "hello cfs" > /localfolder/web/index.html

image-20220314142049299

动态挂载

vim dynamic-provison-allinone.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cfsauto
parameters:
  vpcid: vpc-klr8fwg5
  subnetid: subnet-63c5x24c
  resourcetags: ""
provisioner: com.tencent.cloud.csi.cfs
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-cfsplugin
spec:
  storageClassName: cfsauto
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: ccr.ccs.tencentyun.com/qcloud/nginx:1.9
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    volumeMounts:
      - mountPath: /usr/share/nginx/html
        name: data-cfsplugin
  volumes:
  - name: data-cfsplugin
    persistentVolumeClaim:
      claimName: data-cfsplugin
kubectl apply -f dynamic-provison-allinone.yaml
  • 会自动创建新的cfs,好像这个sc不能直接绑定已有cfs的subpath(待后续研究)

image-20220314192025136

<think>我们将在Kubernetes环境中使用腾讯云CFS(NFS)作为存储,部署一个Flink集群。步骤包括: 1. 创建腾讯云CFS文件系统 2. 创建Kubernetes PV和PVC 3. 部署Flink JobManager和TaskManager 4. 配置Flink使用持久化存储 注意:以下操作假设你已经有一个运行中的Kubernetes集群(腾讯云TKE或自建),并且已经配置好kubectl和腾讯云CFS。 步骤1:创建腾讯云CFS文件系统(如果尚未创建) - 登录腾讯云控制台,进入CFS管理页面。 - 创建文件系统,选择与Kubernetes集群相同的VPC和子网。 - 记录文件系统的挂载点IP(例如:10.0.0.10)。 步骤2:创建PV和PVC 我们将创建两个PV/PVC:一个用于Flink的检查点(checkpoints)和保存点(savepoints),另一个用于高可用(HA)存储(如果需要)。 创建检查点存储的PV和PVC(flink-checkpoints-pv.yaml): ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: flink-checkpoints-pv spec: capacity: storage: 100Gi accessModes: - ReadWriteMany storageClassName: "" mountOptions: - hard - nfsvers=4.0 - noresvport nfs: path: / server: 10.0.0.10 # 替换为你的CFS挂载点IP --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: flink-checkpoints-pvc spec: accessModes: - ReadWriteMany storageClassName: "" resources: requests: storage: 100Gi volumeName: flink-checkpoints-pv ``` 创建高可用存储的PV和PVC(flink-ha-pv.yaml): ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: flink-ha-pv spec: capacity: storage: 10Gi accessModes: - ReadWriteMany storageClassName: "" mountOptions: - hard - nfsvers=4.0 - noresvport nfs: path: / server: 10.0.0.10 # 同一个CFS,也可以使用不同的子目录 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: flink-ha-pvc spec: accessModes: - ReadWriteMany storageClassName: "" resources: requests: storage: 10Gi volumeName: flink-ha-pv ``` 应用PV和PVC: ```bash kubectl apply -f flink-checkpoints-pv.yaml kubectl apply -f flink-ha-pv.yaml ``` 步骤3:创建Flink配置ConfigMap 创建flink-config.yaml,包含flink-conf.yaml和日志配置: ```yaml apiVersion: v1 kind: ConfigMap metadata: name: flink-config data: flink-conf.yaml: |+ jobmanager.rpc.address: flink-jobmanager jobmanager.rpc.port: 6123 jobmanager.memory.process.size: 1600m taskmanager.memory.process.size: 1728m taskmanager.numberOfTaskSlots: 2 parallelism.default: 2 # 状态后端配置 state.backend: filesystem state.checkpoints.dir: file:///checkpoints state.savepoints.dir: file:///checkpoints/savepoints # 高可用配置 high-availability: zookeeper high-availability.storageDir: file:///flink-ha high-availability.zookeeper.quorum: zookeeper-service:2181 # 替换为你的ZooKeeper地址 # 使用CFS作为检查点存储 fs.checkpoint-dir: file:///checkpoints log4j-console.properties: |+ # ... ``` 步骤4:部署JobManager Deployment和Service 创建flink-jobmanager.yaml: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-jobmanager spec: replicas: 1 selector: matchLabels: app: flink component: jobmanager template: metadata: labels: app: flink component: jobmanager spec: containers: - name: jobmanager image: flink:1.14.4-scala_2.12 ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 8081 name: ui env: - name: JOB_MANAGER_RPC_ADDRESS value: flink-jobmanager command: ["/bin/bash", "-c", "$FLINK_HOME/bin/jobmanager.sh start; while :; do sleep 1; done"] volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ - name: checkpoint-volume mountPath: /checkpoints - name: ha-volume mountPath: /flink-ha volumes: - name: flink-config-volume configMap: name: flink-config - name: checkpoint-volume persistentVolumeClaim: claimName: flink-checkpoints-pvc - name: ha-volume persistentVolumeClaim: claimName: flink-ha-pvc --- apiVersion: v1 kind: Service metadata: name: flink-jobmanager spec: ports: - name: rpc port: 6123 - name: blob port: 6124 - name: ui port: 8081 selector: app: flink component: jobmanager ``` 步骤5:部署TaskManager Deployment 创建flink-taskmanager.yaml: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-taskmanager spec: replicas: 2 selector: matchLabels: app: flink component: taskmanager template: metadata: labels: app: flink component: taskmanager spec: containers: - name: taskmanager image: flink:1.14.4-scala_2.12 ports: - containerPort: 6122 name: task env: - name: JOB_MANAGER_RPC_ADDRESS value: flink-jobmanager command: ["/bin/bash", "-c", "$FLINK_HOME/bin/taskmanager.sh start; while :; do sleep 1; done"] volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ - name: checkpoint-volume mountPath: /checkpoints - name: ha-volume mountPath: /flink-ha volumes: - name: flink-config-volume configMap: name: flink-config - name: checkpoint-volume persistentVolumeClaim: claimName: flink-checkpoints-pvc - name: ha-volume persistentVolumeClaim: claimName: flink-ha-pvc ``` 步骤6:部署ZooKeeper(如果尚未部署) 如果还没有ZooKeeper,可以使用下面的部署: ```yaml # zookeeper.yaml apiVersion: apps/v1 kind: Deployment metadata: name: zookeeper spec: replicas: 1 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: zookeeper:3.7.0 ports: - containerPort: 2181 env: - name: ZOO_MY_ID value: "1" - name: ZOO_SERVERS value: server.1=0.0.0.0:2888:3888;2181 --- apiVersion: v1 kind: Service metadata: name: zookeeper-service spec: ports: - port: 2181 targetPort: 2181 selector: app: zookeeper ``` 步骤7:部署所有组件 ```bash kubectl apply -f flink-config.yaml kubectl apply -f zookeeper.yaml kubectl apply -f flink-jobmanager.yaml kubectl apply -f flink-taskmanager.yaml ``` 步骤8:访问Flink Web UI 通过flink-jobmanager服务的8081端口访问UI。可以使用端口转发: ```bash kubectl port-forward service/flink-jobmanager 8081:8081 ``` 然后在浏览器中打开:http://localhost:8081 步骤9:提交作业 可以通过JobManager的REST API提交作业,或者进入JobManager容器内部使用flink run提交。 注意:在Flink配置中,我们使用了文件系统状态后端,并将检查点目录设置为`file:///checkpoints`,该目录已经挂载了腾讯云CFS。因此,所有检查点和保存点都会持久化到CFS中。 高可用存储目录`file:///flink-ha`同样挂载了CFS,这样在JobManager失败时,可以通过ZooKeeper和存储在CFS中的元数据恢复。 这样就完成了在Kubernetes上使用腾讯云CFS部署Flink集群的整个过程。 §§相关问题§§: 1. 如何配置Flink使用RocksDB作为状态后端? 2. 在腾讯云环境中,如何优化Flink与CFS之间的网络性能? 3. 如何实现Flink作业的自动扩缩容? 4. 如何监控Flink集群在Kubernetes中的运行状态?</think>### 在 Kubernetes 上使用腾讯云 NFS 部署 Flink 集群的完整指南 以下是详细的操作步骤,包含所有必要的配置文件和命令: #### 1. 创建腾讯云 CFS (NFS) 文件系统 1. 登录腾讯云控制台 → 文件存储 CFS 2. 创建文件系统: - **地域/可用区**:与 Kubernetes 集群相同 - **网络类型**:VPC(选择集群所在 VPC) - **协议类型**:NFS - **存储类型**:标准型(或根据性能需求选择性能型) 3. 获取挂载点 IP(如 `10.0.0.5`) #### 2. 创建 Kubernetes PV/PVC ```yaml # flink-storage-pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: flink-cfs-pv spec: storageClassName: cfs capacity: storage: 200Gi # 建议至少200GB accessModes: - ReadWriteMany mountOptions: - nolock - tcp - nfsvers=4 nfs: path: /flink-data # CFS 子目录 server: 10.0.0.5 # 替换为你的CFS挂载点IP --- # flink-storage-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: flink-cfs-pvc spec: storageClassName: cfs accessModes: - ReadWriteMany resources: requests: storage: 200Gi ``` 应用配置: ```bash kubectl apply -f flink-storage-pv.yaml kubectl apply -f flink-storage-pvc.yaml ``` #### 3. 创建 Flink 配置文件 ConfigMap ```yaml # flink-config.yaml apiVersion: v1 kind: ConfigMap metadata: name: flink-config data: flink-conf.yaml: | # 基础配置 jobmanager.rpc.address: flink-jobmanager jobmanager.rpc.port: 6123 jobmanager.memory.process.size: 2048m taskmanager.memory.process.size: 4096m taskmanager.numberOfTaskSlots: 4 parallelism.default: 8 # 腾讯云CFS存储配置 state.backend: filesystem state.checkpoints.dir: file:///opt/flink/checkpoints state.savepoints.dir: file:///opt/flink/savepoints high-availability.storageDir: file:///opt/flink/ha # 性能优化 io.tmp.dirs: /opt/flink/tmp web.upload.dir: /opt/flink/uploads blob.server.port: 6124 # 腾讯云网络优化 taskmanager.network.bind-policy: ip taskmanager.memory.network.min: 256mb taskmanager.memory.network.max: 512mb log4j-console.properties: | rootLogger.level = INFO rootLogger.appenderRef.console.ref = ConsoleAppender appender.console.name = ConsoleAppender appender.console.type = CONSOLE appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n ``` #### 4. 部署 JobManager ```yaml # flink-jobmanager.yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-jobmanager spec: replicas: 1 selector: matchLabels: app: flink component: jobmanager template: metadata: labels: app: flink component: jobmanager spec: containers: - name: jobmanager image: apache/flink:1.17.0-scala_2.12 ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 8081 name: webui env: - name: JOB_MANAGER_RPC_ADDRESS value: "0.0.0.0" - name: FLINK_PROPERTIES value: "flink-conf.yaml" command: ["/bin/bash", "-c", "$FLINK_HOME/bin/start-cluster.sh; tail -f /dev/null"] volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf - name: flink-storage mountPath: /opt/flink resources: limits: memory: "4Gi" cpu: "2" requests: memory: "2Gi" cpu: "1" volumes: - name: flink-config-volume configMap: name: flink-config - name: flink-storage persistentVolumeClaim: claimName: flink-cfs-pvc --- apiVersion: v1 kind: Service metadata: name: flink-jobmanager spec: ports: - name: rpc port: 6123 - name: blob port: 6124 - name: webui port: 8081 selector: app: flink component: jobmanager ``` #### 5. 部署 TaskManager ```yaml # flink-taskmanager.yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-taskmanager spec: replicas: 3 selector: matchLabels: app: flink component: taskmanager template: metadata: labels: app: flink component: taskmanager spec: containers: - name: taskmanager image: apache/flink:1.17.0-scala_2.12 ports: - containerPort: 6122 name: task env: - name: JOB_MANAGER_RPC_ADDRESS value: "flink-jobmanager" - name: FLINK_PROPERTIES value: "flink-conf.yaml" command: ["/bin/bash", "-c", "$FLINK_HOME/bin/taskmanager.sh start; tail -f /dev/null"] volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf - name: flink-storage mountPath: /opt/flink resources: limits: memory: "8Gi" cpu: "4" requests: memory: "4Gi" cpu: "2" volumes: - name: flink-config-volume configMap: name: flink-config - name: flink-storage persistentVolumeClaim: claimName: flink-cfs-pvc ``` #### 6. 部署所有组件 ```bash # 应用所有配置 kubectl apply -f flink-config.yaml kubectl apply -f flink-jobmanager.yaml kubectl apply -f flink-taskmanager.yaml # 验证部署 kubectl get pods -l app=flink # 预期输出 NAME READY STATUS RESTARTS AGE flink-jobmanager-5c7d8c6c4f-9x8jv 1/1 Running 0 2m flink-taskmanager-6d5b8c5b5c-2k7wq 1/1 Running 0 1m flink-taskmanager-6d5b8c5b5c-4m9xp 1/1 Running 0 1m flink-taskmanager-6d5b8c5b5c-7p2zt 1/1 Running 0 1m ``` #### 7. 访问 Flink Web UI ```bash # 端口转发 kubectl port-forward service/flink-jobmanager 8081:8081 # 浏览器访问 open http://localhost:8081 ``` #### 8. 提交示例作业 ```bash # 进入JobManager容器 kubectl exec -it $(kubectl get pod -l component=jobmanager -o jsonpath='{.items[0].metadata.name}') -- bash # 提交示例作业 ./bin/flink run ./examples/streaming/WordCount.jar ``` #### 9. 验证存储功能 ```bash # 检查存储挂载 kubectl exec jobmanager-pod -- df -h /opt/flink # 检查检查点创建 kubectl exec jobmanager-pod -- ls /opt/flink/checkpoints ``` ### 腾讯云特定优化 #### 1. 网络优化 在 `flink-conf.yaml` 中添加: ```yaml taskmanager.network.bind-policy: ip taskmanager.network.detailed-metrics: true taskmanager.memory.network.fraction: 0.2 ``` #### 2. CFS 性能优化 ```yaml # 在容器部署中添加 spec: containers: - name: taskmanager env: - name: NFS_MOUNT_OPTIONS value: "rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" ``` #### 3. 安全组配置 确保安全组开放以下端口: - NFS: 2049 (TCP/UDP) - Flink JobManager: 6123, 6124, 8081 (TCP) - Flink TaskManager: 6122 (TCP) ### 故障排查 1. **挂载失败**: ```bash kubectl describe pvc flink-cfs-pvc kubectl get events --sort-by=.metadata.creationTimestamp ``` 2. **权限问题**: ```bash kubectl exec -it jobmanager-pod -- touch /opt/flink/test.txt ``` 3. **性能问题**: ```bash # 在TaskManager容器内测试 kubectl exec taskmanager-pod -- dd if=/dev/zero of=/opt/flink/testfile bs=1M count=1024 oflag=direct ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值