多 K8S 集群监控方案与实现(三)多K8S集群接入

一、前言概述

上一篇主要讲述了VictoriaMetrics 的功能作用及部署实现,本篇进行多K8S集群接入的部署实现。主要使用vmagent进行各K8S集群中的指标数据收集和推送。

1.1 vmagent功能介绍

在监控系统中,vmagent 负责从各个K8S集群收集数据,并处理转发至 VictoriaMetrics 库。vmagent 具备强大的数据采集能力,支持多种主流的监控数据协议,如 Prometheus 的 Metrics 协议、InfluxDB 协议、Graphite 协议等,能够适配不同类型监控目标的数据输出格式。同时,它还拥有数据过滤、数据重标签、数据聚合等数据处理功能。通过数据过滤,可以筛选掉无用或冗余的监控指标,减少数据传输量和存储成本;数据重标签则可以根据实际需求为监控指标添加、修改或删除标签,方便后续的数据查询和分析;数据聚合能够将多个细粒度的监控指标按照一定规则合并为粗粒度指标,进一步优化数据存储和查询效率。

1.2 数据采集

数据采集搭配以下工具来获取各类指标:

1.kube-stat-metrics用于获取k8s集群内各类资源的指标数据

2.kubelet获取容器的资源使用数据、node节点资源总量数据等

3.node-export获取node节点资源使用数据

4.blackbox获取端口、TCP链接等连通性

二、vmagent 部署

2.1 创建命名空间

kubectl create ns monitoring

2.2 创建合适的RBAC

# ServiceAccount:关闭自动挂载Token
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vmagent
  namespace: monitoring
automountServiceAccountToken: false
---
# ClusterRole:最小化集群级权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vmagent-role
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/metrics
      - nodes/proxy
      - pods
      - services
      - endpoints
    verbs: ["get", "list", "watch"]
  - apiGroups: ["discovery.k8s.io"]
    resources:
      - endpointslices
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
    verbs: ["get"]
  - apiGroups: ["metrics.k8s.io"]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses", "ingresses/status"]
    verbs: ["get", "list", "watch"]
---
# Role:命名空间级Secret访问(更安全的权限隔离)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: vmagent-secrets
  namespace: monitoring
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
---
# ClusterRoleBinding:集群权限绑定
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vmagent-monitor-binding
subjects:
  - kind: ServiceAccount
    name: vmagent
    namespace: monitoring
roleRef:
  kind: ClusterRole
  name: vmagent-role
  apiGroup: rbac.authorization.k8s.io
---
# RoleBinding:Secret权限绑定
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: vmagent-secret-binding
  namespace: monitoring
subjects:
  - kind: ServiceAccount
    name: vmagent
    namespace: monitoring
roleRef:
  kind: Role
  name: vmagent-secrets
  apiGroup: rbac.authorization.k8s.io

2.3 创建Token Secret

apiVersion: v1
kind: Secret
metadata:
  name: vmagent-token
  namespace: monitoring
  annotations:
    kubernetes.io/service-account.name: vmagent
  labels:
    app: victoriametrics
    component: vmagent
type: kubernetes.io/service-account-token

2.4 创建TLS Secret

注意:执行下面命令要把第二篇中的vmagent客户端证书文件上传到当前目录

kubectl -n monitoring create secret generic vmagent-tls \
  --from-file=client.crt=vmagent.crt \
  --from-file=client.key=vmagent.key \
  --from-file=ca.crt=ca.crt

2.5 创建ConfigMap

注意:根据情况设置内部标签(external_labels)区别集群(用于Grafana表盘筛选)

ConfigMap中主要配置了自动发现

---
apiVersion: v1
data:
  scrape.yml: |-
    global:
      scrape_interval: 15s
      external_labels:
        region: "中国XXX"              # 可以设置用于区分的内部标签
        klustername: "某某某集群"      # 可以设置用于区分的内部标签

    scrape_configs:
      # 1. 核心基础设施监控
      - job_name: '某某某集群'         # 设置用于区分不同集群
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'
            target_label: __address__
          - target_label: __scheme__
            replacement: http
          - source_labels: [__meta_kubernetes_node_address_Hostname]
            target_label: node
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)

      # 2. Kube-state-metrics 监控
      - job_name: 'kube-state-metrics'
        kubernetes_sd_configs:
          - role: service
        relabel_configs:
          - source_labels: [__meta_kubernetes_node_name]
            target_label: node
          - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: kube-state-metrics;true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_port]
            target_label: __metrics_port__
            regex: (\d+)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__address__]
            target_label: __address__
            regex: (.+?)(\:\d+)?
            replacement: $1:8080

      # 3. cAdvisor 容器资源监控
      - job_name: 'kubernetes-cadvisor'
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__meta_kubernetes_node_address_InternalIP]
            target_label: __address__  # 添加这行
            replacement: "${1}:10250"
          - source_labels: [__meta_kubernetes_node_name]
            target_label: node
          - target_label: __metrics_path__
            replacement: /metrics/cadvisor
          - target_label: __scheme__
            replacement: https
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true

      # 4. Kubelet 监控
      - job_name: 'kubernetes-nodes'
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__meta_kubernetes_node_address_InternalIP]
            target_label: __address__  # 添加这行
            replacement: "${1}:10250"
          - source_labels: [__meta_kubernetes_node_name]
            target_label: node
          - target_label: __metrics_path__
            replacement: /metrics
          - target_label: __scheme__
            replacement: https
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true

      # 5. API Server 监控
      - job_name: 'kubernetes-apiservers'
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - action: keep
            regex: default;kubernetes;https
            source_labels:
              - __meta_kubernetes_namespace
              - __meta_kubernetes_service_name
              - __meta_kubernetes_endpoint_port_name
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true

      # 6. VictoriaMetrics Agent
      - job_name: 'vmagent'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
            action: keep
            regex: victoria-metrics-agent
          - source_labels: [__address__]
            target_label: __address__
            regex: (.+?)(:\d+)?
            replacement: ${1}:8429
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod

kind: ConfigMap
metadata:
  name: vmagent-scrape-config
  namespace: monitoring

2.6 创建Service+Deployment

注意:第二篇中创建vminsert服务端证书用的域名如果是瞎编的,需要在Deployment中配置域名劫持

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: victoria-metrics-vmagent
  namespace: monitoring
  labels:
    app.kubernetes.io/instance: vmagent
    app.kubernetes.io/name: victoria-metrics-vmagent
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: vmagent
      app.kubernetes.io/name: victoria-metrics-vmagent
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: vmagent
        app.kubernetes.io/name: victoria-metrics-vmagent
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/vmagent-cm.yaml") . | sha256sum }}
    spec:
      imagePullSecrets:
      - name: ng-cloud
      serviceAccountName: vmagent
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app.kubernetes.io/name
                      operator: In
                      values: ["victoria-metrics-vmagent"]
                topologyKey: kubernetes.io/hostname
      containers:
        - name: vmagent
          image: "{{ .Values.registry }}{{ if .Values.registry }}/{{ end }}vmagent:{{ .Values.vmagentVersion }}"
          imagePullPolicy: {{ if .Values.registry }}Always{{ else }}IfNotPresent{{ end }}
          args:
            - '--envflag.enable'
            - '--envflag.prefix=VM_'
            - '--httpListenAddr=:8429'
            - '--loggerFormat=json'
            - '--promscrape.config=/config/scrape/scrape.yml'
            - '--remoteWrite.tmpDataPath=/tmpData'
            - '--remoteWrite.url=https://xzd.vminsert.vm:30480/insert/0/prometheus'
            - '--remoteWrite.tlsInsecureSkipVerify=true'
          ports:
            - containerPort: 8429
              name: http
          livenessProbe:
            tcpSocket:
              port: http
            initialDelaySeconds: 5
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /health
              port: http
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 15
          resources:
            limits:
              cpu: "1"
              memory: 1Gi
            requests:
              cpu: 250m
              memory: 512Mi             
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          volumeMounts:
            - mountPath: /tmpData
              name: tmpdata
            - mountPath: /config/scrape
              name: scrape-config
            - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
              name: k8s-token
              readOnly: true
            - mountPath: /etc/tls
              name: vmagent-tls
              readOnly: true
      hostAliases:
        - ip: "xxx.xxx.xxx.xxx"            # vminsert的接入IP
          hostnames: ["abc.vminsert.vm"]   # 创建服务器证书时配置的接入域名
      volumes:
        - emptyDir: {}
          name: tmpdata
        - name: scrape-config
          configMap:
            name: vmagent-scrape-config
        - name: k8s-token
          secret:
            secretName: vmagent-token
        - name: vmagent-tls
          secret:
            secretName: vmagent-tls      
---
apiVersion: v1
kind: Service
metadata:
  name: vmagent-victoria-metrics-agent
  namespace: monitoring
  labels:
    app.kubernetes.io/instance: vmagent
    app.kubernetes.io/name: victoria-metrics-vmagent
spec:
  type: NodePort
  ports:
    - port: 8429
      targetPort: http
      name: http
  selector:
    app.kubernetes.io/instance: vmagent
    app.kubernetes.io/name: victoria-metrics-vmagent

结尾

监控多K8S集群,就在每个K8S集群中部署vmagent即可。

根据文中提供的ConfigMap自动发生配置,还需要在K8S集群中部署node-exporter和kube-stat-metrics。

本篇先写到这,后面继续讲述各种细节:报警规则、报警推送、Grafana仪表盘修改设置等。

贴几张Grafana效果图:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

lianglove7

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值