一、前言概述
上一篇主要讲述了VictoriaMetrics 的功能作用及部署实现,本篇进行多K8S集群接入的部署实现。主要使用vmagent进行各K8S集群中的指标数据收集和推送。
1.1 vmagent功能介绍
在监控系统中,vmagent 负责从各个K8S集群收集数据,并处理转发至 VictoriaMetrics 库。vmagent 具备强大的数据采集能力,支持多种主流的监控数据协议,如 Prometheus 的 Metrics 协议、InfluxDB 协议、Graphite 协议等,能够适配不同类型监控目标的数据输出格式。同时,它还拥有数据过滤、数据重标签、数据聚合等数据处理功能。通过数据过滤,可以筛选掉无用或冗余的监控指标,减少数据传输量和存储成本;数据重标签则可以根据实际需求为监控指标添加、修改或删除标签,方便后续的数据查询和分析;数据聚合能够将多个细粒度的监控指标按照一定规则合并为粗粒度指标,进一步优化数据存储和查询效率。
1.2 数据采集
数据采集搭配以下工具来获取各类指标:
1.kube-stat-metrics用于获取k8s集群内各类资源的指标数据
2.kubelet获取容器的资源使用数据、node节点资源总量数据等
3.node-export获取node节点资源使用数据
4.blackbox获取端口、TCP链接等连通性
二、vmagent 部署
2.1 创建命名空间
kubectl create ns monitoring
2.2 创建合适的RBAC
# ServiceAccount:关闭自动挂载Token
apiVersion: v1
kind: ServiceAccount
metadata:
name: vmagent
namespace: monitoring
automountServiceAccountToken: false
---
# ClusterRole:最小化集群级权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vmagent-role
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- nodes/proxy
- pods
- services
- endpoints
verbs: ["get", "list", "watch"]
- apiGroups: ["discovery.k8s.io"]
resources:
- endpointslices
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
verbs: ["get"]
- apiGroups: ["metrics.k8s.io"]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses", "ingresses/status"]
verbs: ["get", "list", "watch"]
---
# Role:命名空间级Secret访问(更安全的权限隔离)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: vmagent-secrets
namespace: monitoring
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
---
# ClusterRoleBinding:集群权限绑定
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: vmagent-monitor-binding
subjects:
- kind: ServiceAccount
name: vmagent
namespace: monitoring
roleRef:
kind: ClusterRole
name: vmagent-role
apiGroup: rbac.authorization.k8s.io
---
# RoleBinding:Secret权限绑定
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: vmagent-secret-binding
namespace: monitoring
subjects:
- kind: ServiceAccount
name: vmagent
namespace: monitoring
roleRef:
kind: Role
name: vmagent-secrets
apiGroup: rbac.authorization.k8s.io
2.3 创建Token Secret
apiVersion: v1
kind: Secret
metadata:
name: vmagent-token
namespace: monitoring
annotations:
kubernetes.io/service-account.name: vmagent
labels:
app: victoriametrics
component: vmagent
type: kubernetes.io/service-account-token
2.4 创建TLS Secret
注意:执行下面命令要把第二篇中的vmagent客户端证书文件上传到当前目录
kubectl -n monitoring create secret generic vmagent-tls \
--from-file=client.crt=vmagent.crt \
--from-file=client.key=vmagent.key \
--from-file=ca.crt=ca.crt
2.5 创建ConfigMap
注意:根据情况设置内部标签(external_labels)区别集群(用于Grafana表盘筛选)
ConfigMap中主要配置了自动发现
---
apiVersion: v1
data:
scrape.yml: |-
global:
scrape_interval: 15s
external_labels:
region: "中国XXX" # 可以设置用于区分的内部标签
klustername: "某某某集群" # 可以设置用于区分的内部标签
scrape_configs:
# 1. 核心基础设施监控
- job_name: '某某某集群' # 设置用于区分不同集群
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
- target_label: __scheme__
replacement: http
- source_labels: [__meta_kubernetes_node_address_Hostname]
target_label: node
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# 2. Kube-state-metrics 监控
- job_name: 'kube-state-metrics'
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_node_name]
target_label: node
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: kube-state-metrics;true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_port]
target_label: __metrics_port__
regex: (\d+)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
target_label: __scheme__
regex: (https?)
- source_labels: [__address__]
target_label: __address__
regex: (.+?)(\:\d+)?
replacement: $1:8080
# 3. cAdvisor 容器资源监控
- job_name: 'kubernetes-cadvisor'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__ # 添加这行
replacement: "${1}:10250"
- source_labels: [__meta_kubernetes_node_name]
target_label: node
- target_label: __metrics_path__
replacement: /metrics/cadvisor
- target_label: __scheme__
replacement: https
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
# 4. Kubelet 监控
- job_name: 'kubernetes-nodes'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__ # 添加这行
replacement: "${1}:10250"
- source_labels: [__meta_kubernetes_node_name]
target_label: node
- target_label: __metrics_path__
replacement: /metrics
- target_label: __scheme__
replacement: https
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
# 5. API Server 监控
- job_name: 'kubernetes-apiservers'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
# 6. VictoriaMetrics Agent
- job_name: 'vmagent'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: victoria-metrics-agent
- source_labels: [__address__]
target_label: __address__
regex: (.+?)(:\d+)?
replacement: ${1}:8429
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
kind: ConfigMap
metadata:
name: vmagent-scrape-config
namespace: monitoring
2.6 创建Service+Deployment
注意:第二篇中创建vminsert服务端证书用的域名如果是瞎编的,需要在Deployment中配置域名劫持
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: victoria-metrics-vmagent
namespace: monitoring
labels:
app.kubernetes.io/instance: vmagent
app.kubernetes.io/name: victoria-metrics-vmagent
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/instance: vmagent
app.kubernetes.io/name: victoria-metrics-vmagent
template:
metadata:
labels:
app.kubernetes.io/instance: vmagent
app.kubernetes.io/name: victoria-metrics-vmagent
annotations:
checksum/config: {{ include (print $.Template.BasePath "/vmagent-cm.yaml") . | sha256sum }}
spec:
imagePullSecrets:
- name: ng-cloud
serviceAccountName: vmagent
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values: ["victoria-metrics-vmagent"]
topologyKey: kubernetes.io/hostname
containers:
- name: vmagent
image: "{{ .Values.registry }}{{ if .Values.registry }}/{{ end }}vmagent:{{ .Values.vmagentVersion }}"
imagePullPolicy: {{ if .Values.registry }}Always{{ else }}IfNotPresent{{ end }}
args:
- '--envflag.enable'
- '--envflag.prefix=VM_'
- '--httpListenAddr=:8429'
- '--loggerFormat=json'
- '--promscrape.config=/config/scrape/scrape.yml'
- '--remoteWrite.tmpDataPath=/tmpData'
- '--remoteWrite.url=https://xzd.vminsert.vm:30480/insert/0/prometheus'
- '--remoteWrite.tlsInsecureSkipVerify=true'
ports:
- containerPort: 8429
name: http
livenessProbe:
tcpSocket:
port: http
initialDelaySeconds: 5
periodSeconds: 15
readinessProbe:
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
volumeMounts:
- mountPath: /tmpData
name: tmpdata
- mountPath: /config/scrape
name: scrape-config
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: k8s-token
readOnly: true
- mountPath: /etc/tls
name: vmagent-tls
readOnly: true
hostAliases:
- ip: "xxx.xxx.xxx.xxx" # vminsert的接入IP
hostnames: ["abc.vminsert.vm"] # 创建服务器证书时配置的接入域名
volumes:
- emptyDir: {}
name: tmpdata
- name: scrape-config
configMap:
name: vmagent-scrape-config
- name: k8s-token
secret:
secretName: vmagent-token
- name: vmagent-tls
secret:
secretName: vmagent-tls
---
apiVersion: v1
kind: Service
metadata:
name: vmagent-victoria-metrics-agent
namespace: monitoring
labels:
app.kubernetes.io/instance: vmagent
app.kubernetes.io/name: victoria-metrics-vmagent
spec:
type: NodePort
ports:
- port: 8429
targetPort: http
name: http
selector:
app.kubernetes.io/instance: vmagent
app.kubernetes.io/name: victoria-metrics-vmagent
结尾
监控多K8S集群,就在每个K8S集群中部署vmagent即可。
根据文中提供的ConfigMap自动发生配置,还需要在K8S集群中部署node-exporter和kube-stat-metrics。
本篇先写到这,后面继续讲述各种细节:报警规则、报警推送、Grafana仪表盘修改设置等。
贴几张Grafana效果图: