prometheus+grafana+node_exporter_prometheus + grafana + nodeexporter-CSDN博客

本文链接：https://blog.csdn.net/abcdefglouy/article/details/132737778

一、安装prometheus
Prometheus 是一个开源的服务监控系统和时序数据库，其提供了通用的数据模型和快捷数据采集、存储和查询接口。它的核心组件Prometheus server会定期从静态配置的监控目标或者基于服务发现自动配置的自标中进行拉取数据，当新拉取到的数据大于配置的内存缓存区时，数据就会持久化到存储设备当中。

    1.每个被监控的主机都可以通过专用的exporter 程序提供输出监控数据的接口，它会在目标处收集监控数据，并暴露出一个HTTP接口供Prometheus server查询，Prometheus通过基于HTTP的pull的方式来周期性的采集数据。
    2.任何被监控的目标都需要事先纳入到监控系统中才能进行时序数据采集、存储、告警和展示，监控目标可以通过配置信息以静态形式指定，也可以让Prometheus通过服务发现的机制进行动态管理。
    3.Prometheus 能够直接把API Server作为服务发现系统使用，进而动态发现和监控集群中的所有可被监控的对象。

下载压缩包
wget http://github.com/prometheus/prometheus/releases/download/v2.44.0/prometheus-2.44.0.linux-amd64.tar.gz
 
解压
tar -zvxf prometheus-2.44.0.linux-amd64.tar.gz
 
进入prometheus文件夹
cd prometheus-2.44.0.linux-amd64
 
启动prometheus
./prometheus --config.file=prometheus.yml &

访问页面-localhost:9090 或 ip：9090,正常访问Prometheus页面表示安装成功
在这里插入图片描述
二、添加locust服务到Prometheus
基础环境配置

# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
# 如果开启防火则，则向防火墙开放端口（如果防火墙已关闭，请忽视）
firewall-cmd --zone=public --add-port=9100/tcp --permanent
# 重启防火墙（如果防火墙已关闭，请忽视）
firewall-cmd --reload

# 关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config
# 临时生效
setenforce 0

# 编辑prometheus.yml，注意填写服务master IP : port
 
 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["192.168.24.228:9090"]
        labels:
          instance: prometheus
  - job_name: locust
    metrics_path: '/export/prometheus'
    static_configs:
      - targets: ['192.168.24.228:8089']
        labels:
          instance: locust

编辑配置文件，注意重新启动服务或重启centos7也行
配置说明：
1 、 global 配置块：控制 Prometheus 服务器的全局配置
➢ scrape_interval ：配置拉取数据的时间间隔，默认为 1 分钟。
➢ evaluation_interval ：规则验证（生成 alert ）的时间间隔，默认为 1 分钟。
2 、 rule_files 配置块：规则配置文件
3 、 scrape_configs 配置块：配置采集目标相关， prometheus 监视的目标。 Prometheus 自身
的运行信息可以通过 HTTP 访问，所以 Prometheus 可以监控自己的运行数据。
➢ job_name ：监控作业的名称
➢ static_configs ：表示静态目标配置，就是固定从某个 target 拉取数据
➢ targets ：指定监控的目标，其实就是从哪儿拉取数据。 Prometheus 会从
http://ip:9090/metrics 上拉取数据
Prometheus 是可以在运行时自动加载配置的。启动时需要添加： --web.enable-lifecycle

后台运行方式：
nohup ./prometheus --config.file=prometheus.yml > ./prometheus.log 2>&1 &
浏览器输入： http://xx.xx.xx.xx:9090/
➢ 点击 Status ，选中 Targets ：
prometheus 是 up 状态，表示安装启动成功：
在这里插入图片描述
停止Prometheus

找到9090的进程杀掉即可
netstat -tlanp | grep 9090

----运行locust----指定运行prometheus_exporter.py

使用方式两种，

a、直接修改改文件，将自己的压测类替换脚本类，当启动压测，自动会启动ip:/export/prometheus的服务，该服务的数据就是我们需要收集的数据

b、以master启动该脚本，压测脚本以worker形式启动，指向master为启动该脚本的地址

b优势在于，监听服务可以永远启动，第一种方式只有压测时才启动

# centos7 作为主控运行刚才的监控文件
locust -f prometheus_exporter.py --master
 
 
 
prometheus_exporter.py
直接复制下面代码进行监察服务
 
 
# coding: utf8
 
import six
from itertools import chain
 
from flask import request, Response
from locust import stats as locust_stats, runners as locust_runners
from locust import User, task, events
from prometheus_client import Metric, REGISTRY, exposition
import os, sys
from locust import task, tag, TaskSet
from locust import HttpUser
pathS = os.getcwd()
Project_Path = os.path.dirname(os.path.dirname(os.path.dirname(pathS)))
root_Path = os.path.dirname(pathS)
sys.path.append(Project_Path)
sys.path.append(root_Path)
 
# This locustfile adds an external web endpoint to the locust master, and makes it serve as a prometheus exporter.
# Runs it as a normal locustfile, then points prometheus to it.
# locust -f prometheus_exporter.py --master
 
# Lots of code taken from [mbolek's locust_exporter](https://github.com/mbolek/locust_exporter), thx mbolek!
 
 
class LocustCollector(object):
    registry = REGISTRY
 
    def __init__(self, environment, runner):
        self.environment = environment
        self.runner = runner
 
    def collect(self):
        # collect metrics only when locust runner is spawning or running.
        runner = self.runner
 
        if runner and runner.state in (locust_runners.STATE_SPAWNING, locust_runners.STATE_RUNNING):
            stats = []
            for s in chain(locust_stats.sort_stats(runner.stats.entries), [runner.stats.total]):
                stats.append({
                    "method": s.method,
                    "name": s.name,
                    "num_requests": s.num_requests,
                    "num_failures": s.num_failures,
                    "avg_response_time": s.avg_response_time,
                    "min_response_time": s.min_response_time or 0,
                    "max_response_time": s.max_response_time,
                    "current_rps": s.current_rps,
                    "median_response_time": s.median_response_time,
                    "ninetieth_response_time": s.get_response_time_percentile(0.9),
                    # only total stats can use current_response_time, so sad.
                    # "current_response_time_percentile_95": s.get_current_response_time_percentile(0.95),
                    "avg_content_length": s.avg_content_length,
                    "current_fail_per_sec": s.current_fail_per_sec
                })
 
            # perhaps StatsError.parse_error in e.to_dict only works in python slave, take notices!
            errors = [e.to_dict() for e in six.itervalues(runner.stats.errors)]
 
            metric = Metric('locust_user_count', 'Swarmed users', 'gauge')
            metric.add_sample('locust_user_count', value=runner.user_count, labels={})
            yield metric
 
            metric = Metric('locust_errors', 'Locust requests errors', 'gauge')
            for err in errors:
                metric.add_sample('locust_errors', value=err['occurrences'],
                                  labels={'path': err['name'], 'method': err['method'],
                                          'error': err['error']})
            yield metric
 
            is_distributed = isinstance(runner, locust_runners.MasterRunner)
            if is_distributed:
                metric = Metric('locust_slave_count', 'Locust number of slaves', 'gauge')
                metric.add_sample('locust_slave_count', value=len(runner.clients.values()), labels={})
                yield metric
 
            metric = Metric('locust_fail_ratio', 'Locust failure ratio', 'gauge')
            metric.add_sample('locust_fail_ratio', value=runner.stats.total.fail_ratio, labels={})
            yield metric
 
            metric = Metric('locust_state', 'State of the locust swarm', 'gauge')
            metric.add_sample('locust_state', value=1, labels={'state': runner.state})
            yield metric
 
            stats_metrics = ['avg_content_length', 'avg_response_time', 'current_rps', 'current_fail_per_sec',
                             'max_response_time', 'ninetieth_response_time', 'median_response_time',
                             'min_response_time',
                             'num_failures', 'num_requests']
 
            for mtr in stats_metrics:
                mtype = 'gauge'
                if mtr in ['num_requests', 'num_failures']:
                    mtype = 'counter'
                metric = Metric('locust_stats_' + mtr, 'Locust stats ' + mtr, mtype)
                for stat in stats:
                    # Aggregated stat's method label is None, so name it as Aggregated
                    # locust has changed name Total to Aggregated since 0.12.1
                    if 'Aggregated' != stat['name']:
                        metric.add_sample('locust_stats_' + mtr, value=stat[mtr],
                                          labels={'path': stat['name'], 'method': stat['method']})
                    else:
                        metric.add_sample('locust_stats_' + mtr, value=stat[mtr],
                                          labels={'path': stat['name'], 'method': 'Aggregated'})
                yield metric
 
 
@events.init.add_listener
def locust_init(environment, runner, **kwargs):
    print("locust init event received")
    if environment.web_ui and runner:
        @environment.web_ui.app.route("/export/prometheus")
        def prometheus_exporter():
            registry = REGISTRY
            encoder, content_type = exposition.choose_encoder(request.headers.get('Accept'))
            if 'name[]' in request.args:
                registry = REGISTRY.restricted_registry(request.args.get('name[]'))
            body = encoder(registry)
            return Response(body, content_type=content_type)
 
        REGISTRY.register(LocustCollector(environment, runner))
 
 
class Dummy(User):
    @task(20)
    def hello(self):
        pass

重启后访问192.168.24.228:8089/export/prometheus,页面显示下截图，表示已经可以成功了
在这里插入图片描述

三、安装Grafana

# 下载
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.3-1.x86_64.rpm
# 安装
yum -y install grafana-enterprise-8.5.3-1.x86_64.rpm
 
#设置grafana服务开机自启，并启动服务
 
systemctl daemon-reload
 
systemctl enable grafana-server.service
 
systemctl start grafana-server.service

访问IP：3000页面

四、添加prometheus数据源
填写IP、端口即可，然后保存
在这里插入图片描述

五、
一切就绪，开始压测

执行压测：

1.运行master机：locust --master --web-host=本机ip -f prometheus_exporter.py

2.检查是否正在监听：

cmd中执行netstat -ano|findstr 8089，发现当前服务器ip和master机ip正在ESTABLISH着8089端口

浏览器输入master机ip:8089/export/prometheus可查看到prometheus数据

3.运行负载机：go run test.go --master-host=master机ip --master-port=5557

4…浏览器输入master机ip:8089，输入总user数+ramp up数，开始压测

5.浏览器打开服务器ip:3000，查看仪表盘，正常显示当前locust的执行数据

六、prometheus + node_exporter监控linux服务器
①直接在linux服务器上wget方式下载node_exporter安装包

# 新建目录
mkdir -p /data/prometheus/
# 进入目标目录
cd /data/prometheus/
# 下载
wget -c https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
# 解压
tar -vxzf node_exporter-1.3.1.linux-amd64.tar.gz
# 进入安装目录
mv node_exporter-1.3.1.linux-amd64 /usr/local/node-exporter
# 进入目录
cd /usr/local/node-exporter

②将node-exporter配置为系统服务
1、进入systemd目录

cd /usr/lib/systemd/system

2、创建文件

vim node_exporter.service
 
# 添加如下内容
[[Unit]
Description=https://github.com/prometheus/node_exporter
After=network-online.target
 
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter
 
[Install]
WantedBy=multi-user.target

3、生效系统systemd文件

systemctl daemon-reload

4、设置开机自启

systemctl enable node_exporter

5、启动和停止服务命令

# 查看状态
systemctl status node_exporter
# 启动
systemctl start node_exporter.service
# 停止
systemctl stop node_exporter.service
# 重启（不建议使用，容易出问题）
systemctl restart node_exporter

③启动node_exporter

# 后台启动
nohup /usr/local/node_exporter/node_exporter >> /usr/local/node_exporter/node_exporter.out 2>&1 &
# 指定端口
--web.listen-address=:9200

node_exporter用来安装到被监控的主机上，服务器端通过调用默认端口9100 来获取服务器信息。访问node_exporter
http:172.30.18.244:9100/metrics，默认端口为9100

④添加 Prometheus 监控配置

# 进入 prometheus 文件夹
cd /usr/local/prometheus
# 编辑 prometheus 配置文件
vim prometheus.yml
 
# 添加内容如下
- job_name: 'linux-node-cluster'
    static_configs:
      - targets: ['172.30.18.244:9100']
        labels:
          instance: '172.30.18.244_node'
    
# 检查配置文件
./promtool check config prometheus.yml
 
# 重启prometheus
systemctl stop prometheus.service
systemctl start prometheus.service