一、夜莺监控搭建V6.0

本文详细介绍了如何搭建国产开源监控系统Nightingalev6.0,包括与Prometheus、VictoriaMetrics等生态的集成,以及安装配置过程,如Prometheus、MySQL、Redis、Nginx的设置,还讨论了在遇到问题时的处理方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

夜莺监控搭建V6.0

夜莺监控介绍

官网:https://flashcat.cloud/
GitHub:https://github.com/ccfos/nightingale
#夜莺( Nightingale )是一款国产开源、云原生监控系统,Nightingale 在 2020.3.20 发布 v1 版本,目前是 v6 版本,从这个版本开始,与 Prometheus、VictoriaMetrics、Grafana、Telegraf、Datadog 等生态做了协同集成,力争打造国内最好用的开源运维监控系统。出自 Open-Falcon 研发团队。

中心汇聚式部署方案

架构图

1.自己画的架构图。方便理解
#多的不说,少的不唠!大家自己去官网看文档。下面进行部署!!!

在这里插入图片描述

1.1 软件包安装

[root@aly ~]# mkdir -p /opt/prometheus        #创建普罗米修斯的运行目录
[root@aly ~]# wget https://github.com/prometheus/prometheus/releases/download/v2.43.0/prometheus-2.43.0.linux-amd64.tar.gz -O prometheus-2.43.0.linux-amd64.tar.gz
#获取普罗米修细的二进制包
[root@aly ~]# tar xf prometheus-2.43.0.linux-amd64.tar.gz    #解压
[root@aly ~]# cp -far prometheus-2.43.0.linux-amd64/* /opt/prometheus/

1.2 启动普罗米修斯时序库

[root@aly ~]# cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
[root@aly ~]# systemctl daemon-reload # 加载新的unit 配置文件
[root@aly ~]# systemctl enable prometheus 
[root@aly ~]# systemctl restart prometheus

1.2 依赖软件安装

1.mysql安装                             #自己本地存在mysql可以不安装!!mysql5.7以上即可!
[root@aly ~]# yum -y install mariadb*
[root@aly ~]# systemctl enable --now mariadb
# 密码修改,可以修改为其它密码
mysql -e "SET PASSWORD FOR 'root'@'localhost' = PASSWORD('1234');"           #如果密码安全策略不满足可以调整策略
set global validate_password_policy=LOW;   #密码验证等级
set global validate_password_length=6;     #密码长度
2.redis安装
[root@aly ~]# yum -y install redis
[root@aly ~]# systemctl enable --now redis

1.3 安装夜莺

[root@aly ~]# mkdir -p /opt/n9e && cd /opt/n9e
[root@aly n9e]# wget https://download.flashcat.cloud/n9e-v6.0.0-ga.3-linux-amd64.tar.gz
#https://github.com/ccfos/nightingale/releases       ##github的地址
[root@aly n9e]# tar -xvf n9e-v6.0.0-ga.3-linux-amd64.tar.gz 
[root@aly n9e]# mysql -uroot -p123456 <n9e.sql      #sql语句中有创建库
[root@aly n9e]# 

[Redis]
# address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
Address = "127.0.0.1:6379"

[DB]
# postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
DSN="root:123456@tcp(127.0.0.1:3306)/n9e_v6?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
#默认DB的密码是1234需要修改为自己的数据库密码.

[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://127.0.0.1:9090/api/v1/write"					#pushgw选择Prometheus
# Basic auth username
BasicAuthUser = ""
# Basic auth password
BasicAuthPass = ""
# timeout settings, unit: ms
Headers = ["X-From", "n9e"]
Timeout = 10000
DialTimeout = 3000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100


[root@aly n9e]# cat <<EOF >/etc/systemd/system/n9e.service
[Unit]
Description="n9e.service"
After=network.target

[Service]
Type=simple
ExecStart=/opt/n9e/n9e server
WorkingDirectory=/opt/n9e
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=n9e.service
[Install]
WantedBy=multi-user.target
EOF
[root@aly n9e]# systemctl daemon-reload
[root@aly n9e]# systemctl enable --now n9e.service        
#下图为根据业务所画的集群版本的监控图
##架构叙述:
	agent端安装categraf采集器,配置文件中编写nginx的域名->nginx中配置反向代理+负载均衡将流量转发给n9e-server->n9e-server暴漏17000端口将请求接收后转发给配置文件Pushg.writers的时序库vminsert:8428端口将数据进行存储!

1.3.1 安装时序库VM(可选)

#下面是官方所述的一些特点
VictoriaMetrics 架构简单,可靠性高,在性能,成本,可扩展性方面表现出色,社区活跃,且和 Prometheus 生态绑定紧密。如果单机版本的 Prometheus 无法在容量上满足贵司的需求,可以使用 VictoriaMetrics 作为时序数据库。

VictoriaMetrics 提供单机版和集群版。如果您的每秒写入数据点数小于100万(这个数量是个什么概念呢,如果只是做机器设备的监控,每个机器差不多采集200个指标,采集频率是10秒的话每台机器每秒采集20个指标左右,100万/20=5万台机器),VictoriaMetrics 官方默认推荐您使用单机版,单机版可以通过增加服务器的CPU核心数,增加内存,增加IOPS来获得线性的性能提升。且单机版易于配置和运维,

它可以用作普罗米修斯的长期储存。
它可以用作 Grafana中Prometheus 的直接替代品,因为它支持 Prometheus 查询 API。
它可以用作Grafana中Graphite的直接替代品,因为它支持Graphite API。与Graphite相比,VictoriaMetrics允许将基础设施成本降低10倍以上 - 请参阅此案例研究。
易于设置和操作:
VictoriaMetrics由一个没有外部依赖关系的小可执行文件组成。
所有配置都是通过具有合理默认值的显式命令行标志完成的。
所有数据存储在命令行标志指向的单个目录中。-storageDataPath
使用vmbackup/vmrestore工具可以轻松快速地从即时快照进行备份。有关更多详细信息,请参阅此文章。
它实现了类似 PromQL 的查询语言 - MetricsQL,它在 PromQL 之上提供了改进的功能。
它提供全局查询视图。多个 Prometheus 实例或任何其他数据源可能会将数据摄取到 VictoriaMetrics 中。稍后可以通过单个查询查询此数据。
它为数据引入和数据查询提供了高性能以及良好的垂直和水平可扩展性。它的性能比InfluxDB和TimescaleDB高出20倍。
在处理数百万个独特的时间序列(又称高基数)时,它使用的 RAM 比 InfluxDB 少 10 倍,比普罗米修斯、灭霸或 Cortex 少 7 倍。
可以说VictoriaMetrics是企业版的普罗米修斯。

####################################################安装步骤###############################################################
!!!!!!如果按照上方采用了Prometheus那么就可以不用VM。vm需要集群版功能才比较齐全。单机版没有报警没啥子意义!!!



1.[root@flshcat-server opt]# mkdir /opt/vm
2.[root@flshcat-server opt]# cd vm/
3.[root@flshcat-server vm]# wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.90.0/victoria-metrics-linux-amd64-v1.90.0.tar.gz
4.[root@flshcat-server vm]# tar -xvf victoria-metrics-linux-amd64-v1.90.0.tar.gz     
5.[root@flshcat-server vm]# ls			#单节点只有一个prod
victoria-metrics-linux-amd64-v1.90.0.tar.gz  victoria-metrics-prod
6.[root@flshcat-server vm]# cat <<EOF >/etc/systemd/system/victoria.service
[Unit]
Description="VictoriaMetrics"
Documentation=https://docs.victoriametrics.com/
After=network.target
[Service]
Type=simple
ExecStart=/opt/vm/victoria-metrics-prod
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=victoria-metrics
[Install]
WantedBy=multi-user.target
EOF
7.[root@flshcat-server vm]# systemctl daemon-reload
8.[root@flshcat-server vm]# systemctl enable --now victoria.service 
9.[root@flshcat-server vm]# ss -anot | grep "8428"              #监听在8428端口
LISTEN     0      128          *:8428                     *:*                   users:(("victoria-metric",pid=21095,fd=10))

1.3.2 VM时序库修改n9e配置

#因为是测试阶段所以普罗米修斯时序库和vm时序库一起用!!
[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://127.0.0.1:9090/api/v1/write"
# Basic auth username
BasicAuthUser = ""
# Basic auth password
BasicAuthPass = ""
# timeout settings, unit: ms
Headers = ["X-From", "n9e"]
Timeout = 10000
DialTimeout = 3000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
## Optional TLS Config
# UseTLS = false
# TLSCA = "/etc/n9e/ca.pem"
# TLSCert = "/etc/n9e/cert.pem"
# TLSKey = "/etc/n9e/key.pem"
# InsecureSkipVerify = false
# [[Writers.WriteRelabels]]
# Action = "replace"
# SourceLabels = ["__address__"]
# Regex = "([^:]+)(?::\\d+)?"
# Replacement = "$1:80"
# TargetLabel = "__address__"
#
 
 
[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
# Url = "http://127.0.0.1:9090/api/v1/write"
Url = "http://127.0.0.1:8428/api/v1/write"
# # Basic auth username
BasicAuthUser = ""
# # Basic auth password
BasicAuthPass = ""
# # timeout settings, unit: ms
Headers = ["X-From", "n9e"]
Timeout = 10000
DialTimeout = 3000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# # time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
 
[root@flshcat-server etc]# systemctl restart n9e.service
[root@flshcat-server etc]# systemctl status n9e.service  #观察无报错即可!

1.4 安装ibex告警自愈(server)

1.[root@flshcat-server]# wget https://github.com/flashcatcloud/ibex/releases/download/v1.0.0/ibex-1.0.0.tar.gz   #安装ibex模块
2.[root@flshcat-server ibex]# tar -xvf ibex-1.0.0.tar.gz -C /opt/ibex/
3.[root@flshcat-server ibex]# mysql -uroot -p123456 < sql/ibex.sql           #会自动生成ibex库
[MySQL]                              #修改MySQL配置文件
# mysql address host:port
Address = "127.0.0.1:3306"
# mysql username
User = "root"
# mysql password
Password = "123456"
# database name
DBName = "ibex"
4.[root@aly n9e]# cat <<EOF >/etc/systemd/system/ibex.service
[Unit]
Description="ibex.service"
After=network.target

[Service]
Type=simple
ExecStart=/opt/ibex/ibex server
WorkingDirectory=/opt/ibex
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=ibex.service
[Install]
WantedBy=multi-user.target
EOF
5.[root@flshcat-server ibex]# systemctl daemon-reload
6.[root@flshcat-server ibex]# systemctl enable --now ibex.service
#rpc监听20090,http_server监听10090

1.5 安装categraf采集器(agent)

[root@aly yeyin]# wget https://download.flashcat.cloud/categraf-v0.2.35-linux-amd64.tar.gz
[root@aly yeyin]# tar zxvf categraf-v0.2.35-linux-amd64.tar.gz
[root@aly yeyin]# cp -far categraf-v0.2.35-linux-amd64/* /opt/categraf
[root@aly yeyin]# cp /opt/categraf/conf/categraf.service /etc/systemd/system/
[root@aly yeyin]# systemctl daemon-reload
[root@aly yeyin]# systemctl enable --now categraf

1.6 访问夜莺

http://8.130.93.111:17000/metric/explorer           
#uers:root
#passwd:root.2020

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SfYaHtTY-1686996000037)(D:\文档\assets-监控\image-20230414141428802.png)]

1.7 agent配置

1.#基础信息修改
vim /opt/categraf/conf/config.toml
[[writers]]
url = "http://8.130.93.111:17000/prometheus/v1/write"       #server的地址
[heartbeat]
enable = true
url = "http://8.130.93.111:17000/v1/