Nagios:
How will you check the status of services and hosts?
By using plugins we can check status of service and hosts etc. Nagios relies on plugins.
Nagios composition:
1.Plugins: Plugins are standalone extensions for nagios. They check a service and return
results to the nagios server
2.Schedular: it is a server part of Nagios and checkers plugins at regular intervals
3.GUI : is an interface of nagios.
Nagios is built on a server/agent architecture.
1. Nagios process/ scheduler executes the plugins
2. Plugin checks the status and gets results to the local service/remote service
3. And sends results to Nagios to process
4. Nagios server notifies the admin about the status processed by the scheduler
Nagios service checks:
States: OK, WARNING, CRITICAL,UNKNOWN
Critical → soft critical
MCA- Maximum check attempts
If you attempt more than 5(mca=5) then it is hard critical
To monitor Windows Server with Nagios, the Nagios monitoring server must be a
Linux system. Once admins install and configure this setup, they can create monitors
for Windows machines with the Nagios Remote Data Processor (NRDP) agent.
Nagios provides complete monitoring of applications and application state –
including Windows applications, Linux applications, UNIX applications, and Web
applications.
Nagios XI:
Nagios XI is used to actively monitor machines via the Nagios Cross Platform Agent
(NCPA). NCPA is an advanced, cross-platform agent that can be installed on Windows /
Linux / AIX / Mac OS X machines.
Nagios Log Server which can alert on any query on any log file on any system in your
infrastructure.
Your [ commands.cfg file ] will contain:
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTNAME$
$ARG1$
}
OR
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTADDRESS
$
}
Your [ services.cfg file ] will look similar to:
define service {
Check_command NagiosLogMonitor!logrobot!autofig!/var/log/proteus.log!15!
500.html!500 Internal Server Error!1!2!-foundn
max_check_attempts 1
service_description 500_ERRORS_LOGCHECK
host_name sky.blat-01.net,sky.blat-02.net,sky.blat-03.net
use fifteen-minute-interval
}
Yaml:
Consider a YAML example which is mentioned below −
---
environment: production
classes:
nfs::server:
exports:
- /srv/share1
- /srv/share3
parameters:
paramter1
Check diskspace used by a mounted volume:
This plugin is designed using Bash script and intended to display the disk space used by the
specified mount point or partition or volume.
This plugin is created so that 'Performance Gauges' in Nagios XI work properly, showing
appropriate 'Warning' and 'Critical' regions.
For Performance Gauges to show details properly, performance data should be in following
format:
'VarName'=CurrentValue Unit;WarningValue;CriticalValue;MininumValue;MaximumValue
Important:
1. It uses the output of 'df' command.
2. Mount point/Partition/Volume & Warning and Critical thresholds are to be mentioned by the
user itself.
To monitor remote Linux server:
1. Keep the plugin in /usr/local/nagios/libexec directory.
2. Add following line to the nrpe.cfg file:
command[check_disk.sh]=sudo /usr/local/nagios/libexec/check_disk.sh $ARG1$
3. Add the following line to /etc/sudoers file:
nagios ALL=(ALL) NOPASSWD:/usr/local/nagios/libexec/check_disk.sh
Usage:
On Monitoring Server:
/usr/local/nagios/libexec//check_nrpe -H 172.22.246.126 -c check_disk.sh -a '-p /volume1 -w 80
-c 90'
Output:
OK- /Volume1 : Total Space= 468G, Used Space= 80G, Available Space= 365G i.e. 18%
Usage | 'Usage'=18;80;90;0;100
Check File and Dir size for BASH:
check_file_size.sh
Bash shell script to check min/max size of files and/or directories. Relies on "stat" and "du",
which most distributions should have. Usage: check_file_size.sh [--minwarn size] [--maxwarn
size] [--mincrit size] [--maxcrit size] [-m
Bash shell script to check min/max size of files and/or directories.
Relies on "stat" and "du", which most distributions would have.
Usage: check_file_size.sh [--minwarn size] [--maxwarn size] [--mincrit size] [--maxcrit size] [-m|--
missingok] [-v|--verbose]
--minwarn,maxwarn,mincrit,maxcrit all take size in bytes
--missingok prevents errors from being raised if passed in files are missing
Examples:
# Warn if /tmp is above 1GB, Critical if above 2GB
sh check_file_size.sh --maxwarn 1000000000 --maxcrit 2000000000 /tmp
# Warn if any one file in /tmp is above 1GB, Critical if above 2GB
sh check_file_size.sh --maxwarn 1000000000 --maxcrit 2000000000 /tmp/*
How to check file size in linux edition?
check_myfilesiz.pl
This plugin is for LINUX edition and determines the file size on any Unix/Linux platform. You
need basic perl package to run this plugin.
Usage: check_myfilesize.pl filename Critical_size(in Bytes) Warning_size( in Bytes)
How to check free ram?
A simple shell script for nagios to check free memory on a target system
The plugin was designed to do active checks using check_by_ssh to check clients like a firewall.
How to check IO stats of one or all disks:
This plug-in checks IO stats of one (or all) disk. It can be used to send alerts when maximum
hard drive IO/s or sectors read|write/s is reached.
This plug-in checks IO stats of one (or all) disk. It can be used to send alerts when maximum
hard drive IO/s or sectors read|write/s is reached
Usage:
./check_diskstat.sh -d DEVICE -w tps,read,write -c tps,read,write | -h
-d DEVICE DEVICE must be without /dev (ex: -d sda)
-w/c TPS,READ,WRITE TPS means transfer per seconds (aka IO/s)
READ and WRITE are in sectors per seconds
Example: ./check_diskstat.sh -d sda -w 200,100000,100000 -c 300,200000,200000
This plugin use /sys filesystem for retrieving data. Average values are then calculated by
keeping an history file.
In order to check all disks on your system, you can use check_all_diskstat.sh.
Another example of script:
!/bin/bash
EXITCODE=0
CHK=/usr/lib/nagios/plugins/check_diskstat.sh
WARN=${1:-"300,10000,10000"}
CRIT=${2:-"400,20000,20000"}
for DEVICE in `ls /sys/block`; do
if [ -L /sys/block/$DEVICE/device ]; then
DEVNAME=$(echo /dev/$DEVICE | sed 's#!#/#g')
echo -n "$DEVNAME: "
OUTPUT="`$CHK -d $DEVICE -w $WARN -c $CRIT`"
STATUS=$?
if [ "$EXITCODE" -le "$STATUS" ]; then
EXITCODE=$STATUS;
fi
echo $OUTPUT | sed "s#=#_$DEVNAME=#g"
fi
done
exit $EXITCODE
How to check memory plugin in python?
./check_mem.py –help
How to check network bonding?
Check_bond.sh
Example OK output:
OK - Bonding Mode: adaptiveloadbalancing. Bond: up eth0 up eth1 up
Example CRITICAL output:
CRITICAL - Bonding Mode: adaptiveloadbalancing. eth0 down Bond: up eth1 up
How to check new user creation in linux:
Check_user_creation.sh – this script is used to check if a new user is created on your server.
Execute below command to create a reference file of existing users list.
cat /etc/passwd |cut -f 1 -d ":" > old_user.txt
Once the alert is generated, to stop further alerts copy new_user.txt to old_user.txt file
How to open FDs(file descriptors)?
Check_open_fds
Simple bash script
Work only in Linux
Check total open file descriptors and compare with percent of maximum allowance by kernel
There are default values. So script do not need any args
but you can customize:
-w - warning level in number of FDs
-W - warning level in % to kernel limit (default = 75%)
-c - critical level in number of FDs
-C - critical level in % to kernel limit (default = 90%)
wW < cC
Very simple bash script
Check total open file descriptors and compare with percent of
maximum allowance by kernel
There is default values. So script do not need any args
but you can customize:
-w - warning level in number of FDs
-W - warning level in % to kernel limit (default = 75%)
-c - critical level in number of FDs
-C - critical level in % to kernel limit (default = 90%)
wW < cC
Script uses logic from this page:
http://www.netadmintools.com/art295.html
Output of cat /proc/sys/fs/file-nr on differen kernels
3391 969 52427 # For kernels <= 2.4.X
2323 0 141241 # For kernels >= 2.6.X
|||
| | maximum open file descriptors (LIMIT)
| total free allocated file descriptors
total allocated file descriptors
(the number of file descriptors allocated since boot)
The number of open file descriptors is column 1 - column 2
In new kernel column 1 - is what we need.
use 1 - 2 for backcompatibility.
If problems and too much opened files you can increase allowance
echo "104854" > /proc/sys/fs/file-max
How to check VMware server 2 virtual machines?
Check_vmware2.pl – perl script for checking Vmware server 2.x
The plugin is written for linux.
How to check RHEL last update?
Check_rhel_lastupdate.sh
vi /etc/nagios/servers/client.cfg
Here add the below lines:
This basically includes the kind of services I want to monitor. Give the hostname of the machine
and its ip address which you want Nagios to monitor.
Server Monitoring With Nagios
Capabilities
Nagios is recognized as the top solution to monitor servers in a variety of different ways. Server
monitoring is made easy in Nagios because of the flexibility to monitor your servers with and without
agents. With over 3500 different addons available to monitor your servers, the community at the Nagios
Exchange have left no stone unturned.
Nagios is fully capable of monitoring Windows servers, Linux servers, Unix servers, Solaris, AIX, HP-UX,
and Mac OS/X and more.
These Nagios solutions provide server monitoring capabilities and benefits:
Nagios XI
Nagios Core
Windows Monitoring With Nagios
Nagios for Windows Performance Monitoring: Capabilities
Use Nagios for Windows network monitoring and receive complete monitoring of Microsoft Windows
desktop and server operating systems – including system metrics, service states, process states,
performance counters, event logs, applications (IIS, Exchange, etc), services (Active Directory, DHCP,
etc) and more.
Run reports
on disk usage, receive alerts based on CPU trends, and track RAM utilization.
Log network
interface statistics and be notified if certain adapters are handling unexpected bandwidth over certain
periods of time.
Gain insight and run reports off of granular performance data from Windows based operating systems.
Be notified if a process is either running, not running, or running too many instances.
Monitor mission critical Windows services in real-time.
Nagios provides a powerful network monitor tool for all versions of Windows.
Benefits
Implementing effective Windows monitoring with Nagios offers the following benefits:
Increased server, services, and application availability
Fast detection of network outages and protocol failures
Fast detection of failed services, processes and batch jobs
Solutions
These Nagios solutions provide Windows monitoring capabilities and benefits:
Nagios XI
Nagios Core
Linux Monitoring With Nagios
Capabilities
Nagios provides complete monitoring of Linux operating systems and distributions – including operating
system metrics, service state, process state, file system usage, and more. When you use Nagios to
monitor your Linux environment, you’re using one of the most powerful Linux monitoring tools on the
planet.
Benefits
Implementing effective Linux monitoring tools with Nagios offers the following benefits:
Increased server, services, and application availability
Fast detection of network outages and protocol failures
Nagios is the Linux Monitoring Tool.
Monitoring Linux Using SNMP
Learn how to monitor Linux machines with Nagios XI using SNMP. SNMP is an “agentless” method of
monitoring network devices and servers, and is often preferable to installing dedicated agents on target
machines.
Monitoring Linux Using SNMP
✎ EditSign
Installing the XI Linux Agent
This document describes how to install the Linux monitoring agent on target RHEL, CentOS, Fedora,
SLES, OpenSUSE, Ubuntu, and Debian Linux servers. Other Linux distributions may be added in the
future.