Nagios Log Server
Practical Experience
Dave Williams
1
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Agenda
▶ Background
▶ Why choose Nagios Log Server
▶ Implementation
▶ Source Configuration
▶ Useful things to know
▶ Initial Dashboards
▶ Final Dashboards
▶ System Performance
▶ Conclusions
2
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Background
▶UK based
– Mainframe (IBM & Honeywell)
– Unix (HP-UX, AIX, Solaris)
– Linux (RedHat, SLES, Debian)
– Network (CASE, 3COM, CISCO)
▶Working for Atos
– French Outsourcing Company
– Mainframes, Unix, HPC,
Security, Managed Services,
Advisory Services
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Background
▶ System Monitoring
– OpenView
– Netview
– Open Master
▶ Open Source Monitoring
– NetSaint on AIX
– Nagios
– Nagios XI
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Why choose Nagios Log Server?
▶ Needed a log server of some nature
▶ Already built a Elk & Logstash system (not using Kibana) by hand
▶ Used Splunk in a previous life to good effect
▶ Last year Nagios Logserver announced – after Ethan and others had taken note
▶ Seemed to be a ‘cost effective’ easy build option
▶ Included authentication & access control necessary for Managed Services
environment.
5
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Implementation
▶ Because of use of Centos installed from source
– no great issues, ntp requirement in install script overcome.
• Complete!
• 12 Aug 18:40:02 ntpdate[2930]: no server suitable for synchronization
found
• ===================
• INSTALLATION ERROR!
• ===================
• Installation step failed - exiting.
• Check for error messages in the install log (install.log).
• If you require assistance in resolving the issue, please include install.log
• in your communications with Nagios Enterprises technical support.
6
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Implementation
• The step that failed was: 'prereqs'
• # Set date/time because ssl certificates can be in the future... (fix for pypi
and get-pip)
• # ntpdate -u pool.ntp.org
▶ Easily able to move data storage to a nominated filesystem
7
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Implementation
▶ Connecting a new instance to the cluster :
– really is as simple as the manual describes
• install on new host
• connect to the web interface
• enter IP address / name of original cluster node
• enter Cluster ID of the original system
– Finish Installation.
8
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Underlying Structure
9
Server 1
Server N
Logstash
Logstash
Elasticsearch
Cluster
Kibana
Queried by
Push data
into
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Source Configuration
▶ Creation of feeds straightforward.
– First syslog, using syslog remote to accept other systems data
– Because of SNMPTT SNMP traps appearing in syslog also recorded
– Could use Eventlog (NXLog) for Windows in future
▶ VMware logs – from ESXi not the VM’s :
– Add Input, udp {
type => 'esxilogs'
port => 1514
}
– Save and apply, adjust iptables if required
– follow this VMWare configuration guide to setup your ESXI hosts to log
to udp://nagios.log.server.ip:1514
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayK
C&externalId=1007329
– Or read https://assets.nagios.com/downloads/nagios-log-
server/docs/Sending-ESXi-Logs-To-Nagios-Log-Server.pdf
10
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Source Configuration
For NetFlow use this :-
Logstash has native NetFlow v5 and v9 codecs. It can't handle high volume (I'm
guessing no more than a few hundred flows per second)..
– udp { host => "0.0.0.0"
– port => 2055
– codec => netflow { cache_ttl => 1 versions => [ 5, 9 ] }
– type => "netflow" }
– Save and apply, adjust iptables if required
11
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Source Configuration (Pi)
http://www.paluch.biz/blog/134-capturing-and-visualizing-sensor-data-using-the-elk-stack.html
▶ IoT (Internet of Things) simple solution:
– RasPi distance sensor :
– The RaspberryPi is sending its data regularly to
logstash using the TCP input using JSON. JSON
is the simplest data format available on IoT
platforms.
– input{ tcp{ port => 9400
– codec => "json_lines"
– }
– }
– output{
– elasticsearch_http{
– host => "localhost"
– port => 9200
– index => "distance-%{+YYYY.MM.dd}" } }
12
import socket import json import time from
distancemeter import get_distance,cleanup #
Logstash TCP/JSON Host JSON_PORT = 9400
JSON_HOST = '192.168.55.34' if __name__ ==
'__main__': try: s = socket.socket(socket.AF_INET,
socket.SOCK_STREAM) s.connect((JSON_HOST,
JSON_PORT)) while True: distance =
get_distance() data = {'message': 'distance %.1f
cm' % distance, 'distance': distance, 'hostname':
socket.gethostname()} s.send(json.dumps(data))
s.send('n') print ("Received distance = %.1f cm" %
distance) time.sleep(0.2) # interrupt except
KeyboardInterrupt: print("Program interrupted")
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Source Configuration (Pi)
http://www.paluch.biz/blog/134-capturing-and-visualizing-sensor-data-using-the-elk-stack.html
13
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Source Configuration (The Force Awakens)
14
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Useful things to know
▶ How do I install Logstash plugins ?
– /usr/local/nagioslogserver/logstash/bin/plugin install logstash-codec-cef
– (Installs ArcSight logfile handler…)
▶ Check the latest upgrade documentation for how to pause shard allocation :
– https://assets.nagios.com/downloads/nagios-log-server/docs/Upgrade-
Instructions-For-Nagios-Log-Server.pdf
– For large clusters makes a real difference to how long a rolling update can
take
▶ One of my favourite filters :
– if [severity_label] == "Notice“ and [program] == “sudo” {
– drop {}
– }
15
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Useful things to know
▶ Get used to looking at curl -XGET 'http://localhost:9200/
▶ Need the cluster state ? :-
– # curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "80e9022e-f73f-429e-8927-xxxxxxxxxx",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 86,
"active_shards" : 136,
"relocating_shards" : 0,
"initializing_shards" : 6,
"unassigned_shards" : 30
16
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Useful things to know
▶ Monitoring the Nagios Log Server
– Other presentations will cover this topic – see Eric Loyd , Track 1 @ 2:30
today
▶ But mainly use :9200 locally (via NRPE) and then check_proc for the
appropriate processes.
▶ To uninstall manually :-
– Stop all of the relevant NLS processes (elasticsearch, logstash, and httpd)
and remove the following directories:
– rm -rf /usr/local/nagioslogserver
– rm -rf /var/www/html/nagioslogserver
– You can now do a ./fullinstall
17
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Useful things to know
▶ If you run equipment that has to output syslog on port 514 then Logserver can
cope (privileged port access)- NetApp is an example
– There’s a document for this ! https://assets.nagios.com/downloads/nagios-
log-server/docs/Listening-On-Privileged-Ports-With-Nagios-Log-Server.pdf
– You can change logstash to run as the root user.
– Open /etc/sysconfig/logstash and find the line: LS_USER=nagios
– Change this line to read LS_USER=root
– Restart the logstash service: # service logstash restart
18
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Useful things to know
▶ Alternative method of log shipping :-
– Was lumberjack but now logstash-forwarder (still lumberjack protocol )
• Encrypted shipping of compressed logs
• Low impact compared to a full Logstash install
• Use self signed certificates.
• Runs in EC2 micro instances
▶ CentOS 6
– wget http://packages.elasticsearch.org/logstashforwarder/centos/logstash-
forwarder-0.3.1-1.x86_64.rpm
rpm -ivh logstash-forwarder-0.3.1-1.x86_64.rpm
▶ CentOS 5
– wget http://download.elasticsearch.org/logstash-
forwarder/packages/logstash-forwarder-0.3.1-1.x86_64.rpm
rpm -ivh logstash-forwarder-0.3.1-1.x86_64.rpm
19
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Useful things to know
▶ Logstash plugins – over 180 at https://github.com/logstash-plugins
– Nice thing to know:-
– :::ruby
– output { if [type] == "syslog"
– and [program] == "jenkins"
– and [job] == "Install on Cluster"
– and "_grokparsefailure" not in [tags]
• {
• nagios_nsca {
– host => “nagios.example.com" port => 5667
– send_nsca_config => "/etc/send_nsca.cfg"
– message_format => "%{job} %{repo}"
– nagios_host => "jenkins"
– nagios_service => "deployed %{repo}"
– nagios_status => "2" } }
– # if type=syslog, program=jenkins, job="Install on Cluster" }
– # output
20
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Initial Dashboards
▶ Apache dashboard :-
21
Hmm – what are the 404’s ?
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Initial Dashboard
22
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Initial Dashboards
▶ Zoom in by clicking on the 404 part of the Pie chart :-
23
Ah ! A good idea to find win40.jpg then.
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Final Dashboards
24
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Final Dashboards
25
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Performance
▶ A good setting to configure to help control ES memory usage is to set the
indices field cache size. Limiting this indices cache size makes sense because
you rarely need to retrieve logs that are older than a few days. By default ES
will hold old indices in memory and will never let them go. So unless you have
unlimited memory than it makes sense to limit the memory in this scenario.
▶ To limit the cache size simply add the following value anywhere in your custom
elasticsearch.yml configuration file. This setting and adjusting the Java heap
memory size should be enough to get started but there are a few other things
that might be worth checking.
▶ indices.fielddata.cache.size: 40%
26
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Performance
▶ Another idea worth looking at for an easy performance boost would be disabling
swap if it has been enabled. Again, in most cloud environment and images
swap is turned off, but it is always a setting worth checking.
▶ To bypass the OS swap setting you can simply configure a no swap value in ES
by adding the following to your elasticsearch.yml configuration file.
• bootstrap.mlockall: true
– To check that this has value has been configured properly you can run this
command.
– curl http://localhost:9200/_nodes/process?pretty
– This may cause memory warnings when ES starts up (eg, unable to lock JVM
memory (ENOMEM). This can result in part of the JVM being swapped out.
Increase RLIMIT_MEMLOCK (ulimit).) but you should be able to ignore these
warnings. If you are concerned, turn these limits off at the OS level
▶ Centos /etc/sysctl.conf:
– Fs.file-max = 16384
▶ Centos /etc/security/limits.conf:
– * - nofiles 16384
27
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Performance
▶ Rules of thumb :-
– Due issues with JVM heap size, individual Elasticsearch nodes don't scale well
beyond 64GB of RAM. After reaching 64GB of RAM (with 31GB allocated to
the Java heap), you should scale horizontally rather than vertically.
– Elasticsearch has a lot of optimizations built around fast retrieval from disk,
and a lot of knobs you can tweak to ensure that the most frequently searched
indices live on SSD.
– With respect to the concern about high-volume indexing causing search
performance problems: if this is a problem you can use index routing to help
by ensuring that data is indexed on nodes with the fastest disk (say SSD in
RAID 0), then moved to nodes with spinning disk. If your cluster is search-
heavy you could also increase the number of replica shards, which requires
more storage but decreases search time.
28
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Conclusions
▶ Obvious ones first :
– You can’t run this on a RaspberryPi ! (Or maybe you can – ask me outside
this presentation….)
– You need log sources that matter
– You need time to develop filters and alerts that make sense to your
organisation.
▶ Anything can be a logfile
– You can point Logserver at any readable file and parse the content
29
| 31-07-2015 | Dave Williams | © Atos
GB | Managed Services | TTS
Questions
30
Atos, the Atos logo, Atos Consulting, Atos Worldgrid, Worldline,
BlueKiwi, Bull, Canopy the Open Cloud Company, Yunano, Zero Email,
Zero Email Certified and The Zero Email Company are registered
trademarks of the Atos group. July 2015. © 2015 Atos. Confidential
information owned by Atos, to be used by the recipient only. This
document, or any part of it, may not be reproduced, copied, circulated
and/or distributed nor quoted without prior written approval from
Atos.
31-07-2015
© Atos
Thanks
For more information please contact:
T+ 33 1 98765432
M+ 44 (0) 7973226073
dave.2.williams@atos.net

More Related Content

PPTX
Nagios XI Best Practices
PDF
Apple Captive Network Assistant Bypass with ClearPass Guest
PDF
Mobile Backend as a Service(MBaaS)
PPTX
EMEA Airheads_ Aruba AppRF – AOS 6.x & 8.x
PDF
EMEA Airheads- Troubleshooting 802.1x issues
PPTX
Getting started with docker
PPTX
Owasp A9 USING KNOWN VULNERABLE COMPONENTS IT 6873 presentation
PPTX
EMEA Airheads- Aruba Central with Instant AP
Nagios XI Best Practices
Apple Captive Network Assistant Bypass with ClearPass Guest
Mobile Backend as a Service(MBaaS)
EMEA Airheads_ Aruba AppRF – AOS 6.x & 8.x
EMEA Airheads- Troubleshooting 802.1x issues
Getting started with docker
Owasp A9 USING KNOWN VULNERABLE COMPONENTS IT 6873 presentation
EMEA Airheads- Aruba Central with Instant AP

What's hot (10)

PPTX
The Aruba Tech Support Top 10: WLAN design, configuration and troubleshooting...
PPTX
openshift technical overview - Flow of openshift containerisatoin
PDF
Aruba mobility access switch useful commands v2
PDF
Advanced rf troubleshooting_peter lane
PPTX
ClearPass design scenarios that solve the toughest security policy requirements
PDF
PDF
Aruba Mobility Controller 7200 Installation Guide
PDF
Tempest scenariotests 20140512
PPTX
EMEA Airheads - AP Discovery Logic and AP Deployment
PDF
Blockchain overview, use cases, implementations and challenges
The Aruba Tech Support Top 10: WLAN design, configuration and troubleshooting...
openshift technical overview - Flow of openshift containerisatoin
Aruba mobility access switch useful commands v2
Advanced rf troubleshooting_peter lane
ClearPass design scenarios that solve the toughest security policy requirements
Aruba Mobility Controller 7200 Installation Guide
Tempest scenariotests 20140512
EMEA Airheads - AP Discovery Logic and AP Deployment
Blockchain overview, use cases, implementations and challenges
Ad

Similar to Dave Williams - Nagios Log Server - Practical Experience (20)

PPT
Logstash
PDF
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
PDF
Nagios 3
PDF
LogStash in action
PDF
Handout: 'Open Source Tools & Resources'
PPTX
Nagios Conference 2014 - Scott Wilkerson - Log Monitoring and Log Management ...
PDF
Nagios 3
PDF
Install nagios
PDF
Install nagios
PDF
Install nagios
PDF
Atmosphere 2014: Centralized log management based on Logstash and Kibana - ca...
PPTX
Open Source Monitoring Tools
PPTX
NagiOs.pptxhjkgfddssddfccgghuikjhgvccvvhjj
PPTX
Introduction to Monitoring Tools for DevOps
PPTX
Introduction to Monitoring Tools for DevOps
KEY
London devops logging
PDF
Learning Nagios module 1
PDF
Open Source Logging and Metric Tools
ODP
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
PDF
How to configure Nagios in Fedora ?
Logstash
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Nagios 3
LogStash in action
Handout: 'Open Source Tools & Resources'
Nagios Conference 2014 - Scott Wilkerson - Log Monitoring and Log Management ...
Nagios 3
Install nagios
Install nagios
Install nagios
Atmosphere 2014: Centralized log management based on Logstash and Kibana - ca...
Open Source Monitoring Tools
NagiOs.pptxhjkgfddssddfccgghuikjhgvccvvhjj
Introduction to Monitoring Tools for DevOps
Introduction to Monitoring Tools for DevOps
London devops logging
Learning Nagios module 1
Open Source Logging and Metric Tools
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
How to configure Nagios in Fedora ?
Ad

More from Nagios (20)

PDF
Jesse Olson - Nagios Log Server Architecture Overview
PDF
Trevor McDonald - Nagios XI Under The Hood
PDF
Sean Falzon - Nagios - Resilient Notifications
PDF
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
PDF
Janice Singh - Writing Custom Nagios Plugins
PDF
Mike Weber - Nagios and Group Deployment of Service Checks
PDF
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
PDF
Matt Bruzek - Monitoring Your Public Cloud With Nagios
PDF
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
PDF
Eric Loyd - Fractal Nagios
PDF
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
PDF
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
PPTX
Nagios World Conference 2015 - Scott Wilkerson Opening
PDF
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
PDF
Nagios Log Server - Features
PDF
Nagios Network Analyzer - Features
PPTX
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
ODP
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
ODP
Nagios Conference 2014 - Trevor McDonald - Monitoring The Physical World With...
ODP
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
Jesse Olson - Nagios Log Server Architecture Overview
Trevor McDonald - Nagios XI Under The Hood
Sean Falzon - Nagios - Resilient Notifications
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Janice Singh - Writing Custom Nagios Plugins
Mike Weber - Nagios and Group Deployment of Service Checks
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Eric Loyd - Fractal Nagios
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Nagios World Conference 2015 - Scott Wilkerson Opening
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios Log Server - Features
Nagios Network Analyzer - Features
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Trevor McDonald - Monitoring The Physical World With...
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions

Recently uploaded (20)

PPT
teaching pronunciation ways and how .ppt
PPTX
Music & Arts 7 Quarter 1 Weeks 5 to7.pptx
PDF
_INVITATIONS AND REPLIES.pdffhihgushfuhug
PPTX
Karl_Marx_Theorie,arly Life: Born 1818, Germany • Studied philosophy
PDF
Exposición del bullying en inglés equipo
PPTX
CISCO company research - by PREC students.pptx
PPTX
Drama (All Literature in English) UGC NET
PPTX
Prevention of sexual harassment at work place
DOCX
Title: Crime Prevention: A Comprehensive Overview Crime Prevention .docx
PDF
American culture presentation. It is about American culture
PPTX
MASTERING TIME: STRATEGIES FOR MANAGING STRESS AND BOOSTING PRODUCTIVITY A Mo...
PPTX
Distinctive Believes of Seventh-day Adventist Church.pptx
PPTX
The-Impact-of-Study-Spaces-on-Academic-Performance.pptx
PPTX
White and Beige Cute Illustrative Thesis Defense Presentation.pptx.pptx
PPTX
08mendelian-genetics-punnett-square.pptx
PPTX
2025-08-31 Joseph 05 (shared slides).pptx
PPTX
1.OHS LEVEL 1.pptx Occupational Health and Safety
DOCX
Rev 5 Destressing of LWR -18.11.2022.docx
PDF
Recent advances and field updates in legal research
PPTX
2025-09-07 Joseph 06 (shared slides).pptx
teaching pronunciation ways and how .ppt
Music & Arts 7 Quarter 1 Weeks 5 to7.pptx
_INVITATIONS AND REPLIES.pdffhihgushfuhug
Karl_Marx_Theorie,arly Life: Born 1818, Germany • Studied philosophy
Exposición del bullying en inglés equipo
CISCO company research - by PREC students.pptx
Drama (All Literature in English) UGC NET
Prevention of sexual harassment at work place
Title: Crime Prevention: A Comprehensive Overview Crime Prevention .docx
American culture presentation. It is about American culture
MASTERING TIME: STRATEGIES FOR MANAGING STRESS AND BOOSTING PRODUCTIVITY A Mo...
Distinctive Believes of Seventh-day Adventist Church.pptx
The-Impact-of-Study-Spaces-on-Academic-Performance.pptx
White and Beige Cute Illustrative Thesis Defense Presentation.pptx.pptx
08mendelian-genetics-punnett-square.pptx
2025-08-31 Joseph 05 (shared slides).pptx
1.OHS LEVEL 1.pptx Occupational Health and Safety
Rev 5 Destressing of LWR -18.11.2022.docx
Recent advances and field updates in legal research
2025-09-07 Joseph 06 (shared slides).pptx

Dave Williams - Nagios Log Server - Practical Experience

  • 1. Nagios Log Server Practical Experience Dave Williams 1
  • 2. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Agenda ▶ Background ▶ Why choose Nagios Log Server ▶ Implementation ▶ Source Configuration ▶ Useful things to know ▶ Initial Dashboards ▶ Final Dashboards ▶ System Performance ▶ Conclusions 2
  • 3. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Background ▶UK based – Mainframe (IBM & Honeywell) – Unix (HP-UX, AIX, Solaris) – Linux (RedHat, SLES, Debian) – Network (CASE, 3COM, CISCO) ▶Working for Atos – French Outsourcing Company – Mainframes, Unix, HPC, Security, Managed Services, Advisory Services
  • 4. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Background ▶ System Monitoring – OpenView – Netview – Open Master ▶ Open Source Monitoring – NetSaint on AIX – Nagios – Nagios XI
  • 5. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Why choose Nagios Log Server? ▶ Needed a log server of some nature ▶ Already built a Elk & Logstash system (not using Kibana) by hand ▶ Used Splunk in a previous life to good effect ▶ Last year Nagios Logserver announced – after Ethan and others had taken note ▶ Seemed to be a ‘cost effective’ easy build option ▶ Included authentication & access control necessary for Managed Services environment. 5
  • 6. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Implementation ▶ Because of use of Centos installed from source – no great issues, ntp requirement in install script overcome. • Complete! • 12 Aug 18:40:02 ntpdate[2930]: no server suitable for synchronization found • =================== • INSTALLATION ERROR! • =================== • Installation step failed - exiting. • Check for error messages in the install log (install.log). • If you require assistance in resolving the issue, please include install.log • in your communications with Nagios Enterprises technical support. 6
  • 7. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Implementation • The step that failed was: 'prereqs' • # Set date/time because ssl certificates can be in the future... (fix for pypi and get-pip) • # ntpdate -u pool.ntp.org ▶ Easily able to move data storage to a nominated filesystem 7
  • 8. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Implementation ▶ Connecting a new instance to the cluster : – really is as simple as the manual describes • install on new host • connect to the web interface • enter IP address / name of original cluster node • enter Cluster ID of the original system – Finish Installation. 8
  • 9. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Underlying Structure 9 Server 1 Server N Logstash Logstash Elasticsearch Cluster Kibana Queried by Push data into
  • 10. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Source Configuration ▶ Creation of feeds straightforward. – First syslog, using syslog remote to accept other systems data – Because of SNMPTT SNMP traps appearing in syslog also recorded – Could use Eventlog (NXLog) for Windows in future ▶ VMware logs – from ESXi not the VM’s : – Add Input, udp { type => 'esxilogs' port => 1514 } – Save and apply, adjust iptables if required – follow this VMWare configuration guide to setup your ESXI hosts to log to udp://nagios.log.server.ip:1514 http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayK C&externalId=1007329 – Or read https://assets.nagios.com/downloads/nagios-log- server/docs/Sending-ESXi-Logs-To-Nagios-Log-Server.pdf 10
  • 11. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Source Configuration For NetFlow use this :- Logstash has native NetFlow v5 and v9 codecs. It can't handle high volume (I'm guessing no more than a few hundred flows per second).. – udp { host => "0.0.0.0" – port => 2055 – codec => netflow { cache_ttl => 1 versions => [ 5, 9 ] } – type => "netflow" } – Save and apply, adjust iptables if required 11
  • 12. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Source Configuration (Pi) http://www.paluch.biz/blog/134-capturing-and-visualizing-sensor-data-using-the-elk-stack.html ▶ IoT (Internet of Things) simple solution: – RasPi distance sensor : – The RaspberryPi is sending its data regularly to logstash using the TCP input using JSON. JSON is the simplest data format available on IoT platforms. – input{ tcp{ port => 9400 – codec => "json_lines" – } – } – output{ – elasticsearch_http{ – host => "localhost" – port => 9200 – index => "distance-%{+YYYY.MM.dd}" } } 12 import socket import json import time from distancemeter import get_distance,cleanup # Logstash TCP/JSON Host JSON_PORT = 9400 JSON_HOST = '192.168.55.34' if __name__ == '__main__': try: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((JSON_HOST, JSON_PORT)) while True: distance = get_distance() data = {'message': 'distance %.1f cm' % distance, 'distance': distance, 'hostname': socket.gethostname()} s.send(json.dumps(data)) s.send('n') print ("Received distance = %.1f cm" % distance) time.sleep(0.2) # interrupt except KeyboardInterrupt: print("Program interrupted")
  • 13. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Source Configuration (Pi) http://www.paluch.biz/blog/134-capturing-and-visualizing-sensor-data-using-the-elk-stack.html 13
  • 14. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Source Configuration (The Force Awakens) 14
  • 15. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Useful things to know ▶ How do I install Logstash plugins ? – /usr/local/nagioslogserver/logstash/bin/plugin install logstash-codec-cef – (Installs ArcSight logfile handler…) ▶ Check the latest upgrade documentation for how to pause shard allocation : – https://assets.nagios.com/downloads/nagios-log-server/docs/Upgrade- Instructions-For-Nagios-Log-Server.pdf – For large clusters makes a real difference to how long a rolling update can take ▶ One of my favourite filters : – if [severity_label] == "Notice“ and [program] == “sudo” { – drop {} – } 15
  • 16. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Useful things to know ▶ Get used to looking at curl -XGET 'http://localhost:9200/ ▶ Need the cluster state ? :- – # curl -XGET 'http://localhost:9200/_cluster/health?pretty=true' { "cluster_name" : "80e9022e-f73f-429e-8927-xxxxxxxxxx", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 86, "active_shards" : 136, "relocating_shards" : 0, "initializing_shards" : 6, "unassigned_shards" : 30 16
  • 17. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Useful things to know ▶ Monitoring the Nagios Log Server – Other presentations will cover this topic – see Eric Loyd , Track 1 @ 2:30 today ▶ But mainly use :9200 locally (via NRPE) and then check_proc for the appropriate processes. ▶ To uninstall manually :- – Stop all of the relevant NLS processes (elasticsearch, logstash, and httpd) and remove the following directories: – rm -rf /usr/local/nagioslogserver – rm -rf /var/www/html/nagioslogserver – You can now do a ./fullinstall 17
  • 18. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Useful things to know ▶ If you run equipment that has to output syslog on port 514 then Logserver can cope (privileged port access)- NetApp is an example – There’s a document for this ! https://assets.nagios.com/downloads/nagios- log-server/docs/Listening-On-Privileged-Ports-With-Nagios-Log-Server.pdf – You can change logstash to run as the root user. – Open /etc/sysconfig/logstash and find the line: LS_USER=nagios – Change this line to read LS_USER=root – Restart the logstash service: # service logstash restart 18
  • 19. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Useful things to know ▶ Alternative method of log shipping :- – Was lumberjack but now logstash-forwarder (still lumberjack protocol ) • Encrypted shipping of compressed logs • Low impact compared to a full Logstash install • Use self signed certificates. • Runs in EC2 micro instances ▶ CentOS 6 – wget http://packages.elasticsearch.org/logstashforwarder/centos/logstash- forwarder-0.3.1-1.x86_64.rpm rpm -ivh logstash-forwarder-0.3.1-1.x86_64.rpm ▶ CentOS 5 – wget http://download.elasticsearch.org/logstash- forwarder/packages/logstash-forwarder-0.3.1-1.x86_64.rpm rpm -ivh logstash-forwarder-0.3.1-1.x86_64.rpm 19
  • 20. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Useful things to know ▶ Logstash plugins – over 180 at https://github.com/logstash-plugins – Nice thing to know:- – :::ruby – output { if [type] == "syslog" – and [program] == "jenkins" – and [job] == "Install on Cluster" – and "_grokparsefailure" not in [tags] • { • nagios_nsca { – host => “nagios.example.com" port => 5667 – send_nsca_config => "/etc/send_nsca.cfg" – message_format => "%{job} %{repo}" – nagios_host => "jenkins" – nagios_service => "deployed %{repo}" – nagios_status => "2" } } – # if type=syslog, program=jenkins, job="Install on Cluster" } – # output 20
  • 21. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Initial Dashboards ▶ Apache dashboard :- 21 Hmm – what are the 404’s ?
  • 22. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Initial Dashboard 22
  • 23. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Initial Dashboards ▶ Zoom in by clicking on the 404 part of the Pie chart :- 23 Ah ! A good idea to find win40.jpg then.
  • 24. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Final Dashboards 24
  • 25. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Final Dashboards 25
  • 26. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Performance ▶ A good setting to configure to help control ES memory usage is to set the indices field cache size. Limiting this indices cache size makes sense because you rarely need to retrieve logs that are older than a few days. By default ES will hold old indices in memory and will never let them go. So unless you have unlimited memory than it makes sense to limit the memory in this scenario. ▶ To limit the cache size simply add the following value anywhere in your custom elasticsearch.yml configuration file. This setting and adjusting the Java heap memory size should be enough to get started but there are a few other things that might be worth checking. ▶ indices.fielddata.cache.size: 40% 26
  • 27. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Performance ▶ Another idea worth looking at for an easy performance boost would be disabling swap if it has been enabled. Again, in most cloud environment and images swap is turned off, but it is always a setting worth checking. ▶ To bypass the OS swap setting you can simply configure a no swap value in ES by adding the following to your elasticsearch.yml configuration file. • bootstrap.mlockall: true – To check that this has value has been configured properly you can run this command. – curl http://localhost:9200/_nodes/process?pretty – This may cause memory warnings when ES starts up (eg, unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).) but you should be able to ignore these warnings. If you are concerned, turn these limits off at the OS level ▶ Centos /etc/sysctl.conf: – Fs.file-max = 16384 ▶ Centos /etc/security/limits.conf: – * - nofiles 16384 27
  • 28. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Performance ▶ Rules of thumb :- – Due issues with JVM heap size, individual Elasticsearch nodes don't scale well beyond 64GB of RAM. After reaching 64GB of RAM (with 31GB allocated to the Java heap), you should scale horizontally rather than vertically. – Elasticsearch has a lot of optimizations built around fast retrieval from disk, and a lot of knobs you can tweak to ensure that the most frequently searched indices live on SSD. – With respect to the concern about high-volume indexing causing search performance problems: if this is a problem you can use index routing to help by ensuring that data is indexed on nodes with the fastest disk (say SSD in RAID 0), then moved to nodes with spinning disk. If your cluster is search- heavy you could also increase the number of replica shards, which requires more storage but decreases search time. 28
  • 29. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Conclusions ▶ Obvious ones first : – You can’t run this on a RaspberryPi ! (Or maybe you can – ask me outside this presentation….) – You need log sources that matter – You need time to develop filters and alerts that make sense to your organisation. ▶ Anything can be a logfile – You can point Logserver at any readable file and parse the content 29
  • 30. | 31-07-2015 | Dave Williams | © Atos GB | Managed Services | TTS Questions 30
  • 31. Atos, the Atos logo, Atos Consulting, Atos Worldgrid, Worldline, BlueKiwi, Bull, Canopy the Open Cloud Company, Yunano, Zero Email, Zero Email Certified and The Zero Email Company are registered trademarks of the Atos group. July 2015. © 2015 Atos. Confidential information owned by Atos, to be used by the recipient only. This document, or any part of it, may not be reproduced, copied, circulated and/or distributed nor quoted without prior written approval from Atos. 31-07-2015 © Atos Thanks For more information please contact: T+ 33 1 98765432 M+ 44 (0) 7973226073 [email protected]