≡ Menu

monitoring services

This is a user contributed tutorial.

Nagios is free, open source host, service and network monitoring services. Nagios provides an extensible framework, that can monitor pretty much anything using plugins. Some of the items that can be monitored using Nagios plugins are listed below.

=> Disk space usage of remote Linux and Windows server
=> CPU Usage
=> Memory usage
=> Hardware Temperature
=> VPN tunnels
=> Router and Switches
=> Databases
=> Network services (DHCP, DNS, LDAP, SMTP etc.)

Nagios Configurations are very granular and managed using following three different category of configuration files:

  • Nagios server and web console configuration files can be used to configure the Nagios server itself. For e.g. Use the nagios.cfg and cgi.cfg
  • Resource files can be used to store user defined macros and sensitive configuration informations such as passwords.
  • Object definition configuration files are used to store information about the hosts, services, commands, contacts, notification period etc.

Nagios has a web front end to display the status. Apart from getting the notification about the hosts and service status through email, SMS etc., you can also see the hosts, services, status through nagios web front end. You can project is on the NOC (Network Operation Center) to view the current status of your whole data center. You can also perform few actions on the web console such as disable and enable notification for a specific service. If you have defined the relationship between your hosts properly in the nagios configuration files, you can use the 3D display view to see a graphical representation of the whole data center visually. This also provides reporting feature where you can view the historic data such as availability of a particular service on a specific host over a period of time.


(Fig. 01 – Nagios web UI displaying status of various services on a Linux host)

Notification process on the Nagios is defined at a very granular level that it covers a wide range of possible scenarios on the notification including escalation process where a specific contact group can be notified if an issues has not been fixed after certain number of initial notifications. This is very helpful to automatically notify the management team about a critical service that was not fixed immediately.

Nagios can also be configured in a distributed setup, where datacenters from different parts of the world can be monitored using local nagios server that can report the status back to a central nagios server. This is achieved by NSCA (Nagios Service Check Acceptor) sending monitoring results from the local nagios server to the central server.

Following articles from The Geek Stuff blog, explains about everything that is required to get a jumpstart on the Nagios installation, configuration on Linux. This also explains about how to monitor Linux and Windows host.

On October 9th, 2007, Guardian Digital announced the newest release of EnGarde Secure Linux: Community 3.0.17 (Version 3.0, Release 17) server edition.

Features:

- Enterprise Reliability and scalability in a Community Platform
- Integrated SELinux policies and Firewall Functionality
- Intrusion detection and Complete Monitoring Services
- Web and Email Security Services
- Quick and easy Network Installation
- Combining the best of Open Source technologies
- Support for TCB, an alternative password shadowing scheme, has been added. This allows most system utilities to work with the least amount of privilege possible and, when properly configured, can allow you to run a system with zero setuid binaries.
- powernowd, a daemon to control the CPU speed and voltage of your server, is also now available. Once properly configured, powernowd can dynamically adjust the speed and voltage of your CPU, via the kernel CPUFreq and sysfs interfaces, to preserve power when it's idle.
- A very early-stage version of Samba 4 for users to evaluate.
- Samba 4 and much more

Download EnGarde Secure Linux

=> Visit official site to download latest release. Don't forget to check out EnGarde secure Linux documentation section. It offers quick start guide and other howtos.

When you cannot monitor your server for service availability, it is better to take help of automated monitor and restart utility. Last 4 days I was away from my server as I was enjoying my vacation. During this time due to load my lighttpd webserver died but it was restarted automatically within 2 minutes. I had utility configured for monitoring services on a Linux system called monit. It offers all features you ever needed for system monitoring and perform error recovery for UNIX like system.

Before monit I had my own shell and perl script for monitoring service. If service failed script will try to restart service and send an automated email to me. However monit is a superior solution.

monit is a utility for managing and monitoring processes, files, directories and devices on a Unix system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations. For example, monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses to much resources. You may use monit to monitor files, directories and devices for changes, such as timestamps changes, checksum changes or size changes.

Monit logo

You may also use monit to monitor files, directories and devices on localhost. Monit can monitor these items for changes, such as timestamps changes, checksum changes or size changes. This is also useful for security reasons you can monitor the md5 checksum of files that should not change.

Personally, I always install and configure monit on all boxes which are under my control.

Install monit under Debian or Ubuntu Linux

Use apt-get command to install monit
# apt-get install monitOR$ sudo apt-get install monit

Install monit under Red Hat enterprise Linux / CentOS Linux (source code installation)

Many distributions include monit. However monit is not included in official Red hat enterprise Linux. Just download monit source code from official web site using wget command:
# cd /opt
# wget http://www.tildeslash.com/monit/dist/monit-4.8.2.tar.gz
Untar monit
# tar -zxvf monit-4.8.2.tar.gz
# cd monit-4.8.2

Configure and compile monit:

# ./configure
# make

Install monit

# make install

Copy monit configuration file:

# cp monitrc /etc/monitrc

By default monit is located at /usr/local/bin/monit

How do I Configure monit?

monitrc is name of monit configuration file and it is by default located at /etc/monitrc location. However each distribution places file in different location: .
=> Source code installation : /etc/monitrc
=> Debian / Unentu Linux installation : /etc/monit/monitrc

Open monit configuration file and setup values as follows:
# vi /etc/monitrc

a) Run it as daemon and check the services (such as web, mysql, sshd) at 2-minute
intervals.
set daemon 120

b) Set syslog logging with the 'daemon' facility:
set logfile syslog facility log_daemon

c) Set mail server name to send email alert
set mailserver mail.cyberciti.biz
Set email format such as from email
set mail-format { from: alert@nixcraft.in
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}

d) Now most important part, restart lighttpd or apache web server if failed or killed by Linux kernel due to any causes:
check process lighttpd with pidfile /var/run/lighttpd.pid
group lighttpd
start program = "/etc/init.d/lighttpd start"
stop program = "/etc/init.d/lighttpd stop"
if failed host 75.126.43.232 port 80
protocol http then restart
if 5 restarts within 5 cycles then timeout

Where,

  • check process lighttpd with pidfile /var/run/lighttpd.pid : You are specifying lighttpd pid file and daemon name
  • group lighttpd: Specify group name, which is allowed or used to start/restart lighttpd
  • start program = "/etc/init.d/lighttpd start" : Command to start lighttpd server
  • stop program = "/etc/init.d/lighttpd stop" : Command to stop lighttpd server
  • if failed host 127.0.0.1 port 80 : Server IP address and port number (80)
  • protocol http then restart : If above IP and port failed restart the webserver
  • if 5 restarts within 5 cycles then timeout : Try to restart 5 times; if monit cannot restart webserver 5 times; just time out to avoid race condition.

Here is my mysql server restart configuration directives:
check process mysqld with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysqld start"
stop program = "/etc/init.d/mysqld stop"
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout

Here is my sshd server configuration directives:
check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/sshd start"
stop program "/etc/init.d/sshd stop"
if failed host 127.0.0.1 port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout

Here is my Apache serverrestart configuration directives:
check process httpd with pidfile /var/run/httpd.pid
group apache
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host 127.0.0.1 port 80
protocol http then restart
if 5 restarts within 5 cycles then timeout

Replace IP address 127.0.0.1 with your actual IP address. If you are using Debian just start monit:
# /etc/init.d/monit start

If you are using Red Hat Enterprise Linux, start monit from /etc/inittab file:
Open /etc/inittab file:
# vi /etc/inittab
Append following line:
mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc

Now start monit:
# init -qOR
# telinit -q

You can verify that monit is started from /var/log/messages log file:
# tail -f /var/log/messagesOutput:

Nov 21 04:39:21 server monit[8759]: Starting monit daemon
Nov 21 04:39:21 server monit[8759]: Monit started

If lighttpd died, you will see something as follows in log file:

Nov 21 04:45:13 server monit[8759]: 'lighttpd' process is not running
Nov 21 04:45:13 server monit[8759]: 'lighttpd' trying to restart
Nov 21 04:45:13 server monit[8759]: 'lighttpd' start: /etc/init.d/lighttpd

You may use monit to monitor daemon processes or similar programs running on localhost or started from /etc/init.d/ location such as
=> Apache Web Server
=> SSH Server
=> Postfix/Sendmail MTA
=> MySQL etc

Further readings