≡ Menu

memory usage

Nagios: System and Network Monitoring Book

Nagios is a popular open source computer system and network monitoring application software. You can easily monitor all your hosts, network equipment and services. It can send alert when things go wrong and again when they get better.

The convenience and reliability that monitoring programs offer system administrators is astounding. Whether at home, commuting, or on vacation, admins can continuously monitor their networks, learning of issues long before they become catastrophes.

Nagios, the most popular open source solution for system and network monitoring, is extremely robust, but it's also intensely complex. This eagerly anticipated revision of the highly acclaimed Nagios: System and Network Monitoring, has been updated to address Nagios 3.0 and will help readers take full advantage of the many powerful features of the new version. Ethan Galstad, the main developer of Nagios, called the first edition of Nagios "incredibly detailed." He went on to say, "I don't think I could have gone into that much detail if I wrote a book myself."

Nagios, which runs on Linux and most *nix variants, can be configured to continuously monitor network services such as SMTP, POP3, HTTP, NNTP, SSH, and FTP. It can also supervise host resources (processor load, disk and memory usage, running processes, log files, and so on) and environmental factors, such as temperature and humidity. Readers of Nagios learn how to:

  • Install and configure the Nagios core, all standard plugins, and selected third-party plugins
  • Configure the notification system
  • Program event handlers to take automatic action when trouble occurs
  • Write Perl plugins to customize Nagios for unique system needs
  • Quickly understand Nagios data using graphing and visualization tools
  • Monitor Windows servers, SAP systems, and databases

This dense, all-inclusive guide to Nagios also contains a chapter that highlights the differences between Nagios versions 2 and 3 and gives practical migration and compatibility tips. Nagios, 2nd Edition is a key resource for any system and network administrator and will ease the pain of network monitoring migraines in no time.

Wolfgang Barth has written several books for professional network administrators, including The Firewall Book (Suse Press), Network Analysis (Suse Press), and Backup Solutions with Linux (Open Source Press). He is a professional system administrator with considerable experience using Nagios.

Book Info

  • Title: Nagios: System and Network Monitoring, 2nd Edition
  • Author: Wolfgang Barth
  • Pub Date: October 2008, 720 pp
  • ISBN 9781593271794, $59.95 USD
  • Download free chapter 18: "NagVis" (PDF)
  • Order info: order@oreilly.com // 1-800-998-9938 // 1-707-827-7000
  • Support nixCraft: Order Nagios: System and Network Monitoring from Amazon.

This is a user contributed tutorial.

Nagios is free, open source host, service and network monitoring services. Nagios provides an extensible framework, that can monitor pretty much anything using plugins. Some of the items that can be monitored using Nagios plugins are listed below.

=> Disk space usage of remote Linux and Windows server
=> CPU Usage
=> Memory usage
=> Hardware Temperature
=> VPN tunnels
=> Router and Switches
=> Databases
=> Network services (DHCP, DNS, LDAP, SMTP etc.)

Nagios Configurations are very granular and managed using following three different category of configuration files:

  • Nagios server and web console configuration files can be used to configure the Nagios server itself. For e.g. Use the nagios.cfg and cgi.cfg
  • Resource files can be used to store user defined macros and sensitive configuration informations such as passwords.
  • Object definition configuration files are used to store information about the hosts, services, commands, contacts, notification period etc.

Nagios has a web front end to display the status. Apart from getting the notification about the hosts and service status through email, SMS etc., you can also see the hosts, services, status through nagios web front end. You can project is on the NOC (Network Operation Center) to view the current status of your whole data center. You can also perform few actions on the web console such as disable and enable notification for a specific service. If you have defined the relationship between your hosts properly in the nagios configuration files, you can use the 3D display view to see a graphical representation of the whole data center visually. This also provides reporting feature where you can view the historic data such as availability of a particular service on a specific host over a period of time.

(Fig. 01 – Nagios web UI displaying status of various services on a Linux host)

Notification process on the Nagios is defined at a very granular level that it covers a wide range of possible scenarios on the notification including escalation process where a specific contact group can be notified if an issues has not been fixed after certain number of initial notifications. This is very helpful to automatically notify the management team about a critical service that was not fixed immediately.

Nagios can also be configured in a distributed setup, where datacenters from different parts of the world can be monitored using local nagios server that can report the status back to a central nagios server. This is achieved by NSCA (Nagios Service Check Acceptor) sending monitoring results from the local nagios server to the central server.

Following articles from The Geek Stuff blog, explains about everything that is required to get a jumpstart on the Nagios installation, configuration on Linux. This also explains about how to monitor Linux and Windows host.

You can find the memory used by a program (process) by looking into /proc directory or using standard command such as ps or top. However, you must calculate all memory usage by hand i.e. add Shared Memory + mapped file + total virtual memory size of the process + Resident Set Size + non-swapped physical memory used by process.

So how do you find the memory used by a process or program under Linux? [click to continue…]

Q. How can I find out Linux Resource utilization using vmstat command? How do I get information about high disk I/O and memory usage?

A. vmstat command reports information about processes, memory, paging, block IO, traps, and cpu activity. However, a real advantage of vmstat command output - is to the point and (concise) easy to read/understand. The output of vmstat command use to help identify system bottlenecks. Please note that Linux vmstat does not count itself as a running process.

Here is an output of vmstat command from my enterprise grade system:
$ vmstat -S M

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
3  0      0   1963    607   2359    0    0     0     0    0     1 32  0 68  0


  • The fist line is nothing but six different categories. The second line gives more information about each category. This second line gives all data you need.
  • -S M: vmstat lets you choose units (k, K, m, M) default is K (1024 bytes) in the default mode. I am using M since this system has over 4 GB memory. Without -M option it will use K as unit

$ vmstat

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
3  0      0 2485120 621952 2415368  0    0     0     0    0     1 32  0 68  0

Field Description For Vm Mode

(a) procs is the process-related fields are:

  • r: The number of processes waiting for run time.
  • b: The number of processes in uninterruptible sleep.

(b) memory is the memory-related fields are:

  • swpd: the amount of virtual memory used.
  • free: the amount of idle memory.
  • buff: the amount of memory used as buffers.
  • cache: the amount of memory used as cache.

(c) swap is swap-related fields are:

  • si: Amount of memory swapped in from disk (/s).
  • so: Amount of memory swapped to disk (/s).

(d) io is the I/O-related fields are:

  • bi: Blocks received from a block device (blocks/s).
  • bo: Blocks sent to a block device (blocks/s).

(e) system is the system-related fields are:

  • in: The number of interrupts per second, including the clock.
  • cs: The number of context switches per second.

(f) cpu is the CPU-related fields are:

These are percentages of total CPU time.

  • us: Time spent running non-kernel code. (user time, including nice time)
  • sy: Time spent running kernel code. (system time)
  • id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
  • wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

As you see the first output produced gives averages data since the last reboot. Additional reports give information on a sampling period of length delay. You need to sample data using delays i.e. collect data by setting intervals. For example collect data every 2 seconds (or collect data every 2 second 5 times only):
$ vmstat -S M 2
$ vmstat -S M 2 5

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
3  0      0   1756    607   2359    0    0     0     0    0     1 32  0 68  0
3  0      0   1756    607   2359    0    0     0     0 1018    65 38  0 62  0
3  0      0   1756    607   2359    0    0     0     0 1011    64 37  0 63  0
3  0      0   1756    607   2359    0    0     0    20 1018    72 37  0 63  0
3  0      0   1756    607   2359    0    0     0     0 1012    64 37  0 62  0
3  0      0   1756    607   2359    0    0     0     0 1011    65 38  0 63  0
3  0      0   1995    607   2359    0    0     0     0 1012    62 35  2 63  0
3  0      0   1731    607   2359    0    0     0     0 1012    64 34  3 62  0
3  0      0   1731    607   2359    0    0     0     0 1013    72 38  0 62  0
3  0      0   1731    607   2359    0    0     0     0 1013    63 37  0 63  0

This is what most system administrators do to identify system bottlenecks. I hope all of you find vmstat data is concise and easy to read.

See also:

You can use the watch command to execute a program or shell script periodically, display its output on screen repeatedly. This allows you to watch the program output change over time. By default, the program is run every 2 seconds. This is useful to monitor memory utilization or disk space usage over time without having to look at scrolling output.
[click to continue…]