20 Linux System Monitoring Tools Every SysAdmin Should Know

Posted on in Categories CentOS, Debian Linux, Howto, Linux, Monitoring, Networking, RedHat/Fedora Linux, Security, Sys admin last updated June 27, 2009

Need to monitor Linux server performance? Try these built-in commands and a few add-on tools. Most Linux distributions are equipped with tons of monitoring. These tools provide metrics which can be used to get information about system activities. You can use these tools to find the possible causes of a performance problem. The commands discussed below are some of the most basic commands when it comes to system analysis and debugging server issues such as:

  1. Finding out bottlenecks.
  2. Disk (storage) bottlenecks.
  3. CPU and memory bottlenecks.
  4. Network bottlenecks.


#1: top – Process Activity Command

The top program provides a dynamic real-time view of a running system i.e. actual process activity. By default, it displays the most CPU-intensive tasks running on the server and updates the list every five seconds.

Fig.01: Linux top command
Fig.01: Linux top command

Commonly Used Hot Keys

The top command provides several useful hot keys:

Hot KeyUsage
tDisplays summary information off and on.
mDisplays memory information off and on.
ASorts the display by top consumers of various system resources. Useful for quick identification of performance-hungry tasks on a system.
fEnters an interactive configuration screen for top. Helpful for setting up top for a specific task.
oEnables you to interactively select the ordering within top.
rIssues renice command.
kIssues kill command.
zTurn on or off color/mono

=> Related: How do I Find Out Linux CPU Utilization?

#2: vmstat – System Activity, Hardware and System Information

The command vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
# vmstat 3
Sample Outputs:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 2540988 522188 5130400    0    0     2    32    4    2  4  1 96  0  0
 1  0      0 2540988 522188 5130400    0    0     0   720 1199  665  1  0 99  0  0
 0  0      0 2540956 522188 5130400    0    0     0     0 1151 1569  4  1 95  0  0
 0  0      0 2540956 522188 5130500    0    0     0     6 1117  439  1  0 99  0  0
 0  0      0 2540940 522188 5130512    0    0     0   536 1189  932  1  0 98  0  0
 0  0      0 2538444 522188 5130588    0    0     0     0 1187 1417  4  1 96  0  0
 0  0      0 2490060 522188 5130640    0    0     0    18 1253 1123  5  1 94  0  0

Display Memory Utilization Slabinfo

# vmstat -m

Get Information About Active / Inactive Memory Pages

# vmstat -a
=> Related: How do I find out Linux Resource utilization to detect system bottlenecks?

#3: w – Find Out Who Is Logged on And What They Are Doing

w command displays information about the users currently on the machine, and their processes.
# w username
# w vivek

Sample Outputs:

 17:58:47 up 5 days, 20:28,  2 users,  load average: 0.36, 0.26, 0.24
USER     TTY      FROM              [email protected]   IDLE   JCPU   PCPU WHAT
root     pts/0    10.1.3.145       14:55    5.00s  0.04s  0.02s vim /etc/resolv.conf
root     pts/1    10.1.3.145       17:43    0.00s  0.03s  0.00s w

#4: uptime – Tell How Long The System Has Been Running

The uptime command can be used to see how long the server has been running. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
# uptime
Output:

 18:02:41 up 41 days, 23:42,  1 user,  load average: 0.00, 0.00, 0.00

1 can be considered as optimal load value. The load can change from system to system. For a single CPU system 1 – 3 and SMP systems 6-10 load value might be acceptable.

#5: ps – Displays The Processes

ps command will report a snapshot of the current processes. To select all processes use the -A or -e option:
# ps -A
Sample Outputs:

  PID TTY          TIME CMD
    1 ?        00:00:02 init
    2 ?        00:00:02 migration/0
    3 ?        00:00:01 ksoftirqd/0
    4 ?        00:00:00 watchdog/0
    5 ?        00:00:00 migration/1
    6 ?        00:00:15 ksoftirqd/1
....
.....
 4881 ?        00:53:28 java
 4885 tty1     00:00:00 mingetty
 4886 tty2     00:00:00 mingetty
 4887 tty3     00:00:00 mingetty
 4888 tty4     00:00:00 mingetty
 4891 tty5     00:00:00 mingetty
 4892 tty6     00:00:00 mingetty
 4893 ttyS1    00:00:00 agetty
12853 ?        00:00:00 cifsoplockd
12854 ?        00:00:00 cifsdnotifyd
14231 ?        00:10:34 lighttpd
14232 ?        00:00:00 php-cgi
54981 pts/0    00:00:00 vim
55465 ?        00:00:00 php-cgi
55546 ?        00:00:00 bind9-snmp-stat
55704 pts/1    00:00:00 ps

ps is just like top but provides more information.

Show Long Format Output

# ps -Al
To turn on extra full mode (it will show command line arguments passed to process):
# ps -AlF

To See Threads ( LWP and NLWP)

# ps -AlFH

To See Threads After Processes

# ps -AlLm

Print All Process On The Server

# ps ax
# ps axu

Print A Process Tree

# ps -ejH
# ps axjf
# pstree

Print Security Information

# ps -eo euser,ruser,suser,fuser,f,comm,label
# ps axZ
# ps -eM

See Every Process Running As User Vivek

# ps -U vivek -u vivek u

Set Output In a User-Defined Format

# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm
# ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
# ps -eopid,tt,user,fname,tmout,f,wchan

Display Only The Process IDs of Lighttpd

# ps -C lighttpd -o pid=
OR
# pgrep lighttpd
OR
# pgrep -u vivek php-cgi

Display The Name of PID 55977

# ps -p 55977 -o comm=

Find Out The Top 10 Memory Consuming Process

# ps -auxf | sort -nr -k 4 | head -10

Find Out top 10 CPU Consuming Process

# ps -auxf | sort -nr -k 3 | head -10

#6: free – Memory Usage

The command free displays the total amount of free and used physical and swap memory in the system, as well as the buffers used by the kernel.
# free
Sample Output:

            total       used       free     shared    buffers     cached
Mem:      12302896    9739664    2563232          0     523124    5154740
-/+ buffers/cache:    4061800    8241096
Swap:      1052248          0    1052248

=> Related: :

  1. Linux Find Out Virtual Memory PAGESIZE
  2. Linux Limit CPU Usage Per Process
  3. How much RAM does my Ubuntu / Fedora Linux desktop PC have?

#7: iostat – Average CPU Load, Disk Activity

The command iostat report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).
# iostat
Sample Outputs:

Linux 2.6.18-128.1.14.el5 (www03.nixcraft.in) 	06/26/2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.50    0.09    0.51    0.03    0.00   95.86

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              22.04        31.88       512.03   16193351  260102868
sda1              0.00         0.00         0.00       2166        180
sda2             22.04        31.87       512.03   16189010  260102688
sda3              0.00         0.00         0.00       1615          0

=> Related: : Linux Track NFS Directory / Disk I/O Stats

#8: sar – Collect and Report System Activity

The sar command is used to collect, report, and save system activity information. To see network counter, enter:
# sar -n DEV | more
To display the network counters from the 24th:
# sar -n DEV -f /var/log/sa/sa24 | more
You can also display real time usage using sar:
# sar 4 5
Sample Outputs:

Linux 2.6.18-128.1.14.el5 (www03.nixcraft.in) 		06/26/2009

06:45:12 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
06:45:16 PM       all      2.00      0.00      0.22      0.00      0.00     97.78
06:45:20 PM       all      2.07      0.00      0.38      0.03      0.00     97.52
06:45:24 PM       all      0.94      0.00      0.28      0.00      0.00     98.78
06:45:28 PM       all      1.56      0.00      0.22      0.00      0.00     98.22
06:45:32 PM       all      3.53      0.00      0.25      0.03      0.00     96.19
Average:          all      2.02      0.00      0.27      0.01      0.00     97.70

=> Related: : How to collect Linux system utilization data into a file

#9: mpstat – Multiprocessor Usage

The mpstat command displays activities for each available processor, processor 0 being the first one. mpstat -P ALL to display average CPU utilization per processor:
# mpstat -P ALL
Sample Output:

Linux 2.6.18-128.1.14.el5 (www03.nixcraft.in)	 	06/26/2009

06:48:11 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
06:48:11 PM  all    3.50    0.09    0.34    0.03    0.01    0.17    0.00   95.86   1218.04
06:48:11 PM    0    3.44    0.08    0.31    0.02    0.00    0.12    0.00   96.04   1000.31
06:48:11 PM    1    3.10    0.08    0.32    0.09    0.02    0.11    0.00   96.28     34.93
06:48:11 PM    2    4.16    0.11    0.36    0.02    0.00    0.11    0.00   95.25      0.00
06:48:11 PM    3    3.77    0.11    0.38    0.03    0.01    0.24    0.00   95.46     44.80
06:48:11 PM    4    2.96    0.07    0.29    0.04    0.02    0.10    0.00   96.52     25.91
06:48:11 PM    5    3.26    0.08    0.28    0.03    0.01    0.10    0.00   96.23     14.98
06:48:11 PM    6    4.00    0.10    0.34    0.01    0.00    0.13    0.00   95.42      3.75
06:48:11 PM    7    3.30    0.11    0.39    0.03    0.01    0.46    0.00   95.69     76.89

=> Related: : Linux display each multiple SMP CPU processors utilization individually.

#10: pmap – Process Memory Usage

The command pmap report memory map of a process. Use this command to find out causes of memory bottlenecks.
# pmap -d PID
To display process memory information for pid # 47394, enter:
# pmap -d 47394
Sample Outputs:

47394:   /usr/bin/php-cgi
Address           Kbytes Mode  Offset           Device    Mapping
0000000000400000    2584 r-x-- 0000000000000000 008:00002 php-cgi
0000000000886000     140 rw--- 0000000000286000 008:00002 php-cgi
00000000008a9000      52 rw--- 00000000008a9000 000:00000   [ anon ]
0000000000aa8000      76 rw--- 00000000002a8000 008:00002 php-cgi
000000000f678000    1980 rw--- 000000000f678000 000:00000   [ anon ]
000000314a600000     112 r-x-- 0000000000000000 008:00002 ld-2.5.so
000000314a81b000       4 r---- 000000000001b000 008:00002 ld-2.5.so
000000314a81c000       4 rw--- 000000000001c000 008:00002 ld-2.5.so
000000314aa00000    1328 r-x-- 0000000000000000 008:00002 libc-2.5.so
000000314ab4c000    2048 ----- 000000000014c000 008:00002 libc-2.5.so
.....
......
..
00002af8d48fd000       4 rw--- 0000000000006000 008:00002 xsl.so
00002af8d490c000      40 r-x-- 0000000000000000 008:00002 libnss_files-2.5.so
00002af8d4916000    2044 ----- 000000000000a000 008:00002 libnss_files-2.5.so
00002af8d4b15000       4 r---- 0000000000009000 008:00002 libnss_files-2.5.so
00002af8d4b16000       4 rw--- 000000000000a000 008:00002 libnss_files-2.5.so
00002af8d4b17000  768000 rw-s- 0000000000000000 000:00009 zero (deleted)
00007fffc95fe000      84 rw--- 00007ffffffea000 000:00000   [ stack ]
ffffffffff600000    8192 ----- 0000000000000000 000:00000   [ anon ]
mapped: 933712K    writeable/private: 4304K    shared: 768000K

The last line is very important:

  • mapped: 933712K total amount of memory mapped to files
  • writeable/private: 4304K the amount of private address space
  • shared: 768000K the amount of address space this process is sharing with others

=> Related: : Linux find the memory used by a program / process using pmap command

#11 and #12: netstat and ss – Network Statistics

The command netstat displays network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. ss command is used to dump socket statistics. It allows showing information similar to netstat. See the following resources about ss and netstat commands:

#13: iptraf – Real-time Network Statistics

The iptraf command is interactive colorful IP LAN monitor. It is an ncurses-based IP LAN monitor that generates various network statistics including TCP info, UDP counts, ICMP and OSPF information, Ethernet load info, node stats, IP checksum errors, and others. It can provide the following info in easy to read format:

  • Network traffic statistics by TCP connection
  • IP traffic statistics by network interface
  • Network traffic statistics by protocol
  • Network traffic statistics by TCP/UDP port and by packet size
  • Network traffic statistics by Layer2 address
Fig.02: General interface statistics: IP traffic statistics by network interface
Fig.02: General interface statistics: IP traffic statistics by network interface
Fig.03 Network traffic statistics by TCP connection
Fig.03 Network traffic statistics by TCP connection

#14: tcpdump – Detailed Network Traffic Analysis

The tcpdump is simple command that dump traffic on a network. However, you need good understanding of TCP/IP protocol to utilize this tool. For.e.g to display traffic info about DNS, enter:
# tcpdump -i eth1 'udp port 53'
To display all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain data, not, for example, SYN and FIN packets and ACK-only packets, enter:
# tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
To display all FTP session to 202.54.1.5, enter:
# tcpdump -i eth1 'dst 202.54.1.5 and (port 21 or 20'
To display all HTTP session to 192.168.1.5:
# tcpdump -ni eth0 'dst 192.168.1.5 and tcp and port http'
Use wireshark to view detailed information about files, enter:
# tcpdump -n -i eth1 -s 0 -w output.txt src or dst port 80

#15: strace – System Calls

Trace system calls and signals. This is useful for debugging webserver and other server problems. See how to use to trace the process and see What it is doing.

#16: /Proc file system – Various Kernel Statistics

/proc file system provides detailed information about various hardware devices and other Linux kernel information. See Linux kernel /proc documentations for further details. Common /proc examples:
# cat /proc/cpuinfo
# cat /proc/meminfo
# cat /proc/zoneinfo
# cat /proc/mounts

17#: Nagios – Server And Network Monitoring

Nagios is a popular open source computer system and network monitoring application software. You can easily monitor all your hosts, network equipment and services. It can send alert when things go wrong and again when they get better. FAN is “Fully Automated Nagios”. FAN goals are to provide a Nagios installation including most tools provided by the Nagios Community. FAN provides a CDRom image in the standard ISO format, making it easy to easilly install a Nagios server. Added to this, a wide bunch of tools are including to the distribution, in order to improve the user experience around Nagios.

18#: Cacti – Web-based Monitoring Tool

Cacti is a complete network graphing solution designed to harness the power of RRDTool’s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices. It can provide data about network, CPU, memory, logged in users, Apache, DNS servers and much more. See how to install and configure Cacti network graphing tool under CentOS / RHEL.

#19: KDE System Guard – Real-time Systems Reporting and Graphing

KSysguard is a network enabled task and system monitor application for KDE desktop. This tool can be run over ssh session. It provides lots of features such as a client/server architecture that enables monitoring of local and remote hosts. The graphical front end uses so-called sensors to retrieve the information it displays. A sensor can return simple values or more complex information like tables. For each type of information, one or more displays are provided. Displays are organized in worksheets that can be saved and loaded independently from each other. So, KSysguard is not only a simple task manager but also a very powerful tool to control large server farms.

Fig.05 KDE System Guard
Fig.05 KDE System Guard {Image credit: Wikipedia}

See the KSysguard handbook for detailed usage.

#20: Gnome System Monitor – Real-time Systems Reporting and Graphing

The System Monitor application enables you to display basic system information and monitor system processes, usage of system resources, and file systems. You can also use System Monitor to modify the behavior of your system. Although not as powerful as the KDE System Guard, it provides the basic information which may be useful for new users:

  • Displays various basic information about the computer’s hardware and software.
  • Linux Kernel version
  • GNOME version
  • Hardware
  • Installed memory
  • Processors and speeds
  • System Status
  • Currently available disk space
  • Processes
  • Memory and swap space
  • Network usage
  • File Systems
  • Lists all mounted filesystems along with basic information about each.
Fig.06 The Gnome System Monitor application
Fig.06 The Gnome System Monitor application

Bonus: Additional Tools

A few more tools:

  • nmap – scan your server for open ports.
  • lsof – list open files, network connections and much more.
  • ntop web based tool – ntop is the best tool to see network usage in a way similar to what top command does for processes i.e. it is network traffic monitoring software. You can see network status, protocol wise distribution of traffic for UDP, TCP, DNS, HTTP and other protocols.
  • Conky – Another good monitoring tool for the X Window System. It is highly configurable and is able to monitor many system variables including the status of the CPU, memory, swap space, disk storage, temperatures, processes, network interfaces, battery power, system messages, e-mail inboxes etc.
  • GKrellM – It can be used to monitor the status of CPUs, main memory, hard disks, network interfaces, local and remote mailboxes, and many other things.
  • vnstat – vnStat is a console-based network traffic monitor. It keeps a log of hourly, daily and monthly network traffic for the selected interface(s).
  • htop – htop is an enhanced version of top, the interactive process viewer, which can display the list of processes in a tree form.
  • mtr – mtr combines the functionality of the traceroute and ping programs in a single network diagnostic tool.

Did I miss something? Please add your favorite system motoring tool in the comments.

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

356 comment

    1. (quote)
      Pretty much common knowledge. . . .
      (/quote)

      Yea, right!
      I’ve been around the block two or three times – and a number of these are familiar to me – but some of the ways they’re used here were not. Also a fair number of these were absolutely brand-new – and they look damned useful!

      I am so going to book-mark this page it isn’t funny! It’s likely that I will want to spread this URL around like the Flu as well. . . . πŸ˜€

      @Vivek
      *GREAT* list – for those of us who are mere mortals. . . .

      Jim (JR)

  1. Nice list. For systems with just a few nodes I recommend Munin. It’s easy to install and configure. My favorite tool for monitoring a linux cluster is Ganglia.

    P.S. I think you should change this “#2: vmstat – Network traffic statistics by TCP connection …”

        1. Most of the time that happens if the fsck operation requires human interaction, which the boot fsck doesn’t have. Just restart it, if you don’t normally get a grub delay the hold down the shift key to get one, if you do then just select recovery mode, or single user mode, it depends on your distro. It’s the same thing in all, just tripping single user mode with a kernel arg, but it will let you boot, and run fsck on unmounted partitions. If it is your root partition, you may need to boot from an external medium, unless you have a kick ass initrd, lol.

      1. it is you’re – you are a tool. Please when randomly slamming someones post to feel better about yourself, at least you proper grammer. Then at least you sound like an intelligent a55h0le. πŸ˜›

        1. Sarcastic pro’s, N00bs, flaming, harsh language, grammar nazis. All we need now is a Hitler comparison and we have the full set. Who’s up for a ban?

          Also: before stuff can become common knowledge you’ll first have to encounter it at least once. Like here in this nice list. Thanks for sharing!

        2. Let me go ahead and re-write your comment, grammer nazi. It seems you have quite a few errors.

          “It is ‘you’re–you are’ a tool. Please, when randomly slamming someone’s post to feel better about yourself, at least use proper grammar. Then, at least, you sound like an intelligent a55h0le.”

          In the future, I would recommend proof-reading your own posts before you arrogantly correct others. I counted at least six mistakes in your “correction.” Have a nice day! πŸ™‚

  2. I can see that the best tool to monitor processes , CPU, memeory and disk bottleneck at once is atop …

    But the tool itself can cause a lot of trouble in heavily loaded servers and it enables process accounting and has a service running all the time …

    To use it efficiently on RHEL , CentOS;
    1- install rpmforge repo
    2- # yum install atop
    3- # killalll atop
    4- # chkconfig atop off
    5- # rm -rf /tmp/atop.d/ /var/log/atop/
    6- then don’t directly run “atop” command , but instead run it as follows;
    # ATOPACCT=” atop

    This tool has saved me hundreds of hours really! and helped me to diagnose bottlenecks and solve them that couldn’t otherwise be easily detected and would need many different tools

  3. You probably wanna add IFTOP tool, its really simple and light, very useful when u need to have a last moment remote access to a server to see hows the trific going.

  4. maybe it’s a typo too, but the title should be :
    “.. Tools Every SysAdmin MUST Know”
    and still, this is advanced user knowledge, at most. I would not trust a sysadmin that knows so few. And..

  5. Hi guys,

    good list – and some great submitted pointers to other useful tools.

    to those carp-ing on about typo’s – give us all a break. you’ve never made a typo? ever?

    Idea: How ’bout those who have never *ever* made an error in typing text be the first one(s) to give people grief about making a typo?

    I _used_ to be a real PITA about this; then I grew up.

    The purpose of this blog, and other forms of communication, is to *communicate* concepts and ideas. *If* you have received those clearly – in spite of the typos – then the purpose has been fulfilled.

    /me gets down off his soapbox

    .h

    1. I totally second that!
      WTF is up with people making such a big deal about spelling? I could understand if the complaints were in regards to a misspelling of a code-example, but if the language is coherent enough to get the idea across, then that’s all that really matters.

  6. Excellent list. Like Amr El-Sharnoby above, I also find atop indispensable and think it must be installed on every system.

    In addition I would like to add iotop to monitor disk usage per process and jnettop to very easily monitor bandwidth allocation between connections on a Linux system.

  7. One tool which seems to be missing from this list is LTTng. It is a system-wide tracing tool which helps understanding complex performance problems in multithreaded, multiprocess applications involving many userspace-kernel interactions.

    The project is available at http://www.lttng.org. Recent SuSE distributions, WindRiver, Monta Vista and STLinux offer the tracer as distribution packages. The standard way to use it is to install a patched kernel though. It comes with a trace analyzer, LTTV, which provides nice view of the overall system behavior.

    Mathieu

  8. Dude you forgot the most important of ALL!

    net-snmpd

    With it you can collect vast amounts of information. Then with snmpwalk and scripts you can create your own web NMS to collect simple information like ping, disk space, services down.

  9. Nice summary article.

    If your “system” is large and/or distributed, and the performance issues you’re tackling are complex, you may wish to explore Performance Co-Pilot (PCP). It unifies all of the performance data from the tools you’ve mentioned (and more), can be extended to include new applications and service layers, works across the network and for clusters and provides both real-time and retrospective analysis.

    See http://www.oss.sgi.com/projects/pcp

    PCP is included in the Debian-based and SUSE distributions and is likely to appear in the RH distributions in the future.

    As a bonus, PCP also works for monitoring non-Linux platforms (Windows and some of the Unix derivatives).

  10. Great article, many great suggestions.

    Was surprised not to see these among the suggestions:

    bmon – graphs/tracks network activity/bandwidth real time.
    etherape – great visual indicator of what traffic is going where on the network
    wireshark – tcpdump on steroids.
    multitail – tail multiple files in a single terminal window
    swatch – track your log files and fire off alerts

  11. Osmius: The Open Source Monitoring Tool is C++ and Java. Monitor “everything” connected to a network with incredible performance. Create and integrate Business Services, SLAs and ITIL processes such as availability management and capacity planning.

  12. Nice compilation. As usual, always very useful.

    It would be nice if some of you knowledgeable guys can shed some light on java heap monitoring thing, thread lock detection and analysis, heap analysis etc.

  13. From the guy who wrote the collect utility for Tru64:

    Name : collectl Relocations: (not relocatable)
    Version : 3.3.5 Vendor: Fedora Project
    Release : 1.fc10 Build Date: Fri Aug 21 13:22:42 2009
    Install Date: Tue Sep 1 18:10:34 2009 Build Host: x86-5.fedora.phx.redhat.com
    Group : Applications/System Source RPM: collectl-3.3.5-1.fc10.src.rpm
    Size : 1138212 License: GPLv2+ or Artistic
    Signature : DSA/SHA1, Mon Aug 31 14:42:40 2009, Key ID bf226fcc4ebfc273
    Packager : Fedora Project
    URL : http://collectl.sourceforge.net
    Summary : A utility to collect various linux performance data
    Description :
    A utility to collect linux performance data

    Best regards, Bob

  14. When I wrote collectl my goal was to replace as many utilities as possible for several reasons including:
    – not all write to log files
    – different output formats make correlation VERY difficult
    – sar is close but still too many things it doesn’t collect
    – I wanted option to generate data that can be easily plotted or loaded into spreadsheet
    – I wanted sub-second monitoring
    – I want an API and I want to be able to send data over sockets to other tools
    – and a whole lot more

    I think I succeeded on many fronts, in particular not having to worry if the right data is being collected. Just install rpm and type “/etc/init.d/collectl start” and you’re collecting everything such as slabs and processes every 60 seconds and everything else every 10 seconds AND using <0.1% of the CPU to do so. I personally believe if you're collecting performance counters at a minute or coarser you're not really seeing what your system is doing.

    As for the API, I worked with some folks at PNNL to monitor their 2300 node cluster, pass the data to ganglia and from there they pass it to their own real-time plotting tool that can display counters for the entire cluster in 3D. They also collectl counters from individual CPUs and pass that data to collectl as well.

    I put together a very simple mapping of 'standard' utilities like sar to the equivilent collectl commands just to get a feel for how they compare. But also keep in mind there are a lot of things collectl does for which there is no equivalent system command, such as Infiniband or Lustre monitoring. How about buddyinfo? And more…

    http://collectl.sourceforge.net/Matrix.html

    -mark

  15. Darn,
    I’ve been using Linux since Windows 98 was the current MicroSnot FOPA.
    I know all this stuff. I do not make typoous.
    Why do you post this stuff?
    We all know it.
    Sure we do!
    But do we remember it? I just read through it and found stuff that I used long ago and it was like I just learned it. I found stuff I didn’t know either.
    Hummmm…… Imagine that!
    Thanks, particularly for the PDF.
    Saved me making one.
    Hey, where’s the HTML to PDF howto?

    Thanks again.

  16. Is it possible to display hard drive temps from hddtemp in KSysGuard? They are available in Ksensors and GKrellM, without any configuration required. However I prefer the interface and flexibility of KSysGuard. Is there a way of configuring it?

    Andrew

    1. Zabbix is a great tool that it doesn’t require a entirely separate project to make it easy to install and use (like Nagios and FAN).

      I’ve been following it since its early days and its come a long way. Its sad that lists like this never give it its due, not even a foot note mention.

      while on that note.. really? your 17-20 makes the list, but nmap, mtr, and lsof get relegated to foot notes?

  17. Dear all Members,

    Thanks for sharing all your knowledge about Linux.. i really thankful for your share linux tips..!!

    thanks and continue this jurny…as well

    thank you..

  18. This is indeed an impressive collection of tools but I still have to ask if people are really happy with having to know so many names, so many switches and so many formats. If you run one command and see something weird doesn’t it bother you if you have to run a different tool but the anomaly already passed and you can no longer see it with a different tool? For example if you see a drop in network performance and wonder if there was a memory or cpu problem, it’s too late to go back and see what else was going on. I know it bothers me. Again, by running collectl I never have to worry about that because it collects everything (when run as a deamon) or you can just tell it to report lots of things when running interactively and by default is shows cpu, disk and network. If you want to add memory, you can always include it but you will need a wider screen to see the output.

    As a curiosity for those who run sar – I never do – what do you use for a monitoring interval? The default is to take 10 minute samples which I find quite worthless – remember sar has been around forever dating back to when cpus were much slower and monitoring much more expensive. I’d recommend to run sar with a 10 second sampling level like collectl and you’ll get far more out of it. The number of situations which this would be too much of a load on your system would be extremely rare. Anyone care to comment?

    -mark

  19. hi Mark

    absolutely agreed with you mate! if you are the sysadmin something – you will do it for yourself and do it right!
    These tools like ps,top and other is commonly used by users who administrated a non-productive or desktop systems or for some users who’s temporary came to the system and who needed to get a little bit of information about the box – and its pretty good enough for them. )

  20. If you are running a web server and you have multiple clients writing code, you will one day see CPU slow to a crawl. “Why?”, you will ask. ps -ef and top will show that mysql is eating up resources…

    HMM?

    If only there was a tool which showed me what command was being issued against the database…

    mytop

    Once you find the select statement that has mysql running at 99% of the CPU, you can kill the query and then go chase down the client and kill them too (or in my case bill them at $250/hr for fixing their code).

  21. re mysql – it’s not necessarily that straight forward. I was working with someone who had a system with mysql that was crawling. it was taking multiple seconds for vi to echo a single character! we ran collectl on it and could see low cpu, low network and low disk i/o. Lots of available memory, so what gives? A close look showed me that even those the I/O rates were low, the average request sizes were also real low – probably do so small db requests.

    digging even deeper with collectl I saw the i/o request service times were multiple seconds! in other words when you requested an I/O operation not matter how fast the disk is, it took over 2 second to complete and that’s why vi was so slow, it was trying to write to it’s backing store.

    bottom line – running a single tool and only looking at one thing does not tell the whole story. you need to see multiple things AND see them at the same time.

    -mark

  22. I have a postfix mail server, recently through tcpdump I see alot of traffic to dc.mx.aol.com, fedExservices.com, wi.rr.com, mx1.dixie-net.com. I believe my mail server is spamming. How do I find out it is spamming? and how do I stop it. Please help.

  23. Actually where I work we have and isa server acting as a proxy/firewall, which prevent me from monitoring internet traffic consumption. so i installed debian as a network bridge between the isa server and the lan, and equipped it with various monitoring tools (bandwidthd, ntop, vnstat, iftop, iptraf, darkstat).

  24. wow this is some great info,also the various inputs in comments. One i would like to add is

    ulimit

    User limits – limit the use of system-wide resources.

    Syntax
    ulimit [-acdfHlmnpsStuv] [limit]

    Options

    -S Change and report the soft limit associated with a resource.
    -H Change and report the hard limit associated with a resource.

    -a All current limits are reported.
    -c The maximum size of core files created.
    -d The maximum size of a process’s data segment.
    -f The maximum size of files created by the shell(default option)
    -l The maximum size that may be locked into memory.
    -m The maximum resident set size.
    -n The maximum number of open file descriptors.
    -p The pipe buffer size.
    -s The maximum stack size.
    -t The maximum amount of cpu time in seconds.
    -u The maximum number of processes available to a single user.
    -v The maximum amount of virtual memory available to the process.

    ulimit provides control over the resources available to the shell and to processes started by it, on systems that allow such control.

    1. I assume you can find the process ID – for example if your process is called foo.bar, you could do
      ps -ef | grep foo.bar
      this will give the PID (process ID) as well as other information.
      Then do
      kill -9 PID (where PID is the number your found in the above).

      If you are working on a Mac you have to do ‘sudo kill -9 PID’ since the kill command is an “admin” action that it wants you to be sure about.

      Or if you use top, and you can see the process you want to kill in your list, you can just type k and you will be prompted for the PID (the screen will freeze so it’s easy to read). You type the number and “enter”, will have to confirm (y), and the process is killed with -15. Which is less “severe” than a “kill -9” which really kills just about any process (without allowing it a graceful exit of any kind).

      Use with care!

  25. Thanks,

    I think it will be very helpfull for me as i am practicng oracle in redhat linux4. Today i will try to check it. I want 1 more help. I am not clear about crontab. saupposed i want to start a crontab in my system with any script which i have kept in /home/oracle and want to execute in every 1 hour. Can u send me how i can do with details.
    Thanks,

    kalyan de.
    Chennai, india
    +91 9962300520

  26. atop

    man atop shows

    “The program atop is an interactive monitor to view the load on a Linux system. It shows the occupation of the most critical hardware resources (from a performance point of view) on system level, i.e. cpu, memory, disk and network.It also shows which processes are responsible for the indicated load with respect to cpu- and memory load on process level; disk- and network load is only shown per process if a kernel patch has been installed.”

  27. I´m lookuing for apache parameter on the web and found here.

    So, my contribute is: try to use iftop, iptraf, ifstat, jnettop and ethstatus for network graphical and CLI monitoring.

    Use tcmpdump and ngrep for packet sniffing

    HTB is very good for QoS in the network, especially if you need to reduce slower VPN network

  28. fuser command is missing from this list. it tells you which command is using a file at the moment. Since in Linux everything is a file, it is very useful to know!
    Use it this way:
    # to know which process listens on tcp port 80:
    fuser 80/tcp

    # to know which process uses the /dev/sdb1 filesystem:
    fuser -vm /dev/sdb1
    etc …

  29. Though i have come across most of these names, having them all in one list will prove to be a good resource. I am going to make a list from these and have it within my website which i use for reference.

    Thanks for the examples.

  30. I don’t believe that ftp usage by user is recorded anywhere, so you’d have to get inventive. The way I would do it is use collectl to show both processes sorted by I/O and ftp stats. Then is simply becomes a matter of see which processes are contributing to the I/O and who their owners are.
    -mark

  31. There is another tools “Incron” :
    This program is an “inotify cron” system. It consists of a daemon and a table manipulator. You can use it a similar way as the regular cron. The difference is that the inotify cron handles filesystem events rather than time periods.

  32. Handy list.

    Also, these might be handy as well…

    lsdev – list of installed devices
    lsmod – list of installed modules
    ldd – to see dependencies of a executable file
    watch – automated refresh of any code every specified seconds, etc
    stat – details of any file
    getconf – to get HP server details
    runlevel – redhat run level

    Search in web for more detailed info.

    Good luck…

  33. Hi guys,
    I m totally new to the linux & this web aswell.
    Would some1 help me here regarding, mirrordir utility?
    what would b the full syntex if i only want to copy/mirror changed/edited files from
    source to destination. since last mirror.
    And how to define specific time to run this command, i mean schedule.
    Thanks in advance.

  34. Don’t forget systemtap (stap) which provides the equivalent of Solaris’ invaluable “dtrace” scripting utility. There’s a “dtrace” for Linux project but I haven’t been able to get it to compile on my OpenSuSE 11.x.

    On SuSE Linux is “getdelays” , enabled via the grub kernel command line “delayacct” switch (starting with SuSE 10 Enterprise…). It’ll reveal the amount of wait a given process spends waiting for CPU, disk (I/O) or memory (swap), great for isolating lag in the system.

    There are many many other monitoring tools (don’t know if these were mentioned before) atopsar (atop-related), the sysstat/sar-related sa* series (sadc, sadf, sa1), isag, saidar, blktrace (blktrace-blkiomon / blktrace-blkparse), iotop, ftop, htop, nigel’s monitor (nmon), famd/fileschanged, acctail, sysctl, dstat, iftop, btrace, ftop, iostat, iptraf, jnettop, collectl, nagios, the RRD-related tools, the sys-fs tools, big sister/brother … you could fill a book with them all.

  35. another good tool for monitoring traffic and network usage:
    vnstat
    this also makes statistics for bandwidth usage over time which can be display for daily, weekly and monthly usage. very useful if you don’t want to install a web-based tool for this.

  36. If you want to monitor CPU, memory, I/O and disk usage across multiple servers you can use Librato Silverline – it’s a commercial product but the first 8 cores are always free. You can actually do a lot more with Silverline, i.e. place apps in individual containers, assign resource quotas to containers, trigger events etc. but as a monitoring tool it is really great too.

  37. And a good Sysadmin always can count with you prefered script language.

    I using perl for monitoring a lot of basic infra structure services, like DHCP, DNS, Ldap, and Zabbix for generate alarms and very nice graphs.

  38. Hi,

    One of My Professor is introduce about the Ubantu This os is I like very much this flyover. Before I am Using XP but now I download all app. and I all applications. i always love linux, great article.

    sarath

  39. Great list, but why is TOP still used?

    It is a highly limited utility. HTOP can do all top can, plus a ton of stuff more:
    1. use colors for better readabilty. In the 21st century, all computers have a super hightech thing on their monitor called COLORS (sarcasm off)
    2. allow process termination and sending of signals (even multi select several processes)
    3. show cpu / ram usage with visual bars instead of numbers
    4. show ALL processes: top cannot do that, it just shows what is on the screen. It is the main limiting factor that made me chuck it to the curb.
    5. Use your cursor keys to explore what cannot be shown on the screen, for example full CLI parameters from commands.
    6. Active development. There are new features. Top is dead and there does not seem to have been any active development for 10 years (and that is how the tool looks)

  40. Dear All,

    My Oracle Enterprice Linux getting very slow, when my local R12.1 start.

    by using “top” command i found lot of Database users are running.
    normally in other R12 instance only few Database users are available. can any one tell me what might be the problem,, is it OS level issue or my Application Issue.. where i have to start the tuning .

    Kinldy advice me.

    Thanks in Advance,
    Abdul Hameed

  41. “My Oracle Enterprice Linux getting very slow, when my local R12.1 start.”

    Arghh! Linux is turning into Windows!

    These are super machines, people! Remember when 4.2BSD came out, and people were saying “Unix is becoming VMS”? With 4.1 BSD, we had been flying on one MIP machines (think of a one Mhz clock rate – three orders of magnitude slower than today’s machines, not Ghz… Mhz!). So much was added so quickly into 4.2 (kernels were no longer a few hundred kilobytes at most) that performance took a nose dive. But then 4.3 BSD fixed things for a while (with lots of optimizations such as unrolling the the instructions in a bcopy loop till they just just filled an instruction cache line). It didn’t hurt either that memory was getting cheaper, and we could afford to upgrade our 30 user timesharing systems from four Megabytes to eight Megabytes, or even more! It takes an awful amount of software bloat (and blind ignorance of the principles we all learned in our “combinatorial algorithms” classes) to be able to make machines that are over a thousand times faster than the Vaxen we cut our teeth on be “slow”.

    Today’s Linux systems hardly feel much faster on multicore x86 machines than they did on personal MicroVaxes or the somewhat faster Motorola 68020 based workstations (except for compilations, which now really scream by – compiling a quarter meg kernel used to take hours, whereas now it feels like barely seconds pass when compiling kernels that, even compressed, are many times larger. But then, compiler writers for the most part (25 years ago, Green Hills employees seemed a glaring exception and I don’t know about Microsoft) have to prove they have learned good programming practices before their skills are considered acceptable). Other software, like the X server, still feels about the same as it did in the eighties, despite today’s machines being so much faster. And forget about Windows!

  42. Friends I have typed the corrected question here below. Please let me know if you can help:

    Part1 : Find out the system resources — CPU Usage, Memory Usage, & How many process are running currently in “exact numbers”?, what are the process?
    Part2: Assume a process CACHE is running on the same system — How many files are opened by CACHE out of the total numbers found above?? what are the files used by CACHE? Whats the virtual memory used by the process. What is the current run level of the process.
    Part3: How many users or terminals are accessing the process CACHE?
    Part4: The script should run every 15secs with the time of execution & date of script and the output should be given to a file “richprocess” in the same order as that of the question.
    Note: NO EXTERNAL TOOLS are allowed to be used with linux. Only shell script should be written for the same!

    1. I got the answer for it i used
      $vi file1
      #!/bin/bash
      while [ true ]
      do
      echo “—$(date)—-” >> richprocess
      echo ” 1. virtual mem of the system” >> richprocess
      vmstat >> richprocess
      echo ” 2. Free mem available in system” >> richprocess
      free -m >> richprocess
      echo ” 3. Mem used by cache & to print files used by CACHE”
      pmap -x `ps -A | pgrep CACHE` >> richprocess
      sleep 15
      done
      :wq!
      $bash file1 &
      $cat richprocess # to see the output..

      I had a worse comment from someone to try a nonexistent website.. saying “www.Iwantothersdomyhomework.com” please dont post things like this. I am asking help only because I want to learn. Thanks for support from this site..

  43. Thanks for sharing a good list of useful commands.

    I found a typo where there should not be a dash in front of the options for

    ps auxf

    in the command for
    Find Out The Top 10 Memory Consuming Process
    and
    Find Out top 10 CPU Consuming Process

  44. systemgraphhttp://www.decagon.de/sw/systemgraph/

    Nice graphical system statistics RRDTool frontend which produces hourly, daily, weekly, monthly … graphs of various system data. At the moment it provides graphs for memory usage, cpu info, cpu frequency, disk iostat, number of users, number of processes, number of open files, number of tcp connections, system load, network traffic, protocl statistic, harddisk/partition usage and temperatures, privoxy proxy statistic, ntpdrift, fan status and system temperatures.
    It is simple and it doesn’t require snmp. It consists only of some shell and perl scripts.

  45. An other interesting program wich hasn’t been mentioned yet is Midnight Commander (mc). At least it’s my favourite file manager in a console environment.

    Thanks all for your contributions. There are a lot of interesting programs wich I already use, or certainly will be using in the future.

  46. Very good post. I’ve some problems trying to figure out historical data about disk usage. I still dont know a good tool for that. sar is wonderful but it’s unable to record disk usage per process. You know any tool for that?

  47. Its great, but i’m having a little inconvenient, i want to look the detail for a process, exactly from apache, but the result is always the seem, any one have a trick for see them? explaining better, i have a process from apache but not die, it keep for a long time using the resource and overloading the machine, when i see with a “ps auxf” the result is
    apache 32327 85.7 0.5 261164 39036 ? R 22:49 0:49 _ /usr/sbin/httpd

    I want see wath is doing this process “32327” exactly, any idea?

  48. Great article, there are many great suggestions! I want to contribute with these two:

    GoAccess – real-time Apache/nginx log analyzer and viewer, runs in a terminal in *nix systems.
    CCZE – modular log colorizer

  49. yeah really nice post !!!
    It’s really help me but how about the centos linux command can anyone tell me about that, all the linux command will be same for the all versions of linux (Is it wright guys) .
    or
    please email me if you know some code of contos linux cause i using this lunux.

    regards,

  50. Am working in small company having around 45 employees,we r using linux server in our office, i need to checkout or monitor the user’s website, which they are accessing in office hours,Please any one suggest me with correct command. Thanks

  51. Dear Sir,

    My Name is Govardhan Raju from TIRUPATI, ANDHRA PRADESH. working as a linux (RHEL4) operator. I want to take data backup daily. Is there any posibility to take todays date files only ? Please suggest me the commands which are useful to take backup daily with syntax.

    Thanking U Sir,

    S Govardhan Raju

  52. Hi, I’m using windows 7 version. how to access the UNIX commands in windows plat form without installing any set up file or UNIX Operating System.

    Could you please suggest any to me.

    Thanks,

  53. Dear all ,

    I have deployed some 40 routers in the cafes,60 more in have to deploy in diff region/areas.I want to monitor the Wifi routers sitting in one place.

    I have connected Debian installed thin client to each router to provide internet to the customers @ cafe,free browsing for 30 mins.

    Can some one suggest me a tool for monitoring the Routers & my debian machine performance.

    Regards
    Naveen C

    1. Such routers often include a management/monitoring package, which may be more immediately useful than using Debian-based commands, and the router software may allow for viewing the multiple routers you describe from a single screen. I know that the latest NETGEAR wireless routers include a software package like this.

      But, why just 30 minutes per customer? Isn’t that the wrong message to give the cafe customers?: Like, hurry up and drink your coffee/tea, and then get out!!
      Maybe you could try a one hour limit and see what happens. Linux is much more efficient than many people realize, even under heavy usage.

      I think that Starbucks and similar shops in North America tend to offer unlimited Internet access with any purchase – and most don’t really seem to enforce the purchase requirement, unless a “freeloader” is annoying or being offensive to other customers, etc.

  54. Hi Team,

    I required to find the hardware information in linux, can you please advise.

    I recieved alert as below:

    Tivoli MINOR for : Accelerator board battery failed

    thanks
    sudarshan

  55. Your forgot monit (I dont care why it failed at 3a.m. – just fix it and tell me!) and collectd (just record how things are going over the months, without freaky sar..)

    Michael πŸ˜‰

  56. Hi,

    How to take data back in Linux Enterprise 6 daily basis and how to speed up (refresh) in linux. is there any specific commands for this???

    help me out of this…

  57. My new favourite tool is “systemd.analyze”. It is great for pin-pointing bottle-necks in startup. It can produce a very nice plot of every process, allowing you instantly see what’s holding things up.

  58. And I am using “watch” utility. This is basically not a system monitoring tool. But in some case we need to watch the out put of a command continuously. That time this is not easy to enter the same command all the time and watch the output. In that case you can use this utility. You can set the interval of each refresh.

    Eg: watch -n 10 df -Th (this is just an example)
    This command will give you the output of df -Th in each 10 seconds. Then you can easily measure the hard disk usage.

    Cheers…!

  59. Please can somebody help me to with Autosys/ Control M sheduling tool. I ‘m new to both these tools and never used them. want some tutorials to learn any of these tools for beginners .

    also, which unix commands are important for production support guys apart from normal commands like Grep,find,less,more etc.
    any help in form of documents / tutorials is appreciated…

    thanks in advance…

    appreciate ur reply on my maild id

  60. Thx for that usefull info:
    Even if we use or not some web host managers, know manual usage of some tools is a must have for a sysadmin, or almost for decent ones… What will you do when the host manager dont work? Who have to repair it? Is You and you will need a real knowledge about what’s on your hands πŸ™‚

    So many thanks again.

  61. EXCELLENT work , one humble suggestion . when you use top command , or any command , please do mention the way to clear the work load of system so that the system can be speeded up .
    Regards . Very Informative.

  62. E X C E L L E N T !!!!! This word is also not sufficient for such a lovely information you are sharing with all of us without any selfish motto. Kudos .

    With warm regards

    Mahesh Vakharia

  63. A little notify in terms of “ps aux” and “ps -aux”

    cite:
    Note that “ps -aux” is distinct from “ps aux”. The POSIX and UNIX standards require that “ps -aux” print all processes owned by a user named “x”, as well as printing all processes that would be selected by the -a option. If the user named “x” does not exist, this ps may interpret the command as “ps aux” instead and print a warning.”

    quelle: http://superuser.com/questions/394414/ps-warns-me-about-bad-syntax-with-aux-options

  64. Hi,
    ————————————–
    Can not telnet to Debian 6.0 from Windows Box.
    —————————————–
    I have downloaded the file: telnetd_0.17-36_i386.deb and installed it on Debian 6.0 box using dpkg -i command. It was installed successfully. But I still do not find the telnetd process under the “ps -aef” output.

    How do I start the telnetd process automatically so that I can telnet to it from Windows box?

    Thanks.

  65. Hi Vivek and all,
    Its a very useful site. Iam about to complete my RHCE and currently residing in banglore ,plz guide with more of these linux topics on my mail: [email protected] and how to face the interview as a sysadmin fresher
    I would be pleased if you even mail me about job openings in and around banglore which would be very helpful.
    I hope I would join the beautiful world of linux..
    Thanx Mr.Vivek for introducing me to various monitoring tools

  66. Check also netdata:

    netdata is a highly optimized Linux daemon providing real-time performance and health monitoring for Linux systems, applications and SNMP devices, over the web! It has been designed to permanently run on all systems, without disrupting the applications running on them.

    demo: http://my-netdata.io
    source: https://github.com/firehol/netdata
    wiki: https://github.com/firehol/netdata/wiki

    – real-time, per second updates, snappy refreshes!
    – 300+ charts out of the box, 2000+ metrics monitored!
    – zero configuration, zero maintenance, zero dependencies!
    – dozens of health monitoring alarms, out of the box!

Leave a Comment