≡ Menu

faulty memory

Why Does The Segmentation Fault Occur on Linux / UNIX Systems?

According to wikipedia:

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

Usually signal #11 (SIGSEGV) set, which is defined in the header file signal.h file. The default action for a program upon receiving SIGSEGV is abnormal termination. This action will end the process, but may generate a core file (also known as core dump) to aid debugging, or perform some other platform-dependent action. A core dump is the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally.

Segmentation fault can also occur under following circumstances:

a) A buggy program / command, which can be only fixed by applying patch.

b) It can also appear when you try to access an array beyond the end of an array under C programming.

c) Inside a chrooted jail this can occur when critical shared libs, config file or /dev/ entry missing.

d) Sometime hardware or faulty memory or driver can also create problem.

e) Maintain suggested environment for all computer equipment (overheating can also generate this problem).

Suggestions to debug Segmentation Fault errors

To debug this kind of error try one or all of the following techniques :

  • Use gdb to track exact source of problem.
  • Make sure correct hardware installed and configured.
  • Always apply all patches and use updated system.
  • Make sure all dependencies installed inside jail.
  • Turn on core dumping for supported services such as Apache.
  • Use strace which is a useful diagnostic, instructional, and debugging tool.
  • Google and find out if there is a solution to problem.
  • Fix your C program for logical errors such as pointer, null pointer, arrays and so on.
  • Analyze core dump file generated by your system using gdb

Further readings:

Please add your suggestions and debugging techniques in the comment below.

Linux server memory check

If your server crashes regularly it could be a buggy kernel, a driver, power supply or any other hardware part. Memory (RAM) is one of the critical server parts. Bad memory can cause various problems such as random Linux server restart or program segfaults.

Generally, I recommend using memtester command. It is an effective userspace tester for stress-testing the memory subsystem. It is very effective at finding intermittent and non deterministic faults under Linux.

Recently Rahul shah email me another interesting method for testing memory. His idea is based upon md5 checksum and dd command.

First find out memory site using free command.
$ free

 total       used       free     shared    buffers     cached
Mem:        768304     555616     212688          0      22012     270996
-/+ buffers/cache:     262608     505696
Swap:       979956          0     979956

In above example my server has 768304K memory. Now use dd command as follows:
$ dd if=/dev/urandom bs=768304 of=/tmp/memtest count=1050
$ md5sum /tmp/memtest; md5sum /tmp/memtest; md5sum /tmp/memtest

According to him if the checksums do not match, you have faulty memory guaranteed. Read dd command man page to understand all options. dd will create /tmp/memtest file. It will cache data in memory by filling up all memory during read operation. Using md5sum command you are reading same data from memory (as it was cached).

Look like a good hack to me. However I still recommend using memtester userland program. Another option is to use memtest86 program ISO. Download ISO, burn the same on a CD, reboot your system with it test it (it may take more time). From project home page:
Memtest86 is thorough, stand alone memory test for x86 architecture computers. BIOS based memory tests are a quick, cursory check and often miss many of the failures that are detected by Memtest86.