Missing Memory

by on March 13, 2009 · 8 comments· LAST UPDATED April 22, 2009

in , ,

Today, I've upgraded total 8 servers from 4GiB to 8GiB to improve performance of system by inserting additional memory modules. We started each server and checked for memory count at console. All severs booted normally after the upgrade and services such as SMTP, NFS, CIFS, HTTP started as expected. Shortly, afterwords I got a call from help desk about pop3 server for slow performance.

The pop3 server node was giving out timeout errors and download speed was very slow for all MUAs. I tried to ssh into box and it bounced back with 22: Connection refused error. I wasn't ready to take down server from rack; so I fired KVM over IP java client. Eventually, I found that server is reporting only 2GiB RAM instead of doubling the total memory. This was bad. The worst problem was, POP3 server node did not fall back to backup node. Our LVS (Linux Virtual Server based cluster) failed to detect problem. So I made few changes to pickup working POP3 node.

My investigation revealed that this memory problem occurred because the new RAM was incompatible with the server motherboard. I did verified the available RAM for first five nodes and went to back to office for something else. Another person hooked back the rest and told me that he verified the available RAM. Whenever, I perform memory upgrade, I always verify the amount of memory reported by the system when it is rebooted and I never assume the memory is there.

Another lesson learned - never ever trust third person. If he has verified the available RAM immediately after installing the new modules, we would noticed the problem immediately instead of waiting for users to complain back. Another reason not to perform upgrades on Fridays.

TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 8 comments… read them below or add one }

1 Rakesh Bhardwaj March 14, 2009 at 2:44 pm

Thanks for the information

Reply

2 phinoc March 15, 2009 at 1:28 am

“Another reason not to perform upgrades on Fridays” HAHAHAHAHA,
I love overtime bonus on my payment, but you not :p

Reply

3 harrywwc March 15, 2009 at 2:00 am

as I am often reminded when I delegate to another… “trust, but always verify”
.h

Reply

4 Samuel Huckins March 15, 2009 at 4:54 am

The 11th Commandment: Thou shalt make no hardware changes, neither shalt thou release software updates on the 5th day. :-)

Reply

5 Jonas March 16, 2009 at 8:55 am

This mailinglist seemed pretty interesting at a first glance but has time after time proven itself rather low-tech.. When upgrading servers it could be useful to make sure you stick the right kinds of memories in the machines, how insightful..

Reply

6 Jon March 17, 2009 at 4:48 pm

We always run memtest (until test#4) on all our new servers, you’d be amazed how many memory modules actually fail. (From my experience, maybe 1 of 1000. (including single-bit errors, and non-“critial” errors.))

I’m sure you could run many of these systems without noticing this problem, but better safe than sorry! :D

Reply

7 paresh April 10, 2009 at 6:06 am

all ways buy the memory from same vendor where from u buy server at least try from same vendor.

Reply

8 jeffatrackaid April 14, 2009 at 3:54 pm

On Red Hat, I’ve seen several issues where PAE kernels were not installed. Someone called their data center, had the ram updated, but failed to install the appropriate kernel. For would very large server provider, I saw them go through 4 sticks of RAM before realizing it was the OS.

Reply

Leave a Comment

Tagged as: , , , , , , , , , , , ,

Previous post:

Next post: