Friday nights are special for my friends and me. Usually I and other friends hang out in near by pubs. Most of us are network/sys admins. Generally we goto Barton Centre (it is on 13 floor) or 1912 which is located at St Marks road or some other funky place. Usually lots of discussion (cricket, sports, latest bugs/hacks and so on) take place along with fun.
Last Friday I was explaining them about my Monday blues. At the middle of afternoon, I was asked to reboot over 100 servers. It took me more than 45+ minutes. Because of some problem loads on our server increased such way the entire service went down for more than 2-3 hrs. It is believed that this outage cost at least 1/2 million dollar in revenue lost to my employer. Later in evening, everything was restored… While I was, explaining friends how horrible experience was, Sandy joined us. He works for one the biggest Internet companies in the world. He looked to me and laughed and said only 100 servers? Then sandy dropped the bomb on us, he claimed that few months back they rebooted over 15000+ servers in 4 hrs time. :O Sure we don’t use reboot command to reboot server farm. Reboot operation is done using automated software & hardware based solutions. So my question is how many servers have you rebooted at a time? One, two, or thousand severs?