≡ Menu


Google Data Center Information

CNet has published an interesting information about Google data center and estimates that they have 2,00,000 servers spanned across 36 data centers across the globe. From the article:

On the other hand, Dean seemingly thinks clusters of 1,800 servers are pretty routine, if not exactly ho-hum. And the software company runs on top of that hardware, enabling a sub-half-second response to an ordinary Google search query that involves 700 to 1,000 servers, is another matter altogether.

Google doesn't reveal exactly how many servers it has, but I'd estimate it's easily in the hundreds of thousands. It puts 40 servers in each rack, Dean said, and by one reckoning, Google has 36 data centers across the globe. With 150 racks per data center, that would mean Google has more than 200,000 servers, and I'd guess it's far beyond that and growing every day.

(Fig.01: Google data center [credit:cnet news])

I'm well aware of HA and clustering technologies but this is massive setup with tons and tons of systems. Google uses distributed storage system and other in house developed tools.

Sounds like a great place to work :)

=> Google spotlights data center inner workings

Helmer: A Linux Commodity Computing cluster in a IKEA Helmer Cabinet

Rendering is the process of generating an image from a model, by means of computer programs. POV-Ray is one of such free software for rendering images. This article explains how to build home Linux render cluster using commodity computing technique:

3D computer rendering are very CPU intensive and the best way so speed up slow render problems, are usually to distribute them on to more computers. Render farms are usually very large, expensive and run using ALLOT of energy. I wanted to build something that could be put in my home, not make too much noise and run using very little energy... and be dirt cheep, big problem? :) no computer stuff cost almost nothing these days, it just a matter of finding fun stuff to play with.

(Fig.01: Helmer Linux Cluster)

=> This is the story of Helmer. A linux cluster in a IKEA Helmer cabinet.

List of open source cluster management systems

M. Shuaib Khan has published a list of open-source cluster management systems.

Personally, I had used openMosix and Red Hat Cluster software (which is also based upon open source software funded by Red Hat).

From the article: In computing world, the term "cluster" refers to a group of independent computers combined through software and networking, which is often used to run highly compute-intensive jobs. With a cluster, you can build a high-speed supercomputer out of hundreds or even thousands of relatively low-speed systems. Cluster management software offers an easy-to-use interface for managing clusters, and automates the process of queuing jobs, matching the requirements of a job and the resources available to the cluster, and migrating jobs across the cluster:

=> openMosix
=> Kerrighed
=> OpenSSI
=> Gluster

Read this article it offers feature, cons and pros of each solution.

Mount a Linux filesystem on a SAN from multiple nodes at the same time

If you try to mount an ext3 Linux filesystem on a SAN from multiple nodes at the same time you will be in serious deep trouble.

SAN based storage allows multiple nodes to connect to same devices at the same time. Ext3/2 are not cluster aware file system. They can lead to a disaster such as kernel panic, server hang, corruption etc.

You need to use something which supports:

  1. Useful in clusters for moderate scale out and shared SAN volumes
  2. Symmetrical Parallel Cluster File System, Journaled
  3. POSIX access controls

Both GFS (RedHat Global File System) and Lustre (a scalable, secure, robust, highly available cluster file system) can be used with SAN based storage allows multiple nodes to connect to same devices at the same time.

Many newbie get confused as Linux offers a number of file systems. This paper (Linux File System Primer) discusses these file systems, why there are so many, and which ones are the best to use for which workloads and data.