≡ Menu


Apache Web Server: Log Analysis and Server Status Monitoring Tool

wtop is really cool application for web server log analysis and to see server stats at a glance. It also has powerful log grepping capability. It is just like 'top' for your webserver.

It can find out number of searches or signups per seconds. It can also create histogram of response time. There is also another tool called logrep a powerful command-line program for ad-hoc analysis and filtering for log files. You can dig up lots of information using wtop tools.

You need Python version 2.5 to run wtop.

Download wtop

Type the following command:
$ cd /tmp
$ wget http://wtop.googlecode.com/files/wtop-0.5.6.tar.gz
$ tar -zxvf wtop-0.5.6.tar.gz
$ cd wtop-0.5.6
# python setup.py install

Configuring wtop

Once installed you can start using the tool immediately. You need to edit /etc/wtop.cfg file to setup parameters, Apache log files and other directives
# vi /etc/wtop.cfg
Sample configuration file:

# This must match your webserver log format. You MUST have at least %h, %r and %D
LOG_FORMAT=%h %l %u %t "%r" %>s %B "%{Referer}i" "%{User-Agent}i" %D
# max time before a request is logged in the "slow" column
# minimum requests/second before a URL class appears in top mode
# you can extend these to make any classes you wish
# the generic pattern is applied if a line does not match any
# of the named classes. By default it uses the top-level directory.
# incomplete list of known web robots
robots = r'(?:nutch|MSRBOT|translate.google.com|Feedster|Nutch|Gaisbot|Snapbot|VisBot|libwww|CazoodleBot|polybot|VadixBot|Sogou|SBider|BecomeBot|Yandex|Pagebull|chudo|Pockey|nicebot|entireweb|FeedwhipBOT|ConveraCrawler|NG/2.0|WebImages|Factbot|information-online|gsa-crawler|Jyxobot|SentinelCrawler|BlogPulseLive|YahooFeedSeeker|GurujiBot|wwwster|Y\!J-SRD|Findexa|SurveyBot|yetibot|discoveryengine|fastsearch|noxtrum|Googlebot|Snapbot|OGSearchSpider|heritrix|nutch-agent|Slurp|msnbot|cuill|Mediapartners|YahooSeeker|GrabPERF|keywen|ia_archiver|crawler.archive.org|Baiduspider|larbin|shopwiki)'

Now simply type wtop at a shell prompt:
$ wtop$
See all human traffic, enter:
$ logrep -m top -h access.log
See response times for all MSNBot homepage hits:
$ logrep -m grep -g MSNBot -i home -o status,msec,url access.log
Display the current log for traffic to pages about wordpress or themes sent from google.com
$ logrep -m tail --f 'url~wordpress|themes,ref~google.com' access.log

Further readings:

Download Google Gadgets for Linux

Google gadgets is an open-source implementation of Google gadgets platform for Linux and is now available for download. It is the first cross-platform desktop gadgets framework that works with Linux, Windows and Mac OS X computer system. From the google blog:

For Gadgets for Linux, we don't just want to simply release the final offering, but we also want to give everyone a chance to tinker with the code powering the gadgets. For this project, fostering a transparent and lively developer community is just as important as serving our users.

Google Gadgets for Linux provides a platform for running desktop gadgets under Linux, catering to the unique needs of Linux users. We are compatible with the gadgets written for Google Desktop for Windows as well as the Universal Gadgets on iGoogle. Following Linux norms, this project will be open-sourced, under the Apache License.

Google Gadgets for Linux

Download Google Gadgets for Linux

You can download Google Gadgets for Linux here at project website.

Goosh.org Unix-like Shell For Google

Neat idea:

goosh.org - the unofficial google shell. This google-interface behaves similar to a unix-shell.
You type commands and the results are shown on this page.

=> goosh.org

Google Data Center Information

CNet has published an interesting information about Google data center and estimates that they have 2,00,000 servers spanned across 36 data centers across the globe. From the article:

On the other hand, Dean seemingly thinks clusters of 1,800 servers are pretty routine, if not exactly ho-hum. And the software company runs on top of that hardware, enabling a sub-half-second response to an ordinary Google search query that involves 700 to 1,000 servers, is another matter altogether.

Google doesn't reveal exactly how many servers it has, but I'd estimate it's easily in the hundreds of thousands. It puts 40 servers in each rack, Dean said, and by one reckoning, Google has 36 data centers across the globe. With 150 racks per data center, that would mean Google has more than 200,000 servers, and I'd guess it's far beyond that and growing every day.

(Fig.01: Google data center [credit:cnet news])

I'm well aware of HA and clustering technologies but this is massive setup with tons and tons of systems. Google uses distributed storage system and other in house developed tools.

Sounds like a great place to work :)

=> Google spotlights data center inner workings

Google Removed An Open Source Project After DMCA Complaint

The Digital Millennium Copyright Act (DMCA) is a United States copyright law. Google has removed an open-source project that enables the proprietary CoreAVC high-definition video decoder to run under Linux operating system.

CoreAVC is a Windows codec for H.264 video developed by CoreCodec, which sells the codec in two versions, one priced at US$7.95 and another at $14.95. A Linux version is not available.

CoreAVC-for-Linux was an open-source project led by Google that developed patches which allow Linux applications, such as mplayer, to use the CoreAVC codec. A cached version of the project's Web page said video performance was the main motivation for creating Linux support for CoreAVC.

=> Google Takes Down Open-source Project After DMCA Complaint

Update [ 11:29 pm IST ] : A CoreCodec worker using the screen name BetaBoy told an internal forum last night that "The DMCA removal request and the project reinstatement was been sent to Google."

Open Source Programming Contest – Can You Code 24 Hours Non-Stop?

Hackontest is a 24 hour programming competition between various open source software projects. The event takes place at OpenExpo on September 24/25, 2008 in Zurich, Switzerland. The contest is sponsored by Google. From the contest page:

The participating teams may win cash prizes of USD 1000, 2500 and 5000. Next to fun and competition, the elected open source developers receive a free trip to Zurich, Switzerland, including accommodation and meals from September 23 - 26, 2008 up to USD 1000 each person.

The idea of the Hackontest event is three-fold:

  • First of all, hackers (=smart programmers) of open source projects meet physically during 24h and enhance their software with a certain feature. Thus their Free Software project gets improved in terms of code and the developers have a fun time meeting in one place and competing for some nice prizes.
  • Second, users of open source software get the opportunity to file features they've missed in their favorite applications and operating systems. Therefore, during the selection process everyone who registers may file feature requests and others may vote and comment on them.
  • And third, visitors of the Hackontest event get the chance to see the commitment and team work with which open source software is created. Like this, the public becomes more aware of the creative processes and the power of collaborative effort by international open source communities.

Hackontest web site (via ./)

Google Code University: Learn How To Code / Program

Good learning stuff - at no cost!

From the page:

This website provides tutorials and sample course content so CS students and educators can learn more about current computing technologies and paradigms. In particular, this content is Creative Commons licensed which makes it easy for CS educators to use in their own classes.

The Courses section contains tutorials, lecture slides, and problem sets for a variety of topic areas:

* AJAX Programming
* Distributed Systems
* Web Security
* Languages

In the Tools 101 section, you will find a set of introductions to some common tools used in Computer Science such as version control systems and databases.

The CS Curriculum Search will help you find teaching materials that have been published to the web by faculty from CS departments around the world. You can refine your search to display just lectures, assignments or reference materials for a set of courses.

=> Google Code University (via Digg)