≡ Menu

Apache

wtop is really cool application for web server log analysis and to see server stats at a glance. It also has powerful log grepping capability. It is just like 'top' for your webserver.

It can find out number of searches or signups per seconds. It can also create histogram of response time. There is also another tool called logrep a powerful command-line program for ad-hoc analysis and filtering for log files. You can dig up lots of information using wtop tools.

You need Python version 2.5 to run wtop.

Download wtop

Type the following command:
$ cd /tmp
$ wget http://wtop.googlecode.com/files/wtop-0.5.6.tar.gz
$ tar -zxvf wtop-0.5.6.tar.gz
$ cd wtop-0.5.6
# python setup.py install

Configuring wtop

Once installed you can start using the tool immediately. You need to edit /etc/wtop.cfg file to setup parameters, Apache log files and other directives
# vi /etc/wtop.cfg
Sample configuration file:

[main]
LOG_ROOT=/var/log/lighttpd/cyberciti.biz/
LOG_FILE=access.log
DEFAULT_OUTPUT_FIELDS=ts,class,ipcnt,ip,msec,uas,url
# This must match your webserver log format. You MUST have at least %h, %r and %D
LOG_FORMAT=%h %l %u %t "%r" %>s %B "%{Referer}i" "%{User-Agent}i" %D
[wtop]
# max time before a request is logged in the "slow" column
MAX_REQUEST_TIME=5000
# minimum requests/second before a URL class appears in top mode
MIN_RPS=0.2
[classes]
# you can extend these to make any classes you wish
home=^/(?:\?.*)?$
xml=\.xml(?:\?.*)?$
js=\.js(?:\?.*)?$
css=\.css(?:\?.*)?$
img=\.(?:png|gif|jpe?g|cur|ico|bmp)(?:\?.*)?$
[patterns]
# the generic pattern is applied if a line does not match any
# of the named classes. By default it uses the top-level directory.
generic=^/([^/\?]+)
# incomplete list of known web robots
robots = r'(?:nutch|MSRBOT|translate.google.com|Feedster|Nutch|Gaisbot|Snapbot|VisBot|libwww|CazoodleBot|polybot|VadixBot|Sogou|SBider|BecomeBot|Yandex|Pagebull|chudo|Pockey|nicebot|entireweb|FeedwhipBOT|ConveraCrawler|NG/2.0|WebImages|Factbot|information-online|gsa-crawler|Jyxobot|SentinelCrawler|BlogPulseLive|YahooFeedSeeker|GurujiBot|wwwster|Y\!J-SRD|Findexa|SurveyBot|yetibot|discoveryengine|fastsearch|noxtrum|Googlebot|Snapbot|OGSearchSpider|heritrix|nutch-agent|Slurp|msnbot|cuill|Mediapartners|YahooSeeker|GrabPERF|keywen|ia_archiver|crawler.archive.org|Baiduspider|larbin|shopwiki)'

Now simply type wtop at a shell prompt:
$ wtop$
See all human traffic, enter:
$ logrep -m top -h access.log
See response times for all MSNBot homepage hits:
$ logrep -m grep -g MSNBot -i home -o status,msec,url access.log
Display the current log for traffic to pages about wordpress or themes sent from google.com
$ logrep -m tail --f 'url~wordpress|themes,ref~google.com' access.log

Further readings:

Nice idea.

Michael Ogawa has created some stunning visualizations for open source software projects such as Apache, Python, Eclipse IDE, and Postgres. From the project home page:

This visualization, called code_swarm, shows the history of commits in a software project. A commit happens when a developer makes changes to the code or documents and transfers them into the central project repository. Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they will fade away. A histogram at the bottom keeps a reminder of what has come before.

  • Code Swarm - An experiment in organic software visualization. (via Digg)

Many of our regular readers like to know more about lighttpd hotlink protection using mod_rewrite. Lighttpd can use HTTP referrer to detect hotlink and can be configured to partially protect hosted media from inline linking, usually by not serving the media or by serving a different file.

Lighttpd anti hotlinking configuration - redirect to another media

Open lighttpd.conf configuration file:
# vi /etc/lighttpd/lighttpd.conf
Append the following directive to redirect to a default picture called /hotlink.png:

$HTTP["referer"] =~ ".*BADDOMAIN\.com.*|.*IMAGESUCKERDOMAIN\.com.*|.*blogspot\.com.*" {
  url.rewrite = ("(?i)(/.*\.(jpe?g|png))$" => "/hotlink.png" )
}

So if anyone from *.blogspot.com linked www.cyberciti.biz/image.png it will be replaced with www.cyberciti.biz/hotlink.png. I've written small script to detect excessive hotlink from log file and ban all those domains. Most types of electronic media can be redirected this way, including video files, music files, and animations etc.

Related: Apache web server user can stop leechers using mod_rewrite / .htaccess rules.

Linux: Install Django Open Source Framework

Django is a high-level Python Web framework (open source framework) that encourages rapid development and clean, pragmatic design. Django is awesome programming framework. Red hat magazine has published excellent tutorial:

In today's world, web development is all about turnaround. Businesses want to maximize production outcome while minimizing development and production time. Small, lean development teams are increasingly becoming the normal large development departments. Enter Django: a popular Python web framework that invokes the RWAD (rapid web application development) and DRY (don't repeat yourself) principles with clean, pragmatic design.

This article is not about teaching you how to program in Python, nor how to use the Django framework. It's about showing how to promote your Django applications onto an existing Apache or Lighttpd environment.

=> Installing/Configuring/Caching Django on your Linux server

FreeBSD has issued updated version of its Apache package. This release considered as important and encourage users of all prior versions to upgrade.

Cross-site request forgery (CSRF) vulnerability in the balancer-manager in mod_proxy_balancer for Apache HTTP Server 2.2.x allows remote attackers to gain privileges via unpsecified vectors.

The ap_proxy_http_process_response function in mod_proxy_http.c in the mod_proxy module in the Apache HTTP Server 2.0.63 and 2.2.8 does not limit the number of forwarded interim responses, which allows remote HTTP servers to cause a denial of service (memory consumption) via a large number of interim responses.

How do I upgrade Apache under FreeBSD?

Simply run the following two commands:
# portsnap fetch extract
# portupgrade -a
# portversion

You can now easily determine if your ISP throttling and shaping Bittorrent traffi with simple online tool.

From the project web page:

Certain ISPs have been shown to rate limit or block BitTorrent traffic sent by their customers. While there are multiple reports of this on the web, only a few ISPs have admitted that they manipulate BitTorrent traffic. And, to date, it is hard for users without networking expertise to gain evidence about the behavior of their ISP.

This test suite creates a BitTorrent-like transfer between your machine and our server, and determines whether or not your ISP is limiting such traffic. This is a first step towards making traffic manipulation by ISPs more transparent to their customers.

=> Glasnost: Test if your ISP is manipulating BitTorrent traffic

You can also load this tool on your own server or laptop computer running Apache and PHP 4.3 or above:
$ cd /var/www/
$ sudo apt-get install libpcap0.8 libpcap0.8-dev
$ wget http://broadband.mpi-sws.mpg.de/transparency/glasnost-1.1.tgz
$ tar -zxvf glasnost-1.1.tgz
$ cd glasnost
$ make
$ su -c "chmod a+s bt_client"
$ mkdir logs
$ chmod 0777 logs

Fire a web browser and type http://localhost/glasnost/selftest.php or http://your-domain.com/glasnost/selftest.php

Updated for accuracy!

Gzip is the most popular and effective compression method. Most modern web browser supports and accepts compressed data transfer. By gziping response time can reduced by 60-70% as compare to normal web page. The end result is faster web site experience for both dial up (they're not dead yet - I've dial up account for backup purpose) and broadband user. I've already written about speeding up Apache 2.x web access or downloads with mod_deflate.

mod_compress for Lighttpd 1.4.xx

Lighttpd 1.4.xx supports gzip compression using mod_compress. This module can reduces the network load and can improve the overall throughput of the webserver. All major http-clients support compression by announcing it in the Accept-Encoding header as follows:

Accept-Encoding: gzip, deflate

If lighttpd sees this header in the request, it can compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response:

Content-Encoding: gzip

This is used to negotiate the most suitable compression method. Lighttpd support deflate, gzip and bzip2.

Configure mod_compress

Open your lighttpd.conf file:
# vi /etc/lighttpd/lighttpd.conf
Append mod_compress to server.modules directive:
server.modules += ( "mod_compress" )
Setup compress.cache-dir to stored all cached file:
compress.cache-dir = "/tmp/lighttpdcompress/"
Finally, define mimetypes to get compressed. Following will allow to compress javascript, plain text files, css file,xml file etc:

compress.filetype           = ("text/plain","text/css", "text/xml", "text/javascript" )

Save and close the file. Create /tmp/lighttpdcompress/ file:
# mkdir -p /tmp/lighttpdcompress/
# chown lighttpd:lighttpd /tmp/lighttpdcompress/

Restart lighttpd:
# /etc/init.d/lighttpd restart

How do I enable mod_compress per virtual host?

Use conditional $HTTP host directive, for example turn on compression for theos.in:

$HTTP["host"] =~ "theos\.in" {
  compress.cache-dir = "/var/www/cache/theos.in/"
}

PHP dynamic compression

Open php.in file:
# vi /etc/php.ini
To compress dynamic content with PHP please enable following two directives:
zlib.output_compression = On
zlib.output_handler = On

Save and close the file. Restart lighttpd:
# service lighttpd restart

Cleaning cache directory

You need to run a shell script for cleaning out cache directory.

See also: