About nixCraft

Topics

How to write a Web spider on Linux system

Posted by Vivek Gite [Last updated: November 14, 2006]

Web spider is a program or automated script which browses the World Wide Web (WWW sites) in a systematic, automated manner. In particular search engines use spiders to crawling web pages.

You can write a simple spider and scraper that collects Internet content using perl, python, ruby or other scripting languages

Web spiders are software agents that traverse the Internet gathering, filtering, and potentially aggregating information for a user. Using common scripting languages and their collection of Web modules, you can easily develop Web spiders. This article shows you how to build spiders and scrapers for Linux to crawl a Web site and gather information, stock data, in this case. A spider is a program that crawls the Internet in a specific way for a specific purpose. The purpose could be to gather information or to understand the structure and validity of a Web site. Spiders are the basis for modern search engines, such as Google and AltaVista. These spiders automatically retrieve data from the Web and pass it on to other applications that index the contents of the Web site for the best set of search terms.

Similar to a spider, but with more interesting legal questions, is the Web scraper. A scraper is a type of spider that targets specific content from the Web, such as the cost of products or services. Read more at IBM developerworks...

E-mail this to a Friend    Printable Version

While gathering with your family and friends this month, make sure you capture each and every moment with a FREE 4 GB SD memory card with the purchase of select digital cameras from Canon, Nikon, Pentax and others from Amazon.com.

You may also be interested in other helpful articles:

Leave a Reply

We encourage your comments, and suggestions. But please stay on topic, be polite, and avoid spam. Thank you very much for stopping by our site!

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word

Copyright © 2004-2008 nixCraft. All rights reserved - TOS/Disclaimer - Privacy policy - Sitemap - Powered by Open source software.