Web spider is a program or automated script which browses the World Wide Web (WWW sites) in a systematic, automated manner. In particular search engines use spiders to crawling web pages.
You can write a simple spider and scraper that collects Internet content using perl, python, ruby or other scripting languages
Web spiders are software agents that traverse the Internet gathering, filtering, and potentially aggregating information for a user. Using common scripting languages and their collection of Web modules, you can easily develop Web spiders. This article shows you how to build spiders and scrapers for Linux to crawl a Web site and gather information, stock data, in this case. A spider is a program that crawls the Internet in a specific way for a specific purpose. The purpose could be to gather information or to understand the structure and validity of a Web site. Spiders are the basis for modern search engines, such as Google and AltaVista. These spiders automatically retrieve data from the Web and pass it on to other applications that index the contents of the Web site for the best set of search terms.
Similar to a spider, but with more interesting legal questions, is the Web scraper. A scraper is a type of spider that targets specific content from the Web, such as the cost of products or services. Read more at IBM developerworks...TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!
- 30 Cool Open Source Software I Discovered in 2013
- 30 Handy Bash Shell Aliases For Linux / Unix / Mac OS X
- Top 30 Nmap Command Examples For Sys/Network Admins
- 25 PHP Security Best Practices For Sys Admins
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- 20 Linux Server Hardening Security Tips
- Linux: 20 Iptables Examples For New SysAdmins
- Top 20 OpenSSH Server Best Security Practices
- Top 20 Nginx WebServer Best Security Practices
- 20 Examples: Make Sure Unix / Linux Configuration Files Are Free From Syntax Errors
- 15 Greatest Open Source Terminal Applications Of 2012
- My 10 UNIX Command Line Mistakes
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- The Novice Guide To Buying A Linux Laptop