A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program, which is also known as a "spider" or a "bot." Crawlers are typically programmed to visit sites that have been submitted by their owners as new or updated. Entire sites or specific pages can be selectively visited and indexed. Crawlers apparently gained the name because they crawl through a site a page at a time, following the links to other pages on the site until all pages have been read.
The crawler for the AltaVista search engine and its Web site is called Scooter. Scooter adheres to the rules of politeness for Web crawlers that are specified in the Standard for Robot Exclusion (SRE). It asks each server which files should be excluded from being indexed. It does not (or can not) go through firewalls. And it uses a special algorithm for waiting between successive server requests so that it doesn't affect response time for other users.
Continue Reading About crawler
- Search Engine Watch describes how search engines work and lists the names of the crawler programs used by each major search engine.
- The Web Server Administrator's Guide to the Robots Exclusion Protocol describes how to exclude specific pages from being visited by crawlers.