Categorized | News

How Robots Work

Posted on 08 December 2003 by Demian Turner

Quite an interesting read over at ongoing:

Every search-engine robot ever written uses essentially the same approach, and it’s a simple one:

  1. Select one of the URIs you know about.

  2. Do a GET on that URI.

  3. Decide whether you can index whatever you got back. If not, go to Step 1.

  4. Update the search-engine index with what you just fetched.

  5. Extract the hyperlinks from what you just fetched.

  6. Add the URIs from those hyperlinks to the list you know about.

  7. Go to Step 1.

[…] You can test all you want on your private network, but when you send your little software child off into the wilds of the Net, you can be sure that all sorts of weird stuff is going to start happening.

Bookmark and Share

Leave a Reply

Categories

Books

Demian Turner's currently-reading book recommendations, reviews, favorite quotes, book clubs, book trivia, book lists

Facebook