Automated search engine robots, sometimes called "spiders" or "crawlers", are the seekers of web pages. How do they work? What is it they really do? Why are they important?
You'd think with all the fuss about indexing web pages to add to search engine databases, that robots would be great and powerful beings. Wrong. Search engine robots have only basic functionality like that of early browsers in terms of what they can understand in a web page. Like early browsers, robots just can't do certain things. Robots don't understand frames, Flash movies, images or JavaScript. They can't enter password protected areas and they can't click all those buttons you have on your website. They can be stopped cold while indexing a dynamically generated URL and slowed to a stop with JavaScript navigation. How Do Search Engine Robots Work?
Think of search engine robots as automated data retrieval programs, traveling the web to find information and links.
When you submit a web page to a search engine at the "Submit a URL" page, the new URL is added to the robot's queue of websites to visit on its next foray out onto the web. Even if you don't directly submit a page, many robots will find your site because of links from other sites that point back to yours. This is one of the reasons why it is important to build your link popularity and to get links from other topical sites back to yours.
When arriving at your website, the automated robots first check to see if you have a robots.txt file. This file is used to tell robots which areas of your site are off-limits to them. Typically these may be directories containing only binaries or other files the robot doesn't need to concern itself with.
Robots collect links from each page they visit, and later follow those links through to other pages. In this way, they essentially follow the links from one page to another. The entire World Wide Web is made up of links, the original idea being that you could follow links from one place to another. This is how robots get around.
The "smarts" about indexing pages online comes from the search engine engineers, who devise the methods used to evaluate the information the search engine robots retrieve. When introduced into the search engine database, the information is available for searchers querying the search engine. When a search engine user enters their query into the search engine, there are a number of quick calculations done to make sure that the search engine presents just the right set of results to give their visitor the most relevant response to their query.
You can see which pages on your site the search engine robots have visited by looking at your server logs or the results from your log statistics program. Identifying the robots will show you when they visited your website, which pages they visited and how often they visit. Some robots are readily identifiable by their user agent names, like Google's "Googlebot"; others are bit more obscure, like Inktomi's "Slurp". Still other robots may be listed in your logs that you cannot readily identify; some of them may even appear to be human-powered browsers.
Along with identifying individual robots and counting the number of their visits, the statistics can also show you aggressive bandwidth-grabbing robots or robots you may not want visiting your website. In the resources section of the end of this article, you will find sites that list names and IP addresses of search engine robots to help you identify them. How Do They Read The Pages On Your Website?
When the search engine robot visits your page, it looks at the visible text on the page, the content of the various tags in your page's source code (title tag, meta tags, etc.), and the hyperlinks on your page. From the words and the links that the robot finds, the search engine decides what your page is about. There are many factors used to figure out what "matters" and each search engine has its own algorithm in order to evaluate and process the information. Depending on how the robot is set up through the search engine, the information is indexed and then delivered to the search engine's database.
The information delivered to the databases then becomes part of the search engine and directory ranking process. When the search engine visitor submits their query, the search engine digs through its database to give the final listing that is displayed on the results page.
The search engine databases update at varying times. Once you are in the search engine databases, the robots keep visiting you periodically, to pick up any changes to your pages, and to make sure they have the latest info. The number of times you are visited depends on how the search engine sets up its visits, which can vary per search engine.
Sometimes visiting robots are unable to access the website they are visiting. If your site is down, or you are experiencing huge amounts of traffic, the robot may not be able to access your site. When this happens, the website may not be re-indexed, depending on the frequency of the robot visits to your website. In most cases, robots that cannot access your pages will try again later, hoping that your site will be accessible then.
Resources
*SpiderSpotting - Search Engine Watch http://searchenginewatch.com/webmasters/spiders.html
*Robotstxt.org List of robots and protocols for setting up a robots.txt file. http://www.robotstxt.org/
*Spider-Food Tutorials, forums and articles about Search Engine spiders and Search Engine Marketing. http://spider-food.net/
*Spiderhunter.com Articles and resources about tracking Search Engine spiders. http://www.spiderhunter.com/
*Sim Spider Search Engine Robot Simulator Search Engine World has a spider that simulates what the Search Engine robots read from your website. http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
Daria Goetsch is the founder and Search Engine Marketing Consultant for Search Innovation Marketing, a Search Engine Optimization company serving small businesses. She has specialized in Search Engine Promotion since 1998, including three years as the Search Engine Specialist for O'Reilly Media, Inc., a technical book publishing company.
Copyright © 2002-2005 Search Innovation Marketing. http://www.searchinnovation.com All Rights Reserved.
Permission to reprint this article is granted if the article is reproduced in its entirety, without editing, including the bio information. Please include a hyperlink to http://www.searchinnovation.com when using this article in newsletters or online.
![]() |
|
![]() |
|
![]() |
|
![]() |
I was recently contacted by one of my best clients... Read More
My Grandfather ran a small Grocery Store and when you... Read More
Internet users have never had it better what with Yahoo!,... Read More
Get Indexed FastWhat does getting indexed mean?The search engines keep... Read More
You don't have to be a rocket scientist to know... Read More
Welcome to part five in this search engine positioning series.... Read More
Search Engine Optimization (SEO) is something you should be aware... Read More
How can you be found on the web?The web is... Read More
Most webmasters have no idea on how to make a... Read More
Question 1Does it help to track visitor behavior on websites... Read More
What's all the talk about links we hear about? Reciprocal... Read More
Watching a recent football game, I imagined two very different... Read More
There is a way to generate links with the content... Read More
It only makes sense. You have an e-commerce catalog site.... Read More
Search engine optimization sounds so daunting for most young companies.... Read More
The world of internet marketing is a highly competitive place.... Read More
Google applied for a patent on their ranking algorithm as... Read More
Search engines are constantly tweaking their ranking algorithms and when... Read More
I am ranked #1 for that silly phrase at Google.... Read More
Almost all SEO's agree that using too much javascript can... Read More
Everyone seems to want the benefits from working at home:... Read More
Onpage optimization is the process by which various elements on... Read More
Among the many things you need to worry about for... Read More
Google now checks the year your domain name was first... Read More
SEO or search engine optimization strategy now becomes widely popular... Read More
A recent Search Engine Experiment Demonstrates how by combining Key... Read More
Sometimes questions will arise around the subject of gateway information... Read More
When search engines first appeared, they were simple affairs consisting... Read More
Think about the first thing you ever heard about "marketing... Read More
If you're looking for an SEO firm, we recommend that... Read More
We have all heard that adding quality content to your... Read More
When online "Use it. Use it. Use it."Google is our... Read More
It's taken you 6-months of hard work, constant changes, reading... Read More
OK, you published your site, now you just sit by... Read More
Utilizing effective search engine optimization techniques will improve the page... Read More
Onpage search engine optimization are things that you can change... Read More
You need to be extremely careful with keyword research so... Read More
The question for this article is whether or not you... Read More
What's the difference?For those who aren't quite clear what the... Read More
What is Search Engine Optimization?Search Engine Optimization or SEO for... Read More
What do the words "Search Engine" make you think of?... Read More
When Paypal's official Web site no longer ranked #1 in... Read More
Keyword Research is the first task in optimizing your web... Read More
With tons of competition and copycats online, you need a... Read More
Achieving a top ranking position in Google is every webmasters... Read More
In the last 2-3 years many new companies have mushroomed,... Read More
The following are a list of mistakes can ensure that... Read More
Internet Directories and their ImportanceThere are two very pertinent reason... Read More
Before you make drastic changes to your website after a... Read More
You may have heard how important it is to have... Read More
Having your website rank well in the major search engines... Read More
Google use a very complex function to determine which search... Read More
When exploring good keyword choices there are several steps one... Read More
1. What is PageRank? Here is what Google says:" PageRank... Read More
Everybody knows that getting indexed in Google is getting more... Read More
Think about the first thing you ever heard about "marketing... Read More
Often, sites view seo and PPC marketing as exclusive marketing... Read More
Uniquely built web sites can create unique issues when being... Read More
One of the important factors in ranking high in search... Read More
If you have an online business or you just use... Read More
Anchor Text (also called phrase linking) can significantly improve your... Read More
The major search engines are always on the lookout to... Read More
The popularity of weblogs or blogs has gotten a lot... Read More
I knew things were bad at DMOZ. But I guess... Read More
Creating a well-designed website is the first step in your... Read More
Search engine optimization experts are having fun with Google. Experts,... Read More
Search Engine Optimization (SEO) |