A robots.txt file identifies for web crawlers which sections of your site you don’t want indexed by search engines. Robots.txt is also called the Robots Exclusion Protocol.
Website owners generally want search engine robots to crawl and index their sites. However, there may be reasons why, as a website owner, you don’t want your entire site indexed. That’s where a robots.txt file comes in handy. It tells the web crawlers which pages to leave out of the index.
Say, for example, you have a website administration folder or a customer’s test folder as part of your website structure. There’s no value in having that content indexed, so the robots.txt file is a way of telling the search engines to ignore these pages.
Guaranteed exclusion? – Standards-based web crawlers (Google, Bing/Yahoo!) and other well-known search engine robots look for and heed your robots.txt file instructions. Lesser-known web crawlers may not, so there’s no assurance that your identified pages won’t be indexed somewhere on the web.
Is a robots.txt file hard to build? – With LiveEdit, you can simply browse to your robots.txt file and upload it to the root of your domain. Building the file can be fairly simple. You can search for a tool to help you generate a robots.txt file (search the web for “create robots.txt file”). To create a robots.txt file manually, simply open a basic text editor. The most basic robots.txt file uses two rules:
User-agent: the robot to which the following rule applies
Disallow: the URL you don’t want indexed
The user-agent and disallow lines are considered a single entry. You can include as many entries as you want in a single robots.txt file. Here’s a basic example:
This robots.txt file tells all robots (user agents) to go anywhere and index whatever is found on the site. The following command tells robots that you don’t want your site indexed.
The only difference is the slash (“/”), so it’s important, even with a basic robots.txt file, to say what you mean and mean what you say.
For simple websites, you most likely will not have to worry about creating a robots.txt file. However, as your site grows in complexity, and as you become more proficient in SEO techniques, you might want to consider trying a few experiments.