How to create a robots.txt file

Since the mid 90s it has been agreed amongst all tech savvy internet people, that a robots.txt file on a server could specify access policies for robots. Creating a robots.txt file is very easy.

In order to get started open a text editor (mac) or notepad (windows) to create a unformatted text file with the file extension .txt.

What should be included in a robots.txt file?

The file should provide answers to two questions robots want answers for:

  • Do these rules apply to me?
  • What on this server am I not allowed to access?

Keep in mind, that the robots.txt file has to be placed in the root of your website (which is the same location where you put your index.html or index.php file)

User-agent

In your text editor/notepad, the first thing you’re going to write down, is to whom your policy applies. You’ll be using the User-agent command for this.

The value you’ll enter here is the name of the robot. If you use a * your policy will apply to all robots.

Example allowing all:

User-agent: *

Example allowing one robot, in this example Google:

User-agent: Googlebot

Disallow

Using the disallow command you specify which part of your server or website a robot is not allowed to access. This is a great way to shield off any non-public part of your website or server. For instance, your webmail, CMS location or any pages that require user logins.

So again, keep in mind the location of this file (in the root of your web directory). Suppose your CMS is located on www.yourdomainname.com/admin , then you probably want your Disallow-line to look like this:

Disallow: /admin/

If you want to disallow access to everything, this is the way to do it:

Disallow: /

You can add as many disallow lines as you want.

Comments

If, for whatever reason, you’d like to include comments, you can do so by starting your line with #

Example:

# this is a comment for me and not read by robots

Putting the robots.txt together

Your file should be looking something like this:

Sample disallowing everyone and everything:

#disallow all
User-agent: *
Disallow: /

Sample disallowing everyone, but Google

#disallow all, but Google. Google may access everything
User-agent: *
Disallow: /

#allowing Google
User-agent: Googlebot
Disallow:

#Google will now ignore your first command and use the policy you’ve set for them

Sample disallowing certain directories and files for all

#allowing all access, but blocking a few directories and files
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: mysecretfile.html
Disallow: mysecretfile.php

Sample disallowing a particular directory, but allowing one file in that directory

User-agent: *
Disallow: /directory/
Allow: /directory/filethatisallowed.php

Bookmark the permalink.

Comments are closed.