Learn SEO - Crawling

Every website is of interest to a crawler. However, a website will be passed on to be indexed when there’s a readable, aka Friendly URL, and readable content. Only codes will be read, but will be considered to be not valuable. Text and images will. As soon as that’s been found your site will get indexed.

Can you influence a crawler or stop it?

On server level it’s possible to influence and stop a crawlbot. Influencing it is very simply done with a text file called robotos.txt. In this file, you guide the crawler by telling it which directory on the server is and isn’t allowed to be crawled. Read how to set up a robots.txt file here.

When certain crawlers really “bother” you, you can block them using a .htaccess file. This is also on server level. Normally speaking, the crawl bots from search engines won’t cause any problems, but there are crawlers from SEO related companies, who cause a great deal of traffic to your site with the sole purpose of having their name show up in your Google Anylitics or other statistics software. This form of advertisement, or spam, usually targets blog sites. Even though, it’s not damaging, it can be pretty annoying to have them pop up in your statictics, so many sites choose to block this traffic. It’s something that needs constant maintenance though.

There’s also something you can do in the coding of your website. Using the meta tag “robot” you can also set whether a crawler is allowed to crawl a page or not. While robots.txt is mainly used to shield off directories (for instance, the location of your admin pages, which you don’t want to show up in a search engine), these meta tags can be used per page.