Robots.txt File

Definition

What is the robots.txt file?

Simply put, the robots.txt file is a text file that tells search engine crawlers how to crawl the pages and files—and which pages to not crawl—on a website.

If you want a more in-depth understanding of this topic, check out the FAQ section below:

Question #1: How does a robots.txt file work?

As we have seen earlier, a robots.txt file is designed to give web crawlers instructions on how to crawl a particular website—which means it is the first thing these crawlers look at before indexing the pages and files on your website.

You can use it to do all sorts of handy things, including:

  1. Telling web crawlers which pages on your website to crawl and which ones not to crawl – Doing so helps prevent your website from being overloaded and slowing down or completely crashing due to the extra load from search engine crawlers.
  2. Telling web crawlers how fast to index the pages on your website – This is another way you can prevent your website from being overloaded and ensure that there is always bandwidth left for actual human visitors to your website.
  3. Preventing web crawlers from crawling specific files—such as PDF files and images—on your website – This is particularly useful if there are files on your website that you do not want to be indexed by search engines.

Question #2: Can I use my robots.txt file to keep specific pages on my website from showing up in search?

No, you cannot use your robots.txt file to keep specific pages on your website from showing up in search. This is because even non-crawled web pages can still be pulled as a result in search engine results pages (SERPs) if other web pages point to it. The only difference is that it would not have a description.

If you want to make sure specific pages on your website do not show up in search, you can:

  1. Add a noindex tag to each page. If you decide to go with this option, make sure you allow web crawlers to crawl each of these pages so they can see the noindex instruction. One thing to consider about this approach, however, is that it will not work with crawlers (and users) that do not acknowledge the noindex tag—which brings us to option number two:
  2. Completely remove the pages from your website or unpublish them. Of course, you can only do this for pages that serve no actual purpose and will not break your website when removed—which brings us to the final option:
  3. Password-protect each page. Only allow users you want to see it in. This is perfect for things such as members-only pages.

Question #3: Where should I put the robots.txt file on my website?

To ensure that web crawlers see your robots.txt file, you need to put it in the main directory of your website. Otherwise, web crawlers would simply assume that you do not have one and proceed to crawl—based on their default setting—all the pages and files on your websites.

For example, if your main domain is www.thisismybrand.com, then your robots.txt file should be accessible via www.thisismybrand.com/robots.txt and not anywhere else, such as www.thisismybrand.com/products/robots.txt or www.thisismybrand.com/about/robots.txt.

If you own more than one domain, you will need a separate robots.txt file for each one. In addition, the filename is case sensitive and must be written as is, not Robots.txt, ROBOTS.TXT, or robots.TXT

For more information on placing Robots.txt files on your website check out this helpful guide from the Google Developers centre.

Question #4: What is the difference between robots.txt and meta robots?

The main differences between robots.txt and meta robots are as follows:

  1. Robots.txt is an actual file while meta robots is a meta tag. The former contains instructions while the latter is an actual piece of instruction.
  2. The robots.txt file affects the entire website while the meta robots tag only affects the specific page or element it is directed towards. This means that you only need one robots.txt file but you can have a lot of meta robots tags per domain.

Question #5: What is the difference between robots.txt and x-robots?

The main differences between robots.txt and x-robots are practically the same as those between robots.txt and meta robots. X-robots is also a meta tag that only affects pages and sections it is directed towards.