There could be a number of reasons you wouldn’t want a search engine to index your site, I was working on a new site for a customer recently and as it contained many pages from their original site and I didn’t want Google, Bing or any other search engines for that matter to index it.
Having this new test site indexed could cause them issues with the old site and maybe even get them a duplicate content penalty or something, just not worth the aggro.
So what to do?
Started looking at this and I came up with a few snazzy ideas then immediately kicked myself for over thinking the issue. All I needed to do was create a robots.txt file and tell the world not to index the site.
Here’s the snippet you need to place into the robots.txt file on the root of your domain:
User-agent: * Disallow: /
Yes, it’s as simple as that!
Just make sure the file permissions on the file allow anyone to read it and now when a search engine bot hits your site it will read that file and see the ‘please don’t index me’ request and be on its merry way.
Well, mostly there are a number of bots that do not honour the robots.txt file but those we deal with in other ways and in another post.
For more information on the configuration options for robots.txt file head over to the Google help page Learn about robots.txt files