You can place a robots.txt file in the root of your site to help inform search engines and other bots about the areas of your site that you don’t want them to access. For example, you may not want bots to access the content of your images folder:
User-agent: * Disallow: /images/
You can also provide instructions for particular bots. For example, to exclude Google image search from your entire site, use this:
User-agent: Googlebot-Image Disallow: /
The robots.txt standard is unfortunately very limited; it only supports the User-agent and Disallow fields, and the only wildcard allowed is when you specify it by itself in User-agent, as in the previous example.
Google has introduced support for a couple of extensions to the robots.txt standard. First, you can use limited patterns in pathnames. You can also specify an Allow clause. Since those extensions are specific to Google, you should probably only use them with one of the Google user agents or with Googlebot, which all of its bots recognize.
For example, you can block PNG files from all Google user agents as follows:
User-agent: Googlebot Disallow: /*.png$
As with regular expressions, the asterisk means to match any sequence of characters, and the dollar sign means to match the end of the string. Those are the only two pattern matching characters that Google supports.
To disable all bots except for Google, use this:
User-agent: * Disallow: /
User-agent: Googlebot Allow: /
To exclude pages with sort as the first element of a query string that can be followed by any other text, use this:
User-agent: Googlebot Disallow: /*?sort
This clause will also work only woth the Google bots.
White House new Robots.txt
8 easy tips to drive traffic from search engines to your site
Huge List of Dumb and Crazy Laws in the United States
Tools for Web developers
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.