Developping for the translation industry RSS 2.0



 Friday, 05 February 2010

You can place a robots.txt file in the root of your site to help inform search engines and other bots about the areas of your site that you don’t want them to access. For example, you may not want bots to access the content of your images folder: 

User-agent: *
Disallow: /images/
 

You can also provide instructions for particular bots. For example, to exclude Google image search from your entire site, use this: 

User-agent: Googlebot-Image
Disallow: /
 

The robots.txt standard is unfortunately very limited; it only supports the User-agent and Disallow fields, and the only wildcard allowed is when you specify it by itself in User-agent, as in the previous example.

Google has introduced support for a couple of extensions to the robots.txt standard. First, you can use limited patterns in pathnames. You can also specify an Allow clause. Since those extensions are specific to Google, you should probably only use them with one of the Google user agents or with Googlebot, which all of its bots recognize.

For example, you can block PNG files from all Google user agents as follows: 

User-agent: Googlebot
Disallow: /*.png$
 

As with regular expressions, the asterisk means to match any sequence of characters, and the dollar sign means to match the end of the string. Those are the only two pattern matching characters that Google supports.

To disable all bots except for Google, use this: 

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

To exclude pages with sort as the first element of a query string that can be followed by any other text, use this:

User-agent: Googlebot
Disallow: /*?sort
 

This clause will also work only woth the Google bots.

 

Other posts:

White House new Robots.txt

8 easy tips to drive traffic from search engines to your site

Huge List of Dumb and Crazy Laws in the United States

Tools for Web developers

Friday, 05 February 2010 14:26:11 (Eastern Standard Time, UTC-05:00)  #    Comments [0] -
General

Navigation
Advertisement
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2017
Stanislas Biron
Sign In
Statistics
Total Posts: 135
This Year: 0
This Month: 0
This Week: 0
Comments: 1
All Content © 2017, Stanislas Biron