Close it

Senin, 14 Maret 2011

2 How To Use Robots.Txt File?►► URL : https://wwdq.blogspot.com/2011/03/how-to-use-robotstxt-file.html.

Share
Robots.txt file usage is sometimes ignored. On the other hand, it is an important factor for the webpages being indexed properly and very easy to setup.
I know that robots.txt is not something new. But, I’ve been preparing a SEO sheet for a while and wanted to share this small & useful portion with you.

What is robots.txt?

Robots.txt is a file that is used to exclude content from the crawling process of search engine spiders / bots. Robots.txt is also called the Robots Exclusion Protocol.

Why to use robots.txt?

In general, we prefer that our webpages are indexed by the search engines. But there may be some content that we don’t want to be crawled & indexed. Like the personal images folder, website administration folder, customer’s test folder of a web developer, no search value folders like cgi-bin, and many more. The main idea is we don’t want them to be indexed.

Is robots.txt file a certain solution?

No. Standards based bots like Google’s, Yahoo’s or other big search engine’s robots listen to your robots.txt file. This is because they are programmed to. If configured so, any search engine bot can ignore the robots.txt file. Result: there is no guarantee.

How to use robot.txt file?

Robots.txt file has some simple directives which manages the bots. These are:
  • User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
  • Disallow: defines which folders or files will be excluded. None means nothing will be excluded, / means everything will be excluded or /folder name/ or /filename can be used to specify the values to excluded. Folder name between slashes like /folder name/ means that only folder name/default.html will be excluded. Using 1 slash like /folder name means all content inside the folder name folder will be excluded.
There are also some other parameters which are only supported by all browsers. These are:
  • Allow: this parameter works just the opposite of Disallow. You can mention which content will be allowed to be crawled here. * is a wildcard.
  • Request-rate: defines pages/seconds to be crawled ratio. 1/20 would be 1 page in every 20 second.
  • Crawl-delay: defines howmany seconds to wait after each succesful crawling.
  • Visit-time: you can define between which hours you want your pages to be crawled. Example usage is: 0100-0330 which means that pages will be indexed between 01:00 AM – 03:30 AM GMT.
  • Sitemap: this is the parameter where you can show where your sitemap file is. You must use the complete URL addres for the file.

Robots.txt example:

User-agent: * #allows all search engine spiders.
Disallow: /secretcontent/ #disallow them to crawl secretcontent folder.
Resources:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
http://www.robotstxt.org/
http://www.searchtools.com/robots/robots-txt.html
http://en.wikipedia.org/wiki/Robots.txt

Post Info :
Judul = How To Use Robots.Txt File?

Url = https://wwdq.blogspot.com/2011/03/how-to-use-robotstxt-file.html

Author = Riloaw. At : 16.08 Senin, 14 Maret 2011 | Comment:2

Artikel Terkait

Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

2 komentar:

  1. I have no words for this great post such a awe-some information i got gathered. Thanks to Author.
    SEO tools

    BalasHapus
  2. at firsti don't know what is robots.txt, after i read this article i know what is robots.txt, thanks

    BalasHapus

No Spam Please :)


Tambahkan Bookmark halaman ini untuk mempermudah Pencarian Anda [CTRL+D].