Skip to main content

Understanding Robot.txt Files For Blogspot Blogs

We begin our discussion by understanding what robot.txt files are all about and how they influence search crawlers. Robot.txt files (as set up by webmasters) are used by search engine spiders to determine which pages of a particular domain should be indexed by it and which pages should be left out of the index.

Blogspot by default restricts search engines from indexing Label pages of Blogspot blogs. This is done with the help of robot.txt files. 

Blogspot Labels

Labels help bloggers categorize their posts easily. For example all my posts related to 'Blogging' can be found here :
http://www.inkjam.org/search/label/Blogging

An interesting thing to note here is that fact that a Blogspot blog does not have separate page assigned for its 'Labels'. In fact, post categorized under a label can be accessed only with a 'search' command. 

Robots.txt for Blogspot Blogs 

As stated earlier, Blogspot by default prevents web spiders from crawling its Label pages. This is how the Robots.txt files of a blogspot blog (by default) looks like :
User-agent: *
Disallow: /search
Allow: /

Sitemap: http://abc.blogspot.com/feeds/posts/default?orderby=updated 
The "User-agent: *" command implies that this section applies to all robots. 
The "Disallow: /search" command tells the web spiders not to index pages that use the search command.

Note that the slash (/) stands for your homepage URL. The use of the above robot.txt files can be better explained with an example. Let us imagine that your homepage URL is http://abc.blogspot.com/

The cumulative effect of the above robot.txt files would result in the search engines indexing all your pages except pages that use the following URL structure :
http://abc.blogspot.com/search/

Thus label pages which can be accessed only using a search command are kept out of search engine index.

To get search engines to index your blogspot blog's label pages the above mentioned robot.txt files can be modified as follows:
User-agent: *
Disallow: 
Allow: /

Sitemap: http://abc.blogspot.com/feeds/posts/default?orderby=updated 
Accessing your Robots.txt files 

Robot.txt files for your blogspot blog can be accessed as follows:

1. Go to 'Settings' and then click on 'Search Preferences'
2. Under 'Crawlers and Indexing' 'Edit' the 'Custom Robot.txt'
3. 'Enable Custom Robots.txt content'to access your Blogspot blog's robot.txt files
Important Note: I would strongly recommend that you do not modify your robots.txt files to allow search engines to index your Label pages as this might lead to duplicate content issues.

Setting up robot.txt files to prevent Search Engines from indexing Blogspot Archive pages 

Archive pages if indexed by search engines can lead to duplicate content worries and push your search rankings down. You can however, set up robot.txt files to prevent search engines from indexing your Blogspot blogs's Archive pages.

Continuing with our example blog - http://abc.blogspot.com

The monthly archive pages for the said blog would look something like this:
Month URL
January, 2013 http://abc.blogspot.com/2013_01_01_archive.html 
February, 2013 http://abc.blogspot.com/2013_02_01_archive.html 
March, 2013 http://abc.blogspot.com/2013_03_01_archive.html 
and so forth...

You can remove Blogspot archive pages by modifying the robots.txt files as follows:
User-agent: *
Disallow: /search
Disallow: /2013_01_01_archive.html 
Disallow: /2013_02_01_archive.html 
Disallow: /2013_03_01_archive.html 
Allow: /

Sitemap: http://www.example.com/feeds/posts/default?orderby=updated
Note that you will have to manually add disallow tags for each of your archive pages in a manner similar to what has been displayed above.

Popular posts from this blog

Robbers Vs Lawyers Joke

A gang of robbers broke into a lawyer's club by mistake. The old legal lions gave them a fight for their life and their money. The gang was very happy to escape. "It ain't so bad," one crook noted. "We got $25 between us." The boss screamed: "I warned you to stay clear of lawyers--we had $1000 when we broke in!"

Vintage Indian Print Ads: Bring On The Nostalgia [With Pictures]

In this post we will be featuring a few vintage Indian print ads that made a strong case for their respective brands; compelling the Indian consumers to desire and buy more. Note the ads have not been arranged in any particular order. 1. We start off with an ad promoting the iconic Ambassador car. The ad promotes the Ambassador as a "the big size family car" keeping in mind the needs of a large Indian (often joint) family. 2. Next is a poster of Superstar Amitabh Bacchan promoting Bombay Dyeing clothing. 3. Next in is an iconic Amul advertising featuring a little girl. The ad talks about a little butter on a hot chapati making for a delicious one course meal. 4. A Bata Ad for girls who move in fun circles. 5. An interesting ad featuring Kishore Kumar promoting Brylcreem hair gel.  6. Here is an interesting print ad promoting Cadbury's Gems. 7. An ad promoting camel oil pastels. Camel oil pastels allow Raju to paint without the fe...

Using A Custom BigRock Domain For Your Tumblr Blog

This post features a tutorial on how to set up custom "BIGROCK" domains for your 'TUMBLR' Blogs.  I begin with an assumption that you have already purchased your domain from Bigrock.