Skip to main content

Understanding Robot.txt Files For Blogspot Blogs

We begin our discussion by understanding what robot.txt files are all about and how they influence search crawlers. Robot.txt files (as set up by webmasters) are used by search engine spiders to determine which pages of a particular domain should be indexed by it and which pages should be left out of the index.

Blogspot by default restricts search engines from indexing Label pages of Blogspot blogs. This is done with the help of robot.txt files. 

Blogspot Labels

Labels help bloggers categorize their posts easily. For example all my posts related to 'Blogging' can be found here :
http://www.inkjam.org/search/label/Blogging

An interesting thing to note here is that fact that a Blogspot blog does not have separate page assigned for its 'Labels'. In fact, post categorized under a label can be accessed only with a 'search' command. 

Robots.txt for Blogspot Blogs 

As stated earlier, Blogspot by default prevents web spiders from crawling its Label pages. This is how the Robots.txt files of a blogspot blog (by default) looks like :
User-agent: *
Disallow: /search
Allow: /

Sitemap: http://abc.blogspot.com/feeds/posts/default?orderby=updated 
The "User-agent: *" command implies that this section applies to all robots. 
The "Disallow: /search" command tells the web spiders not to index pages that use the search command.

Note that the slash (/) stands for your homepage URL. The use of the above robot.txt files can be better explained with an example. Let us imagine that your homepage URL is http://abc.blogspot.com/

The cumulative effect of the above robot.txt files would result in the search engines indexing all your pages except pages that use the following URL structure :
http://abc.blogspot.com/search/

Thus label pages which can be accessed only using a search command are kept out of search engine index.

To get search engines to index your blogspot blog's label pages the above mentioned robot.txt files can be modified as follows:
User-agent: *
Disallow: 
Allow: /

Sitemap: http://abc.blogspot.com/feeds/posts/default?orderby=updated 
Accessing your Robots.txt files 

Robot.txt files for your blogspot blog can be accessed as follows:

1. Go to 'Settings' and then click on 'Search Preferences'
2. Under 'Crawlers and Indexing' 'Edit' the 'Custom Robot.txt'
3. 'Enable Custom Robots.txt content'to access your Blogspot blog's robot.txt files
Important Note: I would strongly recommend that you do not modify your robots.txt files to allow search engines to index your Label pages as this might lead to duplicate content issues.

Setting up robot.txt files to prevent Search Engines from indexing Blogspot Archive pages 

Archive pages if indexed by search engines can lead to duplicate content worries and push your search rankings down. You can however, set up robot.txt files to prevent search engines from indexing your Blogspot blogs's Archive pages.

Continuing with our example blog - http://abc.blogspot.com

The monthly archive pages for the said blog would look something like this:
Month URL
January, 2013 http://abc.blogspot.com/2013_01_01_archive.html 
February, 2013 http://abc.blogspot.com/2013_02_01_archive.html 
March, 2013 http://abc.blogspot.com/2013_03_01_archive.html 
and so forth...

You can remove Blogspot archive pages by modifying the robots.txt files as follows:
User-agent: *
Disallow: /search
Disallow: /2013_01_01_archive.html 
Disallow: /2013_02_01_archive.html 
Disallow: /2013_03_01_archive.html 
Allow: /

Sitemap: http://www.example.com/feeds/posts/default?orderby=updated
Note that you will have to manually add disallow tags for each of your archive pages in a manner similar to what has been displayed above.

Popular posts from this blog

A Super Funny Joke - The Boy And The Priest

A housewife takes a lover during the day, while her husband is at work. Unknown to her, her 9 year old son was hiding in the closet. Her husband came home unexpectedly, so she hid her lover in the closet. The boy now has company. Boy: Dark in here.. Man: Yes it is. Boy: I have a baseball. Man: That’s nice. Boy: Want to buy it? Man: No, thanks. Boy: My dad’s outside. Man: OK, how much? Boy: $250. In the next few weeks, it happens again that the boy and the mom’s lover are in the closet together. Boy: Dark in here. Man: Yes, it is. Boy: I have a baseball glove. Man: How much? Boy: $750. Man: Fine. A few days later, the father says to the boy, “Grab your glove. Let’s go outside and toss the baseball.” The boy say's, “I can’t. I sold them.” The father asks, “How much did you sell them for?” The son says, “$1,000.” The father says, “That’s terrible to overcharge your friends like that, that is way more than those two things cost. I’m going to

Earn Money By Becoming A BigRock Affiliate

About BIGROCK Affiliate Program The BigRock Affiliate program is an online marketing program where affiliates promote BigRock products to their friends, family, customers etc and are paid attractive commissions for every successful purchase. Once you sign-up, you will receive a URL from BigRock (http://.bigrock.in). This URL will be your website from where you can sell BigRock products and start earning Commissions. In case you already have a Website / Blog, all you need to do is place BigRock banners or text links on your website, blog, facebook/twitter page and drive visitors to your unique URL and earn commissions. What products can you promote? As an affiliate, you can sell all BigRock Products. The Products include: Domain Name Registration Website Hosting Email Hosting Do-it-Yourself Website Builder Tool Build-it-For-Me Professional Web Design Service Digital Certificates. You can Signup by following this direct link :  http://www.bigrock.in/affiliate

Using A Custom BigRock Domain For Your Tumblr Blog

This post features a tutorial on how to set up custom "BIGROCK" domains for your 'TUMBLR' Blogs.  I begin with an assumption that you have already purchased your domain from Bigrock.