Social Networking
Creating sitemaps and robots.txt files
Being unable to fall asleep again after waking up in the middle of the night is the perfect time to [insert your favorite activity here]. This time I decided to learn more about webcrawlers and search indexing. More specifically, Sitemaps and Robots.txt files, simple files that help make sure crawlers find what they're looking for — or don't find what they shouldn't. Here are a few sites that talk about this.

The robots.txt file is the greeting card for webcrawlers. When an indexing robot, say, from Google, Yahoo or some other search engine visits your website — they start by looking around for the robots.txt file. It's just a simple text file that can contain instructions for the robot — what to index, what not to index, etc... After looking at a few sites claiming that it never hurt to have it — I decided to find out the do's and don'ts of robots filemaking. Here is what I inserted into the robots.txt file I just created and dropped into my website's root directory:

User-agent: *


The User-agent part is for specifying what kind of robot the instructions are for (Googlebot, for example), the asterisk (*) means 'anyone'. Disallow is where you would specify a certain directory or file that the bots aren't allowed to index (if you don't want a certain file to be found). Example: Disallow: /mysecretstuff/. As I wanted everyone to index everything, I didn't put anything there. Simple enough, eh? Next time a robot comes around, he'll see the text file there and be happy — kind of like leaving cookies for santa.

Now, a Sitemap is an XML file containing a description of all the pages on your website. Again, this is for the robots — to help them index everything on your site and make sure they don't miss anything important. The file also contains some simple properties for your page, such as the update frequency (tells the webcrawlers how often they should visit), importance of particular webpages and so on. If you own a site — the many recommend a sitemap. Check out the links below to generate a map of your own site — then upload it to Google through the Webmaster tools.

Please note: I strongly recommend not using Internet Explorer to view this page.