Default robots.txt File For Web-Server

How do I create a default robots.txt file for the Apache web-server running on Linux/Unix/MS-Windows server?

Tutorial details
Difficulty level Easy
Root privileges No
Requirements None
Est. reading time 5m
Web Spiders, (also known as Robots), are WWW search engines that “crawl” across the Internet and index pages on Web servers. The robots.txt file help webmasters or site owners to prevent web crawlers (robots) from accessing all or part of a website. Web site owners use the robots.txt file to give instructions about their site to web robots using the Robots Exclusion Protocol.

robots.txt File Syntax and Rules

The robots.txt file uses basic rules as follows:

  1. User-agent: The robot the following rule applies to
  2. Disallow: The URL you want to block.
  3. Allow: The URL you want to allow.

Examples: The default robots.txt

To block all robots from the entire server create or upload robots.txt file as follows:

User-agent: *
Disallow: /

Above two lines are considered a single entry in the file. To allow all robots complete access to the entire server create or upload robots.txt file as follows:

User-agent: *


User-agent: *

Please note that User-agent: * means match “any robot”. You can include as many entries as you want. You can include multiple Disallow or Allow lines and multiple user-agents in one entry. The following example tells robots to stay away from /foo/bar.php file

User-agent: *
Disallow: /foo/bar.php

In this example, you instructs all robots not to enter in /cgi-bin/ and /print/ directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /print/

This example tells a specific robot called fooBar to stay away from your web-site. fooBar is the name of the actual user-agent of the bot. Feel free to replace ‘fooBar’ with the actual user-agent of the bot:

User-agent: fooBar
Disallow: /

To block files of a specific file type say all *.png image files, use the following syntax for googlebot:

User-agent: Googlebot
Disallow: /*.png$

The following example disallows a Robot named “fooBar” from the paths “/cgi-bin/” and “/pdfs/”:

# Tell "fooBar" where it can't go
User-agent: fooBar
Disallow: /cgi-bin/
Disallow: /pdfs/
# Allow all other robots to browse everywhere
User-agent: *

In this example, I am only allowing a Web Spider named “googlebot” into a site, while denying all other Spiders:

# Allow "googlebot" in the site
User-agent: Googlebot
# Deny all other spiders
User-agent: *
Disallow: /

How do I create a robots.txt file on my server?

Please note that a robots.txt file is a special text file and it is always located in your Web server’s root directory. It should be noted that Web Robots are not required to respect robots.txt files, but most well-written Web Spiders follow the rules you define. You can create robots.txt on your system and upload it using ftp client.

You can login to your server using ssh command and use a text editor such as vi to create a robots.txt file. In this example, I am login to server called and creating the file at /var/www/html directory from OS X or Linux/Unix based desktop system. MS-Windows user try putty ssh client:
cd /var/www/html
vi robots.txt

Sample robots.txt file

Sample robots.txt file from

#Allow Google Media Partners bot
User-agent: Mediapartners-Google
#Block the bad bots
User-agent: ia_archiver
Disallow: /
User-agent: VoilaBot
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: BecomeJPBot
Disallow: /
User-agent: Exabot
Disallow: /
User-agent: 008
Disallow: /	
User-agent: Sosospider
Disallow: /
#Block specific urls and directories for all bots
User-agent: *
Disallow: /low.html
Disallow: /lib/
Disallow: /rd/
Disallow: /tools/
Disallow: /tmp/
Disallow: /*?
Disallow: /view/pdf/faq/*.php 
Disallow: /view/pdf/tips/*.php 
Disallow: /view/pdf/cms/*.php

🐧 Get the latest tutorials on Linux, Open Source & DevOps via RSS feed or Weekly email newsletter.

🐧 1 comment so far... add one

CategoryList of Unix and Linux commands
Disk space analyzersdf duf ncdu pydf
File Managementcat cp mkdir tree
FirewallAlpine Awall CentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Modern utilitiesbat exa
Network UtilitiesNetHogs dig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg glances gtop jobs killall kill pidof pstree pwdx time vtop
Searchingag grep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNAlpine CentOS 8 Debian 10 Firewall Ubuntu 20.04
1 comment… add one
  • Sriharsha Apr 9, 2014 @ 7:17

    Hi frendzz,,, Just i tried to create rebots.txt ,,, here i did’t find the html directory under /var/www/… can we create this file any other location…… ? can u send me the minmum steps to understand easily pls,,,,

Leave a Reply

Your email address will not be published.

Use HTML <pre>...</pre> for code samples. Still have questions? Post it on our forum