SEO Guide

9 min read

Robots.txt Guide: How to Use It for SEO (With Examples)

Control how search engines crawl your site with a simple text file. Learn what robots.txt does, how to write one, and how to avoid the most common mistakes.

What is robots.txt

Robots.txt is a plain text file that sits at the root of your website and tells search engine crawlers which pages they can and cannot access. Every website can have one, and it is always located at yoursite.com/robots.txt.

When a crawler like Googlebot visits your site, it checks this file first. The rules inside tell the bot which directories and pages to skip, which ones to crawl, and where to find your sitemap.

It is important to understand that robots.txt is a suggestion, not a hard block. Well-behaved bots like Google and Bing respect it, but malicious bots may ignore it entirely. If you need to truly prevent access to a page, you need server-level authentication.

Robots.txt is part of technical SEO, which covers the behind-the-scenes work that helps search engines understand and crawl your site effectively. For a complete overview of how technical SEO fits into a broader strategy, see the full SEO guide.

Why robots.txt matters for SEO

  • Controls crawl budget. Search engines have a limited number of pages they will crawl per visit. Robots.txt helps focus that budget on pages that matter.
  • Prevents indexing of private pages. Keep admin panels, staging environments, and internal tools out of search results.
  • Avoids duplicate content issues. Block filtered or sorted versions of the same page that add no SEO value.
  • Protects server resources. Prevent aggressive bots from overloading your server by blocking them entirely.
  • Directs crawlers to your sitemap. The Sitemap directive tells search engines exactly where to find your XML sitemap.

Robots.txt does not remove pages from Google. If a page is already indexed and you want it removed, you need a noindex meta tag or a removal request in Search Console.

How robots.txt works

A robots.txt file is made up of simple directives, each on its own line. The four most common directives are User-agent, Disallow, Allow, and Sitemap.

1

User-agent

Specifies which crawler the rules apply to. Use * for all bots, or a specific name like Googlebot.

2

Disallow

Tells the bot not to crawl a specific path. Disallow: /admin/ blocks everything under the /admin/ directory.

3

Allow

Overrides a Disallow for a specific path. Useful when you block a directory but want one page inside it to be crawlable.

4

Sitemap

Points crawlers to your XML sitemap. Place this at the bottom of robots.txt so every bot can find it.

A typical robots.txt file looks like this: User-agent: * followed by Disallow: /admin/, Disallow: /staging/, Allow: /, and Sitemap: https://yoursite.com/sitemap.xml. Each rule goes on its own line.

Common robots.txt rules with examples

Here are the most practical robots.txt rules you will use on real sites. Each example shows the directives and what they do.

  • Block all crawlers from a directory. User-agent: * and Disallow: /private/ — prevents all bots from accessing anything under /private/.
  • Allow all crawlers everywhere. User-agent: * and Disallow: (empty) — this is the most permissive setting. Every page is crawlable.
  • Block a specific bot. User-agent: AhrefsBot and Disallow: / — blocks the Ahrefs crawler from your entire site while letting other bots through.
  • Block a single page. User-agent: * and Disallow: /thank-you — prevents crawlers from accessing a specific page.
  • Allow one page in a blocked directory. User-agent: * and Disallow: /docs/ then Allow: /docs/public — blocks the docs directory except for the public page.
  • Point to your sitemap. Sitemap: https://yoursite.com/sitemap.xml — goes at the bottom of the file, outside any User-agent block.

Always test your robots.txt changes before deploying. A single misplaced rule can accidentally block your entire site from Google.

Common mistakes to avoid

  • Blocking your entire site. Disallow: / blocks everything. This is the most common and most damaging mistake, especially on staging sites that accidentally go live.
  • Blocking CSS and JavaScript. Google needs to render your pages. Blocking CSS/JS files prevents Google from understanding your page layout and content.
  • Using robots.txt to hide sensitive content. Robots.txt is publicly accessible. Anyone can read it. Never use it to hide passwords, API keys, or confidential pages.
  • Forgetting to update after a redesign. Old robots.txt rules may block new URL patterns or allow paths that no longer exist.
  • Not including a Sitemap directive. It takes one line and helps every search engine find your sitemap instantly.

Robots.txt is public. Do not list URLs in Disallow that you want to keep secret — you are actually advertising their existence.

How to create and test robots.txt

Setting up a robots.txt file takes just a few minutes. Follow these steps to get it right the first time.

1

Create the file

Use any text editor. Save it as robots.txt with no file extension changes. Keep it plain text, no HTML.

2

Add your rules

Start with User-agent: * for rules that apply to all bots. Add Disallow lines for paths you want to block. Add your Sitemap line at the bottom.

3

Upload to your root directory

The file must be at yoursite.com/robots.txt. It will not work in subdirectories.

4

Test in Search Console

Use Google Search Console's robots.txt tester to verify your rules work as expected before relying on them.

5

Monitor regularly

Check your robots.txt after site migrations, redesigns, or CMS updates. Automated changes can overwrite your custom rules.

How RankSEO helps with technical SEO

Managing robots.txt manually works for simple sites, but as your site grows, it becomes easy to miss misconfigurations. Automated auditing catches problems before they hurt your rankings.

  • RankSEO's technical SEO features automatically audit your site for crawlability issues, including robots.txt misconfigurations.
  • Detects pages accidentally blocked from crawlers
  • Monitors robots.txt changes and alerts you to problems
  • Checks that your sitemap is properly referenced

Explore all RankSEO features or check out pricing to get started with automated technical SEO audits today.

Frequently Asked Questions

Robots.txt tells search engine crawlers which pages on your site they can and cannot access. It helps you control crawl budget, prevent indexing of private pages, and direct bots to your sitemap.

Robots.txt can prevent Google from crawling a page, but if other sites link to it, Google may still index the URL without crawling the content. To fully prevent indexing, use a noindex meta tag instead.

Yes. It helps search engines spend their crawl budget on your most important pages and prevents them from wasting time on admin pages, duplicates, or staging content.

Errors can accidentally block important pages from being crawled and indexed. In the worst case, a single wrong rule like Disallow: / can remove your entire site from search results.

Use Google Search Console's robots.txt tester to identify which rules are blocking which pages. Fix the rules in your file, re-upload, and request a re-crawl of affected pages.

Even small sites benefit from a basic robots.txt file. At minimum, include a Sitemap directive so search engines can find your sitemap easily.