SEO Guide
9 min readRobots.txt Guide: How to Use It for SEO (With Examples)
A simple text file sits at your site's root and controls how search engines crawl it. Learn what robots.txt does, how to write one correctly, and which common mistakes destroy your SEO.
What is robots.txt
Robots.txt is a plain text file positioned at your website's root that communicates instructions to search engine crawlers about which pages they can and cannot access. Every website can have one, and it's always located at yoursite.com/robots.txt.
When a crawler like Googlebot visits your site, it checks this file first. The rules inside tell the bot which directories and pages to skip, which ones to visit, and where to find your sitemap.
It's important to understand that robots.txt is advisory, not a hard block. Well-behaved bots like Google and Bing respect it, but malicious bots may ignore it entirely. For true access prevention, you need server-level authentication.
Robots.txt is part of technical SEO, which covers the behind-the-scenes mechanics that help search engines crawl and understand your site effectively. For a complete overview of how technical SEO fits into a broader strategy, see the full SEO guide.
Why robots.txt matters for SEO
- Preserves crawl budget allocation. Search engines have a finite number of pages they'll visit per session. Robots.txt helps you focus that budget on pages that matter most.
- Keeps private pages out of results. Prevent admin interfaces, staging environments, and internal tools from appearing in search results.
- Eliminates duplicate content problems. Block parameter variations, sorted versions, and filtered copies of the same page that waste crawl budget and confuse Google.
- Protects server resources. Prevent aggressive bots from overwhelming your infrastructure by blocking them outright.
- Directs crawlers to your sitemap. The Sitemap directive tells search engines exactly where to find your XML sitemap immediately.
Robots.txt doesn't remove pages from Google. If a page is already indexed and you want it removed, you need a noindex meta tag or a removal request in Search Console.
How robots.txt works
A robots.txt file consists of simple directives, each on its own line. The four most critical directives are User-agent, Disallow, Allow, and Sitemap.
User-agent
Specifies which crawler the rules apply to. Use asterisk for all bots, or specific names like Googlebot.
Disallow
Tells the bot not to visit a specific path. Disallow: /admin/ prevents crawling everything under the /admin/ directory.
Allow
Overrides a Disallow for a specific path. Useful when you block a directory but want one page inside it crawlable.
Sitemap
Points crawlers to your XML sitemap. Place this at robots.txt's bottom so every bot finds it.
A typical robots.txt file looks like this: User-agent: asterisk, followed by Disallow: /admin/, Disallow: /staging/, Allow: /, and Sitemap: https://yoursite.com/sitemap.xml. Each rule goes on its own line.
Common robots.txt rules with examples
Here are the most practical robots.txt rules you'll use on real sites. Each example shows the directives and what they accomplish.
- Block all crawlers from a directory. User-agent: asterisk and Disallow: /private/ prevents all bots from accessing anything under /private/.
- Allow all crawlers everywhere. User-agent: asterisk and Disallow: (empty) is the most permissive setting. Every page is crawlable.
- Block a specific bot. User-agent: AhrefsBot and Disallow: / blocks the Ahrefs crawler from your entire site while letting other bots through.
- Block a single page. User-agent: asterisk and Disallow: /thank-you prevents crawlers from accessing a specific page.
- Allow one page in a blocked directory. User-agent: asterisk and Disallow: /docs/ then Allow: /docs/public blocks the docs directory except for the public page.
- Point to your sitemap. Sitemap: https://yoursite.com/sitemap.xml goes at robots.txt's bottom, outside any User-agent block.
Always test your robots.txt changes before deploying. A single misplaced rule can accidentally block your entire site from Google.
Common mistakes to avoid
- Blocking your entire site. Disallow: / blocks everything. This is the most common and most damaging mistake, especially on staging sites that accidentally go live.
- Blocking CSS and JavaScript. Google needs to render your pages. Blocking CSS and JS files prevents Google from understanding your page layout and content.
- Using robots.txt to hide sensitive content. Robots.txt is publicly accessible. Anyone can read it. Never use it to hide passwords, API keys, or confidential pages.
- Forgetting to update after a redesign. Old robots.txt rules may block new URL patterns or allow paths that no longer exist.
- Not including a Sitemap directive. It takes one line and helps every search engine find your sitemap instantly.
Robots.txt is public. Don't list URLs in Disallow that you want to keep secret. You're actually advertising their existence.
How to create and test robots.txt
Setting up a robots.txt file takes just a few minutes. Follow these steps to get it right the first time.
Create the file
Use any text editor. Save it as robots.txt with no file extension changes. Keep it plain text, no HTML.
Add your rules
Start with User-agent: asterisk for rules that apply to all bots. Add Disallow lines for paths you want to block. Add your Sitemap line at the bottom.
Upload to your root directory
The file must be at yoursite.com/robots.txt. It won't work in subdirectories.
Test in Search Console
Use Google Search Console's robots.txt tester to verify your rules work as expected before relying on them.
Monitor regularly
Check your robots.txt after site migrations, redesigns, or CMS updates. Automated changes can overwrite your custom rules.
How Rank SEO helps with technical SEO
Validate your live robots.txt with our free robots.txt checker before you ship changes — it parses your file, simulates Googlebot access rules, and flags directives that could block important pages.
Managing robots.txt manually works for simple sites, but as your site grows, misconfigurations become easy to miss. Automated auditing catches problems before they hurt your rankings.
- Rank SEO's technical SEO features automatically audit your site for crawlability issues, including robots.txt misconfigurations.
- Detects pages accidentally blocked from crawlers
- Monitors robots.txt changes and alerts you to problems
- Checks that your sitemap is properly referenced
Explore all Rank SEO features or check out pricing to get started with automated technical SEO audits today.
Frequently Asked Questions
Robots.txt tells search engine crawlers which pages on your site they can and cannot access. It helps you control crawl budget, prevent indexing of private pages, and direct bots to your sitemap.
Robots.txt can prevent Google from crawling a page, but if other sites link to it, Google may still index the URL without crawling the content. To fully prevent indexing, use a noindex meta tag instead.
Yes. It helps search engines spend their crawl budget on your most important pages and prevents them from wasting time on admin pages, duplicates, or staging content.
Errors can accidentally block important pages from being crawled and indexed. In the worst case, a single wrong rule like Disallow: / can remove your entire site from search results.
Use Google Search Console's robots.txt tester to identify which rules are blocking which pages. Fix the rules in your file, re-upload, and request a re-crawl of affected pages.
Even small sites benefit from a basic robots.txt file. At minimum, include a Sitemap directive so search engines can find your sitemap easily.