Robots.txt File: The Complete Guide for SEO in 2026
A misconfigured robots.txt file can block your entire site from Google. This guide explains every directive, common mistakes, and how to test your robots.txt correctly.
A single misconfiguration in your robots.txt can block all of Google's crawlers from your site โ wiping your rankings overnight. It's a small file with enormous consequences. Here's everything you need to know.
What Is robots.txt?
robots.txt is a plain text file placed at the root of your domain (example.com/robots.txt). It uses the Robots Exclusion Protocol to tell web crawlers which pages they can and cannot access. Google respects it; malicious crawlers often ignore it.
Syntax and Directives
# Allow all crawlers everywhere User-agent: * Allow: / # Block all crawlers from /admin User-agent: * Disallow: /admin/ # Block Googlebot only from /staging User-agent: Googlebot Disallow: /staging/ # Block everyone from everything (DANGEROUS) User-agent: * Disallow: / # Link to sitemap Sitemap: https://example.com/sitemap.xml
What to Block and What Not To
- Block: /admin, /cart, /checkout, /search, /login, staging URLs, duplicate content pages
- Never block: CSS, JS files (Google needs them to render pages), your sitemap, or accidentally your entire site
robots.txt vs noindex
These are different tools with different behaviors:
- Disallow in robots.txt: Prevents crawling. Google won't visit the page, but may still index it if other pages link to it.
- noindex meta tag: Page can be crawled, but won't appear in search results. For pages you want de-indexed, this is the right tool.
Testing robots.txt
Use Google Search Console โ Crawl โ robots.txt Tester to check if your rules are correct before publishing. Always test changes in a staging environment first.