Robots.txt File: The Complete Guide for SEO in 2026

A single misconfiguration in your robots.txt can block all of Google's crawlers from your site — wiping your rankings overnight. It's a small file with enormous consequences. Here's everything you need to know.

What Is robots.txt?

robots.txt is a plain text file placed at the root of your domain (example.com/robots.txt). It uses the Robots Exclusion Protocol to tell web crawlers which pages they can and cannot access. Google respects it; malicious crawlers often ignore it.

Syntax and Directives

# Allow all crawlers everywhere
User-agent: *
Allow: /

# Block all crawlers from /admin
User-agent: *
Disallow: /admin/

# Block Googlebot only from /staging
User-agent: Googlebot
Disallow: /staging/

# Block everyone from everything (DANGEROUS)
User-agent: *
Disallow: /

# Link to sitemap
Sitemap: https://example.com/sitemap.xml

What to Block and What Not To

Block: /admin, /cart, /checkout, /search, /login, staging URLs, duplicate content pages
Never block: CSS, JS files (Google needs them to render pages), your sitemap, or accidentally your entire site

robots.txt vs noindex

These are different tools with different behaviors:

Disallow in robots.txt: Prevents crawling. Google won't visit the page, but may still index it if other pages link to it.
noindex meta tag: Page can be crawled, but won't appear in search results. For pages you want de-indexed, this is the right tool.

Testing robots.txt

Use Google Search Console → Crawl → robots.txt Tester to check if your rules are correct before publishing. Always test changes in a staging environment first.

What Is robots.txt?

Syntax and Directives

What to Block and What Not To

robots.txt vs noindex

Testing robots.txt

More Articles