Robots.txt Tester

💡 Tip of the Day

Use structured data to enhance search listings.

robots.txt is a plain text file that tells crawlers which parts of your site they should or should not request. It is not a security layer, but it is an important signal for responsible bots. A tester helps you verify whether a given User-Agent can fetch a path based on the longest matching rule. This tool accepts the full file, a URL to check, and a crawler name, then shows the matched group, rule lengths, and the final allow or disallow decision.

Quick start - paste, set, test

Paste your robots.txt content, add a URL from your site, and pick a User-Agent. If your file uses multiple groups, make sure the group you expect to apply includes the User-Agent you entered. Click test. The result lists the group that matched, all rules with their match lengths, and the final decision. Remember that the longest match wins and that an Allow can override a Disallow if they tie on length. Keep a note of edge cases you want to check again after deployment.

Groups and precedence - who do rules apply to

Groups begin with one or more User-agent lines followed by Allow and Disallow lines. A crawler looks for the most specific group that mentions it. If no specific group exists, it falls back to the * group. When two rules match the same path, the rule with the longer match decides. If both rules match with equal length, Allow wins. This model is consistent with the published standard and common crawler behavior, and it prevents surprising outcomes when you add narrow exceptions under a broad block.

Wildcards and anchors - how patterns behave

Two special tokens matter in common practice. * matches any sequence of characters. $ anchors a rule to the end of the path. For example, Disallow: /*.pdf$ blocks exactly paths that end with .pdf, while Disallow: /private/ blocks anything under that folder. Keep rules simple so you can reason about them later. Overusing wildcards often creates unintended blocks that are hard to spot in a large site.

Sitemaps, crawl delay, and notes

robots.txt can include Sitemap lines to point crawlers to your sitemaps. Crawl-delay appears in some historical files, but support is inconsistent and should not be relied on for Google. Use server rate controls and caching if load is a concern. For the definitive overview, Google’s documentation on robots.txt behavior and indexing is the most reliable reference when you implement or troubleshoot Google Search - robots.txt. The formal specification published by the IETF is also worth bookmarking for exact language on matching RFC 9309.

Comparison - CMS plugins vs manual rules

Aspect	CMS plugin	Manual file
Setup speed	Fast	Fast for simple sites
Granularity	Limited by UI	Exact control
Error risk	Lower for basics	Higher without tests
Versioning	App managed	Git friendly

Bullet notes - safe patterns you can trust

Block private and staging paths by prefix rather than file type where possible.
Allow assets like CSS and JS needed for rendering so crawlers can fetch them.
Keep the file small and readable - comments help future maintainers.
Test the most sensitive paths with your intended User-Agent before and after release.

Common pitfalls - avoid silent indexing issues

Blocking crawl to content does not remove it from indexes if the URLs are already known. If removal is the goal, serve a 404 or 410, or use noindex in the page when it is accessible. Blocking assets required for rendering can cause crawlers to misjudge layout or mobile friendliness. Make exceptions for important asset paths. When you mirror production on staging, ensure your staging robots.txt is strict and cannot leak to public search.

Two questions before you ship

First, do your rules express intent in the simplest way possible - clear prefixes and short exceptions rather than overlapping wildcards. Second, if you disabled crawl to sections of the site, are you sure none of those URLs need rendering assets that ride on blocked paths. A five-minute test with this tool can prevent days of diagnosis later.

robots.txt is not a cure-all. It is a polite signpost for crawlers and a small guard against wasted bandwidth. Keep your file clean, lean, and tested. Paired with sitemaps, proper canonical tags, and good server responses, it helps search engines understand how to spend time on your site where it matters most.

Does Disallow remove pages from Google?

No. Disallow only blocks crawling. If a URL is already known from links or sitemaps, it can remain indexed without a snippet. Use noindex or return 404 or 410 to remove pages.

Which group applies if multiple user-agents match?

The most specific user-agent match wins. If none match, the group for * applies. Within a group, the longest matching rule decides.

How do * and $ work in rules?

* matches any sequence of characters and $ anchors the pattern to the end of the path. Use them sparingly so rules remain predictable.

Should I block CSS and JS?

No. Crawlers need key assets to render pages. Blocking them can harm rendering and mobile checks. Allow critical asset paths.

Is crawl-delay supported by Google?

Google ignores crawl-delay. Control crawl rate with Search Console, caching, and server capacity planning instead.

Tool Summary

Frequently Asked Questions

Q: Does Disallow remove pages from Google?

Q: Which group applies if multiple user-agents match?

Q: How do * and $ work in rules?

Q: Should I block CSS and JS?

Q: Is crawl-delay supported by Google?

Robots.txt Tester

Result

Matched rules

💡 Tip of the Day

Quick start - paste, set, test

Groups and precedence - who do rules apply to

Wildcards and anchors - how patterns behave

Sitemaps, crawl delay, and notes

Comparison - CMS plugins vs manual rules

Bullet notes - safe patterns you can trust

Common pitfalls - avoid silent indexing issues

Two questions before you ship

🤔 Did You Know?

Keyword Idea Generator (seed → long tail + intent)

Buyer Persona Landing Page Copy Generator

Sales Commission + Quota Simulator