What is Robots.txt Generator?
Generate a robots.txt file and block AI training bots (GPTBot, ClaudeBot, PerplexityBot) with one toggle. Control crawl rules for 15 AI bots. Free.
Robots.txt runs entirely in your browser using JavaScript (browser). Your data never leaves your device.
Free Robots.txt Generator
Create a complete robots.txt file in seconds. Lead with the "Block AI training bots" master toggle to add Disallow rules for GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, CCBot, and 8 more AI crawlers with one click. Add per-bot toggles for granular control. Then set standard robots.txt rules: per-User-agent Disallow/Allow paths, Crawl-delay, and a Sitemap reference. Includes a "What does each AI bot do?" panel explaining every bot's purpose. Output is a copyable plain-text robots.txt block. 100% browser-based.
🤖 Block AI Training Bots
Add Disallow: / rules for all known AI training crawlers
Standard Crawl Rules
robots.txt Output
User-agent: * Disallow: /private/
Frequently Asked Questions
What is a robots.txt file?+
A robots.txt file is a plain text file placed at the root of your domain (e.g., yoursite.com/robots.txt) that instructs web crawlers which pages they are allowed or not allowed to access. It uses a simple format: User-agent specifies which crawler the rule applies to, and Disallow specifies which paths that crawler should not access. It is not a security mechanism — it is a voluntary guideline that well-behaved crawlers are expected to follow.
Will blocking GPTBot stop ChatGPT from citing my content?+
Blocking GPTBot prevents OpenAI from using your content in future training data crawls. However, if your content was already crawled and incorporated into GPT training before you added the block, it will not be retroactively removed. Additionally, blocking GPTBot does not prevent ChatGPT from accessing your content via its browsing tool (which respects robots.txt separately, using the ChatGPT-User agent). To block both training and browsing, block both GPTBot and ChatGPT-User.
What is the difference between AI training bots and AI answer-engine bots?+
Training bots (GPTBot, CCBot, Bytespider) crawl the web to collect data for training large language models. Answer-engine bots (like Perplexity's PerplexityBot) crawl content to generate answers in real time in their search interfaces. Some bots do both. Google-Extended is specifically for Google's generative AI products (Bard/Gemini) and is separate from Googlebot, which handles regular search indexing — so blocking Google-Extended does not affect your regular Google search rankings.
Do AI bots respect robots.txt?+
Major AI companies have publicly committed to respecting robots.txt. OpenAI (GPTBot), Anthropic (ClaudeBot), Perplexity (PerplexityBot), and Google (Google-Extended) all document that their bots honor Disallow rules. However, compliance is voluntary and cannot be technically enforced. Less scrupulous scrapers may ignore robots.txt entirely. For stronger protection, consider HTTP-level access controls or terms-of-service enforcement.
What is Google-Extended?+
Google-Extended is a separate user agent token that controls whether Google can use your content to train and improve Bard and Gemini AI models. It is completely independent from Googlebot — blocking Google-Extended does NOT affect how Googlebot crawls and indexes your site for regular Google Search. You can block Google-Extended to opt out of AI training while still being fully indexed in Google Search.
Can blocking AI bots hurt my SEO?+
Blocking AI training bots (GPTBot, CCBot, ClaudeBot, etc.) does NOT affect your regular search engine rankings. These bots are separate from Googlebot and Bingbot, which handle search indexing. However, if you block Perplexitybot and Perplexity cites results in its answer engine, you may get less referral traffic from AI search interfaces. Blocking Google-Extended specifically does not affect your ranking in Google Search.
Where do I place my robots.txt file?+
Your robots.txt file must be placed at the root of your domain — exactly at yourdomain.com/robots.txt. It cannot be in a subdirectory. For Next.js, place it in the /public folder as public/robots.txt, or generate it dynamically via app/robots.ts in the App Router. For WordPress, it is automatically accessible at yoursite.com/robots.txt — you can edit it via Yoast SEO or via the WordPress SEO settings in some hosting control panels.
How do I test if my robots.txt is working?+
Use Google Search Console's robots.txt Tester tool (under Settings → robots.txt) to validate your file and test specific URLs against your rules. You can also fetch the file directly in a browser at yourdomain.com/robots.txt. To verify that a specific bot is being blocked, check your server access logs for that bot's User-agent string. For the robots.txt syntax, paste it into this tool and confirm the output matches your intended rules.
You might also like
Browse all 9 Marketing & SEO tools →SERP Snippet Preview
Preview exactly how your page looks in Google search results
Meta Description Generator
Generate SEO-optimized meta descriptions that boost click-through rate
UTM Builder
Build UTM tracking URLs with channel presets, bulk generation, and campaign history
Robots.txt and AI Bots: What Every Website Owner Needs to Know
The robots.txt file has been a cornerstone of web crawling etiquette since 1994. For most of its history, it was a straightforward document: tell Googlebot and Bingbot which pages to crawl, and provide a sitemap reference. In 2023 and 2024, a new category of crawler emerged — AI training and answer-engine bots — fundamentally changing why and how website owners manage robots.txt files.
The difference between AI training bots and search bots
Traditional search bots (Googlebot, Bingbot) crawl your content to index it for search results. Blocking them harms your search visibility. AI training bots (GPTBot, CCBot, Bytespider) crawl your content to collect training data for large language models — not to send you search traffic. Blocking these bots does not affect your search rankings. A third category — AI answer-engine bots (PerplexityBot, Google-Extended) — crawl content to generate real-time answers in AI-powered search interfaces, which may or may not send you referral traffic depending on the platform.
Google-Extended: opt out of Gemini training without affecting Search
Google-Extended is a distinct user agent token that controls whether Google can use your content to improve Bard and Gemini AI models. It is completely independent of Googlebot. Adding User-agent: Google-Extended / Disallow: / to your robots.txt opts you out of Gemini training while leaving your regular Google Search indexing entirely unaffected. This is one of the clearest opt-out mechanisms available from a major AI company and is straightforward to implement.
Does blocking AI bots actually work?
Major AI companies — OpenAI, Anthropic, Google, Perplexity, and Apple — have publicly stated that their crawlers respect robots.txt Disallow rules. Independent researchers have generally confirmed compliance from the major players. However, robots.txt is not technically enforceable. Less reputable scrapers may ignore it. For truly sensitive content (proprietary data, private documents), robots.txt alone is insufficient — use server-side authentication or access controls as well. For most website owners, robots.txt AI blocking is the most practical available opt-out mechanism.
robots.txt syntax reference
A valid robots.txt file uses a simple key-value syntax. Each block starts with one or more User-agent: lines naming the crawler (or * for all crawlers). Disallow: lines specify path prefixes to block — Disallow: / blocks everything, Disallow: /private/ blocks only that directory. Allow: lines can override Disallow rules for specific sub-paths. Crawl-delay: asks the bot to wait N seconds between requests. The Sitemap: directive should point to your XML sitemap at its absolute URL. Blank lines separate User-agent blocks.