robots.txt for AI Crawlers: GPTBot, ClaudeBot, PerplexityBot
Updated: May 13, 2026
AI search engines use different crawlers than their training bots. OAI-SearchBot (ChatGPT Search) is separate from GPTBot (training). Blocking GPTBot to prevent training does not automatically allow OAI-SearchBot for search. Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in robots.txt to be indexed by AI search.
robots.txt for AI Crawlers
AI search engines use crawlers that are distinct from their training bots — and many sites block them accidentally.
The Crawlers You Need to Know
| Bot | Company | Purpose |
|---|---|---|
GPTBot | OpenAI | Content training (not search) |
OAI-SearchBot | OpenAI | ChatGPT Search real-time indexing |
PerplexityBot | Perplexity | Search indexing |
ClaudeBot | Anthropic | Content indexing |
Google-Extended | AI Overviews training and search | |
Applebot-Extended | Apple | Apple Intelligence |
Critical distinction: GPTBot is for OpenAI training data. OAI-SearchBot is for ChatGPT Search real-time indexing. If you block GPTBot to prevent training data collection but don’t explicitly allow OAI-SearchBot, your site will be invisible in ChatGPT Search answers.
The Common Mistake
Many sites use this pattern to block AI training:
User-agent: GPTBot
Disallow: /
This blocks OpenAI training — but OAI-SearchBot follows the same blanket rules unless specified separately. Result: invisible in ChatGPT Search.
Recommended Configuration
If you want to be indexed by AI search but not used for training:
# Block training bots
User-agent: GPTBot
Disallow: /
# Allow search indexing bots explicitly
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
If you want maximum AI search visibility (allow both training and search):
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
Blocking Specific Sections
You can allow AI search indexing while protecting private content:
User-agent: OAI-SearchBot
Allow: /blog/
Allow: /docs/
Disallow: /admin/
Disallow: /user/
Disallow: /checkout/
Meta Tag Alternative
For page-level control, use the noai meta tag:
<meta name="robots" content="noai, noimageai">
This is useful for pages you want to rank in Google but not be used in AI-generated answers.
Verification
After updating robots.txt:
- Check your live robots.txt at
your-domain.com/robots.txt - Use Google Search Console → robots.txt Tester for Google-Extended
- Wait 24-48 hours for crawlers to re-check
- Test visibility in Perplexity by searching for unique phrases from your content