robots.txt for AI Crawlers: GPTBot, ClaudeBot, PerplexityBot

Updated: May 13, 2026

AI search engines use different crawlers than their training bots. OAI-SearchBot (ChatGPT Search) is separate from GPTBot (training). Blocking GPTBot to prevent training does not automatically allow OAI-SearchBot for search. Allow GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended in robots.txt to be indexed by AI search.

robots.txt for AI Crawlers

AI search engines use crawlers that are distinct from their training bots — and many sites block them accidentally.

The Crawlers You Need to Know

BotCompanyPurpose
GPTBotOpenAIContent training (not search)
OAI-SearchBotOpenAIChatGPT Search real-time indexing
PerplexityBotPerplexitySearch indexing
ClaudeBotAnthropicContent indexing
Google-ExtendedGoogleAI Overviews training and search
Applebot-ExtendedAppleApple Intelligence

Critical distinction: GPTBot is for OpenAI training data. OAI-SearchBot is for ChatGPT Search real-time indexing. If you block GPTBot to prevent training data collection but don’t explicitly allow OAI-SearchBot, your site will be invisible in ChatGPT Search answers.

The Common Mistake

Many sites use this pattern to block AI training:

User-agent: GPTBot
Disallow: /

This blocks OpenAI training — but OAI-SearchBot follows the same blanket rules unless specified separately. Result: invisible in ChatGPT Search.

If you want to be indexed by AI search but not used for training:

# Block training bots
User-agent: GPTBot
Disallow: /

# Allow search indexing bots explicitly
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

If you want maximum AI search visibility (allow both training and search):

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

Blocking Specific Sections

You can allow AI search indexing while protecting private content:

User-agent: OAI-SearchBot
Allow: /blog/
Allow: /docs/
Disallow: /admin/
Disallow: /user/
Disallow: /checkout/

Meta Tag Alternative

For page-level control, use the noai meta tag:

<meta name="robots" content="noai, noimageai">

This is useful for pages you want to rank in Google but not be used in AI-generated answers.

Verification

After updating robots.txt:

  1. Check your live robots.txt at your-domain.com/robots.txt
  2. Use Google Search Console → robots.txt Tester for Google-Extended
  3. Wait 24-48 hours for crawlers to re-check
  4. Test visibility in Perplexity by searching for unique phrases from your content