Bots - Allow AI Crawlers to Index Your Site

Learn how to ensure AI search bots from ChatGPT, Claude, Perplexity and others can crawl your site. Check robots.txt and firewall settings to avoid accidentally blocking AI indexing.

Michael Buckbee
← Back to B.I.S.C.U.I.T. Framework Next: Indexing →

Problem

Some people have a knee-jerk reaction to AI bots crawling their site, believing that AI bots or their responsible companies are “stealing” your content and, therefore, should be blocked.

But there are many different types of “content” in the world.

Steven King’s content is NYT bestsellers like “IT”, and my content is literally this article, which, much to my dismay, is unlikely to be made into a movie with one of the Skaargard boys kitted out in scary clown makeup.

There’s a fine line between distribution and theft - but services that amplify the marketing messages you’re putting out in the world and provide you with greater reach and visibility aren’t ripping you off.

Action

Since I do want AI search services to index my site, rewrite my copy, and present it as an answer to questions that prospects are looking for, I need to make sure I’m not accidentally blocking any AI bots from crawling my site.

There are two ways you might be doing this:

Passively

Passively (aka “asking nicely”) with your sites ‘robots.txt’ file

You can check any website for robots.txt issues by appending that to their domain and looking at the result. <your-domain.com>/robots.txt

Actively

Actively with a website firewall or some type of security plugin.

Checking for active blocks is more difficult to do manually, which is why we developed an AI Search Console (accessible via the Knowatoa dashboard), which will check if any of the 24 different AI bots we monitor aren’t able to successfully reach your site.

If you’d rather manually check them, you could use an extension like User Agent Switcher and manually use the following user agents


AI2Bot
Mozilla/5.0 (compatible; AI2Bot/1.0; +http://www.allenai.org/crawler)

Amazonbot
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML\, like Gecko)
Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

Anthropic AI Bot
Mozilla/5.0 (compatible; anthropic-ai/1.0; +http://www.anthropic.com/bot.html)

Claude Web
Mozilla/5.0 (compatible; claude-web/1.0; +http://www.anthropic.com/bot.html)

ClaudeBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible;
ClaudeBot/1.0; +claudebot@anthropic.com)

Applebot-Extended
Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html)

Applebot
Mozilla/5.0 (compatible; Applebot/1.0; +http://www.apple.com/bot.html)

BingBot
Mozilla/5.0 (compatible; BingBot/1.0; +http://www.bing.com/bot.html)

Bytespider
Mozilla/5.0 (compatible; Bytespider/1.0; +http://www.bytedance.com/bot.html)

GPTBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible;
GPTBot/1.1; +https://openai.com/gptbot

ChatGPT-User
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible;
ChatGPT-User/1.0; +https://openai.com/bot

OAI-SearchBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible;
OAI-SearchBot/1.0; +https://openai.com/searchbot

CCBot
Mozilla/5.0 (compatible; CCBot/1.0; +http://www.commoncrawl.org/bot.html)

DuckAssistBot
Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html)

Google-Extended
Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)

LinkedInBot
LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta
Commons-HttpClient/3.1 +http://www.linkedin.com)

Meta External Fetcher
Mozilla/5.0 (compatible; meta-externalagent/1.1
(+https://developers.facebook.com/docs/sharing/webmasters/crawler))

FacebookBot
Mozilla/5.0 (compatible; FacebookBot/1.0; +http://www.facebook.com/bot.html)

Omgili Bot
Mozilla/5.0 (compatible; omgili/1.0; +http://www.omgili.com/bot.html)

PerplexityBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible;
PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)

YouBot
Mozilla/5.0 (compatible; YouBot (+http://www.you.com))

Cohere AI
Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)

Timpi
Timpibot/0.8 (+http://www.timpi.io)

DiffBot
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729
Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)

Key Takeaways

  • AI bots aren’t stealing your content—they’re amplifying your marketing reach
  • Check both passive blocks (robots.txt) and active blocks (firewalls/security plugins)
  • Use Knowatoa’s AI Search Console to automatically monitor bot access
  • Allow all legitimate AI bots to crawl your site for maximum visibility

Test Your Site with AI Search Console

Want to know which AI bots can access your site right now? Use Knowatoa’s AI Search Console to automatically test all 24 AI bots and identify any blocks.

Check Your AI Bot Access →


Next Step: Learn how to establish your brand as a distinct entity in AI search services → Continue to Indexing

← Back to Framework Overview