Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions Public/robots.txt
Original file line number Diff line number Diff line change
@@ -1 +1,25 @@
Sitemap: https://swiftpackageindex.com/sitemap.xml

User-agent: Bytespider
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Applebot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Cohere-ai
Disallow: /

User-agent: Seekr
Disallow: /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I've found a longer list at https://www.cyberciti.biz/web-developer/block-openai-bard-bing-ai-crawler-bots-using-robots-txt-file/

User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Omgilibot
Disallow: /
User-Agent: FacebookBot
Disallow: /
User-Agent: Applebot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Omgili
Disallow: /
User-agent: YouBot
Disallow: /

Not sure how to best manage this going forward. I imagine there are templates that are being crowdsourced.

Also, I've created a draft rule to block them at the Cloudflare level, which I think we should enable:

Screenshot 2024-11-19 at 12 30 11

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and enable the CF block to see if this stop the crawl traffic.

Copy link
Member

@finestructure finestructure Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'll give it until this afternoon. We updated robots.txt at noon yesterday and Google update their view of it every 24h. This might be true for the other crawlers as well so let's see if the initial robots.txt has any effect on its own after 24h.

Loading