Skip to content

Add feature to block AI data scraping bots#990

Merged
dkotter merged 12 commits intodevelopfrom
feature/736
Sep 10, 2025
Merged

Add feature to block AI data scraping bots#990
dkotter merged 12 commits intodevelopfrom
feature/736

Conversation

@dkotter
Copy link
Collaborator

@dkotter dkotter commented Aug 27, 2025

Description of the Change

This PR adds a new General Settings panel under the ClassifAI Registration section that currently has one setting: Block AI Bots.

Block AI Bots setting

If this is turned on, we modify the robots.txt file to block the following bots:

  • Applebot-Extended
  • CCBot
  • ClaudeBot
  • FacebookBot
  • Google-Extended
  • GPTbot
  • Meta-ExternalAgent

In doing some research, these seem to be the most widely used AI data scraping bots, though we can definitely modify this list if needed. I purposely am not blocking any AI search or assistant bots, so even if this setting is turned on, your site can still show in AI search results or AI chat results (like ChatGPT). May be worth a different setting to block those though I imagine most sites would not want those blocked.

Closes #736

How to test the Change

  1. Run npm run build
  2. Go to your robots.txt file and ensure you see no blocks being blocked
  3. Go to the ClassifAI settings page
  4. Click on ClassifAI Registration (may be worth updating this to something else now, General Settings maybe?)
  5. Find the new Block AI Bots setting and toggle it on then save settings
  6. Go to your robots.txt file and ensure you see bots being blocked
  7. Turn the setting off, go back to the robots.txt file and ensure bots are no longer blocked

Changelog Entry

Added - New setting that when turned on, will modify your site's robots.txt file to block the most common AI data scraping bots.
Fixed - Ensure error message shows properly if registration settings are initially saved with empty values.

Credits

Props @dkotter, @jeffpaul

Checklist:

@dkotter dkotter added this to the 3.7.0 milestone Aug 27, 2025
@dkotter dkotter self-assigned this Aug 27, 2025
@dkotter dkotter requested review from a team and jeffpaul as code owners August 27, 2025 17:04
@github-actions github-actions bot added the needs:code-review This requires code review. label Aug 27, 2025
@dkotter dkotter changed the title Add feature to block AI scraping bots Add feature to block AI data scraping bots Aug 27, 2025
iamdharmesh
iamdharmesh previously approved these changes Sep 8, 2025
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @dkotter. Looks good to me

Click on ClassifAI Registration (may be worth updating this to something else now, General Settings maybe?)

Yes. As we already have a title at panel level, I think we are good to update this.

@dkotter
Copy link
Collaborator Author

dkotter commented Sep 8, 2025

Thanks for working on this @dkotter. Looks good to me

Click on ClassifAI Registration (may be worth updating this to something else now, General Settings maybe?)

Yes. As we already have a title at panel level, I think we are good to update this.

@jeffpaul Curious for your thoughts on this? In introducing this new AI block setting, we now have more than just registration settings on this page. I'm fine to leave the header ClassifAI Registration if we want but thinking something like Settings or General Settings may be better? Though also worried about confusing someone, making them think that is the page to configure all settings

@dkotter
Copy link
Collaborator Author

dkotter commented Sep 10, 2025

Note I've gone ahead and changed that menu item to be just Settings now instead of ClassifAI Registration. Happy to modify this if needed but I think this is makes more sense now that we've added this new setting

@dkotter dkotter merged commit bbd08fb into develop Sep 10, 2025
19 checks passed
@dkotter dkotter deleted the feature/736 branch September 10, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:code-review This requires code review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting: Block AI bots

2 participants