Skip to content

Commit e0189ad

Browse files
authored
Merge pull request #6 from khnumdev/copilot/block-ai-crawlers-access
Block AI crawlers from training on blog content
2 parents ac8d21f + eb88d24 commit e0189ad

File tree

3 files changed

+121
-0
lines changed

3 files changed

+121
-0
lines changed

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,20 @@ Example custom CSS file (`assets/main.scss`):
226226

227227
This project is open source and available under the terms specified in the [LICENSE](LICENSE) file.
228228

229+
## Terms of Use & AI Training Policy
230+
231+
**All content on this blog is protected by copyright.** The use of any content from this site for training artificial intelligence or machine learning models is **strictly prohibited** without explicit written permission from the author.
232+
233+
This includes, but is not limited to:
234+
- Training large language models (LLMs)
235+
- Fine-tuning AI systems
236+
- Creating datasets for machine learning
237+
- Any form of automated content extraction for AI purposes
238+
239+
This website employs technical measures including `robots.txt` directives and meta tags to prevent AI crawlers from accessing and using the content. Any circumvention of these protections is a violation of these terms.
240+
241+
For licensing inquiries or permission requests, please contact the author.
242+
229243
## Contact
230244

231245
- **Author**: Andrés Pérez Gil

_includes/head.html

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
<head>
2+
<meta charset="utf-8">
3+
<meta http-equiv="X-UA-Compatible" content="IE=edge">
4+
<meta name="viewport" content="width=device-width, initial-scale=1">
5+
6+
{%- seo -%}
7+
<link rel="stylesheet" href="{{ "/assets/main.css" | relative_url }}">
8+
{%- feed_meta -%}
9+
{%- if jekyll.environment == 'production' and site.google_analytics -%}
10+
{%- include google-analytics.html -%}
11+
{%- endif -%}
12+
13+
<!-- Block AI crawlers at page level -->
14+
<meta name="robots" content="noai, noimageai">
15+
<meta name="googlebot" content="noai, noimageai">
16+
<meta name="AdsBot-Google" content="noai, noimageai">
17+
</head>

robots.txt

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Block AI crawlers from training on this content
2+
# See: https://darkvisitors.com/ for more AI bot user agents
3+
4+
# OpenAI
5+
User-agent: GPTBot
6+
Disallow: /
7+
8+
User-agent: ChatGPT-User
9+
Disallow: /
10+
11+
# Google AI
12+
User-agent: Google-Extended
13+
Disallow: /
14+
15+
# Common Crawl (used by many AI companies)
16+
User-agent: CCBot
17+
Disallow: /
18+
19+
# Facebook/Meta
20+
User-agent: FacebookBot
21+
Disallow: /
22+
23+
# Anthropic
24+
User-agent: anthropic-ai
25+
Disallow: /
26+
27+
User-agent: Claude-Web
28+
Disallow: /
29+
30+
# Amazon
31+
User-agent: Amazonbot
32+
Disallow: /
33+
34+
# ByteDance/TikTok
35+
User-agent: Bytespider
36+
Disallow: /
37+
38+
# Apple
39+
User-agent: Applebot-Extended
40+
Disallow: /
41+
42+
# Perplexity AI
43+
User-agent: PerplexityBot
44+
Disallow: /
45+
46+
# Cohere
47+
User-agent: cohere-ai
48+
Disallow: /
49+
50+
# Other AI crawlers
51+
User-agent: Omgilibot
52+
Disallow: /
53+
54+
User-agent: Omgili
55+
Disallow: /
56+
57+
User-agent: YouBot
58+
Disallow: /
59+
60+
User-agent: Diffbot
61+
Disallow: /
62+
63+
User-agent: ImagesiftBot
64+
Disallow: /
65+
66+
User-agent: ai2bot
67+
Disallow: /
68+
69+
# Allow legitimate search engines for SEO
70+
User-agent: Googlebot
71+
Allow: /
72+
73+
User-agent: Bingbot
74+
Allow: /
75+
76+
User-agent: Slurp
77+
Allow: /
78+
79+
User-agent: DuckDuckBot
80+
Allow: /
81+
82+
User-agent: Baiduspider
83+
Allow: /
84+
85+
User-agent: YandexBot
86+
Allow: /
87+
88+
# Default: allow all other bots
89+
User-agent: *
90+
Allow: /

0 commit comments

Comments
 (0)