You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Protect your website or application from AI crawlers by implementing a `robots.txt` file on your domain to direct AI bot operators on what content they can and cannot scrape for AI model training.
12
12
13
13
AI bots are expected to follow the `robots.txt` directives.
14
14
15
+
`robots.txt` files express your preferences. They do not prevent crawler operators from crawling your content at a technical level. Some crawler operators may disregard your `robots.txt` preferences and crawl your content regardless of what your `robots.txt` file says.
16
+
15
17
:::note
16
18
Respecting `robots.txt` is voluntary. If you want to prevent crawling, use AI Crawl Control's [manage AI crawlers](/ai-crawl-control/features/manage-ai-crawlers/) feature.
With the managed `robots.txt` enabled, Cloudflare will prepend our managed content before your original content, resulting in what you can view at https://www.crawlstop.com/robots.txt.
39
41
40
42
```txt title="Feature enabled"
41
-
# NOTICE: The collection of content and other data on this
42
-
# site through automated means, including any device, tool,
43
-
# or process designed to data mine or scrape content, is
44
-
# prohibited except (1) for the purpose of search engine indexing or
45
-
# artificial intelligence retrieval augmented generation or (2) with express
46
-
# written permission from this site’s operator.
47
-
48
-
# To request permission to license our intellectual
49
-
# property and/or other materials, please contact this
50
-
# site’s operator directly.
43
+
# As a condition of accessing this website, you agree to abide by the
44
+
# following content-signals:
45
+
46
+
# (a) If a content-signal = yes, you may collect content for the
47
+
# corresponding use.
48
+
# (b) If a content-signal = no, you may not collect content for the
49
+
# corresponding use.
50
+
# (c) If the website operator does not include a content signal for a
51
+
# corresponding use, the website operator neither grants nor restricts
52
+
# permission via content signal with respect to the corresponding use.
53
+
54
+
# The content signals and their meanings are:
55
+
56
+
# search: building a search index and providing search results (e.g., returning
57
+
# hyperlinks and short excerpts from your website's contents). Search
58
+
# does not include providing AI-generated search summaries.
59
+
# ai-input: inputting content into one or more AI models (e.g., retrieval
60
+
# augmented generation, grounding, or other real-time taking of
61
+
# content for generative AI search answers).
62
+
# ai-train: training or fine-tuning AI models.
63
+
64
+
# ANY RESTRICTIONS EXPRESSED VIA CONTENT-SIGNALS ARE EXPRESS RESERVATIONS OF
65
+
# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT
66
+
# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
51
67
52
68
# BEGIN Cloudflare Managed content
53
69
70
+
User-Agent: *
71
+
Content-signal: search=yes,ai-train=no
72
+
Allow: /
73
+
54
74
User-agent: Amazonbot
55
75
Disallow: /
56
76
@@ -81,7 +101,6 @@ Disallow: /lp
81
101
Disallow: /feedback
82
102
Disallow: /langtest
83
103
84
-
85
104
Sitemap: https://www.crawlstop.com/sitemap.xml
86
105
```
87
106
@@ -99,20 +118,62 @@ To implement a `robots.txt` file on your domain:
99
118
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain.
100
119
2. Go to **Security** > **Bots**.
101
120
3. Select **Configure Bot Fight Mode**.
102
-
4. Turn **Manage bot traffic with robots.txt** on.
121
+
4. Turn **Instruct bot traffic with robots.txt** on.
103
122
</Steps>
104
123
</TabItem>
105
124
<TabItemlabel="New dashboard"icon="rocket">
106
125
<Steps>
107
-
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/login), and select your account and domain.
108
-
2. Go to **Security** > **Settings**.
109
-
3. Filter by **Bot traffic**.
110
-
4. Go to **Instruct AI bot traffic with robots.txt**.
111
-
5. Turn **Instruct AI bot traffic with robots.txt** on.
126
+
1. In the Cloudflare dashboard, go to the Security Settings page.
3. Go to **Instruct AI bot traffic with robots.txt**.
131
+
4. Turn **Instruct AI bot traffic with robots.txt** on.
112
132
</Steps>
113
133
</TabItem>
114
134
</Tabs>
115
135
136
+
## Content Signals Policy
137
+
138
+
Free zones that do not have their own `robots.txt` file and do not use the managed `robots.txt` feature will display the Content Signals Policy when a crawler requests the `robots.txt` file for your zone.
139
+
140
+
This file only outlines the Content Signals framework. It does not express your preferences or rights associated with your content.
141
+
142
+
```txt title="Content Signals Policy"
143
+
# As a condition of accessing this website, you agree to abide by the
144
+
# following content-signals:
145
+
146
+
# (a) If a content-signal = yes, you may collect content for the
147
+
# corresponding use.
148
+
# (b) If a content-signal = no, you may not collect content for the
149
+
# corresponding use.
150
+
# (c) If the website operator does not include a content signal for a
151
+
# corresponding use, the website operator neither grants nor restricts
152
+
# permission via content signal with respect to the corresponding use.
153
+
154
+
# The content signals and their meanings are:
155
+
156
+
# search: building a search index and providing search results (e.g., returning
157
+
# hyperlinks and short excerpts from your website's contents). Search
158
+
# does not include providing AI-generated search summaries.
159
+
# ai-input: inputting content into one or more AI models (e.g., retrieval
160
+
# augmented generation, grounding, or other real-time taking of
161
+
# content for generative AI search answers).
162
+
# ai-train: training or fine-tuning AI models.
163
+
164
+
# ANY RESTRICTIONS EXPRESSED VIA CONTENT-SIGNALS ARE EXPRESS RESERVATIONS OF
165
+
# RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT
166
+
# AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
167
+
```
168
+
169
+
Cloudflare's Content Signals Policy is included by default in the `robots.txt` file when you turn on **Instruct AI bot traffic with robots.txt**.
170
+
171
+
If you would like to opt out of displaying the policy in your `robots.txt` file, you can uncheck **Display Content Signals Policy** under **Control AI Crawlers** in your zone's overview.
0 commit comments