AI Crawl Control: Add documentation and changelog for new robots.txt tab (#25966)

CameronWhiteside · Cameron Whiteside · Oxyjun · web-flow · commit fcf91285cf60 · 2025-10-23T19:15:37.000-05:00
* AI Crawl Control: Added docs and changelog for new robots.txt tab

* AI Crawl Control: imporved changelog according to style guide

* Update src/content/docs/ai-crawl-control/features/track-robots-txt.mdx

* Update src/content/docs/ai-crawl-control/features/track-robots-txt.mdx

Co-authored-by: Jun Lee &lt;junlee@cloudflare.com&gt;

* Update src/content/changelog/ai-crawl-control/2025-10-21-track-robots-txt.mdx

---------

Co-authored-by: Cameron Whiteside &lt;cwhiteside@cloudflare.com&gt;
Co-authored-by: Jun Lee &lt;junlee@cloudflare.com&gt;
diff --git a/src/content/changelog/ai-crawl-control/2025-10-21-track-robots-txt.mdx b/src/content/changelog/ai-crawl-control/2025-10-21-track-robots-txt.mdx
@@ -0,0 +1,27 @@
+---
+title: New Robots.txt tab for tracking crawler compliance
+description: Monitor robots.txt file health, track crawler violations, and gain visibility into how AI crawlers interact with your directives.
+date: 2025-10-21
+---
+
+AI Crawl Control now includes a **Robots.txt** tab that provides insights into how AI crawlers interact with your `robots.txt` files.
+
+## What's new
+
+The Robots.txt tab allows you to:
+
+- Monitor the health status of `robots.txt` files across all your hostnames, including HTTP status codes, and identify hostnames that need a `robots.txt` file.
+- Track the total number of requests to each `robots.txt` file, with breakdowns of successful versus unsuccessful requests.
+- Check whether your `robots.txt` files contain [Content Signals](https://contentsignals.org/) directives for AI training, search, and AI input.
+- Identify crawlers that request paths explicitly disallowed by your `robots.txt` directives, including the crawler name, operator, violated path, specific directive, and violation count.
+- Filter `robots.txt` request data by crawler, operator, category, and custom time ranges.
+
+## Take action
+
+When you identify non-compliant crawlers, you can:
+
+- Block the crawler in the [Crawlers tab](/ai-crawl-control/features/manage-ai-crawlers/)
+- Create custom [WAF rules](/waf/) for path-specific security
+- Use [Redirect Rules](/rules/url-forwarding/) to guide crawlers to appropriate areas of your site
+
+To get started, go to **AI Crawl Control** > **Robots.txt** in the Cloudflare dashboard. Learn more in the [Track robots.txt documentation](/ai-crawl-control/features/track-robots-txt/).
diff --git a/src/content/docs/ai-crawl-control/features/analyze-ai-traffic.mdx b/src/content/docs/ai-crawl-control/features/analyze-ai-traffic.mdx
@@ -7,7 +7,7 @@ sidebar:
   order: 2
 ---
 
-import { Steps, Tabs, TabItem, DashButton } from "~/components";
+import { Aside, Steps, Tabs, TabItem, DashButton } from "~/components";
 
 AI Crawl Control metrics provide you with insight on how AI crawlers are interacting with your website ([Cloudflare zone](/fundamentals/concepts/accounts-and-zones/#zones)).
 
@@ -24,29 +24,55 @@ You can find meaningful information across both **Crawlers** and **Metrics** tab
 
 The **Crawlers** tab provides you with the following information:
 
-- Total number of requests to crawl your website from common AI crawlers
-- Number of requests made by each AI crawler
-- Number of `robots.txt` violations for each crawler
+| Metric                  | Description                                                             |
+| ----------------------- | ----------------------------------------------------------------------- |
+| **Total requests**      | Total number of requests to crawl your website from common AI crawlers. |
+| **Requests by crawler** | Number of requests made by each AI crawler.                             |
 
 ## View AI Crawl Control metrics
 
 The **Metrics** tab provides you with the following metrics to help you understand how AI crawlers are interacting with your website.
 
-| Metric                               | Description                                                                                                                                                                                       |
-| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Total requests                       | The total number of requests to crawl your website, from all AI crawlers                                                                                                                          |
-| Allowed requests                     | The number of crawler requests that received a successful response from your site                                                                                                                 |
-| Unsuccessful requests                | The number of crawler requests that failed (HTTP 4xx or 5xx) as a result of a blocked request, other security rules, or website errors such as a crawler attempting to access a non-existent page |
-| Overall popular paths                | The most popular pages crawled by AI crawlers, from all AI crawlers                                                                                                                               |
-| Most active AI crawlers by operators | The AI crawler owners with the highest number of requests to access your site                                                                                                                     |
-| Request by AI crawlers               | A graph which displays the number of crawl requests from each AI crawler                                                                                                                          |
-| Most popular paths by AI crawlers    | The most popular pages crawled by AI crawlers, for each AI crawler                                                                                                                                |
-| Referrals                            | A graph which displays the number of visits that were directed to your site from each AI operator                                                                                                 |
-| Referers                             | The list of referers who directed visits to your site                                                                                                                                             |
-
-## Filter date range
-
-You can use the date filter to choose the period of time you wish to analyze.
+### Analyze referrer data
+
+<Aside type="note">
+	This feature is available for customers on a paid plan.
+</Aside>
+
+Identify traffic sources with referrer analytics to understand discovery patterns and content popularity from AI operators.
+
+- View top referrers driving traffic to your site.
+- Understand discovery patterns and content popularity from AI operators.
+
+### Track crawler requests over time
+
+Visualize crawler activity patterns over time using the **Requests over time** chart. You can group data by different dimensions to get more specific insights:
+
+| Dimension       | Description                                                                                 |
+| --------------- | ------------------------------------------------------------------------------------------- |
+| **Crawler**     | Track activity from individual AI crawlers (like GPTBot, ClaudeBot, and Bytespider).        |
+| **Category**    | Analyze crawlers by their purpose or type.                                                  |
+| **Operator**    | Discover which companies (such as OpenAI, Anthropic, and ByteDance) are crawling your site. |
+| **Host**        | Break down activity across multiple subdomains.                                             |
+| **Status Code** | Monitor HTTP response codes (200s, 300s, 400s, 500s) to crawlers.                           |
+
+### Understand what content is crawled
+
+The **Most popular paths** table shows you which pages on your site are most frequently requested by AI crawlers. This can help you understand what content is most popular with different AI models.
+
+| Column               | Description                                                             |
+| -------------------- | ----------------------------------------------------------------------- |
+| **Path**             | The path of the page on your website that was requested.                |
+| **Hostname**         | The hostname of the requested page.                                     |
+| **Crawler**          | The name of the AI crawler that made the request.                       |
+| **Operator**         | The company that operates the AI crawler.                               |
+| **Allowed requests** | The number of times the path was successfully requested by the crawler. |
+
+You can also filter the results by path or content type to narrow down your analysis.
+
+## Filter and export data
+
+You can use the date filter to choose the period of time you wish to analyze. To export your data, select **Download CSV**. The downloaded file will include all applied filters and groupings.
 
 <Tabs>
 <TabItem label="Free plans">
diff --git a/src/content/docs/ai-crawl-control/features/track-robots-txt.mdx b/src/content/docs/ai-crawl-control/features/track-robots-txt.mdx
@@ -0,0 +1,84 @@
+---
+title: Track robots.txt
+pcx_content_type: concept
+sidebar:
+  order: 6
+---
+
+import { Steps, GlossaryTooltip, DashButton } from "~/components";
+
+The **Robots.txt** tab in AI Crawl Control provide insights into how AI crawlers interact with your <GlossaryTooltip term="robots.txt">`robots.txt`</GlossaryTooltip> files across your hostnames. You can monitor request patterns, verify file availability, and identify crawlers that violate your directives.
+
+To access robots.txt insights:
+
+1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain.
+2. Go to **AI Crawl Control**.
+
+   <DashButton url="/?to=/:account/:zone/ai" />
+
+3. Go to the **Robots.txt** tab.
+
+## Check managed robots.txt status
+
+The status card at the top of the tab shows whether Cloudflare is managing your `robots.txt` file.
+
+When enabled, Cloudflare will include directives to block common AI crawlers used for training and include its [Content Signals Policy](/bots/additional-configurations/managed-robots-txt/#content-signals-policy) in your `robots.txt`. For more details on how Cloudflare manages your `robots.txt` file, refer to [Managed `robots.txt`](/bots/additional-configurations/managed-robots-txt/).
+
+## Filter robots.txt request data
+
+You can apply filters at the top of the tab to narrow your analysis of robots.txt requests:
+
+- Filter by specific crawler name (for example, Googlebot or specific AI bots).
+- Filter by the entity running the crawler to understand direct licensing opportunities or existing agreements.
+- Filter by general use cases (for example, AI training, general search, or AI assistant).
+- Select a custom time frame for historical analysis.
+
+The values in all tables and metrics will update according to your filters.
+
+## Monitor robots.txt availability
+
+The **Availability** table shows the historical request frequency and health status of `robots.txt` files across your hostnames over the selected time frame.
+
+| Column          | Description                                                                                                                                                                                                                    |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Path            | The specific hostname's `robots.txt` file being requested. Paths are listed from the most requested to the least.                                                                                                                      |
+| Requests        | The total number of requests made to this path. Requests are broken down into:<br/>- **Successful:** HTTP status codes below 400 (including **200 OK** and redirects).<br/>- **Unsuccessful:** HTTP status codes of 400 or above. |
+| Status          | The HTTP status code from pinging the `robots.txt` file.                                                                                                                                                                       |
+| Content Signals | An indicator showing whether the `robots.txt` file contains [Content Signals](https://contentsignals.org/), directives for usage in AI training, search, or AI input.                                                          |
+
+From this table, you can take the following actions:
+
+- Monitor for a high number of unsuccessful requests, which suggests that crawlers are having trouble accessing your `robots.txt` file.
+  - If the **Status** is `404 Not Found`, create a `robots.txt` file to provide clear directives.
+  - If the file exists, check for upstream WAF rules or other security settings that may be blocking access.
+- If the **Content Signals** column indicates that signals are missing, add them to your `robots.txt` file. You can do this by following the [Content Signals](https://contentsignals.org/) instructions or by enabling [Managed `robots.txt`](/bots/additional-configurations/managed-robots-txt/) to have Cloudflare manage them for you.
+
+## Track robots.txt violations
+
+The **Violations** table identifies AI crawlers that have requested paths explicitly disallowed by your `robots.txt` file. This helps you identify non-compliant crawlers and take appropriate action.
+
+:::note[How violations are calculated]
+
+The Violations table identifies mismatches between your **current** `robots.txt` directives and past crawler requests. Because violations are not logged in real-time, recently added or changed rules may cause previously legitimate requests to be flagged as violations.
+
+For example, if you add a new `Disallow` rule, all past requests to that path will appear as violations, even though they were not violations at the time of the request.
+:::
+
+| Column     | Description                                                                                                                              |
+| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
+| Crawler    | The name of the bot that violated your `robots.txt` directives. The operator of the crawler is listed directly beneath the crawler name. |
+| Path       | The specific URL or path the crawler attempted to access that was disallowed by your `robots.txt` file.                                  |
+| Directive  | The exact line from your `robots.txt` file that disallowed access to the path.                                                           |
+| Violations | The count of HTTP requests made to the disallowed path/directive pair within the selected time frame.                                    |
+
+When you identify crawlers violating your `robots.txt` directives, you have several options:
+
+- Navigate to the [**Crawlers** tab](/ai-crawl-control/features/manage-ai-crawlers/) to permanently block the non-compliant crawler.
+- Use [Cloudflare WAF](/waf/) to create a path-specific security rules for the violating crawler.
+- Use [Redirect Rules](/rules/url-forwarding/) to guide violating crawlers to an appropriate area of your site.
+
+## Related resources
+
+- [Manage AI crawlers](/ai-crawl-control/features/manage-ai-crawlers/)
+- [Analyze AI traffic](/ai-crawl-control/features/analyze-ai-traffic/)
+- [Cloudflare WAF](/waf/)
diff --git a/src/content/docs/ai-crawl-control/index.mdx b/src/content/docs/ai-crawl-control/index.mdx
@@ -11,7 +11,15 @@ head:
 description: Monitor and control how AI services access your website content.
 ---
 
-import { Description, Feature, FeatureTable, Plan, LinkButton, RelatedProduct, Card } from "~/components";
+import {
+	Description,
+	Feature,
+	FeatureTable,
+	Plan,
+	LinkButton,
+	RelatedProduct,
+	Card,
+} from "~/components";
 
 <Plan type="all" />
 
@@ -53,6 +61,15 @@ With AI Crawl Control, you can:
 	Gain insight into how AI crawlers are interacting with your pages.
 </Feature>
 
+<Feature
+	header="Track robots.txt"
+	href="/ai-crawl-control/features/track-robots-txt/"
+	cta="Track robots.txt"
+>
+	Track the health of `robots.txt` files and identify which crawlers are
+	violating your directives.
+</Feature>
+
 <Feature
 	header="Pay Per Crawl"
 	href="/ai-crawl-control/features/pay-per-crawl/what-is-pay-per-crawl/"
@@ -66,41 +83,35 @@ With AI Crawl Control, you can:
 ## Use cases
 
 <Card title="Publishers and content creators">
-Publishers and content creators can monitor which AI crawlers are accessing their articles and educational content. Set policies to allow beneficial crawlers while blocking others.
+	Publishers and content creators can monitor which AI crawlers are accessing
+	their articles and educational content. Set policies to allow beneficial
+	crawlers while blocking others.
 </Card>
 
 <Card title="E-commerce and business sites">
-E-commerce and business sites can identify AI crawler activity on product pages and business information. Control access to sensitive data like pricing and inventory.
+	E-commerce and business sites can identify AI crawler activity on product
+	pages and business information. Control access to sensitive data like pricing
+	and inventory.
 </Card>
 
 <Card title="Documentation sites">
-Documentation sites can track how AI crawlers are accessing their technical documentation. Gain insight into how AI crawlers are engaging with your site.
+	Documentation sites can track how AI crawlers are accessing their technical
+	documentation. Gain insight into how AI crawlers are engaging with your site.
 </Card>
 
 ---
 
 ## Related Products
 
-<RelatedProduct
-	header="Bots"
-	href="/bots/"
-	product="bots"
->
-Identify and mitigate automated traffic to protect your domain from bad bots.
+<RelatedProduct header="Bots" href="/bots/" product="bots">
+	Identify and mitigate automated traffic to protect your domain from bad bots.
 </RelatedProduct>
 
-<RelatedProduct
-	header="Web Application Firewall"
-	href="/waf/"
-	product="waf"
->
-Get automatic protection from vulnerabilities and the flexibility to create custom rules.
+<RelatedProduct header="Web Application Firewall" href="/waf/" product="waf">
+	Get automatic protection from vulnerabilities and the flexibility to create
+	custom rules.
 </RelatedProduct>
 
-<RelatedProduct
-	header="Analytics"
-	href="/analytics/"
-	product="analytics"
->
-View and analyze traffic on your domain.
-</RelatedProduct>
+<RelatedProduct header="Analytics" href="/analytics/" product="analytics">
+	View and analyze traffic on your domain.
+</RelatedProduct>