|
| 1 | +--- |
| 2 | +title: Track robots.txt |
| 3 | +pcx_content_type: concept |
| 4 | +sidebar: |
| 5 | + order: 6 |
| 6 | +--- |
| 7 | + |
| 8 | +import { Steps, GlossaryTooltip, DashButton } from "~/components"; |
| 9 | + |
| 10 | +The **Robots.txt** tab in AI Crawl Control provide insights into how AI crawlers interact with your <GlossaryTooltip term="robots.txt">`robots.txt`</GlossaryTooltip> files across your hostnames. You can monitor request patterns, verify file availability, and identify crawlers that violate your directives. |
| 11 | + |
| 12 | +To access robots.txt insights: |
| 13 | + |
| 14 | +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. |
| 15 | +2. Go to **AI Crawl Control**. |
| 16 | + |
| 17 | + <DashButton url="/?to=/:account/:zone/ai" /> |
| 18 | + |
| 19 | +3. Go to the **Robots.txt** tab. |
| 20 | + |
| 21 | +## Check managed robots.txt status |
| 22 | + |
| 23 | +The status card at the top of the tab shows whether Cloudflare is managing your `robots.txt` file. |
| 24 | + |
| 25 | +When enabled, Cloudflare will include directives to block common AI crawlers used for training and include its [Content Signals Policy](/bots/additional-configurations/managed-robots-txt/#content-signals-policy) in your `robots.txt`. For more details on how Cloudflare manages your `robots.txt` file, refer to [Managed `robots.txt`](/bots/additional-configurations/managed-robots-txt/). |
| 26 | + |
| 27 | +## Filter robots.txt request data |
| 28 | + |
| 29 | +You can apply filters at the top of the tab to narrow your analysis of robots.txt requests: |
| 30 | + |
| 31 | +- Filter by specific crawler name (for example, Googlebot or specific AI bots). |
| 32 | +- Filter by the entity running the crawler to understand direct licensing opportunities or existing agreements. |
| 33 | +- Filter by general use cases (for example, AI training, general search, or AI assistant). |
| 34 | +- Select a custom time frame for historical analysis. |
| 35 | + |
| 36 | +The values in all tables and metrics will update according to your filters. |
| 37 | + |
| 38 | +## Monitor robots.txt availability |
| 39 | + |
| 40 | +The **Availability** table shows the historical request frequency and health status of `robots.txt` files across your hostnames over the selected time frame. |
| 41 | + |
| 42 | +| Column | Description | |
| 43 | +| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
| 44 | +| Path | The specific hostname's `robots.txt` file being requested. Paths are listed from the most requested to the least. | |
| 45 | +| Requests | The total number of requests made to this path. Requests are broken down into:<br/>- **Successful:** HTTP status codes below 400 (including **200 OK** and redirects).<br/>- **Unsuccessful:** HTTP status codes of 400 or above. | |
| 46 | +| Status | The HTTP status code from pinging the `robots.txt` file. | |
| 47 | +| Content Signals | An indicator showing whether the `robots.txt` file contains [Content Signals](https://contentsignals.org/), directives for usage in AI training, search, or AI input. | |
| 48 | + |
| 49 | +From this table, you can take the following actions: |
| 50 | + |
| 51 | +- Monitor for a high number of unsuccessful requests, which suggests that crawlers are having trouble accessing your `robots.txt` file. |
| 52 | + - If the **Status** is `404 Not Found`, create a `robots.txt` file to provide clear directives. |
| 53 | + - If the file exists, check for upstream WAF rules or other security settings that may be blocking access. |
| 54 | +- If the **Content Signals** column indicates that signals are missing, add them to your `robots.txt` file. You can do this by following the [Content Signals](https://contentsignals.org/) instructions or by enabling [Managed `robots.txt`](/bots/additional-configurations/managed-robots-txt/) to have Cloudflare manage them for you. |
| 55 | + |
| 56 | +## Track robots.txt violations |
| 57 | + |
| 58 | +The **Violations** table identifies AI crawlers that have requested paths explicitly disallowed by your `robots.txt` file. This helps you identify non-compliant crawlers and take appropriate action. |
| 59 | + |
| 60 | +:::note[How violations are calculated] |
| 61 | + |
| 62 | +The Violations table identifies mismatches between your **current** `robots.txt` directives and past crawler requests. Because violations are not logged in real-time, recently added or changed rules may cause previously legitimate requests to be flagged as violations. |
| 63 | + |
| 64 | +For example, if you add a new `Disallow` rule, all past requests to that path will appear as violations, even though they were not violations at the time of the request. |
| 65 | +::: |
| 66 | + |
| 67 | +| Column | Description | |
| 68 | +| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------- | |
| 69 | +| Crawler | The name of the bot that violated your `robots.txt` directives. The operator of the crawler is listed directly beneath the crawler name. | |
| 70 | +| Path | The specific URL or path the crawler attempted to access that was disallowed by your `robots.txt` file. | |
| 71 | +| Directive | The exact line from your `robots.txt` file that disallowed access to the path. | |
| 72 | +| Violations | The count of HTTP requests made to the disallowed path/directive pair within the selected time frame. | |
| 73 | + |
| 74 | +When you identify crawlers violating your `robots.txt` directives, you have several options: |
| 75 | + |
| 76 | +- Navigate to the [**Crawlers** tab](/ai-crawl-control/features/manage-ai-crawlers/) to permanently block the non-compliant crawler. |
| 77 | +- Use [Cloudflare WAF](/waf/) to create a path-specific security rules for the violating crawler. |
| 78 | +- Use [Redirect Rules](/rules/url-forwarding/) to guide violating crawlers to an appropriate area of your site. |
| 79 | + |
| 80 | +## Related resources |
| 81 | + |
| 82 | +- [Manage AI crawlers](/ai-crawl-control/features/manage-ai-crawlers/) |
| 83 | +- [Analyze AI traffic](/ai-crawl-control/features/analyze-ai-traffic/) |
| 84 | +- [Cloudflare WAF](/waf/) |
0 commit comments