Skip to content

Commit c4232ad

Browse files
committed
Changes for WBA
1 parent 84d2b48 commit c4232ad

File tree

4 files changed

+242
-56
lines changed

4 files changed

+242
-56
lines changed

src/content/docs/bots/concepts/bot/verified-bots/categories.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
pcx_content_type: reference
33
title: Verified bot categories
44
sidebar:
5-
order: 3
5+
order: 10
66
label: Categories
77

88
---
@@ -11,7 +11,7 @@ You can segment your verified bot traffic by its type and purpose by adding the
1111

1212
:::note
1313

14-
The Verified Bot Categories field is not compatible with legacy Firewall rules.
14+
The Verified Bot Categories field is not compatible with legacy Firewall rules.
1515
:::
1616

1717
## Categories

src/content/docs/bots/concepts/bot/verified-bots/policy.mdx

Lines changed: 2 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
pcx_content_type: reference
33
title: Verified bots policy
44
sidebar:
5-
order: 2
5+
order: 5
66
label: Policy
77

88
---
@@ -27,7 +27,7 @@ A bot crawling one site is not valid.
2727

2828
### Bot Identification
2929

30-
The user-agent with the following requirements:
30+
The user-agent or message signature with the following requirements:
3131

3232
- Have at least 5 characters.
3333
- Must not contain special characters.
@@ -72,22 +72,6 @@ If a search engine crawler skips `robots.txt`, it will be rejected.
7272

7373
The bot must have publicly documented expected behavior or user-agent format.
7474

75-
## IP Validation
76-
77-
A set of validation methods and requirements to gather set IP ranges for a verified service.
78-
79-
### Public IP List
80-
81-
- A fixed and limited set of IP addresses, which can be verified via publicly accessible plain-text, `JSON`, or `CSV`.
82-
- IP addresses used solely by the bot owner.
83-
- A user-agent match pattern.
84-
85-
### Reverse DNS
86-
87-
- A list of domain suffixes to validate DNS records.
88-
- IP addresses should have PTR records set correctly.
89-
- A user-agent match pattern.
90-
9175
## Breach of Policy
9276

9377
If any of the requirements to validate are breached, a service will be removed from the global allowlist.
@@ -100,39 +84,3 @@ If any of the requirements to validate are breached, a service will be removed f
10084
- A block of IPs not briefed on onboarding is added to the list.
10185
- The disclosed purpose of the service does not reflect on the traffic.
10286
- An AI Crawler that does not respect the crawl-delay directive in robots.txt.
103-
104-
## Online application
105-
106-
To submit a verified bot that Cloudflare is not [currently tracking](https://radar.cloudflare.com/verified-bots), fill out an [online application](https://dash.cloudflare.com/?to=/:account/configurations/verified-bots) in the Cloudflare dashboard for the fastest possible results. Bot operators who prefer not to create a free Cloudflare account can do so using our [old form](https://docs.google.com/forms/d/e/1FAIpQLSdqYNuULEypMnp4i5pROSc-uP6x65Xub9svD27mb8JChA_-XA/viewform?usp=sf_link), but the waiting time is up to several weeks for verified bot requests to be evaluated.
107-
108-
### Generic user-agents
109-
110-
User-agent patterns that match generic user-agents will be rejected by the Verified Bots API. When you add a user-agent pattern that is considered very common to the Verified Bot form, you may encounter an error message that will prompt you to correct the user-agent before you can submit again.
111-
112-
Generic user-agents include:
113-
114-
- `Dart`
115-
- `Go-http-client`
116-
- `GuzzleHttp`
117-
- `Google Chrome`
118-
- `Mozilla Firefox`
119-
- `Safari`
120-
- `Nessus`
121-
- `Websocket++`
122-
- `cloudflare-go`
123-
- `fasthttp`
124-
- `got`
125-
- `nginx-ssl early hints`
126-
- `node`
127-
- `node-fetch`
128-
- `okhttp`
129-
- `python-requests`
130-
- `uTorrent`
131-
132-
## Transient false negatives
133-
134-
Once Cloudflare lists a bot as a verified bot, this entry is cached and may get delisted if no traffic is seen in the Cloudflare network coming from the bot for a defined period of time.
135-
136-
It takes 24 hours for an inactive IP to be removed as a verified bot.
137-
138-
A bot can remain unlisted until Cloudflare sees traffic being sourced from the bot. When the bot is revalidated, it is listed as a verified bot again.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
pcx_content_type: concept
3+
title: Verified bots requirements
4+
sidebar:
5+
order: 3
6+
label: Requirements
7+
8+
---
9+
10+
import { GlossaryTooltip } from "~/components"
11+
12+
To add a bot to Cloudflare's list of <GlossaryTooltip term="verified bot">verified bots</GlossaryTooltip>, the bot must meet the following requirements:
13+
14+
1. The bot must follow [verified bots policy](/bots/concepts/bot/verified-bots/policy/).
15+
2. The bot must be verified using one of the [verification methods](/bots/concepts/bot/verified-bots/verification/).
16+
17+
Once Cloudflare verifies a bot, it will appear on the [Cloudflare Radar's list of verified bots](https://radar.cloudflare.com/verified-bots).
18+
19+
## Transient false negatives
20+
21+
Once Cloudflare lists a bot as a verified bot, this entry is cached and may get delisted if no traffic is seen in the Cloudflare network coming from the bot for a defined period of time.
22+
23+
It takes 24 hours for an inactive IP to be removed as a verified bot.
24+
25+
A bot can remain unlisted until Cloudflare sees traffic being sourced from the bot. When the bot is revalidated, it is listed as a verified bot again.
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
pcx_content_type: concept
3+
title: Verification methods
4+
sidebar:
5+
order: 7
6+
label: Verification methods
7+
8+
---
9+
10+
import { GlossaryTooltip, Steps } from "~/components"
11+
12+
To submit a verified bot that Cloudflare is not [currently tracking](https://radar.cloudflare.com/verified-bots), fill out an [online application](https://dash.cloudflare.com/?to=/:account/configurations/verified-bots) in the Cloudflare dashboard for the fastest possible results. Bot operators who prefer not to create a free Cloudflare account can do so using our [old form](https://docs.google.com/forms/d/e/1FAIpQLSdqYNuULEypMnp4i5pROSc-uP6x65Xub9svD27mb8JChA_-XA/viewform?usp=sf_link), but the waiting time is up to several weeks for verified bot requests to be evaluated.
13+
14+
Cloudflare can verify a bot in two ways:
15+
16+
- **Web Bot Auth**: An authentication method which leverages cryptographic signatures in HTTP messages to verify requests that come from an automated bot.
17+
- **IP validation**: An authentication method which identifies a bot by their range of IP addresses.
18+
19+
## Web Bot Auth
20+
21+
To authenticate a bot using Web Bot Auth, you need to:
22+
23+
1. Generate a valid signing key.
24+
2. Publish and host a URL which contains the public key derived from your signing key.
25+
3. Register your key directory URL with Cloudflare.
26+
27+
### 1. Generate a valid signing key
28+
29+
You need to generate a signing key which will be used to authenticate your bot's requests.
30+
31+
{/* prettier-ignore */}
32+
<Steps>
33+
1. Generate a unique [Ed25519](https://ed25519.cr.yp.to/) private key to sign your requests. This example uses the [OpenSSL](https://openssl-library.org/) `genpkey` command:
34+
35+
```sh
36+
openssl genpkey -algorithm ed25519 -out private-key.pem
37+
```
38+
2. Extract your public key.
39+
40+
```sh
41+
openssl pkey -in private-key.pem -pubout -out public-key.pem
42+
```
43+
3. Convert the public key to JSON Web Key (JWK) using a tool of your choice. This example uses [`jwker`](https://github.com/jphastings/jwker) command line application.
44+
```sh
45+
go install github.com/jphastings/jwker/cmd/jwker@latest
46+
jwker public-key.pem public-key.jwk
47+
```
48+
</Steps>
49+
50+
By following these steps, you have generated a private key and a public key, then converted the public key to a JWK.
51+
52+
### 2. Host a key directory
53+
54+
You need to host a key directory which creates a way for Cloudflare to authenticate your bot's requests.
55+
56+
<Steps>
57+
1. Host a key directory at a well known message signatures directory. The key directory should serve a JSON Web Key Set (JWKS) including the public key derived from your signing key.
58+
59+
An example directory would be:
60+
```txt
61+
/.well-known/http-message-signatures-directory/
62+
```
63+
2. Serve the web page over HTTPS (not HTTP).
64+
3. Sign your HTTP response using the HTTP message signature specification by attaching one signature per key in your key directory. This ensures no one else can mirror your directory and attempt to register on your behalf. Your response must include the following headers:
65+
- `Signature`: TBD
66+
- `Signature-Input`: TBD
67+
68+
The following example shows the annotated request and response with required headers against `https://example.com`.
69+
```txt
70+
GET /.well-known/http-message-signatures-directory HTTP/1.1
71+
Host: example.com
72+
Accept: application/http-message-signatures-directory+json
73+
74+
HTTP/1.1 200 OK
75+
Content-Type: application/http-message-signatures-directory+json
76+
Signature: sig1=:TD5arhV1ved6xtx63cUIFCMONT248cpDeVUAljLgkdozbjMNpJGr/WAx4PzHj+WeG0xMHQF1BOdFLDsfjdjvBA==:
77+
Signature-Input: sig1=("@authority");alg="ed25519";keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U";nonce="ZO3/XMEZjrvSnLtAP9M7jK0WGQf3J+pbmQRUpKDhF9/jsNCWqUh2sq+TH4WTX3/GpNoSZUa8eNWMKqxWp2/c2g==";tag="http-message-signatures-directory";created=1750105829;expires=1750105839
78+
Cache-Control: max-age=86400
79+
{
80+
"keys": [{
81+
"kty": "OKP",
82+
"crv": "Ed25519",
83+
"x": "JrQLj5P_89iXES9-vFgrIy29clF9CC_oPPsw3c5D0bs", // Base64 URL-encoded public key, with no padding
84+
}]
85+
}
86+
```
87+
</Steps>
88+
89+
:::note
90+
This URL serves a standard JSON Web Key Set. Besides `x`, `crv`, and `kty`, you can include other standard JSON Web Key parameters, and you may publish non-Ed25519 keys as well. Multiple Ed25519 keys are acceptable as well.
91+
92+
Cloudflare will ignore all other key types and key parameters except those containing `kty`, `crv`, and `x` formatted above. Do not include information that would leak your private key, such as the `d` parameter.
93+
:::
94+
95+
You can use the Cloudflare-developed [`http-signature-directory` CLI tool](https://crates.io/crates/http-signature-directory) to assist you in validating your directory.
96+
97+
### 3. Register your bot and key directory
98+
99+
You need to register your bot and its key directory to add your bot to the list of verified bots.
100+
101+
<Steps>
102+
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain.
103+
2. Go to **Manage Account** > **Configurations**.
104+
3. Go to the **Verified Bots** tab.
105+
4. For **Verification Method**: select **Request Signature**.
106+
5. For **Validation Instructions**: enter the URL of your key directory. You can additionally supply User Agents values (and their match patterns) that will be sent by your bot.
107+
6. Select **Submit**.
108+
</Steps>
109+
110+
Cloudflare accepts all valid Ed25519 keys found in your key directory. In the event a key already exists in Cloudflare's registered database, Cloudflare will work with you to supply a new key, or rotate your existing key.
111+
112+
:::note[Estimated review time]
113+
The estimated review time is approximately 1 week.
114+
115+
After successful verification, you will be able to send verified requests.
116+
:::
117+
118+
### 4. (After verification) Sign your requests
119+
120+
After your bot has been successfully verified, you need to sign your bot's requests.
121+
122+
<Steps>
123+
1. Choose a set of components to sign. A component is either an HTTP header, or any [derived components](https://www.rfc-editor.org/rfc/rfc9421#name-derived-components) in the HTTP Message Signatures specification. Cloudflare recommends the following:
124+
- Choose at least the `@authority` derived component, which represents the domain you are sending requests to. For example, a request to `https://example.com` will be interpreted to have an `@authority` of `example.com`.
125+
- Use components that only contain ASCII values. HTTP Message Signature specification disallows non-ASCII characters, which will result in failure to validate your bot's requests.
126+
127+
:::note[Use components with only ASCII values]
128+
Cloudflare currently does not support `bs` or `sf` parameter designed to serialize non-ASCII values into ASCII equivalents.
129+
:::
130+
- Add a `Content-Digest` header if you wish to sign your [message content](https://www.rfc-editor.org/rfc/rfc9421#name-message-content), then specify `Content-Digest` as a component to sign.
131+
2. [Calculate the base64 URL-encoded JWK thumbprint](https://www.rfc-editor.org/rfc/rfc8037.html#appendix-A.3) associated with your Ed25519 public key registered with Cloudflare.
132+
3. Construct a [`Signature-Input` header](https://www.rfc-editor.org/rfc/rfc9421#name-the-signature-input-http-fi) over your chosen components. The header must meet the following requirements.
133+
134+
| Required component parameter | Requirement |
135+
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
136+
| `tag` | This should be equal to `web-bot-auth`. |
137+
| `alg` | This should be equal to `ed25519`. |
138+
| `keyid` | This should be equal to the thumbprint computed in step 2. |
139+
| `created` | This should be equal to a `Unix` timestamp associated with when the message was sent by your application. |
140+
| `expires` | This should be equal to a `Unix` timestamp associated with when Cloudflare should no longer attempt to verify the message. A short `expires` reduces the likelihood of replay attacks, and Cloudflare recommends choosing suitable short-lived intervals. |
141+
4. Construct a [`Signature` header](https://www.rfc-editor.org/rfc/rfc9421#name-the-signature-http-field) over your chosen components.
142+
5. Construct a [`Signature-Agent` header](https://www.ietf.org/archive/id/draft-meunier-http-message-signatures-directory-00.html#name-header-field-definition) that points to your key directory. Note that Cloudflare will fail to verify a message if:
143+
- The message includes a `Signature-Agent` header that is not an `https://`.
144+
- The message includes a valid URI but do not enclose it in double quotes.
145+
- The message has a valid `Signature-Agent` header, but does not include it in the component list in `Signature-Input`.
146+
6. Attach these three headers to your bot's requests.
147+
</Steps>
148+
149+
An example request may look like this:
150+
151+
```txt
152+
Signature-Agent: "https://signature-agent.test"
153+
Signature-Input: sig2=("@authority" "signature-agent")
154+
;created=1735689600
155+
;keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"
156+
;alg="ed25519"
157+
;expires=1735693200
158+
;nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4lAZYadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLElW1qg=="
159+
;tag="web-bot-auth"
160+
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:
161+
```
162+
163+
## IP validation
164+
165+
There are two type of IP validation: public IP list and reverse DNS.
166+
167+
### Public IP List
168+
169+
To verify a bot using a public IP list, you need to provide:
170+
171+
- A fixed and limited set of IP addresses, which can be verified via publicly accessible plain-text, `JSON`, or `CSV`.
172+
- IP addresses used solely by the bot owner.
173+
- A user-agent match pattern.
174+
175+
### Reverse DNS
176+
177+
To verify a bot using reverse DNS, you need to provide:
178+
179+
- A list of domain suffixes to validate DNS records.
180+
- IP addresses should have PTR records set correctly.
181+
- A user-agent match pattern.
182+
183+
## Generic user-agents
184+
185+
User-agent patterns that match generic user-agents will be rejected by the Verified Bots API. When you add a user-agent pattern that is considered very common to the Verified Bot form, you may encounter an error message that will prompt you to correct the user-agent before you can submit again.
186+
187+
Generic user-agents include:
188+
189+
- `Dart`
190+
- `Go-http-client`
191+
- `GuzzleHttp`
192+
- `Google Chrome`
193+
- `Mozilla Firefox`
194+
- `Safari`
195+
- `Nessus`
196+
- `Websocket++`
197+
- `cloudflare-go`
198+
- `fasthttp`
199+
- `got`
200+
- `nginx-ssl early hints`
201+
- `node`
202+
- `node-fetch`
203+
- `okhttp`
204+
- `python-requests`
205+
- `uTorrent`
206+
207+
208+
## Additional resources
209+
210+
You may wish to refer to the following resources.
211+
212+
- Cloudflare's [`web-bot-auth` library in Rust](https://crates.io/crates/web-bot-auth).
213+
- Cloudflare's [`web-bot-auth` npm package in Typescript](https://www.npmjs.com/package/web-bot-auth).

0 commit comments

Comments
 (0)