Skip to content

Commit b1e5947

Browse files
committed
Moving over content from PR #25332
1 parent 7da4629 commit b1e5947

File tree

3 files changed

+25
-4
lines changed

3 files changed

+25
-4
lines changed

src/content/docs/ai-search/configuration/data-source/website.mdx

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ sidebar:
55
order: 2
66
---
77

8+
import { DashButton, Steps } from "~/components"
9+
810
The Website data source allows you to connect a domain you own so its pages can be crawled, stored, and indexed.
911

1012
:::note
@@ -13,11 +15,12 @@ You can only crawl domains that you have onboarded onto the same Cloudflare acco
1315
Refer to [Onboard a domain](/fundamentals/manage-domains/add-site/) for more information on adding a domain to your Cloudflare account.
1416
:::
1517

16-
:::caution[Bot protection may block crawling]
17-
If you use Cloudflare products that control or restrict bot traffic such as [Bot Management](/bots/), [Web Application Firewall (WAF)](/waf/), or [Turnstile](/turnstile/), the same rules will apply to the AI Search (AutoRAG) crawler. Make sure to configure an exception or an allow-list for the AutoRAG crawler in your settings.
18+
:::caution[Bot protection may block crawling]
19+
If you use Cloudflare products that control or restrict bot traffic such as [Bot Management](/bots/), [Web Application Firewall (WAF)](/waf/), or [Turnstile](/turnstile/), the same rules will apply to the AI Search (AutoRAG) crawler. Make sure to configure an exception or an allow-list for the AutoRAG crawler in your settings.
1820
:::
1921

2022
## How website crawling works
23+
2124
When you connect a domain, the crawler looks for your website’s sitemap to determine which pages to visit:
2225

2326
1. The crawler first checks the `robots.txt` for listed sitemaps. If it exists, it reads all sitemaps existing inside.
@@ -26,6 +29,24 @@ When you connect a domain, the crawler looks for your website’s sitemap to det
2629

2730
Pages are visited, according to the `<priority>` attribute set on the sitemaps, if this field is defined.
2831

32+
## How to set WAF rules to allowlist AutoRAG crawler
33+
34+
If you have Security rules configured to block bot activity, you can add a rule to allowlist AutoRAG's crawler bot.
35+
36+
<Steps>
37+
1. In the Cloudflare dashboard, go to the **Security rules** page of your account and domain.
38+
39+
<DashButton url="/?to=/:account/:zone/security/security-rules" />
40+
41+
2. To create a new empty rule, select **Create rule** > **Custom rules**.
42+
3. Enter a descriptive name for the rule in **Rule name**, such as `Allow AutoRAG`.
43+
4. Under **When incoming requests match**, use the **Field** drop-down list to choose _Bot Detection ID_. For **Operator**, select _equals_. For **Value**, enter `122933950`.
44+
5. Under **Then take action**, in the **Choose action** dropdown, choose _Skip_.
45+
6. Under **Place at**, select the order of the rule in the **Select order** dropdown to be _First_. Setting the order as _First_ allows this rule to be applied before subsequent rules.
46+
7. To save and deploy your rule, select **Deploy**.
47+
48+
</Steps>
49+
2950
## Parsing options
3051
You can choose how pages are parsed during crawling:
3152

src/content/docs/ai-search/usage/rest-api.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai-search/rags/{
4848
-H "Authorization: Bearer {API_TOKEN}" \
4949
-d '{
5050
"query": "How do I train a llama to deliver coffee?",
51-
"model": @cf/meta/llama-3.3-70b-instruct-sd,
51+
"model": @cf/meta/llama-3.3-70b-instruct-fp8-fast,
5252
"rewrite_query": false,
5353
"max_num_results": 10,
5454
"ranking_options": {

src/content/docs/ai-search/usage/workers-binding.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ This method searches for relevant results from your data source and generates a
4040
```js
4141
const answer = await env.AI.autorag("my-autorag").aiSearch({
4242
query: "How do I train a llama to deliver coffee?",
43-
model: "@cf/meta/llama-3.3-70b-instruct-sd",
43+
model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
4444
rewrite_query: true,
4545
max_num_results: 2,
4646
ranking_options: {

0 commit comments

Comments
 (0)