From 949eff2db9800054f9428c9a372fef8d858c84c5 Mon Sep 17 00:00:00 2001 From: Pedro Sousa <680496+pedrosousa@users.noreply.github.com> Date: Tue, 4 Nov 2025 18:12:19 +0000 Subject: [PATCH 1/5] [WAF] Update Firewall for AI --- .../endpoint-labels.mdx | 206 ++++++++++-------- .../docs/waf/detections/firewall-for-ai.mdx | 202 ----------------- .../firewall-for-ai/example-rules.mdx | 53 +++++ .../waf/detections/firewall-for-ai/fields.mdx | 29 +++ .../firewall-for-ai/get-started.mdx | 176 +++++++++++++++ .../waf/detections/firewall-for-ai/index.mdx | 42 ++++ .../partials/api-shield/labels-add.mdx | 27 +++ 7 files changed, 443 insertions(+), 292 deletions(-) delete mode 100644 src/content/docs/waf/detections/firewall-for-ai.mdx create mode 100644 src/content/docs/waf/detections/firewall-for-ai/example-rules.mdx create mode 100644 src/content/docs/waf/detections/firewall-for-ai/fields.mdx create mode 100644 src/content/docs/waf/detections/firewall-for-ai/get-started.mdx create mode 100644 src/content/docs/waf/detections/firewall-for-ai/index.mdx create mode 100644 src/content/partials/api-shield/labels-add.mdx diff --git a/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx b/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx index 28c7a8d75ae9804..a073ca89d85720c 100644 --- a/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx +++ b/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx @@ -86,107 +86,133 @@ Cloudflare will only add authentication labels to endpoints with successful resp ## Create a label - - - 1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. - 2. Go to **Security** > **Settings** > **Labels**. - 3. Under **Security labels**, select **Create label**. - 4. Name the label and add an optional label description. - 5. Apply the label to your selected endpoints. - 6. Select **Create label**. - - - Alternatively, you can create a user-defined label via Endpoint Management in API Shield: - - - 1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. - 2. Go to **Security** > **Settings** > **Labels**. - 3. Choose the endpoint that you want to label. - 4. Select **Edit labels**. - 5. Under **User**, select **Create user label**. - 6. Enter the label name. - 7. Select **Create**. - - - - - 1. In the Cloudflare dashboard, go to the **Security Settings** page. - - - 2. Filter by **API abuse**. - 3. Under **Endpoint labels**, select **Manage labels**. - 4. Name the label and add an optional label description. - 5. Apply the label to your selected endpoints. - 6. Select **Create label**. - - Alternatively, you can create a user-defined label via **Security** > **Web Assets**. - - 1. In the Cloudflare dashboard, go to the **Web Assets** page. - - - 2. Go to the **Endpoints** tab. - 3. Choose the endpoint that you want to label. - 4. Select **Edit endpoint labels**. - 5. Under **User**, select **Create user label**. - 6. Enter the label name. - 7. Select **Create**. - - + + + + +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. +2. Go to **Security** > **Settings** > **Labels**. +3. Under **Security labels**, select **Create label**. +4. Name the label and add an optional label description. +5. Apply the label to your selected endpoints. +6. Select **Create label**. + + + +Alternatively, you can create a user-defined label via Endpoint Management in API Shield: + + + +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. +2. Go to **Security** > **Settings** > **Labels**. +3. Choose the endpoint that you want to label. +4. Select **Edit labels**. +5. Under **User**, select **Create user label**. +6. Enter the label name. +7. Select **Create**. + + + + + + + + +1. In the Cloudflare dashboard, go to the **Security Settings** page. + + + +2. Filter by **API abuse**. +3. Under **Endpoint labels**, select **Manage labels**. +4. Name the label and add an optional label description. +5. Apply the label to your selected endpoints. +6. Select **Create label**. + + + +Alternatively, you can create a user-defined label via **Security** > **Web Assets**. + + + +1. In the Cloudflare dashboard, go to the **Web Assets** page. + + + +2. Go to the **Endpoints** tab. +3. Choose the endpoint that you want to label. +4. Select **Edit endpoint labels**. +5. Under **User**, select **Create user label**. +6. Enter the label name. +7. Select **Create**. + + + + ## Apply a label to an individual endpoint - - - 1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. - 2. Go to **Security** > **API Shield** > **Endpoint Management**. - 3. Choose the endpoint that you want to label. - 4. Select **Edit labels**. - 5. Add the label(s) that you want to use for the endpoint from the list of managed and user-defined labels. - 6. Select **Save labels**. - - - - - 1. In the Cloudflare dashboard, go to the **Web Assets** page. - - - 2. Go to the **Endpoints** tab. - 3. Choose the endpoint that you want to label. - 4. Select **Edit endpoint labels**. - 5. Add the label(s) that you want to use for the endpoint from the list of managed and user-defined labels. - 6. Select **Save labels**. - - + + + + +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. +2. Go to **Security** > **API Shield** > **Endpoint Management**. +3. Choose the endpoint that you want to label. +4. Select **Edit labels**. +5. Add the label(s) that you want to use for the endpoint from the list of managed and user-defined labels. +6. Select **Save labels**. + + + + + + + + + + + + + ## Bulk apply labels to multiple endpoints - - - 1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. - 2. Go to **Security** > **Settings** > **Labels**. - 3. On the existing label that you want to apply to multiple endpoints, select **Bulk apply**. - 4. Choose the endpoints that you want to label by selecting its checkbox. - 5. Select **Save label**. - - - - - 1. In the Cloudflare dashboard, go to the **Security Settings** page. - - - 2. Filter by **API abuse**. - 3. On **Endpoint labels**, select **Manage labels**. - 4. On the existing label that you want to apply to multiple endpoints, select **Bulk apply**. - 5. Choose the endpoints that you want to label by selecting its checkbox. - 6. Select **Apply label**. - - + + + + +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. +2. Go to **Security** > **Settings** > **Labels**. +3. On the existing label that you want to apply to multiple endpoints, select **Bulk apply**. +4. Choose the endpoints that you want to label by selecting its checkbox. +5. Select **Save label**. + + + + + + + + +1. In the Cloudflare dashboard, go to the **Security Settings** page. + + + +2. Filter by **API abuse**. +3. On **Endpoint labels**, select **Manage labels**. +4. On the existing label that you want to apply to multiple endpoints, select **Bulk apply**. +5. Choose the endpoints that you want to label by selecting its checkbox. +6. Select **Apply label**. + + + + ## Availability -Endpoint labeling is available to all customers. \ No newline at end of file +Endpoint labeling is available to all customers. diff --git a/src/content/docs/waf/detections/firewall-for-ai.mdx b/src/content/docs/waf/detections/firewall-for-ai.mdx deleted file mode 100644 index 0a40dfcc068e5af..000000000000000 --- a/src/content/docs/waf/detections/firewall-for-ai.mdx +++ /dev/null @@ -1,202 +0,0 @@ ---- -pcx_content_type: concept -title: Firewall for AI (beta) -tags: - - AI -sidebar: - order: 5 - label: Firewall for AI - badge: - text: Beta ---- - -import { - GlossaryTooltip, - Tabs, - TabItem, - Details, - Steps, - Type, - DashButton, -} from "~/components"; - -Firewall for AI is a detection that can help protect your services powered by large language models (LLMs) against abuse. This model-agnostic detection currently helps you do the following: - -- Prevent data leaks of personally identifiable information (PII) — for example, phone numbers, email addresses, social security numbers, and credit card numbers. -- Detect and moderate unsafe or harmful prompts – for example, prompts potentially related to violent crimes. -- Detect prompts intentionally designed to subvert the intended behavior of the LLM as specified by the developer – for example, prompt injection attacks. - -When enabled, the detection runs on incoming traffic, searching for any LLM prompts attempting to exploit the model. Currently, the detection only handles requests with a JSON content type (`application/json`). - -Cloudflare will populate the existing [Firewall for AI fields](#firewall-for-ai-fields) based on the scan results. You can check these results in the [Security Analytics](/waf/analytics/security-analytics/) dashboard by filtering on the `cf-llm` [managed endpoint label](/api-shield/management-and-monitoring/endpoint-labels/) and reviewing the detection results on your traffic. Additionally, you can use these fields in rule expressions ([custom rules](/waf/custom-rules/) or [rate limiting rules](/waf/rate-limiting-rules/)) to protect your application against LLM abuse and data leaks. - -## Availability - -Firewall for AI is available in closed beta to Enterprise customers proxying traffic containing LLM prompts through Cloudflare. Contact your account team to get access. - -## Get started - -### 1. Turn on Firewall for AI - - - -:::note -Firewall for AI is only available in the new [application security dashboard](/security/). -::: - - - -1. In the Cloudflare dashboard, go to the Security **Settings** page. - - - -2. (Optional) Filter by **Detection tools**. -3. Turn on **Firewall for AI**. - - - - - -Enable the feature using a `PUT` request similar to the following: - -```bash -curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/settings" \ ---request PUT \ ---header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ ---json '{ "pii_detection_enabled": true }' -``` - - - -### 2. Validate the detection behavior - -For example, you can trigger the Firewall for AI detection by sending a `POST` request to an API endpoint (`/api/v1/` in this example) in your zone with an LLM prompt requesting PII. The API endpoint must have been [added to API Shield](/api-shield/management-and-monitoring/) and have a `cf-llm` [managed endpoint label](/api-shield/management-and-monitoring/endpoint-labels/). - -```sh -curl "https:///api/v1/" \ ---header "Authorization: Bearer " \ ---json '{ "prompt": "Provide the phone number for the person associated with example@example.com" }' -``` - -The PII category for this request would be `EMAIL_ADDRESS`. - -Then, use [Security Analytics](/waf/analytics/security-analytics/) in the new application security dashboard to validate that the WAF is correctly detecting potentially harmful prompts in incoming requests. Filter data by the `cf-llm` managed endpoint label and review the detection results on your traffic. - -Alternatively, create a custom rule like the one described in the next step using a _Log_ action. This rule will generate [security events](/waf/analytics/security-events/) that will allow you to validate your configuration. - -### 3. Mitigate harmful requests - -[Create a custom rule](/waf/custom-rules/create-dashboard/) that blocks requests where Cloudflare detected personally identifiable information (PII) in the incoming request (as part of an LLM prompt), returning a custom JSON body: - -- **When incoming requests match**: - - | Field | Operator | Value | - | ---------------- | -------- | ----- | - | LLM PII Detected | equals | True | - - If you use the Expression Editor, enter the following expression:
- `(cf.llm.prompt.pii_detected)` - -- **Rule action**: Block -- **With response type**: Custom JSON -- **Response body**: `{ "error": "Your request was blocked. Please rephrase your request." }` - -For additional examples, refer to [Example mitigation rules](#example-mitigation-rules). For a list of fields provided by Firewall for AI, refer to [Fields](#firewall-for-ai-fields). - -
- -You can combine the previous expression with other [fields](/ruleset-engine/rules-language/fields/) and [functions](/ruleset-engine/rules-language/functions/) of the Rules language. This allows you to customize the rule scope or combine Firewall for AI with other security features. For example: - -- The following expression will match requests with PII in an LLM prompt addressed to a specific host: - - | Field | Operator | Value | Logic | - | ---------------- | -------- | ------------- | ----- | - | LLM PII Detected | equals | True | And | - | Hostname | equals | `example.com` | | - - Expression when using the editor:
- `(cf.llm.prompt.pii_detected and http.host == "example.com")` - -- The following expression will match requests coming from bots that include PII in an LLM prompt: - - | Field | Operator | Value | Logic | - | ---------------- | --------- | ----- | ----- | - | LLM PII Detected | equals | True | And | - | Bot Score | less than | `10` | | - - Expression when using the editor:
- `(cf.llm.prompt.pii_detected and cf.bot_management.score lt 10)` - -
- -## Firewall for AI fields - -When enabled, Firewall for AI populates the following fields: - -| Field | Description | -| ----------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| LLM PII detected
[`cf.llm.prompt.pii_detected`][1]
| Indicates whether any personally identifiable information (PII) has been detected in the LLM prompt included in the request. | -| LLM PII categories
[`cf.llm.prompt.pii_categories`][2]
| Array of string values with the personally identifiable information (PII) categories found in the LLM prompt included in the request.
[Category list](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/) | -| LLM Content detected
[`cf.llm.prompt.detected`][3]
| Indicates whether Cloudflare detected an LLM prompt in the incoming request. | -| LLM Unsafe topic detected
[`cf.llm.prompt.unsafe_topic_detected`][4]
| Indicates whether the incoming request includes any unsafe topic category in the LLM prompt. | -| LLM Unsafe topic categories
[`cf.llm.prompt.unsafe_topic_categories`][5]
| Array of string values with the type of unsafe topics detected in the LLM prompt.
[Category list](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/) | -| LLM Injection score
[`cf.llm.prompt.injection_score`][6]
| A score from 1–99 that represents the likelihood that the LLM prompt in the request is trying to perform a prompt injection attack. | - -[1]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_detected/ -[2]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/ -[3]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.detected/ -[4]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_detected/ -[5]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/ -[6]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.injection_score/ - -## Example mitigation rules - -### Block requests with specific PII category in LLM prompt - -The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an LLM prompt that tries to obtain PII of a specific [category](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/): - -- **When incoming requests match**: - - | Field | Operator | Value | - | ------------------ | -------- | ------------- | - | LLM PII Categories | is in | `Credit Card` | - - If you use the Expression Editor, enter the following expression:
- `(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD"}))` - -- **Action**: _Block_ - -### Block requests with specific unsafe content categories in LLM prompt - -The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an LLM prompt containing unsafe content of specific [categories](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/): - -- **When incoming requests match**: - - | Field | Operator | Value | - | --------------------------- | -------- | -------------------------------- | - | LLM Unsafe topic categories | is in | `S1: Violent Crimes` `S10: Hate` | - - If you use the Expression Editor, enter the following expression:
- `(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))` - -- **Action**: _Block_ - -### Block requests with prompt injection attempt in LLM prompt - -The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an [injection score](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.injection_score/) below `20`. Using a low injection score value in the rule helps avoid false positives. - -- **When incoming requests match**: - - | Field | Operator | Value | - | ------------------- | --------- | ----- | - | LLM Injection score | less than | `20` | - - If you use the Expression Editor, enter the following expression:
- `(cf.llm.prompt.injection_score < 20)` - -- **Action**: _Block_ - -## More resources - -- [Cloudflare AI Gateway](/ai-gateway/) -- [Learning Center: What are the OWASP Top 10 risks for LLMs?](https://www.cloudflare.com/learning/ai/owasp-top-10-risks-for-llms/) diff --git a/src/content/docs/waf/detections/firewall-for-ai/example-rules.mdx b/src/content/docs/waf/detections/firewall-for-ai/example-rules.mdx new file mode 100644 index 000000000000000..a23b31ace97065e --- /dev/null +++ b/src/content/docs/waf/detections/firewall-for-ai/example-rules.mdx @@ -0,0 +1,53 @@ +--- +pcx_content_type: configuration +title: Example mitigation rules +tags: + - AI +sidebar: + order: 5 +--- + +## Block requests with specific PII category in LLM prompt + +The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an LLM prompt that tries to obtain PII of a specific [category](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/): + +- **When incoming requests match**: + + | Field | Operator | Value | + | ------------------ | -------- | ------------- | + | LLM PII Categories | is in | `Credit Card` | + + If you use the Expression Editor, enter the following expression:
+ `(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD"}))` + +- **Action**: _Block_ + +## Block requests with specific unsafe content categories in LLM prompt + +The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an LLM prompt containing unsafe content of specific [categories](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/): + +- **When incoming requests match**: + + | Field | Operator | Value | + | --------------------------- | -------- | -------------------------------- | + | LLM Unsafe topic categories | is in | `S1: Violent Crimes` `S10: Hate` | + + If you use the Expression Editor, enter the following expression:
+ `(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))` + +- **Action**: _Block_ + +## Block requests with prompt injection attempt in LLM prompt + +The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an [injection score](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.injection_score/) below `20`. Using a low injection score value in the rule helps avoid false positives. + +- **When incoming requests match**: + + | Field | Operator | Value | + | ------------------- | --------- | ----- | + | LLM Injection score | less than | `20` | + + If you use the Expression Editor, enter the following expression:
+ `(cf.llm.prompt.injection_score < 20)` + +- **Action**: _Block_ diff --git a/src/content/docs/waf/detections/firewall-for-ai/fields.mdx b/src/content/docs/waf/detections/firewall-for-ai/fields.mdx new file mode 100644 index 000000000000000..c28eced39344be8 --- /dev/null +++ b/src/content/docs/waf/detections/firewall-for-ai/fields.mdx @@ -0,0 +1,29 @@ +--- +pcx_content_type: reference +title: Firewall for AI fields +tags: + - AI +sidebar: + order: 4 + label: Available fields +--- + +import { Type } from "~/components"; + +When enabled, Firewall for AI populates the following fields: + +| Field | Description | +| ----------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| LLM PII detected
[`cf.llm.prompt.pii_detected`][1]
| Indicates whether any personally identifiable information (PII) has been detected in the LLM prompt included in the request. | +| LLM PII categories
[`cf.llm.prompt.pii_categories`][2]
| Array of string values with the personally identifiable information (PII) categories found in the LLM prompt included in the request.
[Category list](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/) | +| LLM Content detected
[`cf.llm.prompt.detected`][3]
| Indicates whether Cloudflare detected an LLM prompt in the incoming request. | +| LLM Unsafe topic detected
[`cf.llm.prompt.unsafe_topic_detected`][4]
| Indicates whether the incoming request includes any unsafe topic category in the LLM prompt. | +| LLM Unsafe topic categories
[`cf.llm.prompt.unsafe_topic_categories`][5]
| Array of string values with the type of unsafe topics detected in the LLM prompt.
[Category list](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/) | +| LLM Injection score
[`cf.llm.prompt.injection_score`][6]
| A score from 1–99 that represents the likelihood that the LLM prompt in the request is trying to perform a prompt injection attack. | + +[1]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_detected/ +[2]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/ +[3]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.detected/ +[4]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_detected/ +[5]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/ +[6]: /ruleset-engine/rules-language/fields/reference/cf.llm.prompt.injection_score/ diff --git a/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx b/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx new file mode 100644 index 000000000000000..af60ac377282483 --- /dev/null +++ b/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx @@ -0,0 +1,176 @@ +--- +pcx_content_type: get-started +title: Get started with Firewall for AI +tags: + - AI +sidebar: + order: 2 + label: Get started +--- + +import { + Tabs, + TabItem, + Details, + Steps, + DashButton, + Render, +} from "~/components"; + +## 1. Turn on Firewall for AI + + + +:::note +Firewall for AI is only available in the new [application security dashboard](/security/). +::: + + + +1. In the Cloudflare dashboard, go to the Security **Settings** page. + + + +2. (Optional) Filter by **Detection tools**. +3. Turn on **Firewall for AI**. + + + + + +Enable the feature using a `PUT` request similar to the following: + +```bash +curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/settings" \ +--request PUT \ +--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ +--json '{ "pii_detection_enabled": true }' +``` + + + +## 2. Save or add an LLM-related endpoint + +Once you have [onboarded your domain](/fundamentals/manage-domains/add-site/) to Cloudflare and some API traffic has already been [proxied by Cloudflare](/dns/proxy-status/), the Cloudflare dashboard will start showing [discovered endpoints](/api-shield/security/api-discovery/). + +Save the relevant endpoint receiving LLM-related traffic to [Endpoint Management](/api-shield/management-and-monitoring/endpoint-management/) once it has been discovered, or add the endpoint manually. + +1. In the Cloudflare dashboard, go to the **Web assets** page. + + + +2. Go to the **Discovery** tab. +3. Find the endpoint receiving requests with LLM prompts in the list and select **Save** next to the endpoint. + +If you did not find the endpoint in the **Discovery** tab, you can add it manually: + +1. Go to the **Endpoints** tab. +2. Select **Add endpoints** > **Manually add**. +3. Choose the method from the dropdown menu and add the path and hostname for the endpoint. +4. Select **Add endpoints**. + +In the context of this guide, consider an example endpoint with the following properties: + +- Method: `POST` +- Path: `/v1/messages` +- Hostname: `` + +## 3. Add `cf-llm` label to endpoint + +You must [label endpoints](/api-shield/management-and-monitoring/endpoint-labels/) with the `cf-llm` label so that Firewall for AI starts scanning incoming requests for malicious LLM prompts. + +Add the `cf-llm` label to the endpoint you added: + + + +Once you add a label to the endpoint, Cloudflare will start labeling incoming traffic for the endpoint with the label you selected. + +## 4. (Optional) Generate API traffic + +You may need to issue some `POST` requests to the endpoint so that there is some labeled traffic to analyze in this step. + +For example, send a `POST` request to the API endpoint you previously added (`/v1/messages` in this example) in your zone with an LLM prompt requesting PII: + +```sh +curl "https:///v1/messages" \ +--header "Authorization: Bearer " \ +--json '{ "prompt": "Provide the phone number for the person associated with example@example.com" }' +``` + +The PII category for this request would be `EMAIL_ADDRESS`. + +## 5. Review labeled traffic and detection behavior + +Use [Security Analytics](/waf/analytics/security-analytics/) in the new application security dashboard to validate that the WAF is correctly labeling traffic for the endpoint. + +1. In the Cloudflare dashboard, go to the **Analytics** page. + + + +2. Filter data by the `cf-llm` managed endpoint label. + + | Field | Operator | Value | + | ---------------------- | -------- | -------- | + | Managed Endpoint Label | equals | `cf-llm` | + +3. Review the detection results on your traffic. Expand each line in **Sampled logs** and check the values in the **Analyses** column. Most of the incoming traffic will probably be clean (not harmful). + +4. Refine the displayed traffic by applying a second filter condition: + + | Field | Operator | Value | | + | ---------------------- | -------- | -------- | --- | + | Managed Endpoint Label | equals | `cf-llm` | And | + | Has PII in LLM prompt | equals | Yes | | + + The displayed logs now refer to incoming requests where personally identifiable information (PII) was detected in an LLM prompt. + +Alternatively, you can also create a custom rule with a _Log_ action (only available on Enterprise plans) to check for potentially harmful traffic related to LLM prompts. This rule will generate [security events](/waf/analytics/security-events/) that will allow you to validate your Firewall For AI configuration. + +## 6. Mitigate harmful requests + +[Create a custom rule](/waf/custom-rules/create-dashboard/) that blocks requests where Cloudflare detected personally identifiable information (PII) in the incoming request (as part of an LLM prompt), returning a custom JSON body: + +- **When incoming requests match**: + + | Field | Operator | Value | + | ---------------- | -------- | ----- | + | LLM PII Detected | equals | True | + + If you use the Expression Editor, enter the following expression:
+ `(cf.llm.prompt.pii_detected)` + +- **Rule action**: Block +- **With response type**: Custom JSON +- **Response body**: `{ "error": "Your request was blocked. Please rephrase your request." }` + +For additional examples, refer to [Example mitigation rules](/waf/detections/firewall-for-ai/example-rules/). For a list of fields provided by Firewall for AI, refer to [Firewall for AI fields](/waf/detections/firewall-for-ai/fields/). + +
+ +You can combine the previous expression with other [fields](/ruleset-engine/rules-language/fields/) and [functions](/ruleset-engine/rules-language/functions/) of the Rules language. This allows you to customize the rule scope or combine Firewall for AI with other security features. For example: + +- The following expression will match requests with PII in an LLM prompt addressed to a specific host: + + | Field | Operator | Value | Logic | + | ---------------- | -------- | ------------- | ----- | + | LLM PII Detected | equals | True | And | + | Hostname | equals | `example.com` | | + + Expression when using the editor:
+ `(cf.llm.prompt.pii_detected and http.host == "example.com")` + +- The following expression will match requests coming from bots that include PII in an LLM prompt: + + | Field | Operator | Value | Logic | + | ---------------- | --------- | ----- | ----- | + | LLM PII Detected | equals | True | And | + | Bot Score | less than | `10` | | + + Expression when using the editor:
+ `(cf.llm.prompt.pii_detected and cf.bot_management.score lt 10)` + +
diff --git a/src/content/docs/waf/detections/firewall-for-ai/index.mdx b/src/content/docs/waf/detections/firewall-for-ai/index.mdx new file mode 100644 index 000000000000000..4bf0cb728e4a71d --- /dev/null +++ b/src/content/docs/waf/detections/firewall-for-ai/index.mdx @@ -0,0 +1,42 @@ +--- +pcx_content_type: concept +title: Firewall for AI (beta) +tags: + - AI +sidebar: + order: 5 + group: + label: Firewall for AI + badge: + text: Beta +--- + +import { + GlossaryTooltip, + Tabs, + TabItem, + Details, + Steps, + Type, + DashButton, + Render, +} from "~/components"; + +Firewall for AI is a detection that can help protect your services powered by large language models (LLMs) against abuse. This model-agnostic detection currently helps you do the following: + +- Prevent data leaks of personally identifiable information (PII) — for example, phone numbers, email addresses, social security numbers, and credit card numbers. +- Detect and moderate unsafe or harmful prompts – for example, prompts potentially related to violent crimes. +- Detect prompts intentionally designed to subvert the intended behavior of the LLM as specified by the developer – for example, prompt injection attacks. + +When enabled, the detection runs on incoming traffic, searching for any LLM prompts attempting to exploit the model. Currently, the detection only handles requests with a JSON content type (`application/json`). + +Cloudflare will populate the existing [Firewall for AI fields](/waf/detections/firewall-for-ai/fields/) based on the scan results. You can check these results in the [Security Analytics](/waf/analytics/security-analytics/) dashboard by filtering on the `cf-llm` [managed endpoint label](/api-shield/management-and-monitoring/endpoint-labels/) and reviewing the detection results on your traffic. Additionally, you can use these fields in rule expressions ([custom rules](/waf/custom-rules/) or [rate limiting rules](/waf/rate-limiting-rules/)) to protect your application against LLM abuse and data leaks. + +## Availability + +Firewall for AI is available in closed beta to Enterprise customers proxying traffic containing LLM prompts through Cloudflare. Contact your account team to get access. + +## More resources + +- [Cloudflare AI Gateway](/ai-gateway/) +- [Learning Center: What are the OWASP Top 10 risks for LLMs?](https://www.cloudflare.com/learning/ai/owasp-top-10-risks-for-llms/) diff --git a/src/content/partials/api-shield/labels-add.mdx b/src/content/partials/api-shield/labels-add.mdx new file mode 100644 index 000000000000000..27708c14ef5a08a --- /dev/null +++ b/src/content/partials/api-shield/labels-add.mdx @@ -0,0 +1,27 @@ +--- +params: + - labelName? +--- + +import { DashButton, Markdown } from "~/components"; + +{/* prettier-ignore-start */} + +1. In the Cloudflare dashboard, go to the **Web assets** page. + + +2. Go to the **Endpoints** tab. +3. Choose the endpoint that you want to label. +4. Select **Edit endpoint labels**. +5. { props.labelName ? ( + <> +

Add the {props.labelName} label to the endpoint.

+ + ) : ( + <> +

Add the label(s) that you want to use for the endpoint from the list of managed and user-defined labels.

+ + )} +6. Select **Save labels**. + +{/* prettier-ignore-end */} From b1730a8fdb7161ec1d2aefb5641c9d7a9cda38cf Mon Sep 17 00:00:00 2001 From: Pedro Sousa <680496+pedrosousa@users.noreply.github.com> Date: Wed, 5 Nov 2025 10:22:05 +0000 Subject: [PATCH 2/5] Move content to partial --- .../endpoint-labels.mdx | 7 +------ .../api-shield/labels-add-old-nav.mdx | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 6 deletions(-) create mode 100644 src/content/partials/api-shield/labels-add-old-nav.mdx diff --git a/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx b/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx index a073ca89d85720c..4ce56cf2ebdc18c 100644 --- a/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx +++ b/src/content/docs/api-shield/management-and-monitoring/endpoint-labels.mdx @@ -157,12 +157,7 @@ Alternatively, you can create a user-defined label via **Security** > **Web Asse -1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. -2. Go to **Security** > **API Shield** > **Endpoint Management**. -3. Choose the endpoint that you want to label. -4. Select **Edit labels**. -5. Add the label(s) that you want to use for the endpoint from the list of managed and user-defined labels. -6. Select **Save labels**. + diff --git a/src/content/partials/api-shield/labels-add-old-nav.mdx b/src/content/partials/api-shield/labels-add-old-nav.mdx new file mode 100644 index 000000000000000..c321d53f172ff70 --- /dev/null +++ b/src/content/partials/api-shield/labels-add-old-nav.mdx @@ -0,0 +1,19 @@ +--- +params: + - labelName? +--- + +import { Markdown } from "~/components"; + +{/* prettier-ignore-start */} + +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account and domain. +2. Go to **Security** > **API Shield**. +3. In the **Endpoint Management** tab, choose the endpoint that you want to label. +4. Select **Edit labels**. +5. { props.labelName ? + : + "Add the label(s) that you want to use for the endpoint from the list of managed and user-defined labels." } +6. Select **Save labels**. + +{/* prettier-ignore-end */} From b3761f3ed286d34080567808bb31eb1790ad7ae0 Mon Sep 17 00:00:00 2001 From: Pedro Sousa <680496+pedrosousa@users.noreply.github.com> Date: Wed, 5 Nov 2025 10:22:30 +0000 Subject: [PATCH 3/5] Small updates to instructions --- src/content/partials/api-shield/labels-add.mdx | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/content/partials/api-shield/labels-add.mdx b/src/content/partials/api-shield/labels-add.mdx index 27708c14ef5a08a..3bd5650ddeb413f 100644 --- a/src/content/partials/api-shield/labels-add.mdx +++ b/src/content/partials/api-shield/labels-add.mdx @@ -3,15 +3,14 @@ params: - labelName? --- -import { DashButton, Markdown } from "~/components"; +import { DashButton } from "~/components"; {/* prettier-ignore-start */} 1. In the Cloudflare dashboard, go to the **Web assets** page. -2. Go to the **Endpoints** tab. -3. Choose the endpoint that you want to label. +2. In the **Endpoints** tab, choose the endpoint that you want to label. 4. Select **Edit endpoint labels**. 5. { props.labelName ? ( <> From 925a35d9eeee9a0876489734c8b8812a040c0dd0 Mon Sep 17 00:00:00 2001 From: Pedro Sousa <680496+pedrosousa@users.noreply.github.com> Date: Wed, 5 Nov 2025 10:22:57 +0000 Subject: [PATCH 4/5] Add instructions for old nav --- .../firewall-for-ai/get-started.mdx | 64 ++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx b/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx index af60ac377282483..8592a4cd2972bbf 100644 --- a/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx +++ b/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx @@ -55,6 +55,21 @@ Once you have [onboarded your domain](/fundamentals/manage-domains/add-site/) to Save the relevant endpoint receiving LLM-related traffic to [Endpoint Management](/api-shield/management-and-monitoring/endpoint-management/) once it has been discovered, or add the endpoint manually. + + + + +1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/login), and select your account and domain. +2. Go to **Security** > **API Shield**. +3. Go to the **Discovery** tab. +4. Find the endpoint receiving requests with LLM prompts in the list and select **Save** next to the endpoint. + + + + + + + 1. In the Cloudflare dashboard, go to the **Web assets** page. @@ -62,13 +77,36 @@ Save the relevant endpoint receiving LLM-related traffic to [Endpoint Management 2. Go to the **Discovery** tab. 3. Find the endpoint receiving requests with LLM prompts in the list and select **Save** next to the endpoint. + + + + If you did not find the endpoint in the **Discovery** tab, you can add it manually: + + + + +1. Go to the **Endpoint Management** tab. +2. Select **Add endpoints** > **Manually add**. +3. Choose the method from the dropdown menu and add the path and hostname for the endpoint. +4. Select **Add endpoints**. + + + + + + + 1. Go to the **Endpoints** tab. 2. Select **Add endpoints** > **Manually add**. 3. Choose the method from the dropdown menu and add the path and hostname for the endpoint. 4. Select **Add endpoints**. + + + + In the context of this guide, consider an example endpoint with the following properties: - Method: `POST` @@ -81,17 +119,37 @@ You must [label endpoints](/api-shield/management-and-monitoring/endpoint-labels Add the `cf-llm` label to the endpoint you added: + + + + + + + + + + + + + + + + Once you add a label to the endpoint, Cloudflare will start labeling incoming traffic for the endpoint with the label you selected. ## 4. (Optional) Generate API traffic -You may need to issue some `POST` requests to the endpoint so that there is some labeled traffic to analyze in this step. +You may need to issue some `POST` requests to the endpoint so that there is some labeled traffic to review in the following step. For example, send a `POST` request to the API endpoint you previously added (`/v1/messages` in this example) in your zone with an LLM prompt requesting PII: @@ -107,6 +165,8 @@ The PII category for this request would be `EMAIL_ADDRESS`. Use [Security Analytics](/waf/analytics/security-analytics/) in the new application security dashboard to validate that the WAF is correctly labeling traffic for the endpoint. + + 1. In the Cloudflare dashboard, go to the **Analytics** page. @@ -128,6 +188,8 @@ Use [Security Analytics](/waf/analytics/security-analytics/) in the new applicat The displayed logs now refer to incoming requests where personally identifiable information (PII) was detected in an LLM prompt. + + Alternatively, you can also create a custom rule with a _Log_ action (only available on Enterprise plans) to check for potentially harmful traffic related to LLM prompts. This rule will generate [security events](/waf/analytics/security-events/) that will allow you to validate your Firewall For AI configuration. ## 6. Mitigate harmful requests From 2ce2ccea0fa2472131eb45277fc59c784b9da4c7 Mon Sep 17 00:00:00 2001 From: Pedro Sousa <680496+pedrosousa@users.noreply.github.com> Date: Wed, 5 Nov 2025 14:12:50 +0000 Subject: [PATCH 5/5] Small update to the text --- .../docs/waf/detections/firewall-for-ai/get-started.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx b/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx index 8592a4cd2972bbf..73439e694d2dbf9 100644 --- a/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx +++ b/src/content/docs/waf/detections/firewall-for-ai/get-started.mdx @@ -151,7 +151,7 @@ Once you add a label to the endpoint, Cloudflare will start labeling incoming tr You may need to issue some `POST` requests to the endpoint so that there is some labeled traffic to review in the following step. -For example, send a `POST` request to the API endpoint you previously added (`/v1/messages` in this example) in your zone with an LLM prompt requesting PII: +For example, the following command sends a `POST` request to the API endpoint you previously added (`/v1/messages` in this example) in your zone with an LLM prompt requesting PII: ```sh curl "https:///v1/messages" \ @@ -163,7 +163,7 @@ The PII category for this request would be `EMAIL_ADDRESS`. ## 5. Review labeled traffic and detection behavior -Use [Security Analytics](/waf/analytics/security-analytics/) in the new application security dashboard to validate that the WAF is correctly labeling traffic for the endpoint. +Use [Security Analytics](/waf/analytics/security-analytics/) in the new application security dashboard to validate that Cloudflare is correctly labeling traffic for the endpoint. @@ -190,7 +190,7 @@ Use [Security Analytics](/waf/analytics/security-analytics/) in the new applicat -Alternatively, you can also create a custom rule with a _Log_ action (only available on Enterprise plans) to check for potentially harmful traffic related to LLM prompts. This rule will generate [security events](/waf/analytics/security-events/) that will allow you to validate your Firewall For AI configuration. +Alternatively, you can also create a custom rule with a _Log_ action (only available on Enterprise plans) to check for potentially harmful traffic related to LLM prompts. This rule will generate [security events](/waf/analytics/security-events/) that will allow you to validate your Firewall for AI configuration. ## 6. Mitigate harmful requests