You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/docs/waf/detections/firewall-for-ai.mdx
+19-2Lines changed: 19 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ Firewall for AI is a detection that can help protect your services powered by la
16
16
17
17
- Prevent data leaks of personally identifiable information (PII) — for example, phone numbers, email addresses, social security numbers, and credit card numbers.
18
18
- Detect and moderate unsafe or harmful prompts – for example, prompts potentially related to violent crimes.
19
+
- Detect prompts intentionally designed to subvert the intended behavior of the LLM as specified by the developer – for example, prompt injection attacks and jailbreaking attempts.
19
20
20
21
When enabled, the detection runs on incoming traffic, searching for any LLM prompts attempting to exploit the model.
21
22
@@ -124,14 +125,15 @@ When enabled, Firewall for AI populates the following fields:
For a list of PII categories, refer to the [`cf.llm.prompt.pii_categories`](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/) field reference.
129
131
130
132
For a list of unsafe topic categories, refer to the [`cf.llm.prompt.unsafe_topic_categories`](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/) field reference.
131
133
132
134
## Example mitigation rules
133
135
134
-
### Block requests with specific PII category in prompt
136
+
### Block requests with specific PII category in LLM prompt
135
137
136
138
The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an LLM prompt that tries to obtain PII of a specific [category](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.pii_categories/):
137
139
@@ -146,7 +148,7 @@ The following example [custom rule](/waf/custom-rules/create-dashboard/) will bl
146
148
147
149
-**Action**: _Block_
148
150
149
-
### Block requests with specific unsafe content categories in prompt
151
+
### Block requests with specific unsafe content categories in LLM prompt
150
152
151
153
The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an LLM prompt containing unsafe content of specific [categories](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/):
152
154
@@ -160,3 +162,18 @@ The following example [custom rule](/waf/custom-rules/create-dashboard/) will bl
160
162
`(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))`
161
163
162
164
-**Action**: _Block_
165
+
166
+
### Block requests with prompt injection attempt in LLM prompt
167
+
168
+
The following example [custom rule](/waf/custom-rules/create-dashboard/) will block requests with an [injection score](/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.injection_score/) below `20`. Using a low injection score value in the rule helps avoid false positives.
169
+
170
+
-**If incoming requests match**:
171
+
172
+
| Field | Operator | Value |
173
+
| ------------------- | --------- | ----- |
174
+
| LLM Injection score | less than |`20`|
175
+
176
+
If you use the Expression Editor, enter the following expression:<br />
summary: A score from 1–99 that represents the likelihood that the LLM prompt in the request is trying to perform a prompt injection attack.
1251
+
1252
+
description: |-
1253
+
A low score (for example, below `20`) indicates that there is a high probability that the LLM prompt in the request is trying to perform a prompt injection attack.
1254
+
1255
+
The special score `100` indicates that Cloudflare did not score the request.
1256
+
1257
+
Requires a Cloudflare Enterprise plan. You must also enable [Firewall for AI](/waf/detections/firewall-for-ai/).
0 commit comments