You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/attack_descriptions.md
+22-10Lines changed: 22 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ In code name: `suffix`
13
13
14
14
Uses a specially crafted suffix to bypass LLM filters and restrictions, forcing the model to generate prohibited or harmful content despite the original query. Evaluates the model's vulnerability to input data manipulation.
15
15
16
-
_Original Paper: <https://arxiv.org/abs/2307.15043>, Original Code: <https://github.com/llm-attacks/llm-attacks>_
Challenges the LLM with a dataset of harmful behavior prompts in past tense form.
134
+
Exploits attention blink phenomena by hiding adversarial questions between innocent questions in low-resource languages. The attack leverages the concept that LLMs may have reduced attention when processing mixed low-resource languages.
135
135
136
-
<ahref="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/past_tense.py">View code on GitHub</a>
@@ -163,7 +175,7 @@ Probes an LLM's adherence to content restrictions when explicitly requested to i
163
175
<details>
164
176
<summary><strong>What Drives the Multi-stage?</strong></summary>
165
177
166
-
Multi-stage attacks are inspired by the [Jailbreaking Black Box Large Language Models in Twenty Queries (PAIR)](https://arxiv.org/html/2310.08419) paper.
178
+
Multi-stage attacks are inspired by the [Jailbreaking Black Box Large Language Models in Twenty Queries (PAIR)](https://arxiv.org/abs/2310.08419) paper.
167
179
168
180
For managing a multi-stage interaction between an attacker and tested chat clients, the `MultiStageInteractionSession` class is available [[source]](https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/client/chat_client.py). It contains the following properties:
169
181
*`attacker_session` is the session for the attacker.
@@ -185,7 +197,7 @@ In code name: `autodan_turbo`
185
197
186
198
Implements the AutoDAN-Turbo attack methodology which uses a lifelong agent for strategy self-exploration to jailbreak LLMs. This attack automatically discovers jailbreak strategies without human intervention and combines them for more effective attacks.
187
199
188
-
_Original Paper: <https://arxiv.org/abs/2410.05295v3>, Original Code: <https://github.com/SaFoLab-WISC/AutoDAN-Turbo>_
0 commit comments