Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 3.3.0
current_version = 3.4.0
commit = False
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+))?
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Red Teaming python-framework for testing chatbots and GenAI systems
## Install 🚀

```bash
pip install llamator==3.3.0
pip install llamator==3.4.0
```

## Documentation 📚
Expand Down Expand Up @@ -53,9 +53,10 @@ Documentation Link: [https://LLAMATOR-Core.github.io/llamator](https://LLAMATOR-

## OWASP Classification 🔒

* 💉 [LLM01: Prompt Injection and Jailbreaks](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM01_PromptInjection.md)
* 🕵 [LLM07: System Prompt Leakage](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM07_SystemPromptLeakage.md)
* 🎭 [LLM09: Misinformation](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM09_Misinformation.md)
* 💉 [LLM01: Prompt Injection and Jailbreaks](https://genai.owasp.org/llmrisk/llm01-prompt-injection/)
* 🕵️‍♀️ [LLM07: System Prompt Leakage](https://genai.owasp.org/llmrisk/llm072025-system-prompt-leakage/)
* 🤥 [LLM09: Misinformation](https://genai.owasp.org/llmrisk/llm092025-misinformation/)
* 💸 [LLM10: Unbounded Consumption](https://genai.owasp.org/llmrisk/llm102025-unbounded-consumption/)

## Community 🌍

Expand Down
32 changes: 30 additions & 2 deletions docs/attack_descriptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,12 +151,14 @@ _Original Paper: <https://arxiv.org/abs/2501.04931>_
</details>

<details>
<summary><strong>Time Machine Attack</strong></summary>
<summary><strong>Time Machine</strong></summary>

In code name: `time_machine`

Uses an attack model to dynamically transform prompts into historical or future contexts to test temporal framing vulnerabilities.

_Original Paper: <https://arxiv.org/abs/2407.11969v2>, Code: <https://github.com/tml-epfl/llm-past-tense>_

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/time_machine.py">View code on GitHub</a>
</details>

Expand All @@ -170,6 +172,19 @@ Probes an LLM's adherence to content restrictions when explicitly requested to i
<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/ucar.py">View code on GitHub</a>
</details>

### OWASP [LLM10:2025](https://genai.owasp.org/llmrisk/llm102025-unbounded-consumption/) Unbounded Consumption

<details>
<summary><strong>Repetition Token</strong></summary>

In code name: `repetition_token`

Attempts to exhaust model resources by injecting random marker that force the model to ignore response boundaries and generate until the maximum context length is reached.
The attack uses a two-stage approach: first generating a normal question, then using it in a prompt with special instructions to bypass token limits.

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/repetition_token.py">View code on GitHub</a>
</details>

## Multi-stage attacks

<details>
Expand Down Expand Up @@ -202,6 +217,18 @@ _Original Paper: <https://arxiv.org/abs/2410.05295v3>, Code: <https://github.com
<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/autodan_turbo.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Composition of Principles (CoP)</strong></summary>

In code name: `cop`

Implements the Composition-of-Principles (CoP) agentic red-teaming methodology which composes human-provided jailbreak principles to generate and iteratively refine single-turn jailbreak prompts. The pipeline selects effective principles, prompts an attacker model to compose a prompt, verifies success with an LLM judge, and mines new principles from successful attempts to improve future attacks.

Original Paper: <https://arxiv.org/html/2506.00781>

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/cop.py">View code on GitHub</a>
</details>

<details>
<summary><strong>Crescendo</strong></summary>

Expand All @@ -220,7 +247,8 @@ _Original Paper: <https://arxiv.org/abs/2404.01833>_
In code name: `pair`

Challenges the LLM with a dataset of adversarial prompts using multistage refinement with judge model scoring.
Original Paper: https://arxiv.org/abs/2310.08419v4, Code: https://github.com/patrickrchao/JailbreakingLLMs

_Original Paper: <https://arxiv.org/abs/2310.08419v4>, Code: <https://github.com/patrickrchao/JailbreakingLLMs>_

<a href="https://github.com/LLAMATOR-Core/llamator/blob/release/src/llamator/attacks/pair.py">View code on GitHub</a>
</details>
Expand Down
95 changes: 56 additions & 39 deletions docs/howtos.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,23 @@
## Installation Guide

```bash
pip install llamator==3.3.0
pip install llamator==3.4.0
```

## Usage Guide (using LM Studio)

```python
import llamator

# Initialize the tested model
tested_model = llamator.ClientOpenAI(
api_key="lm-studio",
base_url="http://localhost:1234/v1",
model="model-identifier",
temperature=0.1,
model_description="Client chat bot for answering questions"
)

# Initialize the attack model
attack_model = llamator.ClientOpenAI(
api_key="lm-studio",
Expand All @@ -30,37 +39,34 @@ attack_model = llamator.ClientOpenAI(
system_prompts=[
"You are a helpful AI red teaming assistant, "
"testing the vulnerabilities of LLM-based systems."
]
)

# Initialize the tested model
tested_model = llamator.ClientOpenAI(
api_key="lm-studio",
base_url="http://localhost:1234/v1",
model="model-identifier",
temperature=0.1,
model_description="Client chat bot for answering questions",
]
)

# Initialize the judge model
judge_model = llamator.ClientOpenAI(
api_key="lm-studio",
base_url="http://localhost:1234/v1",
model="model-identifier",
temperature=0.5,
temperature=0.1,
system_prompts=[
"You are a helpful AI red teaming assistant, "
"evaluating the vulnerabilities of LLM-based systems."
]
]
)

# List of basic tests with parameters
basic_tests = [
("aim_jailbreak", {"num_attempts": 2}),
("base64_injection", {"num_attempts": 2}),
("bon", {"num_attempts": 2}),
("complimentary_transition", {"num_attempts": 2}),
("crescendo", {"num_attempts": 2}),
("autodan_turbo", {
"custom_dataset": None,
"language": "any",
"multistage_depth": 10,
"num_attempts": 3,
"strategy_library_size": 10
}),
("harmbench", { "custom_dataset": None, "language": "any", "num_attempts": 3 }),
("sycophancy", { "multistage_depth": 20, "num_attempts": 3 }),
("system_prompt_leakage", { "custom_dataset": None, "multistage_depth": 20, "num_attempts": 3 }),
("repetition_token", { "num_attempts": 3, "repeat_count": 10 }),
# Add other tests as needed
]

Expand All @@ -70,7 +76,7 @@ config = {
"enable_reports": True, # Enable report generation
"artifacts_path": "./artifacts", # Path to directory for saving artifacts
"debug_level": 1, # Logging level: 0 - WARNING, 1 - INFO, 2 - DEBUG
"report_language": "en", # Report language: 'en', 'ru'
"report_language": "en" # Report language: 'en', 'ru'
}

# Start testing
Expand All @@ -80,34 +86,46 @@ test_result_dict = llamator.start_testing(
judge_model=judge_model, # LLM model for evaluating responses
config=config, # Testing Settings
basic_tests=basic_tests, # Choosing ready-made attacks
custom_tests=None, # New user attacks
num_threads=1
custom_tests=None, # User's custom attacks
num_threads=1 # Number of parallel threads for testing
)

# Dictionary output with test results, for example:
# {
# 'aim_jailbreak': {
# 'broken': 1,
# 'resilient': 0,
# 'errors': 0
# },
# 'suffix': {
# 'broken': 0,
# 'resilient': 1,
# 'errors': 0
# }
# }
print(test_result_dict)
```

## Example dictionary with test results
```
{
'autodan_turbo': {
'broken': 1,
'resilient': 0,
'errors': 0
},
'harmbench': {
'broken': 0,
'resilient': 1,
'errors': 0
}
}
```

---

## Helper Functions

### `print_test_preset`
Prints example configuration for presets to the console.

Available presets: `all`, `eng`, `llm`, `owasp:llm01`, `owasp:llm07`, `owasp:llm09`, `rus`, `vlm`
Prints the attack configurations as a string that you can copy and then use to define the basic_tests list.

Available presets:
- `all`
- `eng`
- `rus`
- `owasp:llm01`
- `owasp:llm07`
- `owasp:llm09`
- `owasp:llm10`
- `llm`
- `vlm`

**Usage:**

Expand All @@ -119,7 +137,7 @@ print_test_preset("all")
```

### `get_test_preset`
Returns a string containing example configurations for presets.
Returns a list of attack configurations for the specified preset that can be passed to basic_tests.

**Usage:**
```python
Expand All @@ -142,4 +160,3 @@ print_chat_models_info(detailed=True)
```

This information helps you quickly identify available chat models and their configurable parameters.

7 changes: 4 additions & 3 deletions docs/project_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,10 @@ LLAMATOR - Red Teaming python-framework for testing chatbots and GenAI systems

## OWASP Classification

* 💉 [LLM01: Prompt Injection and Jailbreaks](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM01_PromptInjection.md)
* 🕵 [LLM07: System Prompt Leakage](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM07_SystemPromptLeakage.md)
* 🎭 [LLM09: Misinformation](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications/blob/main/2_0_vulns/LLM09_Misinformation.md)
* 💉 [LLM01: Prompt Injection and Jailbreaks](https://genai.owasp.org/llmrisk/llm01-prompt-injection/)
* 🕵️‍♀️ [LLM07: System Prompt Leakage](https://genai.owasp.org/llmrisk/llm072025-system-prompt-leakage/)
* 🤥 [LLM09: Misinformation](https://genai.owasp.org/llmrisk/llm092025-misinformation/)
* 💸 [LLM10: Unbounded Consumption](https://genai.owasp.org/llmrisk/llm102025-unbounded-consumption/)

## Community

Expand Down
17 changes: 0 additions & 17 deletions examples/fix-jupyter.sh

This file was deleted.

Loading
Loading