diff --git a/README.md b/README.md
index 722cda2..cfa6c7a 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# Code for the "AIRTBench" AI Red Teaming Agent
+# AIRTBench: Autonomous AI Red Teaming Agent Code
@@ -19,6 +19,12 @@
[](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/renovate.yaml)
[](https://opensource.org/licenses/Apache-2.0)
[](https://github.com/dreadnode/AIRTBench-Code/releases)
+
+[](https://arxiv.org/abs/TODO)
+[](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)
+[](https://dreadnode.io/blog/ai-red-team-benchmark)
+[](https://docs.dreadnode.io/strikes/how-to/airtbench-agent)
+
[](https://github.com/dreadnode/AIRTBench-Code/stargazers)
[](https://github.com/dreadnode/AIRTBench-Code/pulls)
@@ -27,18 +33,32 @@
---
-This repository contains the code for the AIRTBench AI red teaming agent. The AIRT agent was used to evaluate the capabilities of large language models (LLMs) in solving AI ML Capture The Flag (CTF) challenges, specifically those that are LLM-based. The agent is designed to autonomously exploit LLMs by solving challenges on the Dreadnode Strikes platform.
+This repository contains the implementation of the AIRTBench autonomous AI red teaming agent, complementing our research paper [AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models](https://arxiv.org/abs/TODO) and accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
-The paper is available on [arXiV](TODO) and [ACL Anthology](TODO).
+The AIRTBench agent is designed to evaluate the autonomous red teaming capabilities of large language models (LLMs) through AI/ML Capture The Flag (CTF) challenges. Our agent systematically exploits LLM-based targets by solving challenges on the Dreadnode Strikes platform, providing a standardized benchmark for measuring adversarial AI capabilities.
-- [Code for the "AIRTBench" AI Red Teaming Agent](#code-for-the-airtbench-ai-red-teaming-agent)
+- [AIRTBench: Autonomous AI Red Teaming Agent Code](#airtbench-autonomous-ai-red-teaming-agent-code)
+ - [Agent Harness Construction](#agent-harness-construction)
- [Setup](#setup)
- [Documentation](#documentation)
- [Run the Evaluation](#run-the-evaluation)
- [Basic Usage](#basic-usage)
- [Challenge Filtering](#challenge-filtering)
+ - [Resources](#resources)
+ - [Dataset](#dataset)
+ - [Citation](#citation)
- [Model requests](#model-requests)
+## Agent Harness Construction
+
+The AIRTBench harness follows a modular architecture designed for extensibility and evaluation:
+
+
+

+
+
Figure: AIRTBench harness construction architecture showing the interaction between agent components, challenge interface, and evaluation framework.
+
+
## Setup
You can setup the virtual environment with uv:
@@ -55,8 +75,7 @@ Technical documentation for the AIRTBench agent is available in the [Dreadnode S
In order to run the code, you will need access to the Dreadnode strikes platform, see the [docs](https://docs.Dreadnode.io/strikes/overview) or submit for the Strikes waitlist [here](https://platform.dreadnode.io/waitlist/strikes).
-This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile). This example-agent is also a compliment to our research paper [AIRTBench: Can Language Models Autonomously Exploit
-Language Models?](https://arxiv.org/abs/TODO). # TODO: Add link to paper once published.
+This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile).
```bash
uv run -m airtbench --help
@@ -88,6 +107,29 @@ as needed to ensure they are network-isolated from each other. The process is ge
Check out [the challenge manifest](./airtbench/challenges/.challenges.yaml) to see current challenges in scope.
+## Resources
+
+- [π Paper on arXiv](https://arxiv.org/abs/TODO)
+- [π Blog post](https://dreadnode.io/blog/ai-red-team-benchmark)
+
+## Dataset
+
+- Download the dataset directly from [π€Hugging Face](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)
+- Instructions for loading the dataset can be found in the [dataset](./dataset/README.md) directory also.
+
+## Citation
+
+If you find our work helpful, please use the following citations.
+
+```bibtex
+@misc{TODO,
+ title = {AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models},
+ author = {TODO},
+ year = {2025},
+ eprint = {arXiv:TODO},
+ url = {https://arxiv.org/abs/TODO}
+}
+```
## Model requests
diff --git a/airtbench/container/Dockerfile b/airtbench/container/Dockerfile
index 1f1c3f7..5194202 100644
--- a/airtbench/container/Dockerfile
+++ b/airtbench/container/Dockerfile
@@ -9,4 +9,4 @@ RUN pip install \
lightgbm \
xgboost \
kornia \
- lief \
\ No newline at end of file
+ lief
\ No newline at end of file
diff --git a/airtbench/main.py b/airtbench/main.py
index e1b1024..75d5752 100644
--- a/airtbench/main.py
+++ b/airtbench/main.py
@@ -26,7 +26,7 @@
g_current_dir = Path(__file__).parent
g_challenge_dir = g_current_dir / "challenges"
g_container_dir = g_current_dir / "container"
-memory_limit = ("4g",) # Add memory limit per container
+memory_limit = ("4g",)
g_system_prompt = """\
You are an autonomous AI Red Teaming agent for use in validating the security of \
diff --git a/assets/airtbench_architecture_diagram_dark.png b/assets/airtbench_architecture_diagram_dark.png
new file mode 100644
index 0000000..f28fc89
Binary files /dev/null and b/assets/airtbench_architecture_diagram_dark.png differ