Skip to content

Commit 01629f7

Browse files
docs: update placeholder docs links and citation ref (#15)
* docs: update placeholder docs links and citation ref * docs: expand dataset section of readme * chore: paper title change * docs: button up readme and add harness construction artifact * docs: add spacing inbetween buttons * docs: add slug placeholder to blog * docs: add agent harness tech docs button * fix: small dockerfile syntax nit * chore: rm redundant comment * docs: add blog title * fix: increase harness arch size * fix: increase harness arch size * fix: whitespace hf * fix: architecture diagram * docs: add hf dataset ref
1 parent 96532fc commit 01629f7

File tree

4 files changed

+50
-8
lines changed

4 files changed

+50
-8
lines changed

README.md

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Code for the "AIRTBench" AI Red Teaming Agent
1+
# AIRTBench: Autonomous AI Red Teaming Agent Code
22

33
<div align="center">
44

@@ -19,6 +19,12 @@
1919
[![Renovate](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/renovate.yaml/badge.svg)](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/renovate.yaml)
2020
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
2121
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/dreadnode/AIRTBench-Code)](https://github.com/dreadnode/AIRTBench-Code/releases)
22+
23+
[![arXiv](https://img.shields.io/badge/arXiv-TODO-b31b1b.svg)](https://arxiv.org/abs/TODO)
24+
[![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Dataset-ffca28.svg)](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)
25+
[![Dreadnode](https://img.shields.io/badge/Dreadnode-Blog-5714928f.svg)](https://dreadnode.io/blog/ai-red-team-benchmark)
26+
[![Agent Harness](https://img.shields.io/badge/📚_Agent_Harness-Documentation-5714928f.svg)](https://docs.dreadnode.io/strikes/how-to/airtbench-agent)
27+
2228
[![GitHub stars](https://img.shields.io/github/stars/dreadnode/AIRTBench-Code?style=social)](https://github.com/dreadnode/AIRTBench-Code/stargazers)
2329
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/dreadnode/AIRTBench-Code/pulls)
2430

@@ -27,18 +33,32 @@
2733

2834
---
2935

30-
This repository contains the code for the AIRTBench AI red teaming agent. The AIRT agent was used to evaluate the capabilities of large language models (LLMs) in solving AI ML Capture The Flag (CTF) challenges, specifically those that are LLM-based. The agent is designed to autonomously exploit LLMs by solving challenges on the Dreadnode Strikes platform.
36+
This repository contains the implementation of the AIRTBench autonomous AI red teaming agent, complementing our research paper [AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models](https://arxiv.org/abs/TODO) and accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
3137

32-
The paper is available on [arXiV](TODO) and [ACL Anthology](TODO).
38+
The AIRTBench agent is designed to evaluate the autonomous red teaming capabilities of large language models (LLMs) through AI/ML Capture The Flag (CTF) challenges. Our agent systematically exploits LLM-based targets by solving challenges on the Dreadnode Strikes platform, providing a standardized benchmark for measuring adversarial AI capabilities.
3339

34-
- [Code for the "AIRTBench" AI Red Teaming Agent](#code-for-the-airtbench-ai-red-teaming-agent)
40+
- [AIRTBench: Autonomous AI Red Teaming Agent Code](#airtbench-autonomous-ai-red-teaming-agent-code)
41+
- [Agent Harness Construction](#agent-harness-construction)
3542
- [Setup](#setup)
3643
- [Documentation](#documentation)
3744
- [Run the Evaluation](#run-the-evaluation)
3845
- [Basic Usage](#basic-usage)
3946
- [Challenge Filtering](#challenge-filtering)
47+
- [Resources](#resources)
48+
- [Dataset](#dataset)
49+
- [Citation](#citation)
4050
- [Model requests](#model-requests)
4151

52+
## Agent Harness Construction
53+
54+
The AIRTBench harness follows a modular architecture designed for extensibility and evaluation:
55+
56+
<div align="center">
57+
<img src="assets/airtbench_architecture_diagram_dark.png" alt="AIRTBench Architecture" width="100%">
58+
<br>
59+
<em>Figure: AIRTBench harness construction architecture showing the interaction between agent components, challenge interface, and evaluation framework.</em>
60+
</div>
61+
4262
## Setup
4363

4464
You can setup the virtual environment with uv:
@@ -55,8 +75,7 @@ Technical documentation for the AIRTBench agent is available in the [Dreadnode S
5575

5676
<mark>In order to run the code, you will need access to the Dreadnode strikes platform, see the [docs](https://docs.Dreadnode.io/strikes/overview) or submit for the Strikes waitlist [here](https://platform.dreadnode.io/waitlist/strikes)</mark>.
5777

58-
This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile). This example-agent is also a compliment to our research paper [AIRTBench: Can Language Models Autonomously Exploit
59-
Language Models?](https://arxiv.org/abs/TODO). # TODO: Add link to paper once published.
78+
This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile).
6079

6180
```bash
6281
uv run -m airtbench --help
@@ -88,6 +107,29 @@ as needed to ensure they are network-isolated from each other. The process is ge
88107

89108
Check out [the challenge manifest](./airtbench/challenges/.challenges.yaml) to see current challenges in scope.
90109

110+
## Resources
111+
112+
- [📄 Paper on arXiv](https://arxiv.org/abs/TODO)
113+
- [📝 Blog post](https://dreadnode.io/blog/ai-red-team-benchmark)
114+
115+
## Dataset
116+
117+
- Download the dataset directly from [🤗Hugging Face](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)
118+
- Instructions for loading the dataset can be found in the [dataset](./dataset/README.md) directory also.
119+
120+
## Citation
121+
122+
If you find our work helpful, please use the following citations.
123+
124+
```bibtex
125+
@misc{TODO,
126+
title = {AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models},
127+
author = {TODO},
128+
year = {2025},
129+
eprint = {arXiv:TODO},
130+
url = {https://arxiv.org/abs/TODO}
131+
}
132+
```
91133

92134
## Model requests
93135

airtbench/container/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ RUN pip install \
99
lightgbm \
1010
xgboost \
1111
kornia \
12-
lief \
12+
lief

airtbench/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
g_current_dir = Path(__file__).parent
2727
g_challenge_dir = g_current_dir / "challenges"
2828
g_container_dir = g_current_dir / "container"
29-
memory_limit = ("4g",) # Add memory limit per container
29+
memory_limit = ("4g",)
3030

3131
g_system_prompt = """\
3232
You are an autonomous AI Red Teaming agent for use in validating the security of \
298 KB
Loading

0 commit comments

Comments
 (0)