You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://github.com/dreadnode/AIRTBench-Code/releases)
This repository contains the code for the AIRTBench AI red teaming agent. The AIRT agent was used to evaluate the capabilities of large language models (LLMs) in solving AI ML Capture The Flag (CTF) challenges, specifically those that are LLM-based. The agent is designed to autonomously exploit LLMs by solving challenges on the Dreadnode Strikes platform.
36
+
This repository contains the implementation of the AIRTBench autonomous AI red teaming agent, complementing our research paper [AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models](https://arxiv.org/abs/TODO) and accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
31
37
32
-
The paper is available on [arXiV](TODO) and [ACL Anthology](TODO).
38
+
The AIRTBench agent is designed to evaluate the autonomous red teaming capabilities of large language models (LLMs) through AI/ML Capture The Flag (CTF) challenges. Our agent systematically exploits LLM-based targets by solving challenges on the Dreadnode Strikes platform, providing a standardized benchmark for measuring adversarial AI capabilities.
33
39
34
-
-[Code for the "AIRTBench" AI Red Teaming Agent](#code-for-the-airtbench-ai-red-teaming-agent)
40
+
-[AIRTBench: Autonomous AI Red Teaming Agent Code](#airtbench-autonomous-ai-red-teaming-agent-code)
<em>Figure: AIRTBench harness construction architecture showing the interaction between agent components, challenge interface, and evaluation framework.</em>
60
+
</div>
61
+
42
62
## Setup
43
63
44
64
You can setup the virtual environment with uv:
@@ -55,8 +75,7 @@ Technical documentation for the AIRTBench agent is available in the [Dreadnode S
55
75
56
76
<mark>In order to run the code, you will need access to the Dreadnode strikes platform, see the [docs](https://docs.Dreadnode.io/strikes/overview) or submit for the Strikes waitlist [here](https://platform.dreadnode.io/waitlist/strikes)</mark>.
57
77
58
-
This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile). This example-agent is also a compliment to our research paper [AIRTBench: Can Language Models Autonomously Exploit
59
-
Language Models?](https://arxiv.org/abs/TODO). # TODO: Add link to paper once published.
78
+
This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile).
60
79
61
80
```bash
62
81
uv run -m airtbench --help
@@ -88,6 +107,29 @@ as needed to ensure they are network-isolated from each other. The process is ge
88
107
89
108
Check out [the challenge manifest](./airtbench/challenges/.challenges.yaml) to see current challenges in scope.
90
109
110
+
## Resources
111
+
112
+
-[📄 Paper on arXiv](https://arxiv.org/abs/TODO)
113
+
-[📝 Blog post](https://dreadnode.io/blog/ai-red-team-benchmark)
114
+
115
+
## Dataset
116
+
117
+
- Download the dataset directly from [🤗Hugging Face](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)
118
+
- Instructions for loading the dataset can be found in the [dataset](./dataset/README.md) directory also.
119
+
120
+
## Citation
121
+
122
+
If you find our work helpful, please use the following citations.
123
+
124
+
```bibtex
125
+
@misc{TODO,
126
+
title = {AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models},
0 commit comments