Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/how-to/airtbench-agent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ For this guide, we'll assume you have the `dreadnode` package installed and are
<Info>
This agent also serves as a major functional component to complement our practical exploit research paper: "AIRTBench: Can Language Models Autonomously Exploit Language Models?" which explores the use of LLMs to solve CTF challenges in Crucible, Dreadnode's AI hacking playground.

The paper discusses the design and implementation of the agent, as well as its performance on various challenges. You can find the paper [here](TODO) on arXiv, or learn more on our accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
The paper discusses the design and implementation of the agent, as well as its performance on various challenges. You can find the paper [here](https://arxiv.org/abs/2506.14682) on arXiv, or learn more on our accompanying blog post, "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)".
</Info>

In this guide, we'll cover building an agent capable of solving AI/ML capture-the-flag (CTF) challenges hosted on [Crucible](../../crucible/overview.mdx). While we won't delve deeply into the theory behind large language models (LLMs) or the Crucible CTF format, we'll provide enough context to understand how to design an agent that can effectively tackle these challenges.
Expand Down
Loading