Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
DREADNODE_SERVER_URL="https://platform.dreadnode.io"
# See https://platform.dreadnode.io/account to get your API token and key (same value)
DREADNODE_TOKEN=YOUR_DREADNODE_API_TOKEN
DREADNODE_LOCAL_DIR="runs/"
LOGFIRE_IGNORE_NO_CONFIG=1

# AI provider API keys (replace <ADD_API_KEY> with your actual key)
ANTHROPIC_API_KEY=<ADD_API_KEY>
GROQ_API_KEY=<ADD_API_KEY>
OPENAI_API_KEY=<ADD_API_KEY>
TOGETHER_AI_API_KEY=<ADD_API_KEY>
GEMINI_API_KEY=<ADD_API_KEY>
9 changes: 0 additions & 9 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,15 +172,6 @@ runs/*.jsonl
# Datasets

datasets/

# Callisto

callisto/challenges/public*

# JSON we want

!callisto/analysis/openai_challenges.json

*.parquet
*.json
*.csv
2 changes: 1 addition & 1 deletion .hooks/generate_pr_description.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def get_diff(base_ref: str, source_ref: str, *, exclude: list[str] | None = None
def main(
base_ref: str = "origin/main",
source_ref: str = "HEAD",
generator_id: str = "openai/gpt-4o-mini",
generator_id: str = "groq/meta-llama/llama-4-scout-17b-16e-instruct",
max_diff_lines: int = 1000,
exclude: list[str] | None = None,
) -> None:
Expand Down
12 changes: 0 additions & 12 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@ repos:
- "types-PyYAML"
- "types-requests"
- "types-setuptools"
exclude: (callisto/scripts/|scripts/challenge_manager\.py)

- repo: local
hooks:
Expand All @@ -90,14 +89,3 @@ repos:
entry: .hooks/prettier.sh
language: script
types: [json, yaml]

- id: check-challenges
name: Check challenge updates
entry: python scripts/challenge_manager.py --check
language: python
pass_filenames: false
always_run: true
files: ^callisto/challenges/
additional_dependencies:
- pyyaml
- jinja2
18 changes: 18 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
},
"editor.defaultFormatter": "charliermarsh.ruff"
},
"python.testing.pytestArgs": [
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"mypy.runUsingActiveInterpreter": true,
"debugpy.debugJustMyCode": false,
"jupyter.debugJustMyCode": false
}
34 changes: 15 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ The paper is available on [arXiV](TODO) and [ACL Anthology](TODO).
- [Code for the "AIRTBench" AI Red Teaming Agent](#code-for-the-airtbench-ai-red-teaming-agent)
- [Setup](#setup)
- [Run the Evaluation](#run-the-evaluation)
- [Basic Usage](#basic-usage)
- [Challenge Filtering](#challenge-filtering)
- [Model requests](#model-requests)
- [Support the Project and Contributing](#support-the-project-and-contributing)
- [Star History](#star-history)

## Setup

Expand All @@ -50,17 +50,25 @@ uv sync

<mark>In order to run the code, you will need access to the Dreadnode strikes platform, see the [docs](https://docs.Dreadnode.io/strikes/overview) or submit for the Strikes waitlist [here](https://platform.dreadnode.io/waitlist/strikes)</mark>.

This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./ai_ctf/container/Dockerfile). This example-agent is also a compliment to our research paper [AIRTBench: Can Language Models Autonomously Exploit
This [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile). This example-agent is also a compliment to our research paper [AIRTBench: Can Language Models Autonomously Exploit
Language Models?](https://arxiv.org/abs/TODO). # TODO: Add link to paper once published.

```bash
uv run -m ai_ctf --help
uv run -m airtbench --help
```

### Basic Usage

```bash
uv run -m airtbench --model $MODEL --project $PROJECT --platform-api-key $DREADNODE_TOKEN --token $DREADNODE_TOKEN --server https://platform.dreadnode.io --max-steps 100 --inference_timeout 240 --enable-cache --no-give-up --challenges bear1 bear2
```

### Challenge Filtering

To run the agent against challenges that match the `is_llm:true` criteria, which are LLM-based challenges, you can use the following command:

```bash
uv run -m ai_ctf --model <model> --llm-challenges-only
uv run -m airtbench --model <model> --llm-challenges-only
```

The harness will automatically build the defined number of containers with the supplied flag, and load them
Expand All @@ -73,21 +81,9 @@ as needed to ensure they are network-isolated from each other. The process is ge
5. If the CTF challenge is solved and flag is observed, the agent must submit the flag
6. Otherwise run until an error, give up, or max-steps is reached

Check out [the challenge manifest](./ai_ctf/challenges/.challenges.yaml) to see current challenges in scope.
Check out [the challenge manifest](./airtbench/challenges/.challenges.yaml) to see current challenges in scope.


## Model requests

If you know of a model that may be interesting to analyze, but do not have the resources to run it yourself, feel free to open a feature request via a GitHub issue.

## Support the Project and Contributing

We welcome any issues or contributions to the project, share the treasure! If you like our project, please feel free to drop us some love <3

### Star History

[![GitHub stars](https://img.shields.io/github/stars/dreadnode/AIRTBench-Code?style=social)](https://github.com/dreadnode/AIRTBench-Code/stargazers)

By watching the repo, you can also be notified of any upcoming releases.

<img src="https://api.star-history.com/svg?repos=dreadnode/AIRTBench-Code&type=Date" width="600" height="400">
If you know of a model that may be interesting to analyze, but do not have the resources to run it yourself, feel free to open a feature request via a GitHub issue.
2 changes: 2 additions & 0 deletions airtbench/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.bak
*.removed_notebooks/
Empty file added airtbench/__init__.py
Empty file.
4 changes: 4 additions & 0 deletions airtbench/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .main import app

if __name__ == "__main__":
app()
27 changes: 27 additions & 0 deletions airtbench/challenges.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import pathlib

import yaml # type: ignore [import-untyped]
from pydantic import BaseModel

current_dir = pathlib.Path(__file__).parent
challenges_dir = current_dir / "challenges"


class Challenge(BaseModel):
id: str
name: str
category: str
difficulty: str
notebook: str
is_llm: bool = False


def load_challenges() -> list[Challenge]:
"""
Load challenges from the .challenges.yaml file in the challenges directory.

Returns:
list[Challenge]: A list of Challenge objects.
"""
with (challenges_dir / ".challenges.yaml").open() as f:
return [Challenge(id=key, **challenge) for key, challenge in yaml.safe_load(f).items()]
Loading