|
1 | 1 | # AIRTBench Dataset - External Release |
2 | 2 |
|
| 3 | +- [AIRTBench Dataset - External Release](#airtbench-dataset---external-release) |
| 4 | + - [Overview](#overview) |
| 5 | + - [Dataset Statistics](#dataset-statistics) |
| 6 | + - [Model Success Rates](#model-success-rates) |
| 7 | + - [Challenge Difficulty Distribution](#challenge-difficulty-distribution) |
| 8 | + - [Data Dictionary](#data-dictionary) |
| 9 | + - [Identifiers](#identifiers) |
| 10 | + - [Primary Outcomes](#primary-outcomes) |
| 11 | + - [Performance Metrics](#performance-metrics) |
| 12 | + - [Resource Usage](#resource-usage) |
| 13 | + - [Cost Analysis](#cost-analysis) |
| 14 | + - [Conversation Content](#conversation-content) |
| 15 | + - [Error Analysis](#error-analysis) |
| 16 | + - [Usage Examples](#usage-examples) |
| 17 | + - [Basic Analysis](#basic-analysis) |
| 18 | + - [Cost Analysis](#cost-analysis-1) |
| 19 | + - [Performance Analysis](#performance-analysis) |
| 20 | + - [Conversation Content](#conversation-content-1) |
| 21 | + - [Contact](#contact) |
| 22 | + - [Version History](#version-history) |
| 23 | + |
3 | 24 | ## Overview |
4 | 25 |
|
5 | | -This dataset contains the complete experimental results from the AIRTBench paper: "AIRTBench: An AI Red Teaming Benchmark for Evaluating Language Models' Ability to Autonomously Discover and Exploit AI/ML Security Vulnerabilities." |
| 26 | +This dataset contains the complete experimental results from the AIRTBench paper: "*AIRTBench: An AI Red Teaming Benchmark for Evaluating Language Models' Ability to Autonomously Discover and Exploit AI/ML Security Vulnerabilities.*" |
6 | 27 |
|
7 | | -The dataset includes 8,066 experimental runs across 12 different language models and 70 security challenges. |
| 28 | +The dataset includes 8,066 experimental runs across 12 different language models and 70 security challenges and is available [here](https://huggingface.co/datasets/dreadnode/AIRTBench/). |
8 | 29 |
|
9 | 30 | ## Dataset Statistics |
10 | 31 |
|
|
0 commit comments