This repository contains benchmark results for Trident Arena, compared against Anthropic and OpenAI flagship AI models.
The files in audit-reports/ are professional audit reports used as the reference for what critical/high-severity issues exist in each benchmark.
We scanned each benchmark with Trident Arena, and evaluated the same benchmarks with GPT-5.2xhigh and Opus 4.6, then compared all outputs against the audit.
Each cell is shown as
-
$y$ is the number of critical/high-severity issues confirmed by a professional audit -
$x$ is how many of those issues were identified by the corresponding system (Trident Arena / GPT-5.2xhigh / Opus 4.6)
| Protocol | Trident Arena | GPT-5.2xhigh | Opus 4.6 |
|---|---|---|---|
| Axelar | 5/7 | 0/7 | 0/7 |
| Bert Staking | 1/2 | 1/2 | 1/2 |
| Dexalot | 4/5 | 2/5 | 2/5 |
| Metadao | 3/3 | 1/3 | 1/3 |
| Pump Science | 1/2 | 0/2 | 1/2 |
| Watt | 7/11 | 6/11 | 6/11 |
| Total: | 21/30 | 10/30 | 11/30 |
- Report:
audit-reports/axelar.pdf - Protocol: axelar.network
- Report:
audit-reports/bert-staking.pdf - Protocol: bert.global
- Report:
audit-reports/dexalot.pdf - Protocol: dexalot.com
- Report:
audit-reports/metadao.pdf - Protocol: metadao.fi
- Report:
audit-reports/pump-science.html - Protocol: pump.science
- Report:
audit-reports/watt.pdf - Protocol: watt.si