Multi-role chain-of-thought LLM pipeline for Solidity security auditing
Install · Quick Start · How It Works · Configuration · Results
Static analyzers like Slither are fast and reliable, but their output is terse. A finding like reentrancy-eth tells you what fired, not why it matters in this specific contract, how an attacker would exploit it, or what the minimal fix looks like. This tool fills that gap.
solidity-cot-auditor takes Slither's JSON output and runs each finding through a four-role LLM chain:
Slither finding
│
▼
[Explainer] — technical explanation + true/false positive verdict
│
▼
[ExploitWriter] — minimal PoC sketch (for defenders)
│
▼
[Fixer] — unified diff of the minimal fix
│
▼
[Judge] — quality score + flags logical errors in the chain
│
▼
Markdown + JSON report
Each role is a separate LLM call with a focused system prompt. The chain-of-thought is preserved in the output so you can inspect each step.
pip install -e ".[dev]"
# slither is a separate install (requires solc)
pip install slither-analyzerAudit a .sol file directly:
export OPENAI_API_KEY=sk-...
solidity-cot audit ./contracts/MyToken.sol --output reports/Audit from a saved Slither JSON (useful in CI):
slither MyToken.sol --json slither_out.json
solidity-cot audit-json slither_out.json --project MyToken --source-root ./contractsTry it on the included example:
solidity-cot audit examples/contracts/SimpleBank.sol --skip-judgeEach role has a narrow, well-defined job. This matters because:
- A single "audit everything" prompt hallucinates more and produces generic output.
- Separating roles lets you swap or skip stages (e.g., skip exploit writing for informational findings).
- The Judge role catches when earlier roles contradict themselves or miss the point.
Findings are filtered by severity before entering the chain. The default is --min-severity medium. Informational findings (pragma version, naming conventions) are skipped unless you explicitly lower the threshold.
Any OpenAI-compatible endpoint works. Point at a local vLLM server, Together AI, or Fireworks:
export LLM_BASE_URL=http://localhost:8000/v1
export LLM_MODEL=meta-llama/Llama-3-70b-instruct
export LLM_API_KEY=dummy
solidity-cot audit MyContract.solAnthropic Claude is also supported directly:
export LLM_PROVIDER=anthropic
export LLM_BASE_URL=https://api.anthropic.com
export LLM_MODEL=claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...
solidity-cot audit MyContract.sol| Flag | Default | Description |
|---|---|---|
--min-severity |
medium |
Skip findings below this level |
--max-findings |
20 |
Cap findings sent to the LLM chain |
--skip-exploit |
off | Skip the ExploitWriter role |
--skip-fix |
off | Skip the Fixer role |
--skip-judge |
off | Skip the Judge quality check |
--slither-args |
"" |
Extra args forwarded to slither |
On SimpleBank.sol (textbook reentrancy):
| Finding | Severity | Verdict | Judge |
|---|---|---|---|
reentrancy-eth in withdraw |
High | TRUE_POSITIVE | 4/5 |
The Fixer correctly identifies the Checks-Effects-Interactions fix and produces a minimal diff. The Judge flags no logical errors.
Sample output snippet
### Explanation
The `withdraw` function performs an external call (`msg.sender.call{value: amount}`) before
updating `balances[msg.sender]`. An attacker contract can re-enter `withdraw` in its fallback
function, draining the contract before the balance is decremented.
Verdict: TRUE_POSITIVE
### Exploit sketch
Attacker deploys a contract with a fallback that calls `withdraw()` again. On first entry,
balance check passes; on re-entry, balance is still non-zero (not yet decremented).
### Suggested fix
Move the state update before the external call (Checks-Effects-Interactions pattern):
```diff
- (bool ok, ) = msg.sender.call{value: amount}("");
- require(ok, "transfer failed");
- balances[msg.sender] -= amount;
+ balances[msg.sender] -= amount;
+ (bool ok, ) = msg.sender.call{value: amount}("");
+ require(ok, "transfer failed");
</details>
## Roadmap
- [x] Slither JSON parser
- [x] Four-role CoT chain (Explainer → Exploit → Fixer → Judge)
- [x] Markdown + JSON report output
- [x] OpenAI-compatible endpoint support
- [ ] Mythril integration (dynamic analysis findings)
- [ ] Batch mode: audit entire Foundry project
- [ ] GitHub Actions workflow template
- [ ] Fine-tuned model support (SFT on DeFi exploit dataset)
## Running Tests
```bash
pytest
Tests use a fake LLM client — no API key needed.
Apache 2.0