solidity-cot-auditor

Multi-role chain-of-thought LLM pipeline for Solidity security auditing

Install · Quick Start · How It Works · Configuration · Results

Static analyzers like Slither are fast and reliable, but their output is terse. A finding like reentrancy-eth tells you what fired, not why it matters in this specific contract, how an attacker would exploit it, or what the minimal fix looks like. This tool fills that gap.

solidity-cot-auditor takes Slither's JSON output and runs each finding through a four-role LLM chain:

Slither finding
    │
    ▼
[Explainer]  — technical explanation + true/false positive verdict
    │
    ▼
[ExploitWriter]  — minimal PoC sketch (for defenders)
    │
    ▼
[Fixer]  — unified diff of the minimal fix
    │
    ▼
[Judge]  — quality score + flags logical errors in the chain
    │
    ▼
Markdown + JSON report

Each role is a separate LLM call with a focused system prompt. The chain-of-thought is preserved in the output so you can inspect each step.

Install

pip install -e ".[dev]"
# slither is a separate install (requires solc)
pip install slither-analyzer

Quick Start

Audit a .sol file directly:

export OPENAI_API_KEY=sk-...
solidity-cot audit ./contracts/MyToken.sol --output reports/

Audit from a saved Slither JSON (useful in CI):

slither MyToken.sol --json slither_out.json
solidity-cot audit-json slither_out.json --project MyToken --source-root ./contracts

Try it on the included example:

solidity-cot audit examples/contracts/SimpleBank.sol --skip-judge

How It Works

Role separation

Each role has a narrow, well-defined job. This matters because:

A single "audit everything" prompt hallucinates more and produces generic output.
Separating roles lets you swap or skip stages (e.g., skip exploit writing for informational findings).
The Judge role catches when earlier roles contradict themselves or miss the point.

Contested-weighted filtering

Findings are filtered by severity before entering the chain. The default is --min-severity medium. Informational findings (pragma version, naming conventions) are skipped unless you explicitly lower the threshold.

LLM compatibility

Any OpenAI-compatible endpoint works. Point at a local vLLM server, Together AI, or Fireworks:

export LLM_BASE_URL=http://localhost:8000/v1
export LLM_MODEL=meta-llama/Llama-3-70b-instruct
export LLM_API_KEY=dummy
solidity-cot audit MyContract.sol

Anthropic Claude is also supported directly:

export LLM_PROVIDER=anthropic
export LLM_BASE_URL=https://api.anthropic.com
export LLM_MODEL=claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...
solidity-cot audit MyContract.sol

Configuration

Flag	Default	Description
`--min-severity`	`medium`	Skip findings below this level
`--max-findings`	`20`	Cap findings sent to the LLM chain
`--skip-exploit`	off	Skip the ExploitWriter role
`--skip-fix`	off	Skip the Fixer role
`--skip-judge`	off	Skip the Judge quality check
`--slither-args`	`""`	Extra args forwarded to slither

Results

On SimpleBank.sol (textbook reentrancy):

Finding	Severity	Verdict	Judge
reentrancy-eth in `withdraw`	High	TRUE_POSITIVE	4/5

The Fixer correctly identifies the Checks-Effects-Interactions fix and produces a minimal diff. The Judge flags no logical errors.

Sample output snippet

### Explanation
The `withdraw` function performs an external call (`msg.sender.call{value: amount}`) before
updating `balances[msg.sender]`. An attacker contract can re-enter `withdraw` in its fallback
function, draining the contract before the balance is decremented.

Verdict: TRUE_POSITIVE

### Exploit sketch
Attacker deploys a contract with a fallback that calls `withdraw()` again. On first entry,
balance check passes; on re-entry, balance is still non-zero (not yet decremented).

### Suggested fix
Move the state update before the external call (Checks-Effects-Interactions pattern):
```diff
-        (bool ok, ) = msg.sender.call{value: amount}("");
-        require(ok, "transfer failed");
-        balances[msg.sender] -= amount;
+        balances[msg.sender] -= amount;
+        (bool ok, ) = msg.sender.call{value: amount}("");
+        require(ok, "transfer failed");


</details>

## Roadmap

- [x] Slither JSON parser
- [x] Four-role CoT chain (Explainer → Exploit → Fixer → Judge)
- [x] Markdown + JSON report output
- [x] OpenAI-compatible endpoint support
- [ ] Mythril integration (dynamic analysis findings)
- [ ] Batch mode: audit entire Foundry project
- [ ] GitHub Actions workflow template
- [ ] Fine-tuned model support (SFT on DeFi exploit dataset)

## Running Tests

```bash
pytest

Tests use a fake LLM client — no API key needed.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
examples/contracts		examples/contracts
solidity_cot		solidity_cot
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

solidity-cot-auditor

Install

Quick Start

How It Works

Role separation

Contested-weighted filtering

LLM compatibility

Configuration

Results

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

solidity-cot-auditor

Install

Quick Start

How It Works

Role separation

Contested-weighted filtering

LLM compatibility

Configuration

Results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages