Skip to content

polymarket-developers/mkzmbd4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

solidity-cot-auditor

Multi-role chain-of-thought LLM pipeline for Solidity security auditing

Python 3.10+ License: Apache 2.0 Slither

Install · Quick Start · How It Works · Configuration · Results


Static analyzers like Slither are fast and reliable, but their output is terse. A finding like reentrancy-eth tells you what fired, not why it matters in this specific contract, how an attacker would exploit it, or what the minimal fix looks like. This tool fills that gap.

solidity-cot-auditor takes Slither's JSON output and runs each finding through a four-role LLM chain:

Slither finding
    │
    ▼
[Explainer]  — technical explanation + true/false positive verdict
    │
    ▼
[ExploitWriter]  — minimal PoC sketch (for defenders)
    │
    ▼
[Fixer]  — unified diff of the minimal fix
    │
    ▼
[Judge]  — quality score + flags logical errors in the chain
    │
    ▼
Markdown + JSON report

Each role is a separate LLM call with a focused system prompt. The chain-of-thought is preserved in the output so you can inspect each step.

Install

pip install -e ".[dev]"
# slither is a separate install (requires solc)
pip install slither-analyzer

Quick Start

Audit a .sol file directly:

export OPENAI_API_KEY=sk-...
solidity-cot audit ./contracts/MyToken.sol --output reports/

Audit from a saved Slither JSON (useful in CI):

slither MyToken.sol --json slither_out.json
solidity-cot audit-json slither_out.json --project MyToken --source-root ./contracts

Try it on the included example:

solidity-cot audit examples/contracts/SimpleBank.sol --skip-judge

How It Works

Role separation

Each role has a narrow, well-defined job. This matters because:

  • A single "audit everything" prompt hallucinates more and produces generic output.
  • Separating roles lets you swap or skip stages (e.g., skip exploit writing for informational findings).
  • The Judge role catches when earlier roles contradict themselves or miss the point.

Contested-weighted filtering

Findings are filtered by severity before entering the chain. The default is --min-severity medium. Informational findings (pragma version, naming conventions) are skipped unless you explicitly lower the threshold.

LLM compatibility

Any OpenAI-compatible endpoint works. Point at a local vLLM server, Together AI, or Fireworks:

export LLM_BASE_URL=http://localhost:8000/v1
export LLM_MODEL=meta-llama/Llama-3-70b-instruct
export LLM_API_KEY=dummy
solidity-cot audit MyContract.sol

Anthropic Claude is also supported directly:

export LLM_PROVIDER=anthropic
export LLM_BASE_URL=https://api.anthropic.com
export LLM_MODEL=claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...
solidity-cot audit MyContract.sol

Configuration

Flag Default Description
--min-severity medium Skip findings below this level
--max-findings 20 Cap findings sent to the LLM chain
--skip-exploit off Skip the ExploitWriter role
--skip-fix off Skip the Fixer role
--skip-judge off Skip the Judge quality check
--slither-args "" Extra args forwarded to slither

Results

On SimpleBank.sol (textbook reentrancy):

Finding Severity Verdict Judge
reentrancy-eth in withdraw High TRUE_POSITIVE 4/5

The Fixer correctly identifies the Checks-Effects-Interactions fix and produces a minimal diff. The Judge flags no logical errors.

Sample output snippet
### Explanation
The `withdraw` function performs an external call (`msg.sender.call{value: amount}`) before
updating `balances[msg.sender]`. An attacker contract can re-enter `withdraw` in its fallback
function, draining the contract before the balance is decremented.

Verdict: TRUE_POSITIVE

### Exploit sketch
Attacker deploys a contract with a fallback that calls `withdraw()` again. On first entry,
balance check passes; on re-entry, balance is still non-zero (not yet decremented).

### Suggested fix
Move the state update before the external call (Checks-Effects-Interactions pattern):
```diff
-        (bool ok, ) = msg.sender.call{value: amount}("");
-        require(ok, "transfer failed");
-        balances[msg.sender] -= amount;
+        balances[msg.sender] -= amount;
+        (bool ok, ) = msg.sender.call{value: amount}("");
+        require(ok, "transfer failed");

</details>

## Roadmap

- [x] Slither JSON parser
- [x] Four-role CoT chain (Explainer → Exploit → Fixer → Judge)
- [x] Markdown + JSON report output
- [x] OpenAI-compatible endpoint support
- [ ] Mythril integration (dynamic analysis findings)
- [ ] Batch mode: audit entire Foundry project
- [ ] GitHub Actions workflow template
- [ ] Fine-tuned model support (SFT on DeFi exploit dataset)

## Running Tests

```bash
pytest

Tests use a fake LLM client — no API key needed.

License

Apache 2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages