diffalayze is a versatile toolkit for automating patch diffing of binary targets and enriching the results with deep-dive analysis from large language models (LLMs). It is designed for reverse engineers, vulnerability researchers, and security teams who need to track software changes, highlight potentially security-relevant modifications, and produce actionable insights quickly.
Check out the blog post!
diffalayze
orchestrates the entire patch diffing workflow:
- Fetches old and new versions of a binary automatically
- Runs side-by-side diffs using Ghidriff in Docker
- Analyzes results using an LLM pipeline with structured scoring and severity levels
- Archives every run for reproducibility and auditing
The result: fast, repeatable, and AI-enhanced binary patch analysis.
┌─ diffalayze.py → Main orchestration (CLI, Docker, Ghidriff, LLM)
│
├─ utils/
│ ├─ llm_client.py → Backend-agnostic LLM helper (Ollama, OpenAI, Claude)
│ ├─ llmanalyze.py → Pocketflow-based diff analysis pipeline
│ └─ prompts.yaml → Prompt definitions for multi-stage LLM analysis
│
└─ targets/
└─ <product-dir>/ → Target-specific directory (binaries, diffs, archives)
└─ fetch_target.py → Script to download, extract & compare versions
-
Automated Binary Diffing Runs Ghidriff in a Docker container to produce detailed side-by-side (SxS) diffs.
-
LLM-Powered Security Analysis Converts raw diffs into structured, severity-rated reports (
NONE
→CRITICAL
). -
Modular & Backend-Agnostic Plug-and-play with OpenAI, Claude (Anthropic), or Ollama.
-
Trigger-Based Actions Execute custom commands automatically when a severity threshold is met.
-
Versioned Archiving Every run is timestamped and stored for future comparison and auditing.
docker pull ghcr.io/clearbluejar/ghidriff:latest
You must have permission to run Docker containers (e.g., be a member of the docker
group).
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install openai anthropic
A target lives in targets/<product-name>/
and must include a fetch_target.py
script that:
-
Downloads the old and new binary versions
-
Verifies if a new version is available
-
Outputs the absolute paths to both versions, e.g.:
/home/user/diffalayze/targets/mrxsmb/old.mrxsmb.sys /home/user/diffalayze/targets/mrxsmb/new.mrxsmb.sys
Tip: For Windows binaries, you can use the provided Winbindex-based example in targets/
.
Just adjust:
dbfile
filename
windows_version
diffalayze
supports any architecture Ghidra can handle – not just Windows binaries.
python3 diffalayze.py -h
export OPENAI_API_KEY="sk-..."
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
python3 diffalayze.py all -f -a -lb openai -lm gpt-5
Reports are saved to:
targets/<target>/archive/ghidriffs_<timestamp>/analysis.md
python3 diffalayze.py tdx-1607 -f -a -lb openai -lm gpt-5-nano -lv
python3 diffalayze.py tdx-1607 -f
python3 utils/llmanalyze.py -i <path-to-sxs-output> -v -lb ollama -tn TDX
python3 diffalayze.py tdx-1607 -f -a -llt MEDIUM -ltc ./notify.sh -lb openai -lm gpt-5-nano
Variable | Description |
---|---|
OPENAI_API_KEY |
API key for OpenAI |
ANTHROPIC_API_KEY |
API key for Claude (Anthropic) |
OLLAMA_URL |
Ollama endpoint (default: http://localhost:11434 ) |
SSL_CERT_FILE |
CA certificate file path |
prompts.yaml
defines three stages:
per_doc
– Analyzes each diffed function in isolationfinal_synthesis
– Combines per-function insights into a full reportevaluation
– Outputs a structured JSON withlevel
,score
,summary
, and recommended actions
Easily customizable to match your security policy or analysis style.
-
Token usage Expect ~8K tokens per changed function + ~5K for the summary (varies by code size and model).
-
Human verification required LLM results are assistive, not authoritative – always review findings manually.
This project builds on the excellent work of:
Use responsibly and only with permission from all relevant parties. This toolkit is intended for educational and research purposes only.