Clouseau is a hierarchical multi-agent system for iterative cyber attack investigation. Starting from a point of interest (e.g., a suspicious domain), it autonomously explores incident data sources, issues targeted queries, analyzes evidence, and incrementally reconstructs an attack narrative. Clouseau operates without training or predefined heuristics, instead leveraging LLMs' inherent understanding of systems and security to correlate and reason about incident data. Read the paper here.
artifact— source code for Clouseau, including prompts and processed scenarios for testing.claims— instructions and scripts to reproduce paper's main claims.
Create a virtual environment first with conda or any other alternatives. We tested this with Python 3.12.11.
pip install -r artifact/requirements.txtYou will need an OpenAI key to reproduce the results, please set API_KEY properly before running any script.
A run of clouseau to reproduce results for Single Host scenarios (Section 6.1). Run:
./claims/claim1/run.shThen inspect claims/claim1/average.csv, recall, precision and f1 should above 95%.
A run of clouseau to reproduce results for Single Host Extended scenarios (Section 6.1). Run:
./claims/claim2/run.shThen inspect claims/claim2/average.csv, recall, precision and f1 should above 95%.
A run of clouseau to reproduce results for Keyword Sensitivity scenarios (Section 6.2). Run:
./claims/claim3/run.shThen inspect claims/claim3/average.csv, recall, precision and f1 should above 95%.
Want to go beyond the paper claims? Use the CLI to explore, tweak, and run targeted experiments:
cd artifact
python app.py --helpThis shows how to:
- switch the underlying LLM model.
- enable single‑agent mode.
- evaluate on OpTC datasets.
To run single exeperiments, you can utilize the notebook artifact/clouseau.ipynb.