📄 Experiments Paper | 📄 Framework Paper | 📘 Documentation | ⚙️ Installation
The first framework designed to build and experiment with provenance-based intrusion detection systems (PIDSs) using deep learning architectures. It provides a single codebase to run most recent state-of-the-arts systems and easily customize them to develop new variants.
The framework currently integrates the following PIDSs.
| PIDS | Venue | Paper |
|---|---|---|
| Velox | USENIX Security 2025 | Link |
| Orthrus | USENIX Security 2025 | Link |
| R-Caid | IEEE S&P 2024 | Link |
| Flash | IEEE S&P 2024 | Link |
| Kairos | IEEE S&P 2024 | Link |
| Magic | USENIX Security 2024 | Link |
| NodLink | NDSS 2024 | Link |
| ThreaTrace | IEEE TIFS 2022 | Link |
It also includes several easy-to-install provenance datasets for APT detection.
| Dataset | OS | Attacks | Size (GB) |
|---|---|---|---|
| CADETS_E3 | FreeBSD | 3 | 10 |
| THEIA_E3 | Linux | 2 | 12 |
| CLEARSCOPE_E3 | Android | 1 | 4.8 |
| FIVEDIRECTIONS_E3 | Windows | 2 | 22 |
| TRACE_E3 | Linux | 3 | 100 |
| CADETS_E5 | FreeBSD | 2 | 276 |
| THEIA_E5 | Linux | 1 | 36 |
| CLEARSCOPE_E5 | Android | 2 | 49 |
| FIVEDIRECTIONS_E5 | Windows | 4 | 280 |
| TRACE_E5 | Linux | 1 | 710 |
| optc_h201 | Windows | 1 | 9 |
| optc_h501 | Windows | 1 | 6.7 |
| optc_h051 | Windows | 1 | 7.7 |
A comprehensive documentation is available, explaining all possible arguments and providing examples on how integrating new systems.
The framework integrates a pipeline composed of seven stages, each parameterizable via configurable arguments, enabling flexible customization of new systems.
git clone https://github.com/ubc-provenance/PIDSMaker.git
We have made the installation of PIDSMaker inclusing pre-processed databases for DARPA TC and OpTC datasets easy and fast. Simply follow these guidelines.
Once you have a followed the installation guidelines, you can open a shell in the pids container and experiment in multiple ways.
Replace SYSTEM by velox, orthrus, nodlink, threatrace, kairos, rcaid, flash, magic.
-
Run in the shell:
python pidsmaker/main.py SYSTEM DATASET
-
Run in the shell, monitored to weights & biases (W&B):
python pidsmaker/main.py SYSTEM DATASET --wandb
-
Run in background, monitored to W&B (recommended for multiple parallel runs and for research):
./run.sh SYSTEM DATASET
You can still watch the logs in your shell using tail -f nohup.out.
We generally using using W&B for experiment monitoring and historization (see installation guidelines).
Warning: Before performing evaluations, you should tune all systems (see docs here).
PIDSs exhibit significant instability—that is, high sensitivity to training perturbations—due to their self-supervised training nature. Running the same configuration with different random seeds or minor hyperparameter changes often yields substantially different results. Consequently, reproducing results as the framework evolves presents a real challenge.
Based on our experiments, we provide tuned hyperparameters for the main systems. However, we can't guarantee that these hyperparameters will lead to satisfactory results due to instability.
We recommend running each system multiple times to increase the likelihood of obtaining a run with good metrics. Alternatively, you can perform hyperparameter tuning for each system.
The default configuration files in config/*.yml represent the architecture of existing PIDSs in YAML format. They contain the original hyperparameters used by each system.
The main strength of PIDSMaker is the customization of existing systems for easy experimentation. A few examples below.
Running Kairos with embedding size of 128 instead of 100, and last neighbor sampling set to last 10 neighbors instead of 20.
python pidsmaker/main.py kairos CADETS_E3 \
--training.node_hid_dim=128 \
--batching.intra_graph_batching.tgn_last_neighbor.tgn_neighbor_size=10Running Orthrus with Doc2vec instead of word2vec, and 3 GraphSAGE layers instead of 2 attention layers.
python pidsmaker/main.py orthrus CADETS_E3 \
--featurization.used_method=doc2vec \
--featurization.emb_dim=128 \
--training.encoder.used_methods=tgn,sage \
--training.encoder.sage.num_layers=3Want to create a new PIDS? Create a new config under config/your_system.yml, inherit from existing PIDSs and tune it as you want.
Magic with node type prediction instead of its hybrid masked feature reconstruction and structure prediction objective function, and use a 2-layer MLP with ReLU as decoder, and use NodLink's thresholding method.
_include_yml: magic
training:
decoder:
used_methods: predict_node_type
predict_node_type:
node_mlp:
architecture_str: linear(0.5) | relu
evaluation:
node_evaluation:
threshold_method: nodlinkYou can then visualize the results using the many generated figures, locally or on Weights and Biases.
PIDSMaker supports easy hyperparameter tuning for existing or new models. Follow the instructions available in our documentation.
You can specify the range of hyperparameters to search in a yaml config.
method: grid
parameters:
training.lr:
values: [0.001, 0.0001]
training.node_hid_dim:
values: [32, 64, 128, 256]
featurization.used_method:
values: [fasttext, word2vec]Then run the framework in tuning mode.
./run.sh my_system CADETS_E3 --tuning_mode=hyperparametersOnce you find the best hyperparameters, store them in a yaml file and run your tuned model.
./run.sh my_system CADETS_E3 --tunedIf you use this work, please cite the two following papers:
@article{bilot2026pidsmaker,
title={PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems},
author={Bilot, Tristan and Jiang, Baoxiang and Pasquier, Thomas},
journal={arXiv preprint arXiv:2601.22983},
year={2026}
}
@inproceedings{bilot2025simpler,
title={{Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems}},
author={Bilot, Tristan and Jiang, Baoxiang and Li, Zefeng and El Madhoun, Nour and Al Agha, Khaldoun and Zouaoui, Anis and Pasquier, Thomas},
booktitle={Security Symposium (USENIX Sec'25)},
year={2025},
organization={USENIX}
}
Pull requests are welcome! Please follow the contribution guidelines.
See licence.

