GitHub - ubc-provenance/PIDSMaker: A framework for building provenance-based intrusion detection systems with neural networks

📄 Experiments Paper | 📄 Framework Paper | 📘 Documentation | ⚙️ Installation

The first framework designed to build and experiment with provenance-based intrusion detection systems (PIDSs) using deep learning architectures. It provides a single codebase to run most recent state-of-the-arts systems and easily customize them to develop new variants.

Supported Systems

The framework currently integrates the following PIDSs.

PIDS	Venue	Paper
Velox	USENIX Security 2025	Link
Orthrus	USENIX Security 2025	Link
R-Caid	IEEE S&P 2024	Link
Flash	IEEE S&P 2024	Link
Kairos	IEEE S&P 2024	Link
Magic	USENIX Security 2024	Link
NodLink	NDSS 2024	Link
ThreaTrace	IEEE TIFS 2022	Link

Supported Datasets

It also includes several easy-to-install provenance datasets for APT detection.

Dataset	OS	Attacks	Size (GB)
CADETS_E3	FreeBSD	3	10
THEIA_E3	Linux	2	12
CLEARSCOPE_E3	Android	1	4.8
FIVEDIRECTIONS_E3	Windows	2	22
TRACE_E3	Linux	3	100
CADETS_E5	FreeBSD	2	276
THEIA_E5	Linux	1	36
CLEARSCOPE_E5	Android	2	49
FIVEDIRECTIONS_E5	Windows	4	280
TRACE_E5	Linux	1	710
optc_h201	Windows	1	9
optc_h501	Windows	1	6.7
optc_h051	Windows	1	7.7

📄 Documentation

A comprehensive documentation is available, explaining all possible arguments and providing examples on how integrating new systems.

Pipeline

The framework integrates a pipeline composed of seven stages, each parameterizable via configurable arguments, enabling flexible customization of new systems.

Setup

⬇️ Clone the repo

git clone https://github.com/ubc-provenance/PIDSMaker.git

💻 Installation with Docker

We have made the installation of PIDSMaker inclusing pre-processed databases for DARPA TC and OpTC datasets easy and fast. Simply follow these guidelines.

🧪 Basic usage of the framework

Once you have a followed the installation guidelines, you can open a shell in the pids container and experiment in multiple ways. Replace SYSTEM by velox, orthrus, nodlink, threatrace, kairos, rcaid, flash, magic.

Run in the shell:
```
python pidsmaker/main.py SYSTEM DATASET
```
Run in the shell, monitored to weights & biases (W&B):
```
python pidsmaker/main.py SYSTEM DATASET --wandb
```
Run in background, monitored to W&B (recommended for multiple parallel runs and for research):
```
./run.sh SYSTEM DATASET
```

You can still watch the logs in your shell using tail -f nohup.out.

We generally using using W&B for experiment monitoring and historization (see installation guidelines).

Warning: Before performing evaluations, you should tune all systems (see docs here).

Reproducing results

PIDSs exhibit significant instability—that is, high sensitivity to training perturbations—due to their self-supervised training nature. Running the same configuration with different random seeds or minor hyperparameter changes often yields substantially different results. Consequently, reproducing results as the framework evolves presents a real challenge.

Based on our experiments, we provide tuned hyperparameters for the main systems. However, we can't guarantee that these hyperparameters will lead to satisfactory results due to instability.

We recommend running each system multiple times to increase the likelihood of obtaining a run with good metrics. Alternatively, you can perform hyperparameter tuning for each system.

Customize existing systems

The default configuration files in config/*.yml represent the architecture of existing PIDSs in YAML format. They contain the original hyperparameters used by each system.

The main strength of PIDSMaker is the customization of existing systems for easy experimentation. A few examples below.

From CLI

Running Kairos with embedding size of 128 instead of 100, and last neighbor sampling set to last 10 neighbors instead of 20.

python pidsmaker/main.py kairos CADETS_E3 \
  --training.node_hid_dim=128 \
  --batching.intra_graph_batching.tgn_last_neighbor.tgn_neighbor_size=10

Running Orthrus with Doc2vec instead of word2vec, and 3 GraphSAGE layers instead of 2 attention layers.

python pidsmaker/main.py orthrus CADETS_E3 \
  --featurization.used_method=doc2vec \
  --featurization.emb_dim=128 \
  --training.encoder.used_methods=tgn,sage \
  --training.encoder.sage.num_layers=3

From a new YAML config file

Want to create a new PIDS? Create a new config under config/your_system.yml, inherit from existing PIDSs and tune it as you want.

Magic with node type prediction instead of its hybrid masked feature reconstruction and structure prediction objective function, and use a 2-layer MLP with ReLU as decoder, and use NodLink's thresholding method.

_include_yml: magic

training:
  decoder:
    used_methods: predict_node_type
    predict_node_type:
      node_mlp:
        architecture_str: linear(0.5) | relu

evaluation:
  node_evaluation:
    threshold_method: nodlink

Visualization

You can then visualize the results using the many generated figures, locally or on Weights and Biases.

Hyperparameter tuning

PIDSMaker supports easy hyperparameter tuning for existing or new models. Follow the instructions available in our documentation.

You can specify the range of hyperparameters to search in a yaml config.

method: grid 

parameters:
  training.lr:
    values: [0.001, 0.0001]
  training.node_hid_dim:
    values: [32, 64, 128, 256]
  featurization.used_method:
    values: [fasttext, word2vec]

Then run the framework in tuning mode.

./run.sh my_system CADETS_E3 --tuning_mode=hyperparameters

Once you find the best hyperparameters, store them in a yaml file and run your tuned model.

./run.sh my_system CADETS_E3 --tuned

Citation

If you use this work, please cite the two following papers:

@article{bilot2026pidsmaker,
  title={PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems},
  author={Bilot, Tristan and Jiang, Baoxiang and Pasquier, Thomas},
  journal={arXiv preprint arXiv:2601.22983},
  year={2026}
}
@inproceedings{bilot2025simpler,
	title={{Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems}},
	author={Bilot, Tristan and Jiang, Baoxiang and  Li, Zefeng and  El Madhoun, Nour and Al Agha, Khaldoun and Zouaoui, Anis and Pasquier, Thomas},
	booktitle={Security Symposium (USENIX Sec'25)},
	year={2025},
	organization={USENIX}
}

Contributing

Pull requests are welcome! Please follow the contribution guidelines.

License

See licence.

Name		Name	Last commit message	Last commit date
Latest commit History 1,328 Commits
.circleci		.circleci
.devcontainer		.devcontainer
.github		.github
Ground_Truth		Ground_Truth
config		config
dataset_preprocessing		dataset_preprocessing
docs		docs
pidsmaker		pidsmaker
postgres		postgres
scripts		scripts
tests		tests
.env.local		.env.local
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compose-pidsmaker.yml		compose-pidsmaker.yml
compose-postgres.yml		compose-postgres.yml
entrypoint.sh		entrypoint.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supported Systems

Supported Datasets

📄 Documentation

Pipeline

Setup

⬇️ Clone the repo

💻 Installation with Docker

🧪 Basic usage of the framework

Reproducing results

Customize existing systems

From CLI

From a new YAML config file

Visualization

Hyperparameter tuning

Citation

Contributing

License

About

Uh oh!

Releases 3

Packages

Contributors 6

Languages

License

ubc-provenance/PIDSMaker

Folders and files

Latest commit

History

Repository files navigation

Supported Systems

Supported Datasets

📄 Documentation

Pipeline

Setup

⬇️ Clone the repo

💻 Installation with Docker

🧪 Basic usage of the framework

Reproducing results

Customize existing systems

From CLI

From a new YAML config file

Visualization

Hyperparameter tuning

Citation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 6

Languages

Packages