Skip to content

A framework for building provenance-based intrusion detection systems with neural networks

License

Notifications You must be signed in to change notification settings

ubc-provenance/PIDSMaker

Repository files navigation

PIDSMAKER logo

Docs DOI License Release Stars

📄 Experiments Paper  |  📄 Framework Paper  |  📘 Documentation  |  ⚙️ Installation


The first framework designed to build and experiment with provenance-based intrusion detection systems (PIDSs) using deep learning architectures. It provides a single codebase to run most recent state-of-the-arts systems and easily customize them to develop new variants.

Supported Systems

The framework currently integrates the following PIDSs.

PIDS Venue Paper
Velox USENIX Security 2025 Link
Orthrus USENIX Security 2025 Link
R-Caid IEEE S&P 2024 Link
Flash IEEE S&P 2024 Link
Kairos IEEE S&P 2024 Link
Magic USENIX Security 2024 Link
NodLink NDSS 2024 Link
ThreaTrace IEEE TIFS 2022 Link

Supported Datasets

It also includes several easy-to-install provenance datasets for APT detection.

Dataset OS Attacks Size (GB)
CADETS_E3 FreeBSD 3 10
THEIA_E3 Linux 2 12
CLEARSCOPE_E3 Android 1 4.8
FIVEDIRECTIONS_E3 Windows 2 22
TRACE_E3 Linux 3 100
CADETS_E5 FreeBSD 2 276
THEIA_E5 Linux 1 36
CLEARSCOPE_E5 Android 2 49
FIVEDIRECTIONS_E5 Windows 4 280
TRACE_E5 Linux 1 710
optc_h201 Windows 1 9
optc_h501 Windows 1 6.7
optc_h051 Windows 1 7.7

📄 Documentation

A comprehensive documentation is available, explaining all possible arguments and providing examples on how integrating new systems.

Pipeline

The framework integrates a pipeline composed of seven stages, each parameterizable via configurable arguments, enabling flexible customization of new systems.

Setup

⬇️ Clone the repo

git clone https://github.com/ubc-provenance/PIDSMaker.git

💻 Installation with Docker

We have made the installation of PIDSMaker inclusing pre-processed databases for DARPA TC and OpTC datasets easy and fast. Simply follow these guidelines.

🧪 Basic usage of the framework

Once you have a followed the installation guidelines, you can open a shell in the pids container and experiment in multiple ways. Replace SYSTEM by velox, orthrus, nodlink, threatrace, kairos, rcaid, flash, magic.

  1. Run in the shell:

    python pidsmaker/main.py SYSTEM DATASET
  2. Run in the shell, monitored to weights & biases (W&B):

    python pidsmaker/main.py SYSTEM DATASET --wandb
  3. Run in background, monitored to W&B (recommended for multiple parallel runs and for research):

    ./run.sh SYSTEM DATASET

You can still watch the logs in your shell using tail -f nohup.out.

We generally using using W&B for experiment monitoring and historization (see installation guidelines).

Warning: Before performing evaluations, you should tune all systems (see docs here).

Reproducing results

PIDSs exhibit significant instability—that is, high sensitivity to training perturbations—due to their self-supervised training nature. Running the same configuration with different random seeds or minor hyperparameter changes often yields substantially different results. Consequently, reproducing results as the framework evolves presents a real challenge.

Based on our experiments, we provide tuned hyperparameters for the main systems. However, we can't guarantee that these hyperparameters will lead to satisfactory results due to instability.

We recommend running each system multiple times to increase the likelihood of obtaining a run with good metrics. Alternatively, you can perform hyperparameter tuning for each system.

Customize existing systems

The default configuration files in config/*.yml represent the architecture of existing PIDSs in YAML format. They contain the original hyperparameters used by each system.

The main strength of PIDSMaker is the customization of existing systems for easy experimentation. A few examples below.

From CLI

Running Kairos with embedding size of 128 instead of 100, and last neighbor sampling set to last 10 neighbors instead of 20.

python pidsmaker/main.py kairos CADETS_E3 \
  --training.node_hid_dim=128 \
  --batching.intra_graph_batching.tgn_last_neighbor.tgn_neighbor_size=10

Running Orthrus with Doc2vec instead of word2vec, and 3 GraphSAGE layers instead of 2 attention layers.

python pidsmaker/main.py orthrus CADETS_E3 \
  --featurization.used_method=doc2vec \
  --featurization.emb_dim=128 \
  --training.encoder.used_methods=tgn,sage \
  --training.encoder.sage.num_layers=3

From a new YAML config file

Want to create a new PIDS? Create a new config under config/your_system.yml, inherit from existing PIDSs and tune it as you want.

Magic with node type prediction instead of its hybrid masked feature reconstruction and structure prediction objective function, and use a 2-layer MLP with ReLU as decoder, and use NodLink's thresholding method.

_include_yml: magic

training:
  decoder:
    used_methods: predict_node_type
    predict_node_type:
      node_mlp:
        architecture_str: linear(0.5) | relu

evaluation:
  node_evaluation:
    threshold_method: nodlink

Visualization

You can then visualize the results using the many generated figures, locally or on Weights and Biases.

alt text

Hyperparameter tuning

PIDSMaker supports easy hyperparameter tuning for existing or new models. Follow the instructions available in our documentation.

You can specify the range of hyperparameters to search in a yaml config.

method: grid 

parameters:
  training.lr:
    values: [0.001, 0.0001]
  training.node_hid_dim:
    values: [32, 64, 128, 256]
  featurization.used_method:
    values: [fasttext, word2vec]

Then run the framework in tuning mode.

./run.sh my_system CADETS_E3 --tuning_mode=hyperparameters

Once you find the best hyperparameters, store them in a yaml file and run your tuned model.

./run.sh my_system CADETS_E3 --tuned

Citation

If you use this work, please cite the two following papers:

@article{bilot2026pidsmaker,
  title={PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems},
  author={Bilot, Tristan and Jiang, Baoxiang and Pasquier, Thomas},
  journal={arXiv preprint arXiv:2601.22983},
  year={2026}
}
@inproceedings{bilot2025simpler,
	title={{Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems}},
	author={Bilot, Tristan and Jiang, Baoxiang and  Li, Zefeng and  El Madhoun, Nour and Al Agha, Khaldoun and Zouaoui, Anis and Pasquier, Thomas},
	booktitle={Security Symposium (USENIX Sec'25)},
	year={2025},
	organization={USENIX}
}

Contributing

Pull requests are welcome! Please follow the contribution guidelines.

License

See licence.