You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first framework designed to build and experiment with provenance-based intrusion detection systems (PIDSs) using deep learning architectures.
34
28
It provides a single codebase to run most recent state-of-the-arts systems and easily customize them to develop new variants.
35
29
36
-
**Currently supported PIDSs**:
37
-
-**Velox** (USENIX Sec'25): [Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems](https://tfjmp.org/publications/2025-usenixsec-2.pdf)
38
-
-**Orthrus** (USENIX Sec'25): [ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion Detection Systems](https://www.usenix.org/system/files/conference/usenixsecurity25/sec25cycle1-prepub-103-jiang-baoxiang.pdf)
39
-
-**R-Caid** (IEEE S\&P'24): [R-CAID: Embedding Root Cause Analysis within Provenance-based Intrusion Detection](https://gangw.web.illinois.edu/rcaid-sp24.pdf)
40
-
-**Flash** (IEEE S\&P'24): [Flash: A Comprehensive Approach to Intrusion Detection via Provenance Graph Representation Learning](https://dartlab.org/assets/pdf/flash.pdf)
41
-
-**Kairos** (IEEE S\&P'24): [Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance](https://arxiv.org/pdf/2308.05034)
It also includes several easy-to-install provenance datasets for APT detection.
48
+
49
+
| Dataset | OS | Attacks | Size (GB) |
50
+
|---------|------|---------|-----------|
51
+
| CADETS_E3 | FreeBSD | 3 | 10 |
52
+
| THEIA_E3 | Linux | 2 | 12 |
53
+
| CLEARSCOPE_E3 | Linux | 1 | 4.8 |
54
+
| FIVEDIRECTIONS_E3 | Linux | 2 | 22 |
55
+
| TRACE_E3 | Linux | 3 | 100 |
56
+
| CADETS_E5 | FreeBSD | 2 | 276 |
57
+
| THEIA_E5 | Linux | 1 | 36 |
58
+
| CLEARSCOPE_E5 | Linux | 2 | 49 |
59
+
| FIVEDIRECTIONS_E5 | Linux | 4 | 280 |
60
+
| TRACE_E5 | Linux | 1 | 710 |
61
+
| optc_h201 | Windows | 1 | 9 |
62
+
| optc_h501 | Windows | 1 | 6.7 |
63
+
| optc_h051 | Windows | 1 | 7.7 |
45
64
46
65
## 📄 Documentation
47
66
48
67
A [comprehensive documentation](https://ubc-provenance.github.io/PIDSMaker/) is available, explaining all possible arguments and providing examples on how integrating new systems.
49
68
69
+
### Pipeline
70
+
71
+
The framework integrates a [pipeline](https://ubc-provenance.github.io/PIDSMaker/pipeline) composed of seven stages, each parameterizable via configurable arguments, enabling flexible customization of new systems.
- Replace `DATASET` by `CADETS_E3`, `THEIA_E3`, `CLEARSCOPE_E3`, `FIVEDIRECTIONS_E3`, `TRACE_E3`, `CADETS_E5`, `THEIA_E5`, `CLEARSCOPE_E5`, `FIVEDIRECTIONS_E5`, `TRACE_E5 `, `optc_h201`, `optc_h501`, or `optc_h051`.
68
93
69
94
1. Run in the shell:
70
95
```shell
@@ -87,6 +112,17 @@ We generally using using W&B for experiment monitoring and historization (see in
87
112
88
113
**Warning:** Before performing evaluations, you should tune all systems (see docs [here](https://ubc-provenance.github.io/PIDSMaker/features/tuning/)).
89
114
115
+
## Reproducing results
116
+
117
+
PIDSs exhibit significant instability—that is, high sensitivity to training perturbations—due to their self-supervised training nature.
118
+
Running the same configuration with different random seeds or minor hyperparameter changes often yields substantially different results.
119
+
Consequently, reproducing results as the framework evolves presents a real challenge.
120
+
121
+
Based on our experiments, we provide [tuned hyperparameters](https://ubc-provenance.github.io/PIDSMaker/tuned_systems) for the main systems.
122
+
However, we can't guarantee that these hyperparameters will lead to satisfactory results due to instability.
123
+
124
+
We recommend [running each system multiple times](https://ubc-provenance.github.io/PIDSMaker/features/instability/) to increase the likelihood of obtaining a run with good metrics. Alternatively, you can perform [hyperparameter tuning](https://ubc-provenance.github.io/PIDSMaker/features/tuning/) for each system.
125
+
90
126
## Customize existing systems
91
127
92
128
The default configuration files in `config/*.yml` represent the architecture of existing PIDSs in YAML format. They contain the original hyperparameters used by each system.
Tasks are steps composing the pipeline, starting from graph construction (`construction`) to detection (`evaluation`) or optionally triage (`tracing`).
2
-
Each task takes as input the output from the previous task and write its output to the disk so that the next task can use it. This process enables "checkpointing" across the pipeline and avoids the duplication of compute. More information on tasks and the pipeline [here](../pipeline.md).
1
+
Tasks are steps composing the pipeline, starting from graph construction (`construction`) to detection (`evaluation`) or optionally triage (`triage`).
2
+
Each task takes as input the output from the previous task and writes its output to the disk so that the next task can use it. This process enables "checkpointing" across the pipeline and avoids the duplication of compute. More information on tasks and the pipeline [here](../pipeline.md).
PIDSMaker supports several public datasets commonly used in APT detection research. This page describes each dataset and its attack scenarios.
4
+
5
+
## Overview
6
+
7
+
| Dataset | OS | Attacks | Size (GB) |
8
+
|---------|------|---------|-----------|
9
+
| CADETS_E3 | FreeBSD | 3 | 10 |
10
+
| THEIA_E3 | Linux | 2 | 12 |
11
+
| CLEARSCOPE_E3 | Linux | 1 | 4.8 |
12
+
| FIVEDIRECTIONS_E3 | Linux | 2 | 22 |
13
+
| TRACE_E3 | Linux | 3 | 100 |
14
+
| CADETS_E5 | FreeBSD | 2 | 276 |
15
+
| THEIA_E5 | Linux | 1 | 36 |
16
+
| CLEARSCOPE_E5 | Linux | 2 | 49 |
17
+
| FIVEDIRECTIONS_E5 | Linux | 4 | 280 |
18
+
| TRACE_E5 | Linux | 1 | 710 |
19
+
| optc_h201 | Windows | 1 | 9 |
20
+
| optc_h501 | Windows | 1 | 6.7 |
21
+
| optc_h051 | Windows | 1 | 7.7 |
22
+
23
+
24
+
25
+
## DARPA TC
26
+
27
+
The DARPA Transparent Computing program produced benchmark datasets for evaluating provenance-based security systems.
28
+
29
+
### [Engagement 3 (E3) - April 2018](https://github.com/darpa-i2o/Transparent-Computing/blob/master/README-E3.md)
30
+
31
+
32
+
#### CADETS_E3
33
+
34
+
FreeBSD host with Nginx server exploitation.
35
+
36
+
| Attack id | Duration | Description |
37
+
|---|----------|-------------|
38
+
| 0 | 49 min | Nginx exploited to deploy Drakon loader with root escalation. Netrecon executed after C2 connection, followed by failed `libdrakon` injection into `sshd`. Host crashed with kernel panic. |
39
+
| 1 | 40 min | Nginx re-exploited to deploy Drakon and MicroAPT implants under random names (`tmux`, `minions`, `sendmail`). Privilege escalation failed; MicroAPT ran unprivileged for port scanning. |
40
+
| 2 | 13 min | Nginx re-exploited to deploy new Drakon implant with root privileges. Multiple failed `sshd` injection attempts using renamed `libdrakon` copies. |
41
+
42
+
```shell
43
+
python pidsmaker/main.py SYSTEM CADETS_E3
44
+
```
45
+
46
+
#### THEIA_E3
47
+
48
+
Ubuntu host with Firefox exploitation.
49
+
50
+
| Attack id | Duration | Description |
51
+
|---|----------|-------------|
52
+
| 0 | 50 min | Malicious Firefox extension dropped Drakon implant. MicroAPT staged under `/var/log/mail`, connected to C2 for control and network scanning. |
53
+
| 1 | 30 min | Firefox exploited to drop Drakon implant as `/home/admin/clean` with root privileges, then copied as `profile`. Both connected to C2 server. |
54
+
55
+
```shell
56
+
python pidsmaker/main.py SYSTEM THEIA_E3
57
+
```
58
+
59
+
#### CLEARSCOPE_E3
60
+
61
+
Android device with Firefox exploitation.
62
+
63
+
| Attack id | Duration | Description |
64
+
|---|----------|-------------|
65
+
| 0 | 54 min | Firefox exploited via malicious website. Drakon implant installed and elevated, but module loading failed. Persistent C2 connection maintained. |
66
+
67
+
```shell
68
+
python pidsmaker/main.py SYSTEM CLEARSCOPE_E3
69
+
```
70
+
71
+
### [Engagement 5 (E5) - May 2019](https://github.com/darpa-i2o/Transparent-Computing)
72
+
73
+
#### THEIA_E5
74
+
75
+
Ubuntu host with Firefox exploitation.
76
+
77
+
| Attack id | Duration | Description |
78
+
|---|----------|-------------|
79
+
| 0 | 19 min | Firefox exploited via malicious website. Root gained with BinFmt-Elevate, Drakon shellcode injected into `sshd`, persistence file created, C2 access maintained. |
80
+
81
+
```shell
82
+
python pidsmaker/main.py SYSTEM THEIA_E5
83
+
```
84
+
85
+
#### CLEARSCOPE_E5
86
+
87
+
Android device with APK-based attacks.
88
+
89
+
| Attack id | Duration | Description |
90
+
|---|----------|-------------|
91
+
| 0 | 41 min | Malicious `appstarter` APK loaded MicroAPT. Elevate driver installed for privilege escalation. Sensitive databases exfiltrated (calllog, calendar, SMS) and screenshot captured. |
92
+
| 1 | 8 min | MicroAPT deployed directly via adb shell after APK dropper failed. Privilege escalation via BinFmt Elevate driver, then file exfiltration. |
Windows enterprise environment with realistic APT scenarios.
101
+
102
+
### optc_h201
103
+
104
+
| Attack id | Duration | Description |
105
+
|---|----------|-------------|
106
+
| 0 | 1h58 | PowerShell Empire stager executed with elevated access. Mimikatz used for credential theft, registry persistence set, recon performed, then pivoted to other hosts via WMI. |
107
+
108
+
```shell
109
+
python pidsmaker/main.py SYSTEM optc_h201
110
+
```
111
+
112
+
### optc_h501
113
+
114
+
| Attack id | Duration | Description |
115
+
|---|----------|-------------|
116
+
| 0 | 5h01 | Phishing email launched PowerShell Empire stager. Escalated via DeathStar, WMI persistence established, RDP tunneling and file exfiltration performed, then pivoted to other hosts. |
117
+
118
+
```shell
119
+
python pidsmaker/main.py SYSTEM optc_h501
120
+
```
121
+
122
+
### optc_h051
123
+
124
+
| Attack id | Duration | Description |
125
+
|---|----------|-------------|
126
+
| 0 | 3h56 | Malicious Notepad++ update installed Meterpreter. Escalated to SYSTEM, migrated into LSASS for Mimikatz credential theft, established persistence, timestomped files, added admin account for RDP. |
127
+
128
+
```shell
129
+
python pidsmaker/main.py SYSTEM optc_h051
130
+
```
131
+
132
+
!!! note
133
+
TODO: add descriptions for CADETS_E5, FIVED and TRACE datasets.
134
+
135
+
## Data structure
136
+
137
+
### Graph partitioning
138
+
139
+
Each dataset is partitioned into daily graphs, split into:
140
+
141
+
-**Train graphs**: Normal activity for model training
142
+
-**Validation graphs**: Normal activity for threshold calibration
143
+
-**Test graphs**: Contains both normal activity and attacks
144
+
145
+
## Adding custom datasets
146
+
147
+
To add a new dataset, define its configuration in `pidsmaker/config/config.py`:
0 commit comments