Skip to content

Commit 738324f

Browse files
authored
Merge pull request #31 from ubc-provenance/dev
Update the docs
2 parents cee9c46 + 875b241 commit 738324f

31 files changed

+2622
-137
lines changed

.github/img/pidsmaker_new.png

153 KB
Loading

.github/img/pidsmaker_title.png

412 KB
Loading

README.md

Lines changed: 65 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,16 @@
1-
21
<p align="center">
3-
<img width="50%" src="./.github/img/pidsmaker.png" alt="PIDSMAKER logo"/><br><br>
4-
5-
<a href="https://ubc-provenance.github.io/PIDSMaker/">
6-
<img src="https://img.shields.io/badge/docs-online-pink.svg" alt="Documentation"/>
7-
</a>
8-
<a href="https://doi.org/10.5281/zenodo.15603122">
9-
<img src="https://img.shields.io/badge/DOI-10.5281%2Fzenodo.15603122-blue?logo=zenodo" alt="DOI"/>
10-
</a>
11-
<img src="https://img.shields.io/github/license/ubc-provenance/PIDSMaker?color=red" alt="License"/>
12-
</a>
13-
<a href="https://github.com/ubc-provenance/PIDSMaker/releases">
14-
<img src="https://img.shields.io/github/v/release/ubc-provenance/PIDSMaker" alt="Latest Release"/>
15-
</a>
16-
<a href="https://github.com/ubc-provenance/PIDSMaker/stargazers">
17-
<img src="https://img.shields.io/github/stars/ubc-provenance/PIDSMaker" alt="Stars"/>
18-
</a>
2+
<img width="80%" src="./.github/img/pidsmaker_title.png" alt="PIDSMAKER logo"/>
193
</p>
204

21-
---
5+
<div align="center">
6+
7+
[![Docs](https://img.shields.io/badge/Docs-Online-ed6a2f?style=flat&labelColor=gray)](https://ubc-provenance.github.io/PIDSMaker/)
8+
[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.15603122-ed6a2f?style=flat&labelColor=gray)](https://doi.org/10.5281/zenodo.15603122)
9+
[![License](https://img.shields.io/github/license/ubc-provenance/PIDSMaker?style=flat&color=ed6a2f&labelColor=gray)](LICENSE)
10+
[![Release](https://img.shields.io/github/v/release/ubc-provenance/PIDSMaker?style=flat&color=ed6a2f&labelColor=gray)](https://github.com/ubc-provenance/PIDSMaker/releases)
11+
[![Stars](https://img.shields.io/github/stars/ubc-provenance/PIDSMaker?style=flat&color=ed6a2f&labelColor=white&logo=github&logoColor=black)](https://github.com/ubc-provenance/PIDSMaker/stargazers)
12+
13+
</div>
2214

2315
<p align="center">
2416
<strong>
@@ -30,23 +22,56 @@
3022
</strong>
3123
</p>
3224

25+
---
26+
3327
The first framework designed to build and experiment with provenance-based intrusion detection systems (PIDSs) using deep learning architectures.
3428
It provides a single codebase to run most recent state-of-the-arts systems and easily customize them to develop new variants.
3529

36-
**Currently supported PIDSs**:
37-
- **Velox** (USENIX Sec'25): [Sometimes Simpler is Better: A Comprehensive Analysis of State-of-the-Art Provenance-Based Intrusion Detection Systems](https://tfjmp.org/publications/2025-usenixsec-2.pdf)
38-
- **Orthrus** (USENIX Sec'25): [ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion Detection Systems](https://www.usenix.org/system/files/conference/usenixsecurity25/sec25cycle1-prepub-103-jiang-baoxiang.pdf)
39-
- **R-Caid** (IEEE S\&P'24): [R-CAID: Embedding Root Cause Analysis within Provenance-based Intrusion Detection](https://gangw.web.illinois.edu/rcaid-sp24.pdf)
40-
- **Flash** (IEEE S\&P'24): [Flash: A Comprehensive Approach to Intrusion Detection via Provenance Graph Representation Learning](https://dartlab.org/assets/pdf/flash.pdf)
41-
- **Kairos** (IEEE S\&P'24): [Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance](https://arxiv.org/pdf/2308.05034)
42-
- **Magic** (USENIX Sec'24): [MAGIC: Detecting Advanced Persistent Threats via Masked Graph Representation Learning](https://www.usenix.org/system/files/usenixsecurity24-jia-zian.pdf)
43-
- **NodLink** (NDSS'24): [NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation](https://arxiv.org/pdf/2311.02331)
44-
- **ThreaTrace** (IEEE TIFS'22): [THREATRACE: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning](https://arxiv.org/pdf/2111.04333)
30+
### Supported Systems
31+
32+
The framework currently integrates the following PIDSs.
33+
34+
| PIDS | Venue | Paper |
35+
|------------|---------------------|-------|
36+
| Velox | USENIX Security 2025 | [Link](https://tfjmp.org/publications/2025-usenixsec-2.pdf) |
37+
| Orthrus | USENIX Security 2025 | [Link](https://www.usenix.org/system/files/conference/usenixsecurity25/sec25cycle1-prepub-103-jiang-baoxiang.pdf) |
38+
| R-Caid | IEEE S&P 2024 | [Link](https://gangw.web.illinois.edu/rcaid-sp24.pdf) |
39+
| Flash | IEEE S&P 2024 | [Link](https://dartlab.org/assets/pdf/flash.pdf) |
40+
| Kairos | IEEE S&P 2024 | [Link](https://arxiv.org/pdf/2308.05034) |
41+
| Magic | USENIX Security 2024 | [Link](https://www.usenix.org/system/files/usenixsecurity24-jia-zian.pdf) |
42+
| NodLink | NDSS 2024 | [Link](https://arxiv.org/pdf/2311.02331) |
43+
| ThreaTrace | IEEE TIFS 2022 | [Link](https://arxiv.org/pdf/2111.04333) |
44+
45+
### Supported Datasets
46+
47+
It also includes several easy-to-install provenance datasets for APT detection.
48+
49+
| Dataset | OS | Attacks | Size (GB) |
50+
|---------|------|---------|-----------|
51+
| CADETS_E3 | FreeBSD | 3 | 10 |
52+
| THEIA_E3 | Linux | 2 | 12 |
53+
| CLEARSCOPE_E3 | Linux | 1 | 4.8 |
54+
| FIVEDIRECTIONS_E3 | Linux | 2 | 22 |
55+
| TRACE_E3 | Linux | 3 | 100 |
56+
| CADETS_E5 | FreeBSD | 2 | 276 |
57+
| THEIA_E5 | Linux | 1 | 36 |
58+
| CLEARSCOPE_E5 | Linux | 2 | 49 |
59+
| FIVEDIRECTIONS_E5 | Linux | 4 | 280 |
60+
| TRACE_E5 | Linux | 1 | 710 |
61+
| optc_h201 | Windows | 1 | 9 |
62+
| optc_h501 | Windows | 1 | 6.7 |
63+
| optc_h051 | Windows | 1 | 7.7 |
4564

4665
## 📄 Documentation
4766

4867
A [comprehensive documentation](https://ubc-provenance.github.io/PIDSMaker/) is available, explaining all possible arguments and providing examples on how integrating new systems.
4968

69+
### Pipeline
70+
71+
The framework integrates a [pipeline](https://ubc-provenance.github.io/PIDSMaker/pipeline) composed of seven stages, each parameterizable via configurable arguments, enabling flexible customization of new systems.
72+
73+
<img src="docs/docs/img/pipeline.svg" style="width: 100%"/>
74+
5075

5176
## Setup
5277

@@ -63,8 +88,8 @@ We have made the installation of PIDSMaker inclusing pre-processed databases for
6388

6489
Once you have a followed the installation guidelines, you can open a shell in the `pids container` and experiment in multiple ways.
6590

66-
- Replace `SYSTEM` by `velox | orthrus | nodlink | threatrace | kairos | rcaid | flash | magic`.
67-
- Replace `DATASET` by `CLEARSCOPE_E3 | CADETS_E3 | THEIA_E3 | CLEARSCOPE_E5 | THEIA_E5 | optc_h201 | optc_h501 | optc_h051`.
91+
- Replace `SYSTEM` by `velox`, `orthrus`, `nodlink`, `threatrace`, `kairos`, `rcaid`, `flash`, `magic`.
92+
- Replace `DATASET` by `CADETS_E3`, `THEIA_E3`, `CLEARSCOPE_E3`, `FIVEDIRECTIONS_E3`, `TRACE_E3`, `CADETS_E5`, `THEIA_E5`, `CLEARSCOPE_E5`, `FIVEDIRECTIONS_E5`, `TRACE_E5 `, `optc_h201`, `optc_h501`, or `optc_h051`.
6893

6994
1. Run in the shell:
7095
```shell
@@ -87,6 +112,17 @@ We generally using using W&B for experiment monitoring and historization (see in
87112

88113
**Warning:** Before performing evaluations, you should tune all systems (see docs [here](https://ubc-provenance.github.io/PIDSMaker/features/tuning/)).
89114

115+
## Reproducing results
116+
117+
PIDSs exhibit significant instability—that is, high sensitivity to training perturbations—due to their self-supervised training nature.
118+
Running the same configuration with different random seeds or minor hyperparameter changes often yields substantially different results.
119+
Consequently, reproducing results as the framework evolves presents a real challenge.
120+
121+
Based on our experiments, we provide [tuned hyperparameters](https://ubc-provenance.github.io/PIDSMaker/tuned_systems) for the main systems.
122+
However, we can't guarantee that these hyperparameters will lead to satisfactory results due to instability.
123+
124+
We recommend [running each system multiple times](https://ubc-provenance.github.io/PIDSMaker/features/instability/) to increase the likelihood of obtaining a run with good metrics. Alternatively, you can perform [hyperparameter tuning](https://ubc-provenance.github.io/PIDSMaker/features/tuning/) for each system.
125+
90126
## Customize existing systems
91127
92128
The default configuration files in `config/*.yml` represent the architecture of existing PIDSs in YAML format. They contain the original hyperparameters used by each system.

docs/docs/assets/logo.ico

-227 KB
Binary file not shown.

docs/docs/assets/pidsmaker.ico

227 KB
Binary file not shown.

docs/docs/assets/pidsmaker_new.png

153 KB
Loading
412 KB
Loading

docs/docs/config/tasks.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,30 @@
1-
Tasks are steps composing the pipeline, starting from graph construction (`construction`) to detection (`evaluation`) or optionally triage (`tracing`).
2-
Each task takes as input the output from the previous task and write its output to the disk so that the next task can use it. This process enables "checkpointing" across the pipeline and avoids the duplication of compute. More information on tasks and the pipeline [here](../pipeline.md).
1+
Tasks are steps composing the pipeline, starting from graph construction (`construction`) to detection (`evaluation`) or optionally triage (`triage`).
2+
Each task takes as input the output from the previous task and writes its output to the disk so that the next task can use it. This process enables "checkpointing" across the pipeline and avoids the duplication of compute. More information on tasks and the pipeline [here](../pipeline.md).
33

4-
### Preprocessing
4+
### Stage 1: Construction
55

6-
--8<-- "scripts/args/args_preprocessing.md"
6+
--8<-- "scripts/args/args_construction.md"
77

8-
### Featurization
8+
### Stage 2: Transformation
9+
10+
--8<-- "scripts/args/args_transformation.md"
11+
12+
### Stage 3: Featurization
913

1014
--8<-- "scripts/args/args_featurization.md"
1115

12-
### Detection
16+
### Stage 4: Batching
1317

14-
--8<-- "scripts/args/args_detection.md"
18+
--8<-- "scripts/args/args_batching.md"
1519

16-
### Triage
20+
### Stage 5: Training
1721

18-
--8<-- "scripts/args/args_triage.md"
22+
--8<-- "scripts/args/args_training.md"
23+
24+
### Stage 6: Evaluation
1925

26+
--8<-- "scripts/args/args_evaluation.md"
27+
28+
### Stage 7: Triage
29+
30+
--8<-- "scripts/args/args_triage.md"

docs/docs/create-db-from-scratch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Install a dataset from scratch
1+
# Installing a dataset from scratch
22

33
PIDSMaker comes by default with pre-processed versions of DARPA datasets.
44
If you want to install them from scratch using the official public files, follow this guide.

docs/docs/datasets.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Datasets
2+
3+
PIDSMaker supports several public datasets commonly used in APT detection research. This page describes each dataset and its attack scenarios.
4+
5+
## Overview
6+
7+
| Dataset | OS | Attacks | Size (GB) |
8+
|---------|------|---------|-----------|
9+
| CADETS_E3 | FreeBSD | 3 | 10 |
10+
| THEIA_E3 | Linux | 2 | 12 |
11+
| CLEARSCOPE_E3 | Linux | 1 | 4.8 |
12+
| FIVEDIRECTIONS_E3 | Linux | 2 | 22 |
13+
| TRACE_E3 | Linux | 3 | 100 |
14+
| CADETS_E5 | FreeBSD | 2 | 276 |
15+
| THEIA_E5 | Linux | 1 | 36 |
16+
| CLEARSCOPE_E5 | Linux | 2 | 49 |
17+
| FIVEDIRECTIONS_E5 | Linux | 4 | 280 |
18+
| TRACE_E5 | Linux | 1 | 710 |
19+
| optc_h201 | Windows | 1 | 9 |
20+
| optc_h501 | Windows | 1 | 6.7 |
21+
| optc_h051 | Windows | 1 | 7.7 |
22+
23+
24+
25+
## DARPA TC
26+
27+
The DARPA Transparent Computing program produced benchmark datasets for evaluating provenance-based security systems.
28+
29+
### [Engagement 3 (E3) - April 2018](https://github.com/darpa-i2o/Transparent-Computing/blob/master/README-E3.md)
30+
31+
32+
#### CADETS_E3
33+
34+
FreeBSD host with Nginx server exploitation.
35+
36+
| Attack id | Duration | Description |
37+
|---|----------|-------------|
38+
| 0 | 49 min | Nginx exploited to deploy Drakon loader with root escalation. Netrecon executed after C2 connection, followed by failed `libdrakon` injection into `sshd`. Host crashed with kernel panic. |
39+
| 1 | 40 min | Nginx re-exploited to deploy Drakon and MicroAPT implants under random names (`tmux`, `minions`, `sendmail`). Privilege escalation failed; MicroAPT ran unprivileged for port scanning. |
40+
| 2 | 13 min | Nginx re-exploited to deploy new Drakon implant with root privileges. Multiple failed `sshd` injection attempts using renamed `libdrakon` copies. |
41+
42+
```shell
43+
python pidsmaker/main.py SYSTEM CADETS_E3
44+
```
45+
46+
#### THEIA_E3
47+
48+
Ubuntu host with Firefox exploitation.
49+
50+
| Attack id | Duration | Description |
51+
|---|----------|-------------|
52+
| 0 | 50 min | Malicious Firefox extension dropped Drakon implant. MicroAPT staged under `/var/log/mail`, connected to C2 for control and network scanning. |
53+
| 1 | 30 min | Firefox exploited to drop Drakon implant as `/home/admin/clean` with root privileges, then copied as `profile`. Both connected to C2 server. |
54+
55+
```shell
56+
python pidsmaker/main.py SYSTEM THEIA_E3
57+
```
58+
59+
#### CLEARSCOPE_E3
60+
61+
Android device with Firefox exploitation.
62+
63+
| Attack id | Duration | Description |
64+
|---|----------|-------------|
65+
| 0 | 54 min | Firefox exploited via malicious website. Drakon implant installed and elevated, but module loading failed. Persistent C2 connection maintained. |
66+
67+
```shell
68+
python pidsmaker/main.py SYSTEM CLEARSCOPE_E3
69+
```
70+
71+
### [Engagement 5 (E5) - May 2019](https://github.com/darpa-i2o/Transparent-Computing)
72+
73+
#### THEIA_E5
74+
75+
Ubuntu host with Firefox exploitation.
76+
77+
| Attack id | Duration | Description |
78+
|---|----------|-------------|
79+
| 0 | 19 min | Firefox exploited via malicious website. Root gained with BinFmt-Elevate, Drakon shellcode injected into `sshd`, persistence file created, C2 access maintained. |
80+
81+
```shell
82+
python pidsmaker/main.py SYSTEM THEIA_E5
83+
```
84+
85+
#### CLEARSCOPE_E5
86+
87+
Android device with APK-based attacks.
88+
89+
| Attack id | Duration | Description |
90+
|---|----------|-------------|
91+
| 0 | 41 min | Malicious `appstarter` APK loaded MicroAPT. Elevate driver installed for privilege escalation. Sensitive databases exfiltrated (calllog, calendar, SMS) and screenshot captured. |
92+
| 1 | 8 min | MicroAPT deployed directly via adb shell after APK dropper failed. Privilege escalation via BinFmt Elevate driver, then file exfiltration. |
93+
94+
```shell
95+
python pidsmaker/main.py SYSTEM CLEARSCOPE_E5
96+
```
97+
98+
## [DARPA OpTC](https://github.com/FiveDirections/OpTC-data)
99+
100+
Windows enterprise environment with realistic APT scenarios.
101+
102+
### optc_h201
103+
104+
| Attack id | Duration | Description |
105+
|---|----------|-------------|
106+
| 0 | 1h58 | PowerShell Empire stager executed with elevated access. Mimikatz used for credential theft, registry persistence set, recon performed, then pivoted to other hosts via WMI. |
107+
108+
```shell
109+
python pidsmaker/main.py SYSTEM optc_h201
110+
```
111+
112+
### optc_h501
113+
114+
| Attack id | Duration | Description |
115+
|---|----------|-------------|
116+
| 0 | 5h01 | Phishing email launched PowerShell Empire stager. Escalated via DeathStar, WMI persistence established, RDP tunneling and file exfiltration performed, then pivoted to other hosts. |
117+
118+
```shell
119+
python pidsmaker/main.py SYSTEM optc_h501
120+
```
121+
122+
### optc_h051
123+
124+
| Attack id | Duration | Description |
125+
|---|----------|-------------|
126+
| 0 | 3h56 | Malicious Notepad++ update installed Meterpreter. Escalated to SYSTEM, migrated into LSASS for Mimikatz credential theft, established persistence, timestomped files, added admin account for RDP. |
127+
128+
```shell
129+
python pidsmaker/main.py SYSTEM optc_h051
130+
```
131+
132+
!!! note
133+
TODO: add descriptions for CADETS_E5, FIVED and TRACE datasets.
134+
135+
## Data structure
136+
137+
### Graph partitioning
138+
139+
Each dataset is partitioned into daily graphs, split into:
140+
141+
- **Train graphs**: Normal activity for model training
142+
- **Validation graphs**: Normal activity for threshold calibration
143+
- **Test graphs**: Contains both normal activity and attacks
144+
145+
## Adding custom datasets
146+
147+
To add a new dataset, define its configuration in `pidsmaker/config/config.py`:
148+
149+
```python
150+
DATASET_DEFAULT_CONFIG = {
151+
"MY_DATASET": {
152+
"database": "my_database_name",
153+
"num_node_types": 3,
154+
"num_edge_types": 10,
155+
"train_files": ["graph_1", "graph_2", "graph_3"],
156+
"val_files": ["graph_4"],
157+
"test_files": ["graph_5", "graph_6"],
158+
"ground_truth_relative_path": ["MY_DATASET/labels.csv"],
159+
"attack_to_time_window": [
160+
["MY_DATASET/labels.csv", "2024-01-05 10:00:00", "2024-01-05 12:00:00"],
161+
],
162+
},
163+
}
164+
```
165+
166+
Then follow the [database creation guide](create-db-from-scratch.md) to load your data.

0 commit comments

Comments
 (0)