Skip to content

Commit b8f148f

Browse files
authored
Update summary_dataset.md
1 parent 4193df3 commit b8f148f

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

docs/immune/summary_dataset.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@
2121

2222
Develop a dataset for network security event summarization to be integrated with the Slips Immune system, optimized for deployment on low-resource hardware such as the Raspberry Pi 5. This dataset will be used to fine-tune compact language models capable of generating concise and actionable summaries of security incidents from raw Slips alert data, enabling real-time threat analysis in resource-constrained environments.
2323

24+
The current version of the dataset used for finentuning LLM models is available [here](https://github.com/stratosphereips/Slips-tools/raw/refs/heads/main/alert_summary/datasets/summarization_dataset_extended.json.gz)
25+
26+
2427
## 2. Limitations
2528

2629
### Hardware Constraints
@@ -38,6 +41,7 @@ Develop a dataset for network security event summarization to be integrated with
3841
The dataset generation process consists of four stages, each implemented as Python scripts with shell wrappers that simplify execution, handle argument validation, and automate file naming. This modular design enables flexible experimentation with different models and configurations while maintaining reproducibility.
3942

4043
**Detailed documentation**: See [summary_dataset_workflow.md](summary_dataset_workflow.md) for complete pipeline specifications and advanced usage.
44+
The complete set of scripts for creating and updating the dataset is availabble in the [SLIPS tools repository](https://github.com/stratosphereips/Slips-tools/tree/main/alert_summary).
4145

4246
### Stage 1: Incident Sampling
4347
Extract security incidents from Slips `alerts.json` logs with category labels (Malware/Normal):
@@ -167,4 +171,4 @@ Each incident in the final dataset contains:
167171

168172
Token efficiency enables deployment on Raspberry Pi 5 while maintaining security analysis quality suitable for real-time intrusion detection.
169173

170-
The current version of the dataset used for finentuning LLM models is available [here](https://github.com/stratosphereips/Slips-tools/raw/refs/heads/main/alert_summary/datasets/summarization_dataset.json.gz)
174+

0 commit comments

Comments
 (0)