stratosphereips · harpomaxx · Dec 1, 2025 · Nov 27, 2025 · Nov 27, 2025 · Nov 27, 2025
diff --git a/docs/immune/DATASET_REPORT.md b/docs/immune/DATASET_REPORT.md
@@ -0,0 +1,168 @@
+# Network Event Summarization Dataset for Slips IDS
+
+## Table of Contents
+
+- [1. Task description](#1-task-description)
+- [2. Limitations](#2-limitations)
+  - [Hardware Constraints](#hardware-constraints)
+  - [Scope Constraints](#scope-constraints)
+- [3. Dataset Generation Workflow](#3-dataset-generation-workflow)
+  - [Stage 1: Incident Sampling](#stage-1-incident-sampling)
+  - [Stage 2: Structural Analysis](#stage-2-structural-analysis)
+  - [Stage 3: Multi-Model LLM Analysis](#stage-3-multi-model-llm-analysis)
+  - [Stage 4: Dataset Correlation](#stage-4-dataset-correlation)
+  - [Dataset Extension](#dataset-extension)
+  - [Workflow Diagram](#workflow-diagram)
+  - [Event Grouping Strategy](#event-grouping-strategy)
+  - [Additional Optimizations](#additional-optimizations)
+  - [Dataset Structure](#dataset-structure)
+
+## 1. Task description 
+
+Develop a dataset for network security event summarization to be integrated with the Slips Immune system, optimized for deployment on low-resource hardware such as the Raspberry Pi 5. This dataset will be used to fine-tune compact language models capable of generating concise and actionable summaries of security incidents from raw Slips alert data, enabling real-time threat analysis in resource-constrained environments.
+
+## 2. Limitations
+
+### Hardware Constraints
+- **Platform**: Raspberry Pi 5 with limited RAM and processing power
+- **Model Size**: Only small language models (1.5B-3B parameters) are viable on target hardware
+- **Real-time Processing**: Target 10-15 seconds per incident on RPi5 with Ollama requires aggressive token optimization
+
+### Scope Constraints
+- **Alert Format**: Analysis currently limited to Slips alert format; generalization to other IDS outputs requires format adaptation
+- **Token Budget**: Input and output tokens must be minimized to enable real-time inference on resource-constrained hardware (~2000 tokens max)
+- **Output Constraints**: Summaries must be concise (150-300 tokens) while maintaining security context
+
+## 3. Dataset Generation Workflow
+
+The dataset generation process consists of four stages, each implemented as Python scripts with shell wrappers that simplify execution, handle argument validation, and automate file naming. This modular design enables flexible experimentation with different models and configurations while maintaining reproducibility.
+
+**Detailed documentation**: See [README_dataset_summary_workflow.md](README_dataset_summary_workflow.md) for complete pipeline specifications and advanced usage.
+
+### Stage 1: Incident Sampling
+Extract security incidents from Slips `alerts.json` logs with category labels (Malware/Normal):
+
+```bash
+./sample_dataset.sh 20 my_dataset --category malware --seed 42
+```
+
+**Output**: `my_dataset.jsonl` (JSONL format with incidents and events)
+
+### Stage 2: Structural Analysis
+Generate DAG-based chronological analysis of incident events:
+
+```bash
+./generate_dag_analysis.sh my_dataset.jsonl
+```
+
+**Output**: `my_dataset.dag.json` (incident metadata + event timeline)
+
+### Stage 3: Multi-Model LLM Analysis
+Query multiple language models with optimized prompts:
+
+```bash
+# GPT-4o-mini (baseline)
+./generate_llm_analysis.sh my_dataset.jsonl --model gpt-4o-mini \
+  --group-events --behavior-analysis
+
+# Qwen2.5:3b (target model)
+./generate_llm_analysis.sh my_dataset.jsonl --model qwen2.5:3b \
+  --base-url http://10.147.20.102:11434/v1 --group-events --behavior-analysis
+
+# Qwen2.5:1.5b (minimal model)
+./generate_llm_analysis.sh my_dataset.jsonl --model qwen2.5:1.5b \
+  --base-url http://10.147.20.102:11434/v1 --group-events --behavior-analysis
+```
+
+**Outputs**: Model-specific JSON files with `summary` and `behavior_analysis` fields
+
+### Stage 4: Dataset Correlation
+Merge all analyses into unified dataset by incident ID:
+
+```bash
+python3 correlate_incidents.py my_dataset.*.json \
+  --jsonl my_dataset.jsonl -o final_dataset.json
+```
+
+**Output**: `final_dataset.json` (consolidated dataset with all analyses)
+
+### Dataset Extension
+
+To expand existing datasets without regeneration, use `merge_datasets.py` to combine multiple correlated datasets with automatic deduplication:
+
+```bash
+# Generate new samples with different seed
+./sample_dataset.sh 20 extension --category malware --seed 99
+
+# Run full analysis pipeline on extension
+./generate_dag_analysis.sh extension.jsonl
+./generate_llm_analysis.sh extension.jsonl --model qwen2.5:3b --group-events --behavior-analysis
+
+# Correlate extension data
+python3 correlate_incidents.py extension.*.json --jsonl extension.jsonl -o extension_dataset.json
+
+# Merge with existing dataset (removes duplicates by incident_id)
+python3 merge_datasets.py final_dataset.json extension_dataset.json -o final_dataset_v2.json
+```
+
+This approach enables incremental dataset growth while maintaining consistency across all analysis fields.
+
+### Workflow Diagram
+
+```
+Raw Slips Logs (alerts.json)
+         ↓
+[sample_dataset.py] → incidents.jsonl
+         ↓
+         ├─→ [alert_dag_parser.py] → incidents.dag.json
+         ├─→ [alert_dag_parser_llm.py + GPT-4o-mini] → incidents.llm.gpt-4o-mini.json
+         ├─→ [alert_dag_parser_llm.py + Qwen2.5:3b] → incidents.llm.qwen2.5.json
+         └─→ [alert_dag_parser_llm.py + Qwen2.5:1.5b] → incidents.llm.qwen2.5.1.5b.json
+         ↓
+[correlate_incidents.py] → final_dataset.json
+```
+
+### Event Grouping Strategy
+
+The `--group-events` optimization reduces token count through pattern normalization:
+
+1. **Pattern Normalization**: Replaces variable components in event descriptions with placeholders
+   - IPv4 addresses → `<IP>`
+   - Port numbers → `<PORT>` (handles formats: `443/TCP`, `port: 80`)
+   - Standalone numbers → `<NUM>`
+
+2. **Pattern-Based Grouping**: Groups events with identical normalized patterns
+   - Example: "Connection to 192.168.1.5:443" + "Connection to 10.0.2.15:443" → single pattern "Connection to `<IP>`:`<PORT>`"
+   - Preserves count, time range, and sample values (first 5 unique IPs/ports) per group
+
+3. **Token Reduction**:
+   - 103 events: 3,522 → 976 tokens (72% reduction)
+   - 4,604 events: ~50,000 → 1,897 tokens (96% reduction)
+
+4. **Information Loss Analysis**:
+   - **Lost**: Individual timestamps (only ranges), complete IP/port lists (max 5 samples), exact event sequence, duplicate frequency tracking
+   - **Retained**: Semantic patterns, event counts, representative samples, temporal context, protocol details, attack patterns
+   - **Impact**: Small incidents (~28% loss), large incidents (~90-95% loss, mostly repetitive data)
+   - **Justification**: Enables LLM summarization on RPi5; alternative is inability to process large incidents
+
+### Additional Optimizations
+
+**Dual-Prompt Analysis** (`--behavior-analysis`): Generates both severity-filtered summaries and structured technical flow analysis, providing richer training signals for model fine-tuning.
+
+**Severity Filtering Strategy**: The dual-prompt approach implements intelligent filtering to manage token budgets:
+- Prioritizes high-threat evidence in summaries for focused incident assessment
+- May omit low-confidence events to reduce token consumption
+- Balanced by generating both severity-filtered summaries and comprehensive behavior analysis
+- Trade-off: Enables complete incident coverage while maintaining concise outputs suitable for resource-constrained deployment
+
+**Multi-Model Evaluation**: Compares GPT-4o (quality baseline), GPT-4o-mini,  Qwen2.5:3b (target deployment), and Qwen2.5:1.5b (minimal viable model) to assess performance-resource trade-offs.
+
+### Dataset Structure
+
+Each incident in the final dataset contains:
+- **Metadata**: incident_id, category, source_ip, timewindow, threat_level
+- **DAG Analysis**: Chronological event timeline with threat scores
+- **LLM Summaries**: Model-specific severity assessments
+- **Behavior Analysis**: Structured network flow descriptions
+
+Token efficiency enables deployment on Raspberry Pi 5 while maintaining security analysis quality suitable for real-time intrusion detection.
diff --git a/docs/immune/DATASET_RISK_REPORT.md b/docs/immune/DATASET_RISK_REPORT.md
@@ -0,0 +1,155 @@
+# Network Event Cause & Risk Analysis Dataset for Slips IDS
+
+## Table of Contents
+
+- [1. Task Description](#1-task-description)
+- [2. Relationship to Summarization Workflow](#2-relationship-to-summarization-workflow)
+- [3. Dataset Generation Workflow](#3-dataset-generation-workflow)
+  - [Workflow Overview](#workflow-overview)
+  - [Stage 3: Multi-Model Cause & Risk Analysis](#stage-3-multi-model-cause--risk-analysis)
+  - [Stage 4: Dataset Correlation](#stage-4-dataset-correlation)
+  - [Dataset Structure](#dataset-structure)
+- [4. Use Cases and Applications](#4-use-cases-and-applications)
+
+## 1. Task Description
+
+Develop a dataset for **root cause analysis and risk assessment** of network security incidents from Slips IDS alerts. This complementary workflow focuses on structured security analysis rather than event summarization, providing:
+
+1. **Cause Analysis** - Categorized incident attribution (Malicious Activity / Legitimate Activity / Misconfigurations)
+2. **Risk Assessment** - Structured evaluation (Risk Level / Business Impact / Investigation Priority)
+
+**Target Deployment**: Same hardware constraints as [summarization workflow](DATASET_REPORT.md#2-limitations) (Raspberry Pi 5, 1.5B-3B parameter models).
+
+## 2. Relationship to Summarization Workflow
+
+Both workflows share identical **Stages 1-2** (incident sampling and DAG generation) but diverge in LLM analysis approach:
+
+| Aspect | Summarization Workflow | Risk Analysis Workflow |
+|--------|------------------------|------------------------|
+| **Documentation** | [DATASET_REPORT.md](DATASET_REPORT.md) | This document |
+| **Detailed Guide** | [README_dataset_summary_workflow.md](README_dataset_summary_workflow.md) | [README_dataset_risk_workflow.md](README_dataset_risk_workflow.md) |
+| **Analysis Script** | `generate_llm_analysis.sh` | `generate_cause_risk_analysis.sh` |
+| **Correlation Script** | `correlate_incidents.py` | `correlate_risks.py` |
+| **Output Fields** | `summary` + `behavior_analysis` | `cause_analysis` + `risk_assessment` |
+| **LLM Prompts** | 2 per incident (event summarization + behavior patterns) | 2 per incident (cause attribution + risk scoring) |
+| **Primary Use Case** | Incident timeline reconstruction, behavior pattern identification | Root cause analysis, threat prioritization, SOC decision support |
+
+**Recommendation**: Generate both datasets from the same sampled incidents to enable comparative analysis and multi-task model training.
+
+## 3. Dataset Generation Workflow
+
+### Workflow Overview
+
+**Stages 1-2** (Sampling + DAG): See [DATASET_REPORT.md §3](DATASET_REPORT.md#3-dataset-generation-workflow) - identical to summarization workflow.
+
+**Quick commands:**
+```bash
+# Stage 1: Sample 100 incidents
+./sample_dataset.sh 100 my_dataset --seed 42
+
+# Stage 2: Generate DAG analysis
+./generate_dag_analysis.sh datasets/my_dataset.jsonl
+```
+
+### Stage 3: Multi-Model Cause & Risk Analysis
+
+Query LLMs with dual prompts for cause attribution and risk assessment:
+
+```bash
+# GPT-4o-mini (recommended baseline)
+./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl \
+  --model gpt-4o-mini --group-events
+
+# Qwen2.5:3b (target deployment model)
+./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl \
+  --model qwen2.5:3b \
+  --base-url http://10.147.20.102:11434/v1 --group-events
+```
+
+**Output Structure** (per incident):
+```json
+{
+  "cause_analysis": "**Possible Causes:**\n\n**1. Malicious Activity:**\n• Port scanning indicates reconnaissance...\n\n**2. Legitimate Activity:**\n• Could be network monitoring tools...\n\n**3. Misconfigurations:**\n• Firewall allowing unrestricted scanning...\n\n**Conclusion:** Most likely malicious reconnaissance activity.",
+
+  "risk_assessment": "**Risk Level:** High\n\n**Justification:** Active scanning + C2 connections...\n\n**Business Impact:** Potential data breach or service disruption...\n\n**Likelihood of Malicious Activity:** High - Systematic attack pattern...\n\n**Investigation Priority:** Immediate - Block source IP and investigate."
+}
+```
+
+### Stage 4: Dataset Correlation
+
+Merge all analyses (DAG + LLM cause/risk assessments) by incident ID:
+
+```bash
+python3 correlate_risks.py datasets/my_dataset.*.json \
+  --jsonl datasets/my_dataset.jsonl \
+  -o datasets/final_dataset_risk.json
+```
+
+### Dataset Structure
+
+Final output contains merged analyses with model-specific risk assessments:
+
+```json
+{
+  "total_incidents": 100,
+  "incidents": [
+    {
+      "incident_id": "uuid",
+      "category": "Malware",
+      "source_ip": "192.168.1.113",
+      "timewindow": "5",
+      "timeline": "2024-04-05 16:53:07 to 16:53:50",
+      "threat_level": 15.36,
+      "event_count": 4604,
+      "dag_analysis": "• 16:53 - 222 horizontal port scans [HIGH]\n...",
+      "cause_risk_gpt_4o_mini": {
+        "cause_analysis": "**1. Malicious Activity:** Reconnaissance scanning...",
+        "risk_assessment": "**Risk Level:** High\n**Justification:**..."
+      },
+      "cause_risk_gpt_4o": { ... },
+      "cause_risk_qwen2_5": { ... }
+    }
+  ]
+}
+```
+
+**Key differences from summarization dataset**:
+- `cause_risk_*` fields replace `llm_*` fields
+- Structured 3-category cause analysis (vs. free-form summary)
+- 5-field risk assessment framework (vs. behavior flow description)
+
+## 4. Use Cases and Applications
+
+### Security Operations Center (SOC)
+- **Automated Triage**: Risk level + investigation priority for alert queue sorting
+- **Incident Attribution**: Distinguish malicious attacks from misconfigurations
+- **Resource Allocation**: Business impact assessment for team assignments
+
+### Model Training Applications
+- **Classification Tasks**: Train models to categorize incidents (malicious/legitimate/misconfiguration)
+- **Risk Scoring**: Fine-tune models for threat level prediction
+- **Decision Support**: Generate actionable recommendations (block/monitor/investigate)
+
+### Dataset Comparison
+Use both workflows together:
+- **Summarization**: "What happened?" (temporal sequences, behavior patterns)
+- **Risk Analysis**: "Why did it happen?" + "How urgent?" (attribution, prioritization)
+
+**Combined Training Strategy**:
+```bash
+# Generate both datasets from same incidents
+./generate_llm_analysis.sh datasets/my_dataset.jsonl --model qwen2.5:3b --group-events --behavior-analysis
+./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl --model qwen2.5:3b --group-events
+
+# Correlate separately
+python3 correlate_incidents.py datasets/my_dataset.*.json --jsonl datasets/my_dataset.jsonl -o summary_dataset.json
+python3 correlate_risks.py datasets/my_dataset.*.json --jsonl datasets/my_dataset.jsonl -o risk_dataset.json
+
+# Multi-task training: Merge datasets and train single model on both tasks
+```
+
+---
+
+**For detailed implementation**: See [README_dataset_risk_workflow.md](README_dataset_risk_workflow.md)
+**For workflow comparison**: See [WORKFLOWS_OVERVIEW.md](WORKFLOWS_OVERVIEW.md) (if available)
+**For evaluation methods**: See [LLM_EVALUATION_GUIDE.md](LLM_EVALUATION_GUIDE.md)
diff --git a/docs/immune/Immune.md b/docs/immune/Immune.md
@@ -2,17 +2,41 @@
 
 This is the main guide to the documentation related to the changes done to Slips as part of incorporating the immunology ideas
 
+### Architecture
 
 - [Main Architecture of Slips Immune](https://stratospherelinuxips.readthedocs.io/en/develop/immune/immune_architecture.html)
+
+### 
 - [Research RPI Limitations](https://stratospherelinuxips.readthedocs.io/en/develop/immune/research_rpi_limitations_and_define_acceptable_performance_benchmarks.html)
 - [Slips Compatibility In The RPI](https://stratospherelinuxips.readthedocs.io/en/develop/immune/reimplement_slips_features_incompatible_with_the_rpi.html)
 - [Installing Slips On the RPI](https://stratospherelinuxips.readthedocs.io/en/develop/immune/installing_slips_in_the_rpi.html)
 - [LLM Research and Selection](https://stratospherelinuxips.readthedocs.io/en/develop/immune/research_and_selection_of_llm_candidates.html)
 - [LLM RPI Performance](https://stratospherelinuxips.readthedocs.io/en/develop/immune/research_rpi_llm_performance.html)
-- [LLM RPI Finetuning Frameworks](https://stratospherelinuxips.readthedocs.io/en/develop/immune/finetuning_frameworks_rpi_5.html)
-- [LLM Summarization Dataset](https://stratospherelinuxips.readthedocs.io/en/develop/immune/summary_dataset.html)
+
+### Security & Network Configuration
+
 - [ARP Poisoning](https://stratospherelinuxips.readthedocs.io/en/develop/immune/arp_poisoning.html)
 - [ARP Poisoning Risks](https://stratospherelinuxips.readthedocs.io/en/develop/immune/arp_poisoning_risks.html)
 - [Blocking with Slips as an Access Point](https://stratospherelinuxips.readthedocs.io/en/develop/immune/blocking_in_slips.html)
 - [IDS-in-the-middle Traffic routing](https://stratospherelinuxips.readthedocs.io/en/develop/immune/ids_in_the_middle_traffic_routing.html)
 - [RPI Failover Mechanisms](https://stratospherelinuxips.readthedocs.io/en/develop/immune/failover_mechanisms.html)
+
+### Datasets & LLM Training
+
+**Overview Documents:**
+- [Dataset Generation Workflows Overview](https://stratospherelinuxips.readthedocs.io/en/develop/immune/WORKFLOWS_OVERVIEW.html) - Quick comparison of summarization vs. risk workflows
+- [Summarization Dataset Report](https://stratospherelinuxips.readthedocs.io/en/develop/immune/DATASET_REPORT.html) - Event summarization and behavior analysis
+- [Risk Analysis Dataset Report](https://stratospherelinuxips.readthedocs.io/en/develop/immune/DATASET_RISK_REPORT.html) - Root cause and risk assessment
+
+**Detailed Workflow Guides:**
+- [Summarization Workflow Implementation](https://stratospherelinuxips.readthedocs.io/en/develop/immune/README_dataset_summary_workflow.html) - Step-by-step guide for generating summarization datasets
+- [Risk Analysis Workflow Implementation](https://stratospherelinuxips.readthedocs.io/en/develop/immune/README_dataset_risk_workflow.html) - Step-by-step guide for generating risk datasets
+- [Alert DAG Parser Documentation](https://stratospherelinuxips.readthedocs.io/en/develop/immune/README_alert_dag.html) - DAG structural analysis reference
+
+**Datasets Evaluation (LLM-as-a-judge):**
+- [LLM Evaluation Guide](https://stratospherelinuxips.readthedocs.io/en/develop/immune/LLM_EVALUATION_GUIDE.html) - How to evaluate and compare LLM models
+- [Summarization Evaluation Results](https://stratospherelinuxips.readthedocs.io/en/develop/immune/summary_report.html) - Performance metrics for summarization models
+- [Risk Analysis Evaluation Results](https://stratospherelinuxips.readthedocs.io/en/develop/immune/risk_summary.html) - Performance metrics for risk assessment models
+
+**LLM finetuning**
+- [LLM RPI Finetuning Frameworks](https://stratospherelinuxips.readthedocs.io/en/develop/immune/finetuning_frameworks_rpi_5.html)