You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/pipeline.rst
+46-11Lines changed: 46 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -433,17 +433,28 @@ Stage 4: Inspection
433
433
Overview
434
434
--------
435
435
436
-
The `Inspector` stage is responsible to run time-series based anomaly detection on prefiltered batches. This stage
437
-
is essentiell to reduce the load on the `Detection` stage. Otherwise, resource complexity increases disproportionately.
436
+
The **Inspection** stage performs time-series-based anomaly detection on prefiltered DNS request batches.
437
+
Its primary purpose is to reduce the load on the `Detection` stage by filtering out non-suspicious traffic early.
438
+
439
+
This stage uses StreamAD models—supporting univariate, multivariate, and ensemble techniques—to detect unusual patterns
440
+
in request volume and packet sizes.
441
+
438
442
439
443
Main Class
440
444
----------
441
445
442
446
.. py:currentmodule:: src.inspector.inspector
443
447
.. autoclass:: Inspector
448
+
:members:
449
+
:undoc-members:
450
+
:show-inheritance:
451
+
452
+
The :class:`Inspector` class is responsible for:
444
453
445
-
The :class:`Inspector` is the primary class to run StreamAD models for time-series based anomaly detection, such as the Z-Score outlier detection.
446
-
In addition, it features fine-tuning settings for models and anomaly thresholds.
454
+
- Loading batches from Kafka
455
+
- Extracting time-series features (e.g., frequency and packet size)
456
+
- Applying anomaly detection models
457
+
- Forwarding suspicious batches to the detector stage
447
458
448
459
Usage
449
460
-----
@@ -498,9 +509,9 @@ Stage 5: Detection
498
509
Overview
499
510
--------
500
511
501
-
The `Detector` resembles the heart of heiDGAF. It runs pre-trained machine learning models to get a probability outcome for the DNS requests.
502
-
The pre-trained models are under the EUPL-1.2 license online available.
503
-
In total, we rely on the following data sets for the pre-trained models we offer:
512
+
The **Detection** stage is the core of the heiDGAF pipeline. It consumes **suspicious batches** passed from the `Inspector`, applies **pre-trained ML models** to classify individual DNS requests, and issues alerts based on aggregated probabilities.
513
+
514
+
The pre-trained models used here are licensed under **EUPL‑1.2** and built from the following datasets:
- `DGTA-BENCH - Domain Generation and Tunneling Algorithms for Benchmark <https://data.mendeley.com/datasets/2wzf9bz7xr/1>`_
@@ -511,15 +522,39 @@ Main Class
511
522
512
523
.. py:currentmodule:: src.detector.detector
513
524
.. autoclass:: Detector
525
+
:members:
526
+
:undoc-members:
527
+
:show-inheritance:
528
+
529
+
The :class:`Detector` class:
530
+
531
+
- Consumes a batch flagged as suspicious.
532
+
- Downloads and validates the ML model (if necessary).
533
+
- Extracts features from domain names (e.g. character distributions, entropy, label statistics).
534
+
- Computes a probability per request and an overall risk score per batch.
535
+
- Emits alerts to ClickHouse and logs in ``/tmp/warnings.json`` where applicable.
514
536
515
537
Usage
516
538
-----
517
539
518
-
The :class:`Detector` consumes anomalous batches of requests.
519
-
It calculates a probability score for each request, and at last, an overall score of the batch.
520
-
Alerts are log to ``/tmp/warnings.json``.
540
+
1. The `Detector` listens on the Kafka topic from the Inspector (``inspector_to_detector``).
541
+
2. For each suspicious batch:
542
+
- Extracts features for every domain request.
543
+
- Applies the loaded ML model (after scaling) to compute class probabilities.
544
+
- Marks a request as malicious if its probability exceeds the configured `threshold`.
545
+
3. Computes an **overall score** (e.g. median of malicious probabilities) for the batch.
546
+
4. If malicious requests exist, issues an **alert** record and logs it; otherwise, the batch is filtered.
547
+
548
+
Alerts are recorded in ClickHouse and also appended to a local JSON file (`warnings.json`) for external monitoring.
521
549
522
550
Configuration
523
551
-------------
524
552
525
-
In case you want to load self-trained models, the :class:`Detector` needs a URL path, model name, and SHA256 checksum to download the model during start-up.
553
+
You may use the provided, pre-trained models or supply your own. To use a custom model, specify:
554
+
555
+
- `base_url`: URL from which to fetch model artifacts
556
+
- `model`: model name
557
+
- `checksum`: SHA256 digest for integrity validation
558
+
- `threshold`: probability threshold for classifying a request as malicious
559
+
560
+
These parameters are loaded at startup and used to download, verify, and load the model/scaler if not already cached locally (in temp directory).
0 commit comments