Skip to content

Commit f486aad

Browse files
Update inspector and detector docu
1 parent 8b0ae07 commit f486aad

File tree

1 file changed

+46
-11
lines changed

1 file changed

+46
-11
lines changed

docs/pipeline.rst

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -433,17 +433,28 @@ Stage 4: Inspection
433433
Overview
434434
--------
435435

436-
The `Inspector` stage is responsible to run time-series based anomaly detection on prefiltered batches. This stage
437-
is essentiell to reduce the load on the `Detection` stage. Otherwise, resource complexity increases disproportionately.
436+
The **Inspection** stage performs time-series-based anomaly detection on prefiltered DNS request batches.
437+
Its primary purpose is to reduce the load on the `Detection` stage by filtering out non-suspicious traffic early.
438+
439+
This stage uses StreamAD models—supporting univariate, multivariate, and ensemble techniques—to detect unusual patterns
440+
in request volume and packet sizes.
441+
438442

439443
Main Class
440444
----------
441445

442446
.. py:currentmodule:: src.inspector.inspector
443447
.. autoclass:: Inspector
448+
:members:
449+
:undoc-members:
450+
:show-inheritance:
451+
452+
The :class:`Inspector` class is responsible for:
444453

445-
The :class:`Inspector` is the primary class to run StreamAD models for time-series based anomaly detection, such as the Z-Score outlier detection.
446-
In addition, it features fine-tuning settings for models and anomaly thresholds.
454+
- Loading batches from Kafka
455+
- Extracting time-series features (e.g., frequency and packet size)
456+
- Applying anomaly detection models
457+
- Forwarding suspicious batches to the detector stage
447458

448459
Usage
449460
-----
@@ -498,9 +509,9 @@ Stage 5: Detection
498509
Overview
499510
--------
500511

501-
The `Detector` resembles the heart of heiDGAF. It runs pre-trained machine learning models to get a probability outcome for the DNS requests.
502-
The pre-trained models are under the EUPL-1.2 license online available.
503-
In total, we rely on the following data sets for the pre-trained models we offer:
512+
The **Detection** stage is the core of the heiDGAF pipeline. It consumes **suspicious batches** passed from the `Inspector`, applies **pre-trained ML models** to classify individual DNS requests, and issues alerts based on aggregated probabilities.
513+
514+
The pre-trained models used here are licensed under **EUPL‑1.2** and built from the following datasets:
504515

505516
- `CIC-Bell-DNS-2021 <https://www.unb.ca/cic/datasets/dns-2021.html>`_
506517
- `DGTA-BENCH - Domain Generation and Tunneling Algorithms for Benchmark <https://data.mendeley.com/datasets/2wzf9bz7xr/1>`_
@@ -511,15 +522,39 @@ Main Class
511522

512523
.. py:currentmodule:: src.detector.detector
513524
.. autoclass:: Detector
525+
:members:
526+
:undoc-members:
527+
:show-inheritance:
528+
529+
The :class:`Detector` class:
530+
531+
- Consumes a batch flagged as suspicious.
532+
- Downloads and validates the ML model (if necessary).
533+
- Extracts features from domain names (e.g. character distributions, entropy, label statistics).
534+
- Computes a probability per request and an overall risk score per batch.
535+
- Emits alerts to ClickHouse and logs in ``/tmp/warnings.json`` where applicable.
514536

515537
Usage
516538
-----
517539

518-
The :class:`Detector` consumes anomalous batches of requests.
519-
It calculates a probability score for each request, and at last, an overall score of the batch.
520-
Alerts are log to ``/tmp/warnings.json``.
540+
1. The `Detector` listens on the Kafka topic from the Inspector (``inspector_to_detector``).
541+
2. For each suspicious batch:
542+
- Extracts features for every domain request.
543+
- Applies the loaded ML model (after scaling) to compute class probabilities.
544+
- Marks a request as malicious if its probability exceeds the configured `threshold`.
545+
3. Computes an **overall score** (e.g. median of malicious probabilities) for the batch.
546+
4. If malicious requests exist, issues an **alert** record and logs it; otherwise, the batch is filtered.
547+
548+
Alerts are recorded in ClickHouse and also appended to a local JSON file (`warnings.json`) for external monitoring.
521549

522550
Configuration
523551
-------------
524552

525-
In case you want to load self-trained models, the :class:`Detector` needs a URL path, model name, and SHA256 checksum to download the model during start-up.
553+
You may use the provided, pre-trained models or supply your own. To use a custom model, specify:
554+
555+
- `base_url`: URL from which to fetch model artifacts
556+
- `model`: model name
557+
- `checksum`: SHA256 digest for integrity validation
558+
- `threshold`: probability threshold for classifying a request as malicious
559+
560+
These parameters are loaded at startup and used to download, verify, and load the model/scaler if not already cached locally (in temp directory).

0 commit comments

Comments
 (0)