You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the below summary you will find examplary views of the grafana dashboards. The metrics were obtained using the [mock-generator](./docker/docker-compose.send-real-logs.yml)
72
+
#### Or run the modules locally on your machine:
73
+
```sh
74
+
python -m venv .venv
75
+
source .venv/bin/activate
76
+
77
+
sh install_requirements.sh
78
+
```
79
+
Alternatively, you can use `pip install` and enter all needed requirements individually with `-r requirements.*.txt`.
80
+
81
+
Now, you can start each stage, e.g. the inspector:
82
+
83
+
```sh
84
+
python src/inspector/inspector.py
85
+
```
86
+
87
+
<palign="right">(<ahref="#readme-top">back to top</a>)</p>
88
+
89
+
90
+
## Usage
91
+
92
+
### Configuration
93
+
94
+
To configure **heiDGAF** according to your needs, use the provided `config.yaml`.
95
+
96
+
The most relevant settings are related to your specific log line format, the model you want to use, and
97
+
possibly infrastructure.
98
+
99
+
The section `pipeline.log_collection.collector.logline_format` has to be adjusted to reflect your specific input log
100
+
line format. Using our adjustable and flexible log line configuration, you can rename, reorder and fully configure each
101
+
field of a valid log line. Freely define timestamps, RegEx patterns, lists, and IP addresses. For example, your
|`pipeline.data_inspection.inspector.mode`| Mode of operation for the data inspector. |`univariate` (options: `multivariate`, `ensemble`) |
160
-
|`pipeline.data_inspection.inspector.ensemble.model`| Model to use when inspector mode is `ensemble`. |`WeightEnsemble`|
161
-
|`pipeline.data_inspection.inspector.ensemble.module`| Module name for the ensemble model. |`streamad.process`|
162
-
|`pipeline.data_inspection.inspector.models`| List of models to use for data inspection (e.g., anomaly detection). | Array of model definitions (e.g., `{"model": "ZScoreDetector", "module": "streamad.model", "model_args": {"is_global": false}}`)|
163
-
|`pipeline.data_inspection.inspector.anomaly_threshold`| Threshold for classifying an observation as an anomaly. |`0.01`|
164
-
|`pipeline.data_analysis.detector.model`| Model to use for data analysis (e.g., DGA detection). |`rf` (Random Forest) option: `XGBoost`|
165
-
|`pipeline.data_analysis.detector.checksum`| Checksum for the model file to ensure integrity. |`021af76b2385ddbc76f6e3ad10feb0bb081f9cf05cff2e52333e31040bbf36cc`|
166
-
|`pipeline.data_analysis.detector.base_url`| Base URL for downloading the model if not present locally. |`https://heibox.uni-heidelberg.de/d/0d5cbcbe16cd46a58021/`|
167
-
168
-
<palign="right">(<ahref="#readme-top">back to top</a>)</p>
169
200
170
-
### Insert test data
201
+
## Models and Training
171
202
172
-
>[!IMPORTANT]
173
-
> To be able to train and test our or your own models, you will need to download the datasets.
203
+
To train and test our and possibly your own models, we currently rely on the following datasets:
174
204
175
-
For training our models, we currently rely on the following data sets:
After setting up the [dataset directories](#insert-test-data) (and adding the code for your model class if applicable), you can start the training process by running the following commands:
The results will be saved per default to `./results`, if not configured otherwise.
230
262
231
-
### Data
232
-
233
-
> [!IMPORTANT]
234
-
> We support custom schemes.
235
-
236
-
Depending on your data and usecase, you can customize the data scheme to fit your needs.
237
-
The below configuration is part of the [main configuration file](./config.yaml) which is detailed in our [documentation](https://heidgaf.readthedocs.io/en/latest/usage.html#id2)
0 commit comments