Skip to content

Commit a417234

Browse files
Merge pull request #85 from stefanDeveloper/update-read-the-docs
Add missing configuration parameters and training, testing, expanatio…
2 parents 361c4f8 + 80246da commit a417234

File tree

3 files changed

+124
-20
lines changed

3 files changed

+124
-20
lines changed

docs/configuration.rst

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -146,41 +146,41 @@ functionality of the modules.
146146
- Default Value
147147
- Description
148148
* - mode
149-
- ``univariate``
150-
- TODO
149+
- ``univariate`` (options: ``multivariate``, ``ensemble``)
150+
- Mode of operation for the data inspector.
151151
* - ensemble.model
152152
- ``WeightEnsemble``
153-
- TODO
153+
- Model to use when inspector mode is ``ensemble``.
154154
* - ensemble.module
155155
- ``streamad.process``
156-
- TODO
156+
- Python module for the ensemble model.
157157
* - ensemble.model_args
158158
-
159-
- TODO
159+
- Additional Arguments for the ensemble model.
160160
* - models.model
161161
- ``ZScoreDetector``
162-
- TODO
162+
- Model to use for data inspection
163163
* - models.module
164164
- ``streamad.model``
165-
- TODO
165+
- Base python module for inspection models
166166
* - models.model_args
167167
-
168-
- TODO
168+
- Additional arguments for the model
169169
* - models.model_args.is_global
170170
- ``false``
171-
- TODO
171+
-
172172
* - anomaly_threshold
173173
- ``0.01``
174-
- TODO
174+
- Threshold for classifying an observation as an anomaly.
175175
* - score_threshold
176176
- ``0.5``
177-
- TODO
177+
- Threshold for the anomaly score.
178178
* - time_type
179179
- ``ms``
180-
- TODO
180+
- Unit of time used in time range calculations.
181181
* - time_range
182182
- ``20``
183-
- TODO
183+
- Time window for data inspection
184184

185185
``pipeline.data_analysis``
186186
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -193,17 +193,17 @@ functionality of the modules.
193193
- Default Value
194194
- Description
195195
* - model
196-
- ``rf``
197-
- TODO
196+
- ``rf`` option: ``XGBoost``
197+
- Model to use for the detector
198198
* - checksum
199199
- Not given here
200-
- TODO
200+
- Checksum for the model file to ensure integrity
201201
* - base_url
202202
- https://heibox.uni-heidelberg.de/d/0d5cbcbe16cd46a58021/
203-
- TODO
203+
- Base URL for downloading the model if not present locally
204204
* - threshold
205205
- ``0.5``
206-
- TODO
206+
- Threshold for the detector's classification.
207207

208208
Environment Configuration
209209
.........................

docs/training.rst

Lines changed: 92 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,95 @@ For hyperparameter optimisation we use ``optuna``.
1919
It offers GPU support to get the best parameters.
2020

2121

22-
TODO: add configuration parameters for the training, test and explanation process
23-
------------------------------------------------------------------------------------
22+
Training Parameters
23+
-------------------
24+
25+
.. list-table::
26+
:header-rows: 1
27+
28+
* - Parameter
29+
- Type
30+
- Default
31+
- Description
32+
* - ``--dataset``
33+
- ``[combine|dgarchive|cic|dgta]``
34+
- ``combine``
35+
- Data set to train model, choose between all available datasets.
36+
* - ``--dataset_path``
37+
- ``Path``
38+
-
39+
- Dataset path, follow folder structure.
40+
* - ``--dataset_max_rows``
41+
- ``int``
42+
- ``-1``
43+
- Maximum rows to load from each dataset.
44+
* - ``--model``
45+
- ``[xg|rf|gbm]``
46+
-
47+
- Model to train, choose between XGBoost, RandomForest, or GBM.
48+
* - ``--model_output_path``
49+
- ``Path``
50+
- ``./results/model``
51+
- Path to store model. Output is ``{MODEL}_{SHA256}.pickle``.
52+
53+
Testing Parameters
54+
------------------
55+
56+
.. list-table::
57+
:header-rows: 1
58+
59+
* - Parameter
60+
- Type
61+
- Default
62+
- Description
63+
* - ``--dataset``
64+
- ``[combine|dgarchive|cic|dgta]``
65+
- ``combine``
66+
- Data set to test model.
67+
* - ``--dataset_path``
68+
- ``Path``
69+
-
70+
- Dataset path, follow folder structure.
71+
* - ``--dataset_max_rows``
72+
- ``int``
73+
- ``-1``
74+
- Maximum rows to load from each dataset.
75+
* - ``--model``
76+
- ``[xg|rf|gbm]``
77+
-
78+
- Model architecture to test.
79+
* - ``--model_path``
80+
- ``Path``
81+
-
82+
- Path to trained model.
83+
84+
Explanation Parameters
85+
----------------------
86+
87+
.. list-table::
88+
:header-rows: 1
89+
90+
* - Parameter
91+
- Type
92+
- Default
93+
- Description
94+
* - ``--dataset``
95+
- ``[combine|dgarchive|cic|dgta]``
96+
- ``combine``
97+
- Data set to explain model predictions.
98+
* - ``--dataset_path``
99+
- ``Path``
100+
-
101+
- Dataset path, follow folder structure.
102+
* - ``--dataset_max_rows``
103+
- ``int``
104+
- ``-1``
105+
- Maximum rows to load from each dataset.
106+
* - ``--model``
107+
- ``[xg|rf|gbm]``
108+
-
109+
- Model architecture to explain.
110+
* - ``--model_path``
111+
- ``Path``
112+
-
113+
- Path to trained model.

docs/usage.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,20 @@ Now, you can start each module, e.g. the `Inspector`:
5050
(.venv) $ python src/inspector/main.py
5151
5252
53+
Commit Hook
54+
------------
55+
56+
Contributing to the project you might be noting failed pipeline runs.
57+
This can be due to the pre.commit hook finding errors in the formatting. Therefore, we suggest you run
58+
59+
.. code-block:: console
60+
61+
(.venv) pre-commit run --show-diff-on-failure --color=always --all-files
62+
63+
before committing your changes to GitHub.
64+
This reformates the code accordingly, preventing errors in the pipeline.
65+
66+
5367
Configuration
5468
-------------
5569

0 commit comments

Comments
 (0)