Merge pull request #85 from stefanDeveloper/update-read-the-docs

stefanDeveloper · web-flow · commit a41723453c9d · 2025-06-18T18:30:19.000+02:00
Add missing configuration parameters and training, testing, expanatio…
diff --git a/docs/configuration.rst b/docs/configuration.rst
@@ -146,41 +146,41 @@ functionality of the modules.
      - Default Value
      - Description
    * - mode
-     - ``univariate``
-     - TODO
+     - ``univariate`` (options: ``multivariate``, ``ensemble``)
+     - Mode of operation for the data inspector.
    * - ensemble.model
      - ``WeightEnsemble``
-     - TODO
+     -  Model to use when inspector mode is ``ensemble``.
    * - ensemble.module
      - ``streamad.process``
-     - TODO
+     - Python module for the ensemble model.
    * - ensemble.model_args
      -
-     - TODO
+     - Additional Arguments for the ensemble model.
    * - models.model
      - ``ZScoreDetector``
-     - TODO
+     - Model to use for data inspection
    * - models.module
      - ``streamad.model``
-     - TODO
+     - Base python module for inspection models
    * - models.model_args
      -
-     - TODO
+     - Additional arguments for the model
    * - models.model_args.is_global
      - ``false``
-     - TODO
+     -
    * - anomaly_threshold
      - ``0.01``
-     - TODO
+     - Threshold for classifying an observation as an anomaly.
    * - score_threshold
      - ``0.5``
-     - TODO
+     - Threshold for the anomaly score.
    * - time_type
      - ``ms``
-     - TODO
+     - Unit of time used in time range calculations.
    * - time_range
      - ``20``
-     - TODO
+     - Time window for data inspection
 
 ``pipeline.data_analysis``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -193,17 +193,17 @@ functionality of the modules.
      - Default Value
      - Description
    * - model
-     - ``rf``
-     - TODO
+     - ``rf`` option: ``XGBoost``
+     - Model to use for the detector
    * - checksum
      - Not given here
-     - TODO
+     - Checksum for the model file to ensure integrity
    * - base_url
      - https://heibox.uni-heidelberg.de/d/0d5cbcbe16cd46a58021/
-     - TODO
+     - Base URL for downloading the model if not present locally
    * - threshold
      - ``0.5``
-     - TODO
+     - Threshold for the detector's classification.
 
 Environment Configuration
 .........................
diff --git a/docs/training.rst b/docs/training.rst
@@ -19,5 +19,95 @@ For hyperparameter optimisation we use ``optuna``.
 It offers GPU support to get the best parameters.
 
 
-TODO: add configuration parameters for the training, test and explanation process
-------------------------------------------------------------------------------------
+Training Parameters
+-------------------
+
+.. list-table::
+   :header-rows: 1
+
+   * - Parameter
+     - Type
+     - Default
+     - Description
+   * - ``--dataset``
+     - ``[combine|dgarchive|cic|dgta]``
+     - ``combine``
+     - Data set to train model, choose between all available datasets.
+   * - ``--dataset_path``
+     - ``Path``
+     -
+     - Dataset path, follow folder structure.
+   * - ``--dataset_max_rows``
+     - ``int``
+     - ``-1``
+     - Maximum rows to load from each dataset.
+   * - ``--model``
+     - ``[xg|rf|gbm]``
+     -
+     - Model to train, choose between XGBoost, RandomForest, or GBM.
+   * - ``--model_output_path``
+     - ``Path``
+     - ``./results/model``
+     - Path to store model. Output is ``{MODEL}_{SHA256}.pickle``.
+
+Testing Parameters
+------------------
+
+.. list-table::
+   :header-rows: 1
+
+   * - Parameter
+     - Type
+     - Default
+     - Description
+   * - ``--dataset``
+     - ``[combine|dgarchive|cic|dgta]``
+     - ``combine``
+     - Data set to test model.
+   * - ``--dataset_path``
+     - ``Path``
+     -
+     - Dataset path, follow folder structure.
+   * - ``--dataset_max_rows``
+     - ``int``
+     - ``-1``
+     - Maximum rows to load from each dataset.
+   * - ``--model``
+     - ``[xg|rf|gbm]``
+     -
+     - Model architecture to test.
+   * - ``--model_path``
+     - ``Path``
+     -
+     - Path to trained model.
+
+Explanation Parameters
+----------------------
+
+.. list-table::
+   :header-rows: 1
+
+   * - Parameter
+     - Type
+     - Default
+     - Description
+   * - ``--dataset``
+     - ``[combine|dgarchive|cic|dgta]``
+     - ``combine``
+     - Data set to explain model predictions.
+   * - ``--dataset_path``
+     - ``Path``
+     -
+     - Dataset path, follow folder structure.
+   * - ``--dataset_max_rows``
+     - ``int``
+     - ``-1``
+     - Maximum rows to load from each dataset.
+   * - ``--model``
+     - ``[xg|rf|gbm]``
+     -
+     - Model architecture to explain.
+   * - ``--model_path``
+     - ``Path``
+     -
+     - Path to trained model.
diff --git a/docs/usage.rst b/docs/usage.rst
@@ -50,6 +50,20 @@ Now, you can start each module, e.g. the `Inspector`:
    (.venv) $ python src/inspector/main.py
 
 
+Commit Hook
+------------
+
+Contributing to the project you might be noting failed pipeline runs.
+This can be due to the pre.commit hook finding errors in the formatting. Therefore, we suggest you run
+
+.. code-block:: console
+
+   (.venv) pre-commit run --show-diff-on-failure --color=always --all-files
+
+before committing your changes to GitHub.
+This reformates the code accordingly, preventing errors in the pipeline.
+
+
 Configuration
 -------------