CESNET
diff --git a/‎README.md‎
Lines changed: 113 additions & 0 deletions b/‎README.md‎
Lines changed: 113 additions & 0 deletions
diff --git a/‎docs/annotations.md‎
Lines changed: 100 additions & 0 deletions b/‎docs/annotations.md‎
Lines changed: 100 additions & 0 deletions
diff --git a/‎docs/benchmarks.md‎
Lines changed: 53 additions & 0 deletions b/‎docs/benchmarks.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎docs/benchmarks/device_type_classification/69270dcc1819.md‎
Lines changed: 31 additions & 0 deletions b/‎docs/benchmarks/device_type_classification/69270dcc1819.md‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎docs/benchmarks/device_type_classification/941261e8c367.md‎
Lines changed: 31 additions & 0 deletions b/‎docs/benchmarks/device_type_classification/941261e8c367.md‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎docs/benchmarks/device_type_classification/bf0aec939afe.md‎
Lines changed: 31 additions & 0 deletions b/‎docs/benchmarks/device_type_classification/bf0aec939afe.md‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎docs/benchmarks/multivariate_forecasting/generic_model/0197980a87c0.md‎
Lines changed: 30 additions & 0 deletions b/‎docs/benchmarks/multivariate_forecasting/generic_model/0197980a87c0.md‎
Lines changed: 30 additions & 0 deletions
@@ -0,0 +1,113 @@
+<p align="center">
+    <img src="https://raw.githubusercontent.com/CESNET/cesnet-tszoo/main/docs/images/tszoo.svg" width="450">
+</p>
+
+[![](https://img.shields.io/badge/license-BSD-blue.svg)](https://github.com/CESNET/cesnet-tszoo/blob/main/LICENSE)
+[![](https://img.shields.io/badge/docs-cesnet--tszoo-blue.svg)](https://cesnet.github.io/cesnet-tszoo/)
+[![](https://img.shields.io/badge/python->=3.10-blue.svg)](https://pypi.org/project/cesnet-tszoo/)
+[![](https://img.shields.io/pypi/v/cesnet-tszoo)](https://pypi.org/project/cesnet-tszoo/)
+
+The goal of `cesnet-tszoo` project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:
+
+- API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
+- Example of configuration options:
+  - Data can be split into train/val/test sets. Split can be done by time series or by time periods.
+  - Transforming of data with built-in scalers or with custom scalers.
+  - Handling missing values built-in fillers or with custom fillers.
+- Creation and import of benchmarks, for easy reproducibility of experiments.
+- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
+
+## Datasets
+
+| Name                      | CESNET-TimeSeries24                                                                       | CESNET-AGG23                                                                                          |
+|---------------------------|-------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
+| _Published in_            | 2025                                                                                      | 2023                                                                                                  |
+| _Collection duration_     | 40 weeks                                                                                  | 10 weeks                                                                                              |
+| _Collection period_       | 9.10.2023 - 14.7.2024                                                                     | 25.2.2023 - 3.5.2023                                                                                  |
+| _Aggregation window_      | 1 day, 1 hour, 10 min                                                                     | 1 min                                                                                                 |
+| _Sources_                 | CESNET3: Institutions, Institution subnets, IP addresses                                  | CESNET2                                                                                               |
+| _Number of time series_   | Institutions: 849, Institution subnets: 1644, IP addresses: 825372                        | 1                                                                                                     |
+| _Cite_                    | [https://doi.org/10.1038/s41597-025-04603-x](https://doi.org/10.1038/s41597-025-04603-x)  | [https://doi.org/10.23919/CNSM59352.2023.10327823](https://doi.org/10.23919/CNSM59352.2023.10327823)  |
+| _Zenodo URL_              | [https://zenodo.org/records/13382427](https://zenodo.org/records/13382427)                | [https://zenodo.org/records/8053021](https://zenodo.org/records/8053021)                              |
+| _Related papers_          |                                                                                           |                                                                                                       |
+
+## Installation
+
+Install the package from pip with:
+
+```bash
+pip install cesnet-tszoo
+```
+
+or for editable install with:
+
+```bash
+pip install -e git+https://github.com/CESNET/cesnet-tszoo
+```
+
+## Examples
+
+### Initialize dataset to create train, validation, and test dataframes
+
+#### Using [`TimeBasedCesnetDataset`](cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset) dataset
+
+```python
+from cesnet_tszoo.datasets import CESNET_TimeSeries24
+from cesnet_tszoo.utils.enums import SourceType, AgreggationType
+from cesnet_tszoo.configs import TimeBasedConfig
+
+dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False)
+config = TimeBasedConfig(
+    ts_ids=50, # number of randomly selected time series from dataset
+    train_time_period=range(0, 100), 
+    val_time_period=range(100, 150), 
+    test_time_period=range(150, 250), 
+    features_to_take=["n_flows", "n_packets"])
+dataset.set_dataset_config_and_initialize(config)
+
+train_dataframe = dataset.get_train_df()
+val_dataframe = dataset.get_val_df()
+test_dataframe = dataset.get_test_df()
+```
+
+Time-based datasets are configured with [`TimeBasedConfig`](cesnet_tszoo.configs.time_based_config.TimeBasedConfig).
+
+#### Using [`SeriesBasedCesnetDataset`](cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset) dataset
+
+```python
+from cesnet_tszoo.datasets import CESNET_TimeSeries24
+from cesnet_tszoo.utils.enums import SourceType, AgreggationType
+from cesnet_tszoo.configs import SeriesBasedConfig
+
+dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=True)
+config = SeriesBasedConfig(
+    time_period=range(0, 250), 
+    train_ts=100, # number of randomly selected time series from dataset
+    val_ts=30, # number of randomly selected time series from dataset
+    test_ts=20, # number of randomly selected time series from dataset
+    features_to_take=["n_flows", "n_packets"])
+dataset.set_dataset_config_and_initialize(config)
+
+train_dataframe = dataset.get_train_df()
+val_dataframe = dataset.get_val_df()
+test_dataframe = dataset.get_test_df()
+```
+
+Series-based datasets are configured with [`SeriesBasedConfig`](cesnet_tszoo.configs.series_based_config.SeriesBasedConfig).
+
+#### Using [`load_benchmark`](cesnet_tszoo.benchmarks.load_benchmark)
+
+```python
+from cesnet_tszoo.benchmarks import load_benchmark
+
+benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
+dataset = benchmark.get_initialized_dataset()
+
+train_dataframe = dataset.get_train_df()
+val_dataframe = dataset.get_val_df()
+test_dataframe = dataset.get_test_df()
+```
+
+Whether loaded dataset is series-based or time-based depends on the benchmark. What can be loaded corresponds to previous datasets.
+
+## Papers
@@ -0,0 +1,100 @@
+# Annotations
+
+This tutorial will look at how to use annotations.
+
+!!! info "Note"
+    For every option and more detailed examples refer to Jupyter notebook [`annotations`](https://github.com/CESNET/cesnet-tszoo/blob/tutorial_notebooks/annotations.ipynb).
+
+## Basics
+
+- You can get annotations for specific type with `get_annotations` method. 
+- Method `get_annotations` returns annotations as Pandas Dataframe.
+
+There are three annotation types:
+
+1. **AnnotationType.TS_ID** -> Annotations for whole specific time series
+2. **AnnotationType.ID_TIME** -> Annotations for specific time... independent on time series
+3. **AnnotationType.BOTH** -> Annotations for specific time in specific time series
+
+```python
+
+from cesnet_tszoo.utils.enums import AnnotationType                                                                          
+
+dataset.get_annotations(on=AnnotationType.TS_ID)
+dataset.get_annotations(on=AnnotationType.ID_TIME)
+dataset.get_annotations(on=AnnotationType.BOTH)
+
+```
+
+## Annotation groups
+- Annotation group could be understood as column names in Dataframe/CSV.
+- You can add annotation groups or remove them.
+
+```python
+
+from cesnet_tszoo.utils.enums import AnnotationType                                                                          
+
+# Adding groups
+dataset.add_annotation_group(annotation_group="test1", on=AnnotationType.TS_ID)
+dataset.add_annotation_group(annotation_group="test2", on=AnnotationType.ID_TIME)
+dataset.add_annotation_group(annotation_group="test3", on=AnnotationType.BOTH)
+
+# Removing groups
+dataset.remove_annotation_group(annotation_group="test1", on=AnnotationType.TS_ID)
+dataset.remove_annotation_group(annotation_group="test2", on=AnnotationType.ID_TIME)
+dataset.remove_annotation_group(annotation_group="test3", on=AnnotationType.BOTH)
+
+```
+
+## Annotation values
+- Annotations are specific values for selected annotation group and AnnotationType.
+- You can add annotations or remove them.
+- Adding annotation
+    - When adding annotation to annotation group that does not exist, it will be created.
+    - To override existing annotation, you just need to specify same `annotation_group`, `ts_id`, `id_time` and new annotation.
+    - Setting `enforce_ids` to True, ensures that inputted `ts_id` and `id_time` must belong to used dataset.
+- Removing annotations
+    - Removing annotation from every annotation group of a row, removes that row from Dataframe.
+
+```python                                                                     
+
+# Adding annotations
+dataset.add_annotation(annotation="test_annotation1_3", annotation_group="test1", ts_id=3, id_time=None, enforce_ids=True) # Adds to AnnotationType.TS_ID
+dataset.add_annotation(annotation="test_annotation2_0", annotation_group="test2", ts_id=None, id_time=0, enforce_ids=True) # Adds to AnnotationType.ID_TIME
+dataset.add_annotation(annotation="test_annotation3_3_0", annotation_group="test3", ts_id=3, id_time=0, enforce_ids=True) # Adds to AnnotationType.BOTH
+
+# Removing annotations
+dataset.remove_annotation(annotation_group="test1", ts_id=3, id_time=None) # Removes from AnnotationType.TS_ID
+dataset.remove_annotation(annotation_group="test2", ts_id=None, id_time=0 ) # Removes from AnnotationType.ID_TIME
+dataset.remove_annotation(annotation_group="test3", ts_id=3, id_time=0 ) # Removes from AnnotationType.BOTH
+
+```    
+
+## Exporting annotations
+- You can export your created annotation with `save_annotations` method.
+- `save_annotations` creates CSV file at: `os.path.join(dataset.annotations_root, identifier)`.
+- When parameter `force_write` is True, existing files with same name will be overwritten.
+- You should not add ".csv" to identifier, because it will be added automatically.
+
+```python                                                                     
+
+from cesnet_tszoo.utils.enums import AnnotationType   
+
+dataset.save_annotations(identifier="test_name", on=AnnotationType.BOTH, force_write=True)
+
+```   
+
+## Importing annotations
+- You can import already existing annotations, be it your own or already built-in one.
+- Setting `enforce_ids` to True, ensures that all `ts_id` or `id_time` from imported annotations must belong to used dataset.
+- Method `import_annotations` automatically detects what AnnotationType imported annotations is, based on existing ts_id (expects name of ts_id for used dataset) or id_time columns.
+- First, it attempts to load the built-in annotations, if no built-in annotations with such an identifier exists, it attempts to load a custom annotations from the `"data_root"/tszoo/annotations/` directory.
+
+```python                                                                     
+
+from cesnet_tszoo.utils.enums import AnnotationType   
+
+dataset.import_annotations(identifier="test_name", enforce_ids=True)
+
+```   
+
@@ -0,0 +1,53 @@
+# Benchmarks
+
+CESNET-TS-Zoo enables easy sharing and reuse of configuration files to support open science, reproducibility, and transparent comparison of time series modeling approaches.
+
+We provide a collection of pre-defined configurations that serve as benchmarks, including use cases like network traffic forecasting and anomaly detection.
+
+The library includes tools for both importing and exporting configurations as benchmarks. This allows researchers to cite a specific benchmark via its unique hash or to share their own approach as a configuration file.
+
+To load and use a benchmark in your code, simply use the following snippet:
+
+```python
+from cesnet_tszoo.benchmarks import load_benchmark
+
+benchmark = load_benchmark("<benchmark_hash>", "<path-to-datasets>")
+dataset = benchmark.get_initialized_dataset()
+```
+
+!!! info "Note"
+    More detailed tutorial how to use benchmarks is available [`here`][benchmark_tutorial]
+
+## Available benchmarks
+
+#### Network Traffic Forecasting Benchmarks
+
+Network traffic forecasting plays a crucial role in network management and security. Therefore, we prepared several benchmarks for evaluation of network traffic forecasting methods for both management and security tasks. We split the `Network Traffic Forecasting Benchmarks` into these two groups:
+
+- ["Univariate forecasting - Transmitted data size"][univariate_forecasting]: Benchmarks in this group are designed to support mostly used forecasting task for network management.
+- ["Multivariate forecasting"][multivariate_forecasting]: Benchmarks in this group are designed to multivariate forecasting of network traffic features which is more often usable in network security for anomaly/outlier detection.
+
+#### Network Device Type Classification Benchmarks
+
+Network device type classification focuses on evaluating the performance of models for classifying types of network devices. The goal of this benchmark is to allow comparison of various classification algorithms and methods in the context of network devices. This task is valuable in environments where it is essential to quickly and efficiently identify devices in a network for monitoring, security, and traffic optimization purposes. Analyzing the benchmarks helps determine which methods are most suitable for deployment in real-world scenarios.
+
+The network device type classification benchmarks are described in detail: [here][device_type_classification]
+
+
+#### Anomaly Detection Benchmarks
+
+This benchmarks are in process of making and they will be added soon.
+
+#### Similarity Search Benchmarks
+
+This benchmarks are in process of making and they will be added soon.
+
+## Available dataset configs from related works
+
+For supporting reproducibility of approaches, the CESNET-TS-Zoo allows to share ts-zoo configs with others using pull request from forked repository.
+
+Each related work contains configs and example of usage. Please follow authors instruction in example to ensure comparable results. Following configs are already included in the ts-zoo:
+
+| DOI  | Task | Configs link |
+|:-----------------|:-----------------:|:-----------------:|
+| <https://doi.org/10.48550/arXiv.2503.17410> | Univariate forecasting  | [configs][arxiv.org/abs/2503.17410]  |
@@ -0,0 +1,31 @@
+# 69270dcc1819 { #69270dcc1819 }
+
+| Parameter | Value |
+|:-----------------|:-----------------:|
+| Benchmark hash |  0d523e69c328 |
+| Original paper |  None |
+| Dataset |  CESNET-TimeSeries24 |
+| Aggregation |  AGG_10_MINUTES |
+| Source |  IP_ADDRESSES_FULL |
+| Train size |  0.6 |
+| Val size |  0.2 |
+| Test size |  0.2 |
+| Uni/Multi variate |  Multivariate |
+| Metrics |  all |
+| Default value |  None* |
+| Filler |  None* |
+| Scaler |  None* |
+| Sliding window train |  None |
+| Sliding window prediction |  None |
+| Sliding window step |  None |
+| Set shared size |  None |
+| All batch size |  7* |
+| Train TS IDs |  1.0 |
+| Test TS IDs |  0.0 |
+
+!!! info "Note"
+    Values marked with the * can users change in the benchmark.
+
+| Related work | Accuracy | Precision | Recall | F1-score |
+|:-----------------|:-----------------:|:-----------------:|:-----------------:|:-----------------:|
+|  |   |  |  |  |
@@ -0,0 +1,31 @@
+# 941261e8c367 { #941261e8c367 }
+
+| Parameter | Value |
+|:-----------------|:-----------------:|
+| Benchmark hash |  0d523e69c328 |
+| Original paper |  None |
+| Dataset |  CESNET-TimeSeries24 |
+| Aggregation |  AGG_1_HOUR |
+| Source |  IP_ADDRESSES_FULL |
+| Train size |  0.6 |
+| Val size |  0.2 |
+| Test size |  0.2 |
+| Uni/Multi variate |  Multivariate |
+| Metrics |  all |
+| Default value |  None* |
+| Filler |  None* |
+| Scaler |  None* |
+| Sliding window train |  None |
+| Sliding window prediction |  None |
+| Sliding window step |  None |
+| Set shared size |  None |
+| All batch size |  7* |
+| Train TS IDs |  1.0 |
+| Test TS IDs |  0.0 |
+
+!!! info "Note"
+    Values marked with the * can users change in the benchmark.
+
+| Related work | Accuracy | Precision | Recall | F1-score |
+|:-----------------|:-----------------:|:-----------------:|:-----------------:|:-----------------:|
+|  |   |  |  |  |
@@ -0,0 +1,31 @@
+# Benchamrk bf0aec939afe { #bf0aec939afe }
+
+| Parameter | Value |
+|:-----------------|:-----------------:|
+| Benchmark hash |  0d523e69c328 |
+| Original paper |  None |
+| Dataset |  CESNET-TimeSeries24 |
+| Aggregation |  AGG_1_DAY |
+| Source |  IP_ADDRESSES_FULL |
+| Train size |  0.6 |
+| Val size |  0.2 |
+| Test size |  0.2 |
+| Uni/Multi variate |  Multivariate |
+| Metrics |  all |
+| Default value |  None* |
+| Filler |  None* |
+| Scaler |  None* |
+| Sliding window train |  None |
+| Sliding window prediction |  None |
+| Sliding window step |  None |
+| Set shared size |  None |
+| All batch size |  7* |
+| Train TS IDs |  1.0 |
+| Test TS IDs |  0.0 |
+
+!!! info "Note"
+    Values marked with the * can users change in the benchmark.
+
+| Related work | Accuracy | Precision | Recall | F1-score |
+|:-----------------|:-----------------:|:-----------------:|:-----------------:|:-----------------:|
+|  |   |  |  |  |
@@ -0,0 +1,30 @@
+# Benchmark 0197980a87c0 { #0197980a87c0 }
+
+| Parameter | Value |
+|:-----------------|:-----------------:|
+| Benchmark hash |  0197980a87c0 |
+| Original paper |  None |
+| Dataset |  CESNET-TimeSeries24 |
+| Aggregation |  AGG_1_DAY |
+| Source |  IP_ADDRESSES_FULL |
+| Train size |  0.6 |
+| Val size |  0.1 |
+| Test size |  0.3 |
+| Uni/Multi variate |  Multivariate |
+| Metrics |  all |
+| Default value |  None* |
+| Filler |  None* |
+| Scaler |  None* |
+| Sliding window train |  7* |
+| Sliding window prediction |  1* |
+| Sliding window step |  1* |
+| set_shared_size |  7* |
+| train_ts_ids |  0.5 |
+| test_ts_ids |  0.5 |
+
+!!! info "Note"
+    Values marked with the * can users change in the benchmark.
+
+| Related work | RMSE | R2-score | SMAPE |
+|:-----------------|:-----------------:|:-----------------:|:-----------------:|
+|  |   |  |  |