Skip to content

Commit af10b25

Browse files
committed
Docs: Added init documentation
1 parent fe92eb2 commit af10b25

File tree

100 files changed

+4778
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+4778
-0
lines changed

README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
<p align="center">
2+
<img src="https://raw.githubusercontent.com/CESNET/cesnet-tszoo/main/docs/images/tszoo.svg" width="450">
3+
</p>
4+
5+
[![](https://img.shields.io/badge/license-BSD-blue.svg)](https://github.com/CESNET/cesnet-tszoo/blob/main/LICENSE)
6+
[![](https://img.shields.io/badge/docs-cesnet--tszoo-blue.svg)](https://cesnet.github.io/cesnet-tszoo/)
7+
[![](https://img.shields.io/badge/python->=3.10-blue.svg)](https://pypi.org/project/cesnet-tszoo/)
8+
[![](https://img.shields.io/pypi/v/cesnet-tszoo)](https://pypi.org/project/cesnet-tszoo/)
9+
10+
The goal of `cesnet-tszoo` project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:
11+
12+
- API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
13+
- Example of configuration options:
14+
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
15+
- Transforming of data with built-in scalers or with custom scalers.
16+
- Handling missing values built-in fillers or with custom fillers.
17+
- Creation and import of benchmarks, for easy reproducibility of experiments.
18+
- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
19+
20+
## Datasets
21+
22+
| Name | CESNET-TimeSeries24 | CESNET-AGG23 |
23+
|---------------------------|-------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
24+
| _Published in_ | 2025 | 2023 |
25+
| _Collection duration_ | 40 weeks | 10 weeks |
26+
| _Collection period_ | 9.10.2023 - 14.7.2024 | 25.2.2023 - 3.5.2023 |
27+
| _Aggregation window_ | 1 day, 1 hour, 10 min | 1 min |
28+
| _Sources_ | CESNET3: Institutions, Institution subnets, IP addresses | CESNET2 |
29+
| _Number of time series_ | Institutions: 849, Institution subnets: 1644, IP addresses: 825372 | 1 |
30+
| _Cite_ | [https://doi.org/10.1038/s41597-025-04603-x](https://doi.org/10.1038/s41597-025-04603-x) | [https://doi.org/10.23919/CNSM59352.2023.10327823](https://doi.org/10.23919/CNSM59352.2023.10327823) |
31+
| _Zenodo URL_ | [https://zenodo.org/records/13382427](https://zenodo.org/records/13382427) | [https://zenodo.org/records/8053021](https://zenodo.org/records/8053021) |
32+
| _Related papers_ | | |
33+
34+
## Installation
35+
36+
Install the package from pip with:
37+
38+
```bash
39+
pip install cesnet-tszoo
40+
```
41+
42+
or for editable install with:
43+
44+
```bash
45+
pip install -e git+https://github.com/CESNET/cesnet-tszoo
46+
```
47+
48+
## Examples
49+
50+
### Initialize dataset to create train, validation, and test dataframes
51+
52+
#### Using [`TimeBasedCesnetDataset`](cesnet_tszoo.datasets.time_based_cesnet_dataset.TimeBasedCesnetDataset) dataset
53+
54+
```python
55+
from cesnet_tszoo.datasets import CESNET_TimeSeries24
56+
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
57+
from cesnet_tszoo.configs import TimeBasedConfig
58+
59+
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False)
60+
config = TimeBasedConfig(
61+
ts_ids=50, # number of randomly selected time series from dataset
62+
train_time_period=range(0, 100),
63+
val_time_period=range(100, 150),
64+
test_time_period=range(150, 250),
65+
features_to_take=["n_flows", "n_packets"])
66+
dataset.set_dataset_config_and_initialize(config)
67+
68+
train_dataframe = dataset.get_train_df()
69+
val_dataframe = dataset.get_val_df()
70+
test_dataframe = dataset.get_test_df()
71+
```
72+
73+
Time-based datasets are configured with [`TimeBasedConfig`](cesnet_tszoo.configs.time_based_config.TimeBasedConfig).
74+
75+
#### Using [`SeriesBasedCesnetDataset`](cesnet_tszoo.datasets.series_based_cesnet_dataset.SeriesBasedCesnetDataset) dataset
76+
77+
```python
78+
from cesnet_tszoo.datasets import CESNET_TimeSeries24
79+
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
80+
from cesnet_tszoo.configs import SeriesBasedConfig
81+
82+
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=True)
83+
config = SeriesBasedConfig(
84+
time_period=range(0, 250),
85+
train_ts=100, # number of randomly selected time series from dataset
86+
val_ts=30, # number of randomly selected time series from dataset
87+
test_ts=20, # number of randomly selected time series from dataset
88+
features_to_take=["n_flows", "n_packets"])
89+
dataset.set_dataset_config_and_initialize(config)
90+
91+
train_dataframe = dataset.get_train_df()
92+
val_dataframe = dataset.get_val_df()
93+
test_dataframe = dataset.get_test_df()
94+
```
95+
96+
Series-based datasets are configured with [`SeriesBasedConfig`](cesnet_tszoo.configs.series_based_config.SeriesBasedConfig).
97+
98+
#### Using [`load_benchmark`](cesnet_tszoo.benchmarks.load_benchmark)
99+
100+
```python
101+
from cesnet_tszoo.benchmarks import load_benchmark
102+
103+
benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
104+
dataset = benchmark.get_initialized_dataset()
105+
106+
train_dataframe = dataset.get_train_df()
107+
val_dataframe = dataset.get_val_df()
108+
test_dataframe = dataset.get_test_df()
109+
```
110+
111+
Whether loaded dataset is series-based or time-based depends on the benchmark. What can be loaded corresponds to previous datasets.
112+
113+
## Papers

docs/annotations.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Annotations
2+
3+
This tutorial will look at how to use annotations.
4+
5+
!!! info "Note"
6+
For every option and more detailed examples refer to Jupyter notebook [`annotations`](https://github.com/CESNET/cesnet-tszoo/blob/tutorial_notebooks/annotations.ipynb).
7+
8+
## Basics
9+
10+
- You can get annotations for specific type with `get_annotations` method.
11+
- Method `get_annotations` returns annotations as Pandas Dataframe.
12+
13+
There are three annotation types:
14+
15+
1. **AnnotationType.TS_ID** -> Annotations for whole specific time series
16+
2. **AnnotationType.ID_TIME** -> Annotations for specific time... independent on time series
17+
3. **AnnotationType.BOTH** -> Annotations for specific time in specific time series
18+
19+
```python
20+
21+
from cesnet_tszoo.utils.enums import AnnotationType
22+
23+
dataset.get_annotations(on=AnnotationType.TS_ID)
24+
dataset.get_annotations(on=AnnotationType.ID_TIME)
25+
dataset.get_annotations(on=AnnotationType.BOTH)
26+
27+
```
28+
29+
## Annotation groups
30+
- Annotation group could be understood as column names in Dataframe/CSV.
31+
- You can add annotation groups or remove them.
32+
33+
```python
34+
35+
from cesnet_tszoo.utils.enums import AnnotationType
36+
37+
# Adding groups
38+
dataset.add_annotation_group(annotation_group="test1", on=AnnotationType.TS_ID)
39+
dataset.add_annotation_group(annotation_group="test2", on=AnnotationType.ID_TIME)
40+
dataset.add_annotation_group(annotation_group="test3", on=AnnotationType.BOTH)
41+
42+
# Removing groups
43+
dataset.remove_annotation_group(annotation_group="test1", on=AnnotationType.TS_ID)
44+
dataset.remove_annotation_group(annotation_group="test2", on=AnnotationType.ID_TIME)
45+
dataset.remove_annotation_group(annotation_group="test3", on=AnnotationType.BOTH)
46+
47+
```
48+
49+
## Annotation values
50+
- Annotations are specific values for selected annotation group and AnnotationType.
51+
- You can add annotations or remove them.
52+
- Adding annotation
53+
- When adding annotation to annotation group that does not exist, it will be created.
54+
- To override existing annotation, you just need to specify same `annotation_group`, `ts_id`, `id_time` and new annotation.
55+
- Setting `enforce_ids` to True, ensures that inputted `ts_id` and `id_time` must belong to used dataset.
56+
- Removing annotations
57+
- Removing annotation from every annotation group of a row, removes that row from Dataframe.
58+
59+
```python
60+
61+
# Adding annotations
62+
dataset.add_annotation(annotation="test_annotation1_3", annotation_group="test1", ts_id=3, id_time=None, enforce_ids=True) # Adds to AnnotationType.TS_ID
63+
dataset.add_annotation(annotation="test_annotation2_0", annotation_group="test2", ts_id=None, id_time=0, enforce_ids=True) # Adds to AnnotationType.ID_TIME
64+
dataset.add_annotation(annotation="test_annotation3_3_0", annotation_group="test3", ts_id=3, id_time=0, enforce_ids=True) # Adds to AnnotationType.BOTH
65+
66+
# Removing annotations
67+
dataset.remove_annotation(annotation_group="test1", ts_id=3, id_time=None) # Removes from AnnotationType.TS_ID
68+
dataset.remove_annotation(annotation_group="test2", ts_id=None, id_time=0 ) # Removes from AnnotationType.ID_TIME
69+
dataset.remove_annotation(annotation_group="test3", ts_id=3, id_time=0 ) # Removes from AnnotationType.BOTH
70+
71+
```
72+
73+
## Exporting annotations
74+
- You can export your created annotation with `save_annotations` method.
75+
- `save_annotations` creates CSV file at: `os.path.join(dataset.annotations_root, identifier)`.
76+
- When parameter `force_write` is True, existing files with same name will be overwritten.
77+
- You should not add ".csv" to identifier, because it will be added automatically.
78+
79+
```python
80+
81+
from cesnet_tszoo.utils.enums import AnnotationType
82+
83+
dataset.save_annotations(identifier="test_name", on=AnnotationType.BOTH, force_write=True)
84+
85+
```
86+
87+
## Importing annotations
88+
- You can import already existing annotations, be it your own or already built-in one.
89+
- Setting `enforce_ids` to True, ensures that all `ts_id` or `id_time` from imported annotations must belong to used dataset.
90+
- Method `import_annotations` automatically detects what AnnotationType imported annotations is, based on existing ts_id (expects name of ts_id for used dataset) or id_time columns.
91+
- First, it attempts to load the built-in annotations, if no built-in annotations with such an identifier exists, it attempts to load a custom annotations from the `"data_root"/tszoo/annotations/` directory.
92+
93+
```python
94+
95+
from cesnet_tszoo.utils.enums import AnnotationType
96+
97+
dataset.import_annotations(identifier="test_name", enforce_ids=True)
98+
99+
```
100+

docs/benchmarks.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Benchmarks
2+
3+
CESNET-TS-Zoo enables easy sharing and reuse of configuration files to support open science, reproducibility, and transparent comparison of time series modeling approaches.
4+
5+
We provide a collection of pre-defined configurations that serve as benchmarks, including use cases like network traffic forecasting and anomaly detection.
6+
7+
The library includes tools for both importing and exporting configurations as benchmarks. This allows researchers to cite a specific benchmark via its unique hash or to share their own approach as a configuration file.
8+
9+
To load and use a benchmark in your code, simply use the following snippet:
10+
11+
```python
12+
from cesnet_tszoo.benchmarks import load_benchmark
13+
14+
benchmark = load_benchmark("<benchmark_hash>", "<path-to-datasets>")
15+
dataset = benchmark.get_initialized_dataset()
16+
```
17+
18+
!!! info "Note"
19+
More detailed tutorial how to use benchmarks is available [`here`][benchmark_tutorial]
20+
21+
## Available benchmarks
22+
23+
#### Network Traffic Forecasting Benchmarks
24+
25+
Network traffic forecasting plays a crucial role in network management and security. Therefore, we prepared several benchmarks for evaluation of network traffic forecasting methods for both management and security tasks. We split the `Network Traffic Forecasting Benchmarks` into these two groups:
26+
27+
- ["Univariate forecasting - Transmitted data size"][univariate_forecasting]: Benchmarks in this group are designed to support mostly used forecasting task for network management.
28+
- ["Multivariate forecasting"][multivariate_forecasting]: Benchmarks in this group are designed to multivariate forecasting of network traffic features which is more often usable in network security for anomaly/outlier detection.
29+
30+
#### Network Device Type Classification Benchmarks
31+
32+
Network device type classification focuses on evaluating the performance of models for classifying types of network devices. The goal of this benchmark is to allow comparison of various classification algorithms and methods in the context of network devices. This task is valuable in environments where it is essential to quickly and efficiently identify devices in a network for monitoring, security, and traffic optimization purposes. Analyzing the benchmarks helps determine which methods are most suitable for deployment in real-world scenarios.
33+
34+
The network device type classification benchmarks are described in detail: [here][device_type_classification]
35+
36+
37+
#### Anomaly Detection Benchmarks
38+
39+
This benchmarks are in process of making and they will be added soon.
40+
41+
#### Similarity Search Benchmarks
42+
43+
This benchmarks are in process of making and they will be added soon.
44+
45+
## Available dataset configs from related works
46+
47+
For supporting reproducibility of approaches, the CESNET-TS-Zoo allows to share ts-zoo configs with others using pull request from forked repository.
48+
49+
Each related work contains configs and example of usage. Please follow authors instruction in example to ensure comparable results. Following configs are already included in the ts-zoo:
50+
51+
| DOI | Task | Configs link |
52+
|:-----------------|:-----------------:|:-----------------:|
53+
| <https://doi.org/10.48550/arXiv.2503.17410> | Univariate forecasting | [configs][arxiv.org/abs/2503.17410] |
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# 69270dcc1819 { #69270dcc1819 }
2+
3+
| Parameter | Value |
4+
|:-----------------|:-----------------:|
5+
| Benchmark hash | 0d523e69c328 |
6+
| Original paper | None |
7+
| Dataset | CESNET-TimeSeries24 |
8+
| Aggregation | AGG_10_MINUTES |
9+
| Source | IP_ADDRESSES_FULL |
10+
| Train size | 0.6 |
11+
| Val size | 0.2 |
12+
| Test size | 0.2 |
13+
| Uni/Multi variate | Multivariate |
14+
| Metrics | all |
15+
| Default value | None* |
16+
| Filler | None* |
17+
| Scaler | None* |
18+
| Sliding window train | None |
19+
| Sliding window prediction | None |
20+
| Sliding window step | None |
21+
| Set shared size | None |
22+
| All batch size | 7* |
23+
| Train TS IDs | 1.0 |
24+
| Test TS IDs | 0.0 |
25+
26+
!!! info "Note"
27+
Values marked with the * can users change in the benchmark.
28+
29+
| Related work | Accuracy | Precision | Recall | F1-score |
30+
|:-----------------|:-----------------:|:-----------------:|:-----------------:|:-----------------:|
31+
| | | | | |
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# 941261e8c367 { #941261e8c367 }
2+
3+
| Parameter | Value |
4+
|:-----------------|:-----------------:|
5+
| Benchmark hash | 0d523e69c328 |
6+
| Original paper | None |
7+
| Dataset | CESNET-TimeSeries24 |
8+
| Aggregation | AGG_1_HOUR |
9+
| Source | IP_ADDRESSES_FULL |
10+
| Train size | 0.6 |
11+
| Val size | 0.2 |
12+
| Test size | 0.2 |
13+
| Uni/Multi variate | Multivariate |
14+
| Metrics | all |
15+
| Default value | None* |
16+
| Filler | None* |
17+
| Scaler | None* |
18+
| Sliding window train | None |
19+
| Sliding window prediction | None |
20+
| Sliding window step | None |
21+
| Set shared size | None |
22+
| All batch size | 7* |
23+
| Train TS IDs | 1.0 |
24+
| Test TS IDs | 0.0 |
25+
26+
!!! info "Note"
27+
Values marked with the * can users change in the benchmark.
28+
29+
| Related work | Accuracy | Precision | Recall | F1-score |
30+
|:-----------------|:-----------------:|:-----------------:|:-----------------:|:-----------------:|
31+
| | | | | |
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Benchamrk bf0aec939afe { #bf0aec939afe }
2+
3+
| Parameter | Value |
4+
|:-----------------|:-----------------:|
5+
| Benchmark hash | 0d523e69c328 |
6+
| Original paper | None |
7+
| Dataset | CESNET-TimeSeries24 |
8+
| Aggregation | AGG_1_DAY |
9+
| Source | IP_ADDRESSES_FULL |
10+
| Train size | 0.6 |
11+
| Val size | 0.2 |
12+
| Test size | 0.2 |
13+
| Uni/Multi variate | Multivariate |
14+
| Metrics | all |
15+
| Default value | None* |
16+
| Filler | None* |
17+
| Scaler | None* |
18+
| Sliding window train | None |
19+
| Sliding window prediction | None |
20+
| Sliding window step | None |
21+
| Set shared size | None |
22+
| All batch size | 7* |
23+
| Train TS IDs | 1.0 |
24+
| Test TS IDs | 0.0 |
25+
26+
!!! info "Note"
27+
Values marked with the * can users change in the benchmark.
28+
29+
| Related work | Accuracy | Precision | Recall | F1-score |
30+
|:-----------------|:-----------------:|:-----------------:|:-----------------:|:-----------------:|
31+
| | | | | |
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Benchmark 0197980a87c0 { #0197980a87c0 }
2+
3+
| Parameter | Value |
4+
|:-----------------|:-----------------:|
5+
| Benchmark hash | 0197980a87c0 |
6+
| Original paper | None |
7+
| Dataset | CESNET-TimeSeries24 |
8+
| Aggregation | AGG_1_DAY |
9+
| Source | IP_ADDRESSES_FULL |
10+
| Train size | 0.6 |
11+
| Val size | 0.1 |
12+
| Test size | 0.3 |
13+
| Uni/Multi variate | Multivariate |
14+
| Metrics | all |
15+
| Default value | None* |
16+
| Filler | None* |
17+
| Scaler | None* |
18+
| Sliding window train | 7* |
19+
| Sliding window prediction | 1* |
20+
| Sliding window step | 1* |
21+
| set_shared_size | 7* |
22+
| train_ts_ids | 0.5 |
23+
| test_ts_ids | 0.5 |
24+
25+
!!! info "Note"
26+
Values marked with the * can users change in the benchmark.
27+
28+
| Related work | RMSE | R2-score | SMAPE |
29+
|:-----------------|:-----------------:|:-----------------:|:-----------------:|
30+
| | | | |

0 commit comments

Comments
 (0)