Skip to content

Commit df81bff

Browse files
authored
Merge pull request #18 from CESNET/dev
Version 2.0.0 release
2 parents aa36f2e + b6d4770 commit df81bff

File tree

292 files changed

+39663
-58448
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

292 files changed

+39663
-58448
lines changed

README.md

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The goal of `cesnet-tszoo` project is to provide time series datasets with usefu
1212
- API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
1313
- Example of configuration options:
1414
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
15-
- Transforming of data with built-in scalers or with custom scalers.
15+
- Transforming of data with built-in transformers or with custom transformers.
1616
- Handling missing values built-in fillers or with custom fillers.
1717
- Creation and import of benchmarks, for easy reproducibility of experiments.
1818
- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
@@ -56,7 +56,7 @@ from cesnet_tszoo.datasets import CESNET_TimeSeries24
5656
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
5757
from cesnet_tszoo.configs import TimeBasedConfig
5858

59-
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False)
59+
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
6060
config = TimeBasedConfig(
6161
ts_ids=50, # number of randomly selected time series from dataset
6262
train_time_period=range(0, 100),
@@ -72,19 +72,43 @@ test_dataframe = dataset.get_test_df()
7272

7373
Time-based datasets are configured with [`TimeBasedConfig`](https://cesnet.github.io/cesnet-tszoo/reference_time_based_config/).
7474

75+
#### Using [`DisjointTimeBasedCesnetDataset`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset] dataset
76+
```python
77+
from cesnet_tszoo.datasets import CESNET_TimeSeries24
78+
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
79+
from cesnet_tszoo.configs import DisjointTimeBasedConfig
80+
81+
dataset = CESNET_TimeSeries24.get_dataset("/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
82+
config = DisjointTimeBasedConfig(
83+
train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
84+
val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
85+
test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
86+
train_time_period=range(0, 100),
87+
val_time_period=range(100, 150),
88+
test_time_period=range(150, 250),
89+
features_to_take=["n_flows", "n_packets"])
90+
dataset.set_dataset_config_and_initialize(config)
91+
92+
train_dataframe = dataset.get_train_df()
93+
val_dataframe = dataset.get_val_df()
94+
test_dataframe = dataset.get_test_df()
95+
```
96+
97+
Disjoint-time-based datasets are configured with [`DisjointTimeBasedConfig`][cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig].
98+
7599
#### Using [`SeriesBasedCesnetDataset`](https://cesnet.github.io/cesnet-tszoo/reference_series_based_cesnet_dataset/) dataset
76100

77101
```python
78102
from cesnet_tszoo.datasets import CESNET_TimeSeries24
79103
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
80104
from cesnet_tszoo.configs import SeriesBasedConfig
81105

82-
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, is_series_based=True)
106+
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
83107
config = SeriesBasedConfig(
84108
time_period=range(0, 250),
85-
train_ts=100, # number of randomly selected time series from dataset
86-
val_ts=30, # number of randomly selected time series from dataset
87-
test_ts=20, # number of randomly selected time series from dataset
109+
train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
110+
val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
111+
test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
88112
features_to_take=["n_flows", "n_packets"])
89113
dataset.set_dataset_config_and_initialize(config)
90114

cesnet_tszoo/benchmarks.py

Lines changed: 60 additions & 46 deletions
Large diffs are not rendered by default.

cesnet_tszoo/configs/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
from cesnet_tszoo.configs.series_based_config import SeriesBasedConfig
22
from cesnet_tszoo.configs.time_based_config import TimeBasedConfig
3+
from cesnet_tszoo.configs.disjoint_time_based_config import DisjointTimeBasedConfig

0 commit comments

Comments
 (0)