@@ -12,7 +12,7 @@ The goal of `cesnet-tszoo` project is to provide time series datasets with usefu
1212- API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
1313- Example of configuration options:
1414 - Data can be split into train/val/test sets. Split can be done by time series or by time periods.
15- - Transforming of data with built-in scalers or with custom scalers .
15+ - Transforming of data with built-in transformers or with custom transformers .
1616 - Handling missing values built-in fillers or with custom fillers.
1717- Creation and import of benchmarks, for easy reproducibility of experiments.
1818- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
@@ -56,7 +56,7 @@ from cesnet_tszoo.datasets import CESNET_TimeSeries24
5656from cesnet_tszoo.utils.enums import SourceType, AgreggationType
5757from cesnet_tszoo.configs import TimeBasedConfig
5858
59- dataset = CESNET_TimeSeries24 .get_dataset(data_root = " /some_directory/" , source_type = SourceType.INSTITUTIONS , aggregation = AgreggationType.AGG_1_DAY , is_series_based = False )
59+ dataset = CESNET_TimeSeries24 .get_dataset(data_root = " /some_directory/" , source_type = SourceType.INSTITUTIONS , aggregation = AgreggationType.AGG_1_DAY , dataset_type = DatasetType. TIME_BASED )
6060config = TimeBasedConfig(
6161 ts_ids = 50 , # number of randomly selected time series from dataset
6262 train_time_period = range (0 , 100 ),
@@ -72,19 +72,43 @@ test_dataframe = dataset.get_test_df()
7272
7373Time-based datasets are configured with [ ` TimeBasedConfig ` ] ( https://cesnet.github.io/cesnet-tszoo/reference_time_based_config/ ) .
7474
75+ #### Using [ ` DisjointTimeBasedCesnetDataset ` ] [ cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset ] dataset
76+ ``` python
77+ from cesnet_tszoo.datasets import CESNET_TimeSeries24
78+ from cesnet_tszoo.utils.enums import SourceType, AgreggationType
79+ from cesnet_tszoo.configs import DisjointTimeBasedConfig
80+
81+ dataset = CESNET_TimeSeries24 .get_dataset(" /some_directory/" , source_type = SourceType.INSTITUTIONS , aggregation = AgreggationType.AGG_1_DAY , dataset_type = DatasetType.DISJOINT_TIME_BASED )
82+ config = DisjointTimeBasedConfig(
83+ train_ts = 50 , # number of randomly selected time series from dataset that are not in val_ts and test_ts
84+ val_ts = 20 , # number of randomly selected time series from dataset that are not in train_ts and test_ts
85+ test_ts = 10 , # number of randomly selected time series from dataset that are not in train_ts and val_ts
86+ train_time_period = range (0 , 100 ),
87+ val_time_period = range (100 , 150 ),
88+ test_time_period = range (150 , 250 ),
89+ features_to_take = [" n_flows" , " n_packets" ])
90+ dataset.set_dataset_config_and_initialize(config)
91+
92+ train_dataframe = dataset.get_train_df()
93+ val_dataframe = dataset.get_val_df()
94+ test_dataframe = dataset.get_test_df()
95+ ```
96+
97+ Disjoint-time-based datasets are configured with [ ` DisjointTimeBasedConfig ` ] [ cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig ] .
98+
7599#### Using [ ` SeriesBasedCesnetDataset ` ] ( https://cesnet.github.io/cesnet-tszoo/reference_series_based_cesnet_dataset/ ) dataset
76100
77101``` python
78102from cesnet_tszoo.datasets import CESNET_TimeSeries24
79103from cesnet_tszoo.utils.enums import SourceType, AgreggationType
80104from cesnet_tszoo.configs import SeriesBasedConfig
81105
82- dataset = CESNET_TimeSeries24 .get_dataset(data_root = " /some_directory/" , source_type = SourceType.INSTITUTIONS , aggregation = AgreggationType.AGG_1_DAY , is_series_based = True )
106+ dataset = CESNET_TimeSeries24 .get_dataset(data_root = " /some_directory/" , source_type = SourceType.INSTITUTIONS , aggregation = AgreggationType.AGG_1_DAY , dataset_type = DatasetType. SERIES_BASED )
83107config = SeriesBasedConfig(
84108 time_period = range (0 , 250 ),
85- train_ts = 100 , # number of randomly selected time series from dataset
86- val_ts = 30 , # number of randomly selected time series from dataset
87- test_ts = 20 , # number of randomly selected time series from dataset
109+ train_ts = 50 , # number of randomly selected time series from dataset that are not in val_ts and test_ts
110+ val_ts = 20 , # number of randomly selected time series from dataset that are not in train_ts and test_ts
111+ test_ts = 10 , # number of randomly selected time series from dataset that are not in train_ts and val_ts
88112 features_to_take = [" n_flows" , " n_packets" ])
89113dataset.set_dataset_config_and_initialize(config)
90114
0 commit comments