Skip to content

Commit da853ed

Browse files
authored
Merge pull request #30 from CESNET/feat-updating-docs
Updated all docs to match new features and various changes.
2 parents 698a906 + 9808321 commit da853ed

File tree

68 files changed

+766
-239
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+766
-239
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ The goal of `cesnet-tszoo` project is to provide time series datasets with usefu
1414
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
1515
- Transforming of data with built-in transformers or with custom transformers.
1616
- Handling missing values built-in fillers or with custom fillers.
17+
- Applying custom handlers.
18+
- Changing order of when are preprocesses applied/fitted
1719
- Creation and import of benchmarks, for easy reproducibility of experiments.
1820
- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
1921

@@ -60,6 +62,8 @@ If you use CESNET TS-Zoo, please cite our paper:
6062

6163
## Examples
6264

65+
For detailed examples refer to [`Tutorial notebooks`](https://github.com/CESNET/cesnet-ts-zoo-tutorials)
66+
6367
### Initialize dataset to create train, validation, and test dataframes
6468

6569
#### Using [`TimeBasedCesnetDataset`](https://cesnet.github.io/cesnet-tszoo/reference_time_based_cesnet_dataset/) dataset
@@ -146,4 +150,4 @@ val_dataframe = dataset.get_val_df()
146150
test_dataframe = dataset.get_test_df()
147151
```
148152

149-
Whether loaded dataset is series-based or time-based depends on the benchmark. What can be loaded corresponds to previous datasets.
153+
Loaded dataset can be one of the above.

cesnet_tszoo/configs/base_config.py

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@
2020
import cesnet_tszoo.utils.custom_handler.factory as custom_handler_factories
2121
from cesnet_tszoo.utils.custom_handler.factory import PerSeriesCustomHandlerFactory, AllSeriesCustomHandlerFactory, NoFitCustomHandlerFactory
2222
import cesnet_tszoo.utils.css_styles.utils as css_utils
23-
from cesnet_tszoo.data_models.holders import FillingHolder, AnomalyHandlerHolder, TransformerHolder, PerSeriesCustomHandlerHolder, NoFitCustomHandlerHolder, AllSeriesCustomHandlerHolder
23+
from cesnet_tszoo.data_models.holders import FillingHolder, AnomalyHandlerHolder, TransformerHolder, AllSeriesCustomHandlerHolder
2424

2525

2626
class DatasetConfig(ABC):
2727
"""
28-
Base class for configuration management. This class should **not** be used directly. Instead, use one of its derived classes, such as [`SeriesBasedConfig`][cesnet_tszoo.configs.series_based_config.SeriesBasedConfig] or [`TimeBasedConfig`][cesnet_tszoo.configs.time_based_config.TimeBasedConfig].
28+
Base class for configuration management. This class should **not** be used directly. Instead, use one of its derived classes, such as TimeBasedConfig, DisjointTimeBasedConfig or SeriesBasedConfig.
2929
3030
For available configuration options, refer to [here][cesnet_tszoo.configs.base_config.DatasetConfig--configuration-options].
3131
@@ -35,9 +35,13 @@ class DatasetConfig(ABC):
3535
used_test_workers: Tracks the number of test workers in use. Helps determine if the test dataloader should be recreated based on worker changes.
3636
used_all_workers: Tracks the total number of all workers in use. Helps determine if the all dataloader should be recreated based on worker changes.
3737
import_identifier: Tracks the name of the config upon import. None if not imported.
38+
filler_factory: Represents factory used to create passed Filler type.
39+
anomaly_handler_factory: Represents factory used to create passed Anomaly Handler type.
40+
transformer_factory: Represents factory used to create passed Transformer type.
41+
can_fit_fillers: Whether fillers in this config, can be fitted.
3842
logger: Logger for displaying information.
3943
40-
The following attributes are initialized when [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.cesnet_dataset.CesnetDataset.set_dataset_config_and_initialize] is called:
44+
The following attributes are initialized when CesnetDataset.set_dataset_config_and_initialize is called:
4145
4246
Attributes:
4347
aggregation: The aggregation period used for the data.
@@ -49,13 +53,11 @@ class DatasetConfig(ABC):
4953
used_singular_train_time_series: Currently used singular train set time series for dataloader.
5054
used_singular_val_time_series: Currently used singular validation set time series for dataloader.
5155
used_singular_test_time_series: Currently used singular test set time series for dataloader.
52-
used_singular_all_time_series: Currently used singular all set time series for dataloader.
53-
transformers: Prepared transformers for fitting/transforming. Can be one transformer, array of transformers or `None`.
54-
train_fillers: Fillers used in the train set. `None` if no filler is used or train set is not used.
55-
val_fillers: Fillers used in the validation set. `None` if no filler is used or validation set is not used.
56-
test_fillers: Fillers used in the test set. `None` if no filler is used or test set is not used.
57-
all_fillers: Fillers used for the all set. `None` if no filler is used or all set is not used.
58-
anomaly_handlers: Prepared anomaly handlers for fitting/handling anomalies. Can be array of anomaly handlers or `None`.
56+
used_singular_all_time_series: Currently used singular all set time series for dataloader.
57+
train_preprocess_order: All preprocesses used for train set.
58+
val_preprocess_order: All preprocesses used for val set.
59+
test_preprocess_order: All preprocesses used for test set.
60+
all_preprocess_order: All preprocesses used for all set.
5961
is_initialized: Flag indicating if the configuration has already been initialized. If true, config initialization will be skipped.
6062
version: Version of cesnet-tszoo this config was made in.
6163
export_update_needed: Whether config was updated to newer version and should be exported.
@@ -69,7 +71,8 @@ class DatasetConfig(ABC):
6971
val_batch_size: Batch size for the validation dataloader, when window size is None.
7072
test_batch_size: Batch size for the test dataloader, when window size is None.
7173
all_batch_size: Batch size for the all dataloader, when window size is None.
72-
fill_missing_with: Defines how to fill missing values in the dataset. Can pass enum [`FillerType`][cesnet_tszoo.utils.enums.FillerType] for built-in filler or pass a type of custom filler that must derive from [`Filler`][cesnet_tszoo.utils.filler.Filler] base class.
74+
preprocess_order: Defines in which order preprocesses are used. Also can add to order a type of PerSeriesCustomHandler, AllSeriesCustomHandler or NoFitCustomHandler.
75+
fill_missing_with: Defines how to fill missing values in the dataset. Can pass enum [`FillerType`][cesnet_tszoo.utils.enums.FillerType] for built-in filler or pass a type of custom filler that must derive from [`Filler`][cesnet_tszoo.utils.filler.filler.Filler] base class.
7376
transform_with: Defines the transformer to transform the dataset. Can pass enum [`TransformerType`][cesnet_tszoo.utils.enums.TransformerType] for built-in transformer, pass a type of custom transformer or instance of already fitted transformer(s).
7477
handle_anomalies_with: Defines the anomaly handler for handling anomalies in the dataset. Can pass enum [`AnomalyHandlerType`][cesnet_tszoo.utils.enums.AnomalyHandlerType] for built-in anomaly handler or a type of custom anomaly handler.
7578
partial_fit_initialized_transformers: If `True`, partial fitting on train set is performed when using initiliazed transformers.
@@ -121,6 +124,10 @@ def __init__(self,
121124
self.used_test_workers = None
122125
self.used_all_workers = None
123126
self.import_identifier = None
127+
self.filler_factory = filler_factories.get_filler_factory(fill_missing_with)
128+
self.anomaly_handler_factory = anomaly_handler_factories.get_anomaly_handler_factory(handle_anomalies_with)
129+
self.transformer_factory = transformer_factories.get_transformer_factory(transform_with, create_transformer_per_time_series, partial_fit_initialized_transformers)
130+
self.can_fit_fillers = can_fit_fillers
124131
self.logger = logger
125132

126133
self.aggregation = None
@@ -133,6 +140,10 @@ def __init__(self,
133140
self.used_singular_val_time_series = None
134141
self.used_singular_test_time_series = None
135142
self.used_singular_all_time_series = None
143+
self.train_preprocess_order: list[PreprocessNote] = []
144+
self.val_preprocess_order: list[PreprocessNote] = []
145+
self.test_preprocess_order: list[PreprocessNote] = []
146+
self.all_preprocess_order: list[PreprocessNote] = []
136147
self.is_initialized = False
137148
self.version = version.current_version
138149
self.export_update_needed = False
@@ -143,6 +154,7 @@ def __init__(self,
143154
self.val_batch_size = val_batch_size
144155
self.test_batch_size = test_batch_size
145156
self.all_batch_size = all_batch_size
157+
self.preprocess_order = list(preprocess_order)
146158
self.partial_fit_initialized_transformers = partial_fit_initialized_transformers
147159
self.include_time = include_time
148160
self.include_ts_id = include_ts_id
@@ -158,16 +170,6 @@ def __init__(self,
158170
self.train_dataloader_order = train_dataloader_order
159171
self.random_state = random_state
160172

161-
self.filler_factory = filler_factories.get_filler_factory(fill_missing_with)
162-
self.anomaly_handler_factory = anomaly_handler_factories.get_anomaly_handler_factory(handle_anomalies_with)
163-
self.transformer_factory = transformer_factories.get_transformer_factory(transform_with, create_transformer_per_time_series, partial_fit_initialized_transformers)
164-
self.preprocess_order = list(preprocess_order)
165-
self.train_preprocess_order: list[PreprocessNote] = []
166-
self.val_preprocess_order: list[PreprocessNote] = []
167-
self.test_preprocess_order: list[PreprocessNote] = []
168-
self.all_preprocess_order: list[PreprocessNote] = []
169-
self.can_fit_fillers = can_fit_fillers
170-
171173
if self.random_state is not None:
172174
np.random.seed(random_state)
173175

@@ -447,7 +449,7 @@ def _set_default_values(self, default_values: dict[str, Number]) -> None:
447449
self.default_values = temp_default_values
448450

449451
def _set_preprocess_order(self):
450-
"""Validates and converts preprocess order to their enum variant. """
452+
"""Validates and converts preprocess order to their enum variant. Also initializes preprocess_orders for all sets. """
451453

452454
for i, order in enumerate(self.preprocess_order):
453455
if isinstance(order, (str, PreprocessType)):

cesnet_tszoo/configs/disjoint_time_based_config.py

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from cesnet_tszoo.configs.handlers.time_based_handler import TimeBasedHandler
1717
import cesnet_tszoo.utils.css_styles.utils as css_utils
1818
from cesnet_tszoo.utils.custom_handler.factory import PerSeriesCustomHandlerFactory, NoFitCustomHandlerFactory
19-
from cesnet_tszoo.data_models.holders import PerSeriesCustomHandlerHolder, NoFitCustomHandlerHolder
19+
from cesnet_tszoo.data_models.holders import NoFitCustomHandlerHolder
2020
from cesnet_tszoo.data_models.preprocess_note import PreprocessNote
2121

2222

@@ -27,22 +27,25 @@ class DisjointTimeBasedConfig(SeriesBasedHandler, TimeBasedHandler, DatasetConfi
2727
Used to configure the following:
2828
2929
- Train, validation, test, all sets (time period, sizes, features, window size)
30-
- Handling missing values (default values, [`fillers`][cesnet_tszoo.utils.filler])
31-
- Handling anomalies ([`anomaly handlers`][cesnet_tszoo.utils.anomaly_handler])
32-
- Data transformation using [`transformers`][cesnet_tszoo.utils.transformer]
30+
- Handling missing values (default values, [`fillers`][cesnet_tszoo.utils.filler.filler])
31+
- Handling anomalies ([`anomaly handlers`][cesnet_tszoo.utils.anomaly_handler.anomaly_handler])
32+
- Data transformation using [`transformers`][cesnet_tszoo.utils.transformer.transformer]
33+
- Applying custom handlers ([`custom handlers`][cesnet_tszoo.utils.custom_handler.custom_handler])
34+
- Changing order of preprocesses
3335
- Dataloader options (train/val/test/all/init workers, batch sizes)
3436
- Plotting
3537
3638
**Important Notes:**
3739
38-
- Custom fillers must inherit from the [`fillers`][cesnet_tszoo.utils.filler.Filler] base class.
39-
- Custom anomaly handlers must inherit from the [`anomaly handlers`][cesnet_tszoo.utils.anomaly_handler.AnomalyHandler] base class.
40+
- Custom fillers must inherit from the [`fillers`][cesnet_tszoo.utils.filler.filler.Filler] base class.
41+
- Custom anomaly handlers must inherit from the [`anomaly handlers`][cesnet_tszoo.utils.anomaly_handler.anomaly_handler.AnomalyHandler] base class.
4042
- Selected anomaly handler is only used for train set.
41-
- It is recommended to use the [`transformers`][cesnet_tszoo.utils.transformer.Transformer] base class, though this is not mandatory as long as it meets the required methods.
43+
- It is recommended to use the [`transformers`][cesnet_tszoo.utils.transformer.transformer.Transformer] base class, though this is not mandatory as long as it meets the required methods.
4244
- If a transformer is already initialized and `partial_fit_initialized_transformers` is `False`, the transformer does not require `partial_fit`.
4345
- Otherwise, the transformer must support `partial_fit`.
4446
- Transformers must implement `transform` method.
4547
- Both `partial_fit` and `transform` methods must accept an input of type `np.ndarray` with shape `(times, features)`.
48+
- Custom handlers must be derived from one of the built-in [`custom handler`][cesnet_tszoo.utils.custom_handler.custom_handler] classes
4649
- `train_time_period`, `val_time_period`, `test_time_period` can overlap, but they should keep order of `train_time_period` < `val_time_period` < `test_time_period`
4750
4851
For available configuration options, refer to [here][cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig--configuration-options].
@@ -54,6 +57,10 @@ class DisjointTimeBasedConfig(SeriesBasedHandler, TimeBasedHandler, DatasetConfi
5457
uses_all_time_period: Whether all time period set should be used.
5558
uses_all_ts: Whether all time series set should be used.
5659
import_identifier: Tracks the name of the config upon import. None if not imported.
60+
filler_factory: Represents factory used to create passed Filler type.
61+
anomaly_handler_factory: Represents factory used to create passed Anomaly Handler type.
62+
transformer_factory: Represents factory used to create passed Transformer type.
63+
can_fit_fillers: Whether fillers in this config, can be fitted.
5764
logger: Logger for displaying information.
5865
5966
The following attributes are initialized when [`set_dataset_config_and_initialize`][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset.set_dataset_config_and_initialize] is called:
@@ -78,12 +85,9 @@ class DisjointTimeBasedConfig(SeriesBasedHandler, TimeBasedHandler, DatasetConfi
7885
used_singular_train_time_series: Currently used singular train set time series for dataloader.
7986
used_singular_val_time_series: Currently used singular validation set time series for dataloader.
8087
used_singular_test_time_series: Currently used singular test set time series for dataloader.
81-
transformers: Prepared transformers for fitting/transforming. Can be one transformer, array of transformers or `None`.
82-
train_fillers: Fillers used in the train set. `None` if no filler is used or train set is not used.
83-
val_fillers: Fillers used in the validation set. `None` if no filler is used or validation set is not used.
84-
test_fillers: Fillers used in the test set. `None` if no filler is used or test set is not used.
85-
all_fillers: Fillers used for the all set.
86-
anomaly_handlers: Prepared anomaly handlers for fitting/handling anomalies. Can be array of anomaly handlers or `None`.
88+
train_preprocess_order: All preprocesses used for train set.
89+
val_preprocess_order: All preprocesses used for val set.
90+
test_preprocess_order: All preprocesses used for test set.
8791
is_initialized: Flag indicating if the configuration has already been initialized. If true, config initialization will be skipped.
8892
version: Version of cesnet-tszoo this config was made in.
8993
export_update_needed: Whether config was updated to newer version and should be exported.
@@ -107,7 +111,8 @@ class DisjointTimeBasedConfig(SeriesBasedHandler, TimeBasedHandler, DatasetConfi
107111
train_batch_size: Batch size for the train dataloader. Affects number of returned times in one batch. `Default: 32`
108112
val_batch_size: Batch size for the validation dataloader. Affects number of returned times in one batch. `Default: 64`
109113
test_batch_size: Batch size for the test dataloader. Affects number of returned times in one batch. `Default: 128`
110-
fill_missing_with: Defines how to fill missing values in the dataset. Can pass enum [`FillerType`][cesnet_tszoo.utils.enums.FillerType] for built-in filler or pass a type of custom filler that must derive from [`Filler`][cesnet_tszoo.utils.filler.Filler] base class. `Default: None`
114+
preprocess_order: Defines in which order preprocesses are used. Also can add to order a type of [`AllSeriesCustomHandler`][cesnet_tszoo.utils.custom_handler.AllSeriesCustomHandler] or [`NoFitCustomHandler`][cesnet_tszoo.utils.custom_handler.NoFitCustomHandler]. `Default: ["handling_anomalies", "filling_gaps", "transforming"]`
115+
fill_missing_with: Defines how to fill missing values in the dataset. Can pass enum [`FillerType`][cesnet_tszoo.utils.enums.FillerType] for built-in filler or pass a type of custom filler that must derive from [`Filler`][cesnet_tszoo.utils.filler.filler.Filler] base class. `Default: None`
111116
transform_with: Defines the transformer used to transform the dataset. Can pass enum [`TransformerType`][cesnet_tszoo.utils.enums.TransformerType] for built-in transformer, pass a type of custom transformer or instance of already fitted transformer(s). `Default: None`
112117
handle_anomalies_with: Defines the anomaly handler for handling anomalies in the train set. Can pass enum [`AnomalyHandlerType`][cesnet_tszoo.utils.enums.AnomalyHandlerType] for built-in anomaly handler or a type of custom anomaly handler. `Default: None`
113118
partial_fit_initialized_transformers: If `True`, partial fitting on train set is performed when using initiliazed transformers. `Default: False`
@@ -153,10 +158,6 @@ def __init__(self,
153158
nan_threshold: float = 1.0,
154159
random_state: int | None = None):
155160

156-
# to remove
157-
158-
# to remove
159-
160161
self.logger = logging.getLogger("disjoint_time_based_config")
161162

162163
TimeBasedHandler.__init__(self, self.logger, train_batch_size, val_batch_size, test_batch_size, 1, False, sliding_window_size, sliding_window_prediction_size, sliding_window_step, set_shared_size, train_time_period, val_time_period, test_time_period)

0 commit comments

Comments
 (0)