You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: nbs/docs/capabilities/cross_validation.ipynb
+9-3Lines changed: 9 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -78,7 +78,7 @@
78
78
"source": [
79
79
"## 2. Read the data\n",
80
80
"\n",
81
-
"For this tutorial, we use part of the hourly M4 dataset. It is stored in a parquet file for efficiency. You can use ordinary pandas operations to read your data in other formats likes `.csv`. \n",
81
+
"For this tutorial, we use part of the hourly M4 dataset. It is stored in a parquet file for efficiencym. However, you can use ordinary pandas operations to read your data in other formats likes `.csv`. \n",
82
82
"\n",
83
83
"The input to `NeuralForecast` is always a data frame in [long format](https://www.theanalysisfactor.com/wide-and-long-data/) with three columns: `unique_id`, `ds` and `y`:\n",
84
84
"\n",
@@ -180,7 +180,7 @@
180
180
"cell_type": "markdown",
181
181
"metadata": {},
182
182
"source": [
183
-
"For simplicity, we use only a single series to explore in detail the cross-validation functionality. Also, let's use the first 700 time steps, such that we work with round numbers, making it easier to visualize and understand cross-validation."
183
+
"For simplicity, we focus on a single time series to explore the cross-validation functionality in detail. We also use only the first 700 time steps, which allows us to work with round numbers and makes the cross-validation process easier to visualize and understand."
184
184
]
185
185
},
186
186
{
@@ -449,7 +449,7 @@
449
449
"cell_type": "markdown",
450
450
"metadata": {},
451
451
"source": [
452
-
"In the figure above, we see that we have 4 cutoff points, which correspond to our four cross-validation windows. Of course, notice that the windows are set from the end of the dataset. That way, the model trains on past data to predict future data.\n",
452
+
"In the figure above, we observe four cutoff points, each corresponding to a cross-validation window. Note that these windows are defined from the end of the dataset, ensuring that the model is trained on past data to predict future data.\n",
453
453
"\n",
454
454
":::{.callout-warning collapse=\"true\"}\n",
455
455
"## Important note\n",
@@ -655,11 +655,17 @@
655
655
"metadata": {},
656
656
"source": [
657
657
"In the figure above, we see that our two folds overlap between time steps 601 and 650, since the step size is 50. This happens because:\n",
658
+
"\n",
658
659
"- fold 1: model is trained using time steps 0 to 550 and predicts 551 to 650 (h=100)\n",
659
660
"- fold 2: model is trained using time steps 0 to 600 (`step_size=50`) and predicts 601 to 700\n",
660
661
"\n",
661
662
"Be aware that when evaluating a model trained with overlapping cross-validation windows, some time steps have more than one prediction. This may bias your evaluation metric, as the repeated time steps are taken into account in the metric multiple times."
Fit `models` to a large set of time series from DataFrame `df`.
454
+
Fit `models` to a large set of time series from DataFrame `df`
455
455
and store fitted models for later inspection.
456
456
457
457
Args:
458
458
df (pandas, polars or spark DataFrame, or a list of parquet files containing the series, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
459
-
If None, a previously stored dataset is required. Defaults to None.
460
-
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
461
-
val_size (int, optional): Size of validation set. Defaults to 0.
462
-
use_init_models (bool, optional): Use initial model passed when NeuralForecast object was instantiated. Defaults to False.
463
-
verbose (bool): Print processing steps. Defaults to False.
464
-
id_col (str): Column that identifies each serie. Defaults to 'unique_id'.
465
-
time_col (str): Column that identifies each timestep, its values can be timestamps or integers. Defaults to 'ds'.
466
-
target_col (str): Column that contains the target. Defaults to 'y'.
459
+
If None, a previously stored dataset is required.
460
+
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous.
461
+
val_size (int, optional): Size of validation set.
462
+
use_init_models (bool, optional): Use initial model passed when NeuralForecast object was instantiated.
463
+
verbose (bool): Print processing steps.
464
+
id_col (str): Column that identifies each serie.
465
+
time_col (str): Column that identifies each timestep, its values can be timestamps or integers.
466
+
target_col (str): Column that contains the target.
467
467
distributed_config (neuralforecast.DistributedConfig): Configuration to use for DDP training. Currently only spark is supported.
468
-
prediction_intervals (PredictionIntervals, optional): Configuration to calibrate prediction intervals (Conformal Prediction). Defaults to None.
468
+
prediction_intervals (PredictionIntervals, optional): Configuration to calibrate prediction intervals (Conformal Prediction).
469
469
470
470
Returns:
471
471
NeuralForecast: Returns `NeuralForecast` class with fitted `models`.
@@ -580,7 +580,7 @@ def make_future_dataframe(
580
580
581
581
Args:
582
582
df (pandas or polars DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
583
-
Only required if this is different than the one used in the fit step. Defaults to None.
583
+
Only required if this is different than the one used in the fit step.
584
584
"""
585
585
ifnotself._fitted:
586
586
raiseException("You must fit the model first.")
@@ -821,14 +821,14 @@ def predict(
821
821
822
822
Args:
823
823
df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
824
-
If a DataFrame is passed, it is used to generate forecasts. Defaults to None.
825
-
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
826
-
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous. Defaults to None.
827
-
verbose (bool): Print processing steps. Defaults to False.
824
+
If a DataFrame is passed, it is used to generate forecasts.
825
+
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous.
826
+
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous.
827
+
verbose (bool): Print processing steps.
828
828
engine (spark session): Distributed engine for inference. Only used if df is a spark dataframe or if fit was called on a spark dataframe.
829
-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
830
-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
831
-
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models. Defaults to None.
829
+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
830
+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
831
+
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models.
832
832
data_kwargs (kwargs): Extra arguments to be passed to the dataset within each model.
833
833
834
834
Returns:
@@ -1031,14 +1031,14 @@ def explain(
1031
1031
outputs (list of int, optional): List of outputs to explain for models with multiple outputs. Defaults to [0] (first output).
1032
1032
explainer (str): Name of the explainer to use. Options are 'IntegratedGradients', 'ShapleyValueSampling', 'InputXGradient'. Defaults to 'IntegratedGradients'.
1033
1033
df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
1034
-
If a DataFrame is passed, it is used to generate forecasts. Defaults to None.
1035
-
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
1036
-
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous. Defaults to None.
1034
+
If a DataFrame is passed, it is used to generate forecasts.
1035
+
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous.
1036
+
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous.
1037
1037
h (int): The forecast horizon. Can be larger than the horizon set during training.
1038
-
verbose (bool): Print processing steps. Defaults to False.
1038
+
verbose (bool): Print processing steps.
1039
1039
engine (spark session): Distributed engine for inference. Only used if df is a spark dataframe or if fit was called on a spark dataframe.
1040
-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
1041
-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
1040
+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
1041
+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
1042
1042
data_kwargs (kwargs): Extra arguments to be passed to the dataset within each model.
1043
1043
1044
1044
Returns:
@@ -1360,24 +1360,24 @@ def cross_validation(
1360
1360
1361
1361
Args:
1362
1362
df (pandas or polars DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
1363
-
If None, a previously stored dataset is required. Defaults to None.
1363
+
If None, a previously stored dataset is required.
1364
1364
static_df (pandas or polars DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
1365
-
n_windows (int): Number of windows used for cross validation. Defaults to 1.
1366
-
step_size (int): Step size between each window. Defaults to 1.
1365
+
n_windows (int): Number of windows used for cross validation.
1366
+
step_size (int): Step size between each window.
1367
1367
val_size (int, optional): Length of validation size. If passed, set `n_windows=None`. Defaults to 0.
1368
-
test_size (int, optional): Length of test size. If passed, set `n_windows=None`. Defaults to None.
1369
-
use_init_models (bool, optional): Use initial model passed when object was instantiated. Defaults to False.
1370
-
verbose (bool): Print processing steps. Defaults to False.
1368
+
test_size (int, optional): Length of test size. If passed, set `n_windows=None`.
1369
+
use_init_models (bool, optional): Use initial model passed when object was instantiated.
1370
+
verbose (bool): Print processing steps.
1371
1371
refit (bool or int): Retrain model for each cross validation window.
1372
1372
If False, the models are trained at the beginning and then used to predict each window.
1373
-
If positive int, the models are retrained every `refit` windows. Defaults to False.
1374
-
id_col (str): Column that identifies each serie. Defaults to 'unique_id'.
1373
+
If positive int, the models are retrained every `refit` windows.
1374
+
id_col (str): Column that identifies each serie.
1375
1375
time_col (str): Column that identifies each timestep, its values can be timestamps or integers. Defaults to 'ds'.
1376
-
target_col (str): Column that contains the target. Defaults to 'y'.
1376
+
target_col (str): Column that contains the target.
1377
1377
prediction_intervals (PredictionIntervals, optional): Configuration to calibrate prediction intervals (Conformal Prediction). Defaults to None.
1378
-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
1379
-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
1380
-
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models. Defaults to None.
1378
+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
1379
+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
1380
+
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models.
1381
1381
data_kwargs (kwargs): Extra arguments to be passed to the dataset within each model.
1382
1382
1383
1383
Returns:
@@ -1533,9 +1533,9 @@ def predict_insample(
1533
1533
to predict historic values of a time series from the stored dataframe.
1534
1534
1535
1535
Args:
1536
-
step_size (int): Step size between each window. Defaults to 1.
1537
-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
1538
-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
1536
+
step_size (int): Step size between each window.
1537
+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
1538
+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
1539
1539
1540
1540
Returns:
1541
1541
fcsts_df (pandas.DataFrame): DataFrame with insample predictions for all fitted `models`.
@@ -1705,9 +1705,9 @@ def save(
1705
1705
1706
1706
Args:
1707
1707
path (str): Directory to save current status.
1708
-
model_index (list, optional): List to specify which models from list of self.models to save. Defaults to None.
1709
-
save_dataset (bool): Whether to save dataset or not. Defaults to True.
1710
-
overwrite (bool): Whether to overwrite files or not. Defaults to False.
1708
+
model_index (list, optional): List to specify which models from list of self.models to save.
1709
+
save_dataset (bool): Whether to save dataset or not.
1710
+
overwrite (bool): Whether to overwrite files or not.
0 commit comments