Skip to content

Commit e1ff369

Browse files
Rogelio MeloRogelio Melo
authored andcommitted
Improve grammar and readability in documentation
1 parent c9b184a commit e1ff369

File tree

2 files changed

+55
-49
lines changed

2 files changed

+55
-49
lines changed

nbs/docs/capabilities/cross_validation.ipynb

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878
"source": [
7979
"## 2. Read the data\n",
8080
"\n",
81-
"For this tutorial, we use part of the hourly M4 dataset. It is stored in a parquet file for efficiency. You can use ordinary pandas operations to read your data in other formats likes `.csv`. \n",
81+
"For this tutorial, we use part of the hourly M4 dataset. It is stored in a parquet file for efficiencym. However, you can use ordinary pandas operations to read your data in other formats likes `.csv`. \n",
8282
"\n",
8383
"The input to `NeuralForecast` is always a data frame in [long format](https://www.theanalysisfactor.com/wide-and-long-data/) with three columns: `unique_id`, `ds` and `y`:\n",
8484
"\n",
@@ -180,7 +180,7 @@
180180
"cell_type": "markdown",
181181
"metadata": {},
182182
"source": [
183-
"For simplicity, we use only a single series to explore in detail the cross-validation functionality. Also, let's use the first 700 time steps, such that we work with round numbers, making it easier to visualize and understand cross-validation."
183+
"For simplicity, we focus on a single time series to explore the cross-validation functionality in detail. We also use only the first 700 time steps, which allows us to work with round numbers and makes the cross-validation process easier to visualize and understand."
184184
]
185185
},
186186
{
@@ -449,7 +449,7 @@
449449
"cell_type": "markdown",
450450
"metadata": {},
451451
"source": [
452-
"In the figure above, we see that we have 4 cutoff points, which correspond to our four cross-validation windows. Of course, notice that the windows are set from the end of the dataset. That way, the model trains on past data to predict future data. \n",
452+
"In the figure above, we observe four cutoff points, each corresponding to a cross-validation window. Note that these windows are defined from the end of the dataset, ensuring that the model is trained on past data to predict future data.\n",
453453
"\n",
454454
":::{.callout-warning collapse=\"true\"}\n",
455455
"## Important note\n",
@@ -655,11 +655,17 @@
655655
"metadata": {},
656656
"source": [
657657
"In the figure above, we see that our two folds overlap between time steps 601 and 650, since the step size is 50. This happens because:\n",
658+
"\n",
658659
"- fold 1: model is trained using time steps 0 to 550 and predicts 551 to 650 (h=100)\n",
659660
"- fold 2: model is trained using time steps 0 to 600 (`step_size=50`) and predicts 601 to 700\n",
660661
"\n",
661662
"Be aware that when evaluating a model trained with overlapping cross-validation windows, some time steps have more than one prediction. This may bias your evaluation metric, as the repeated time steps are taken into account in the metric multiple times."
662663
]
664+
},
665+
{
666+
"cell_type": "markdown",
667+
"metadata": {},
668+
"source": []
663669
}
664670
],
665671
"metadata": {

neuralforecast/core.py

Lines changed: 46 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ def __init__(
225225
local_static_scaler_type: Optional[str] = None,
226226
):
227227
"""The `core.StatsForecast` class allows you to efficiently fit multiple `NeuralForecast` models
228-
for large sets of time series. It operates with pandas DataFrame `df` that identifies series
228+
for large sets of time series. It operates with a pandas DataFrame `df` that identifies series
229229
and datestamps with the `unique_id` and `ds` columns. The `y` column denotes the target
230230
time series variable.
231231
@@ -234,9 +234,9 @@ def __init__(
234234
see [collection here](./models).
235235
freq (str or int): Frequency of the data. Must be a valid pandas or polars offset alias, or an integer.
236236
local_scaler_type (str, optional): Scaler to apply per-serie to temporal features before fitting, which is inverted after predicting.
237-
Can be 'standard', 'robust', 'robust-iqr', 'minmax' or 'boxcox'. Defaults to None.
237+
Can be 'standard', 'robust', 'robust-iqr', 'minmax' or 'boxcox'.
238238
local_static_scaler_type (str, optional): Scaler to apply to static exogenous features before fitting.
239-
Can be 'standard', 'robust', 'robust-iqr', 'minmax' or 'boxcox'. Defaults to None.
239+
Can be 'standard', 'robust', 'robust-iqr', 'minmax' or 'boxcox'.
240240
241241
Returns:
242242
NeuralForecast: Returns instantiated `NeuralForecast` class.
@@ -449,23 +449,23 @@ def fit(
449449
distributed_config: Optional[DistributedConfig] = None,
450450
prediction_intervals: Optional[PredictionIntervals] = None,
451451
) -> None:
452-
"""Fit the core.NeuralForecast.
452+
"""Fit the core.NeuralForecast
453453
454-
Fit `models` to a large set of time series from DataFrame `df`.
454+
Fit `models` to a large set of time series from DataFrame `df`
455455
and store fitted models for later inspection.
456456
457457
Args:
458458
df (pandas, polars or spark DataFrame, or a list of parquet files containing the series, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
459-
If None, a previously stored dataset is required. Defaults to None.
460-
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
461-
val_size (int, optional): Size of validation set. Defaults to 0.
462-
use_init_models (bool, optional): Use initial model passed when NeuralForecast object was instantiated. Defaults to False.
463-
verbose (bool): Print processing steps. Defaults to False.
464-
id_col (str): Column that identifies each serie. Defaults to 'unique_id'.
465-
time_col (str): Column that identifies each timestep, its values can be timestamps or integers. Defaults to 'ds'.
466-
target_col (str): Column that contains the target. Defaults to 'y'.
459+
If None, a previously stored dataset is required.
460+
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous.
461+
val_size (int, optional): Size of validation set.
462+
use_init_models (bool, optional): Use initial model passed when NeuralForecast object was instantiated.
463+
verbose (bool): Print processing steps.
464+
id_col (str): Column that identifies each serie.
465+
time_col (str): Column that identifies each timestep, its values can be timestamps or integers.
466+
target_col (str): Column that contains the target.
467467
distributed_config (neuralforecast.DistributedConfig): Configuration to use for DDP training. Currently only spark is supported.
468-
prediction_intervals (PredictionIntervals, optional): Configuration to calibrate prediction intervals (Conformal Prediction). Defaults to None.
468+
prediction_intervals (PredictionIntervals, optional): Configuration to calibrate prediction intervals (Conformal Prediction).
469469
470470
Returns:
471471
NeuralForecast: Returns `NeuralForecast` class with fitted `models`.
@@ -580,7 +580,7 @@ def make_future_dataframe(
580580
581581
Args:
582582
df (pandas or polars DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
583-
Only required if this is different than the one used in the fit step. Defaults to None.
583+
Only required if this is different than the one used in the fit step.
584584
"""
585585
if not self._fitted:
586586
raise Exception("You must fit the model first.")
@@ -821,14 +821,14 @@ def predict(
821821
822822
Args:
823823
df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
824-
If a DataFrame is passed, it is used to generate forecasts. Defaults to None.
825-
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
826-
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous. Defaults to None.
827-
verbose (bool): Print processing steps. Defaults to False.
824+
If a DataFrame is passed, it is used to generate forecasts.
825+
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous.
826+
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous.
827+
verbose (bool): Print processing steps.
828828
engine (spark session): Distributed engine for inference. Only used if df is a spark dataframe or if fit was called on a spark dataframe.
829-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
830-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
831-
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models. Defaults to None.
829+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
830+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
831+
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models.
832832
data_kwargs (kwargs): Extra arguments to be passed to the dataset within each model.
833833
834834
Returns:
@@ -1031,14 +1031,14 @@ def explain(
10311031
outputs (list of int, optional): List of outputs to explain for models with multiple outputs. Defaults to [0] (first output).
10321032
explainer (str): Name of the explainer to use. Options are 'IntegratedGradients', 'ShapleyValueSampling', 'InputXGradient'. Defaults to 'IntegratedGradients'.
10331033
df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
1034-
If a DataFrame is passed, it is used to generate forecasts. Defaults to None.
1035-
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
1036-
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous. Defaults to None.
1034+
If a DataFrame is passed, it is used to generate forecasts.
1035+
static_df (pandas, polars or spark DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous.
1036+
futr_df (pandas, polars or spark DataFrame, optional): DataFrame with [`unique_id`, `ds`] columns and `df`'s future exogenous.
10371037
h (int): The forecast horizon. Can be larger than the horizon set during training.
1038-
verbose (bool): Print processing steps. Defaults to False.
1038+
verbose (bool): Print processing steps.
10391039
engine (spark session): Distributed engine for inference. Only used if df is a spark dataframe or if fit was called on a spark dataframe.
1040-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
1041-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
1040+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
1041+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
10421042
data_kwargs (kwargs): Extra arguments to be passed to the dataset within each model.
10431043
10441044
Returns:
@@ -1360,24 +1360,24 @@ def cross_validation(
13601360
13611361
Args:
13621362
df (pandas or polars DataFrame, optional): DataFrame with columns [`unique_id`, `ds`, `y`] and exogenous variables.
1363-
If None, a previously stored dataset is required. Defaults to None.
1363+
If None, a previously stored dataset is required.
13641364
static_df (pandas or polars DataFrame, optional): DataFrame with columns [`unique_id`] and static exogenous. Defaults to None.
1365-
n_windows (int): Number of windows used for cross validation. Defaults to 1.
1366-
step_size (int): Step size between each window. Defaults to 1.
1365+
n_windows (int): Number of windows used for cross validation.
1366+
step_size (int): Step size between each window.
13671367
val_size (int, optional): Length of validation size. If passed, set `n_windows=None`. Defaults to 0.
1368-
test_size (int, optional): Length of test size. If passed, set `n_windows=None`. Defaults to None.
1369-
use_init_models (bool, optional): Use initial model passed when object was instantiated. Defaults to False.
1370-
verbose (bool): Print processing steps. Defaults to False.
1368+
test_size (int, optional): Length of test size. If passed, set `n_windows=None`.
1369+
use_init_models (bool, optional): Use initial model passed when object was instantiated.
1370+
verbose (bool): Print processing steps.
13711371
refit (bool or int): Retrain model for each cross validation window.
13721372
If False, the models are trained at the beginning and then used to predict each window.
1373-
If positive int, the models are retrained every `refit` windows. Defaults to False.
1374-
id_col (str): Column that identifies each serie. Defaults to 'unique_id'.
1373+
If positive int, the models are retrained every `refit` windows.
1374+
id_col (str): Column that identifies each serie.
13751375
time_col (str): Column that identifies each timestep, its values can be timestamps or integers. Defaults to 'ds'.
1376-
target_col (str): Column that contains the target. Defaults to 'y'.
1376+
target_col (str): Column that contains the target.
13771377
prediction_intervals (PredictionIntervals, optional): Configuration to calibrate prediction intervals (Conformal Prediction). Defaults to None.
1378-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
1379-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
1380-
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models. Defaults to None.
1378+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
1379+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
1380+
h (int, optional): Forecasting horizon. If None, uses the horizon of the fitted models.
13811381
data_kwargs (kwargs): Extra arguments to be passed to the dataset within each model.
13821382
13831383
Returns:
@@ -1533,9 +1533,9 @@ def predict_insample(
15331533
to predict historic values of a time series from the stored dataframe.
15341534
15351535
Args:
1536-
step_size (int): Step size between each window. Defaults to 1.
1537-
level (list of ints or floats, optional): Confidence levels between 0 and 100. Defaults to None.
1538-
quantiles (list of floats, optional): Alternative to level, target quantiles to predict. Defaults to None.
1536+
step_size (int): Step size between each window.
1537+
level (list of ints or floats, optional): Confidence levels between 0 and 100.
1538+
quantiles (list of floats, optional): Alternative to level, target quantiles to predict.
15391539
15401540
Returns:
15411541
fcsts_df (pandas.DataFrame): DataFrame with insample predictions for all fitted `models`.
@@ -1705,9 +1705,9 @@ def save(
17051705
17061706
Args:
17071707
path (str): Directory to save current status.
1708-
model_index (list, optional): List to specify which models from list of self.models to save. Defaults to None.
1709-
save_dataset (bool): Whether to save dataset or not. Defaults to True.
1710-
overwrite (bool): Whether to overwrite files or not. Defaults to False.
1708+
model_index (list, optional): List to specify which models from list of self.models to save.
1709+
save_dataset (bool): Whether to save dataset or not.
1710+
overwrite (bool): Whether to overwrite files or not.
17111711
"""
17121712
# Standarize path without '/'
17131713
if path[-1] == "/":

0 commit comments

Comments
 (0)