|
78 | 78 | "source": [ |
79 | 79 | "## 2. Read the data\n", |
80 | 80 | "\n", |
81 | | - "For this tutorial, we use part of the hourly M4 dataset. It is stored in a parquet file for efficiency. You can use ordinary pandas operations to read your data in other formats likes `.csv`. \n", |
| 81 | + "For this tutorial, we use part of the hourly M4 dataset. It is stored in a parquet file for efficiency. However, you can use ordinary pandas operations to read your data in other formats likes `.csv`. \n", |
82 | 82 | "\n", |
83 | 83 | "The input to `NeuralForecast` is always a data frame in [long format](https://www.theanalysisfactor.com/wide-and-long-data/) with three columns: `unique_id`, `ds` and `y`:\n", |
84 | 84 | "\n", |
|
180 | 180 | "cell_type": "markdown", |
181 | 181 | "metadata": {}, |
182 | 182 | "source": [ |
183 | | - "For simplicity, we use only a single series to explore in detail the cross-validation functionality. Also, let's use the first 700 time steps, such that we work with round numbers, making it easier to visualize and understand cross-validation." |
| 183 | + "For simplicity, we focus on a single time series to explore the cross-validation functionality in detail. We also use only the first 700 time steps, which allows us to work with round numbers and makes the cross-validation process easier to visualize and understand." |
184 | 184 | ] |
185 | 185 | }, |
186 | 186 | { |
|
449 | 449 | "cell_type": "markdown", |
450 | 450 | "metadata": {}, |
451 | 451 | "source": [ |
452 | | - "In the figure above, we see that we have 4 cutoff points, which correspond to our four cross-validation windows. Of course, notice that the windows are set from the end of the dataset. That way, the model trains on past data to predict future data. \n", |
| 452 | + "In the figure above, we observe four cutoff points, each corresponding to a cross-validation window. Note that these windows are defined from the end of the dataset, ensuring that the model is trained on past data to predict future data.\n", |
453 | 453 | "\n", |
454 | 454 | ":::{.callout-warning collapse=\"true\"}\n", |
455 | 455 | "## Important note\n", |
|
655 | 655 | "metadata": {}, |
656 | 656 | "source": [ |
657 | 657 | "In the figure above, we see that our two folds overlap between time steps 601 and 650, since the step size is 50. This happens because:\n", |
| 658 | + "\n", |
658 | 659 | "- fold 1: model is trained using time steps 0 to 550 and predicts 551 to 650 (h=100)\n", |
659 | 660 | "- fold 2: model is trained using time steps 0 to 600 (`step_size=50`) and predicts 601 to 700\n", |
660 | 661 | "\n", |
661 | 662 | "Be aware that when evaluating a model trained with overlapping cross-validation windows, some time steps have more than one prediction. This may bias your evaluation metric, as the repeated time steps are taken into account in the metric multiple times." |
662 | 663 | ] |
| 664 | + }, |
| 665 | + { |
| 666 | + "cell_type": "markdown", |
| 667 | + "metadata": {}, |
| 668 | + "source": [] |
663 | 669 | } |
664 | 670 | ], |
665 | 671 | "metadata": { |
|
0 commit comments