Skip to content

Commit ada33c2

Browse files
add perfect validation score q
1 parent 89bef8a commit ada33c2

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

articles/machine-learning/how-to-automl-forecasting-faq.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,9 @@ AutoML uses machine learning best practices, such as cross-validated model selec
7070
- The training data uses **features that are not known into the future**, up to the forecast horizon. AutoML's regression models currently assume all features are known to the forecast horizon. We advise you to explore your data prior to training and remove any feature columns that are only known historically.
7171
- There are **significant structural differences - regime changes - between the training, validation, or test portions of the data**. For example, consider the effect of the COVID-19 pandemic on demand for almost any good during 2020 and 2021; this is a classic example of a regime change. Over-fitting due to regime change is the most challenging issue to address because it's highly scenario dependent and can require deep knowledge to identify. As a first line of defense, try to reserve 10 - 20% of the total history for validation, or cross-validation, data. It isn't always possible to reserve this amount of validation data if the training history is short, but is a best practice. See our guide on [configuring validation](./how-to-auto-train-forecast.md#training-and-validation-data) for more information.
7272

73+
## What does it mean if my training job achieves perfect validation scores?
74+
75+
It's possible to see perfect scores when viewing validation metrics from a training job. A perfect score means that the forecast and the actuals on the validation set are the same, or very nearly the same. For example, a root mean squared error equal to 0.0 or an R2 score of 1.0. A perfect validation score is _usually_ an indicator that the model is severely overfit, likely due to [data leakage](#how-can-i-prevent-over-fitting-and-data-leakage). The best course of action is to inspect the data for leaks and drop the column(s) that are causing the leak.
7376

7477
## What if my time series data doesn't have regularly spaced observations?
7578

0 commit comments

Comments
 (0)