fix(models): pad predictions to match index length in prepare_prediction_models

cyiallou · cyiallou · commit 3715a96d7802 · 2025-05-02T23:43:58.000+02:00
Fix a bug in `prepare_prediction_models` where model outputs could
mismatch the expected index length when the available data is smaller
than the configured moving average window size.

Instead of slicing and risking index errors, predictions are now aligned
to the reference index by right-aligning predictions and padding missing
entries with NaNs if necessary.

This makes the model preparation robust even with minimal data and prevents
silent failures or incorrect series construction.

Signed-off-by: cyiallou - Costas &lt;42914163+cyiallou@users.noreply.github.com&gt;
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -14,3 +14,4 @@
 ## Bug Fixes
 
 * Fixed a bug where the Solar Maintenance app would crash if some requested inverter components were missing from the reporting data. Missing components are now handled gracefully with a warning.
+* Fixed a bug in prediction model preparation where predictions could fail or mismatch the expected index when using minimal data and a large moving average window. Predictions are now correctly aligned with the data index, with missing values padded with NaN.
diff --git a/src/frequenz/lib/notebooks/solar/maintenance/models.py b/src/frequenz/lib/notebooks/solar/maintenance/models.py
@@ -145,15 +145,7 @@ def prepare_prediction_models(
         predictions = model_function(data=data_to_use, **model_params)
 
         if not isinstance(predictions, pd.Series):
-            if len(data_to_use) - len(predictions) > 0:
-                tmp_series = pd.Series(data=np.nan, index=data_to_use.index)
-                tmp_series.iloc[-len(predictions) :] = predictions
-                predictions = tmp_series
-            else:
-                predictions = pd.Series(
-                    data=predictions, index=data_to_use.index[-len(predictions) :]
-                )
-            predictions.name = "predictions"
+            predictions = _align_predictions_to_index(predictions, data_to_use.index)
 
         prediction_models[label] = {"predictions": predictions}
 
@@ -558,3 +550,34 @@ def _get_pvgis_hourly(
     predictions: SeriesFloat = simulation_data["power_W"]
     predictions.name = "predictions"
     return predictions
+
+
+def _align_predictions_to_index(
+    predictions: NDArray[np.float64],
+    reference_index: pd.Index[Any],
+) -> SeriesFloat:
+    """Align predictions to a reference index, padding with NaNs if necessary.
+
+    Args:
+        predictions: The prediction outputs.
+        reference_index: The DataFrame index to align to.
+
+    Returns:
+        A pandas Series with predictions aligned to the reference index.
+
+    Note:
+        If predictions are longer than the reference index, predictions will
+        be truncated from the left. If predictions are shorter, they are
+        right-aligned and earlier entries are filled with NaN.
+    """
+    reference_length = len(reference_index)
+    prediction_length = len(predictions)
+
+    if prediction_length > reference_length:
+        predictions = predictions[-reference_length:]
+    elif prediction_length < reference_length:
+        padded_predictions = np.full(reference_length, np.nan, dtype=np.float64)
+        padded_predictions[-prediction_length:] = predictions
+        predictions = padded_predictions
+
+    return pd.Series(data=predictions, index=reference_index, name="predictions")

Original file line number	Diff line number	Diff line change
`@@ -14,3 +14,4 @@`
`14`	`14`	`## Bug Fixes`
`15`	`15`
`16`	`16`	`* Fixed a bug where the Solar Maintenance app would crash if some requested inverter components were missing from the reporting data. Missing components are now handled gracefully with a warning.`
	`17`	`+* Fixed a bug in prediction model preparation where predictions could fail or mismatch the expected index when using minimal data and a large moving average window. Predictions are now correctly aligned with the data index, with missing values padded with NaN.`