-
-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Detailed Description
The current evaluation pipeline relies on iterrows() and nested loops to generate forecast horizons and perform timestamp matching. For each test row, the implementation repeatedly applies boolean masking over the entire pv_data dataframe to find the closest timestamp within a ±5 minute window. This leads to repeated full-dataframe scans and unnecessary Python-level iteration.
Context
Evaluation runs may involve large datasets and 100k+ horizon expansions. Because the pipeline performs repeated filtering operations for each horizon and each test entry, runtime scales poorly as dataset size increases. The bottleneck appears to be algorithmic (row-wise iteration and repeated dataframe scans) rather than hardware-related.
Possible Implementation
Focus on algorithmic optimization
Optimizing the data-processing logic should significantly improve performance and scalability.