Skip to content

Optimisation for Evaluation Pipeline #344

@Sharkyii

Description

@Sharkyii

Detailed Description
The current evaluation pipeline relies on iterrows() and nested loops to generate forecast horizons and perform timestamp matching. For each test row, the implementation repeatedly applies boolean masking over the entire pv_data dataframe to find the closest timestamp within a ±5 minute window. This leads to repeated full-dataframe scans and unnecessary Python-level iteration.

Context
Evaluation runs may involve large datasets and 100k+ horizon expansions. Because the pipeline performs repeated filtering operations for each horizon and each test entry, runtime scales poorly as dataset size increases. The bottleneck appears to be algorithmic (row-wise iteration and repeated dataframe scans) rather than hardware-related.

Possible Implementation
Focus on algorithmic optimization
Optimizing the data-processing logic should significantly improve performance and scalability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions