Implement Feature Selection via Correlation Matrix

## Problem
The current forecasting model currently ingests all available features without filtering for multicollinearity. This is particularly problematic for weather data, where features like maximum temperature (`tmax`) and sun duration (`tsun`) are often highly correlated (> 0.9).

While the Gradient Boosting model handles this gracefully for *prediction* accuracy, it harms *interpretability*. The model may arbitrarily split importance between two redundant features, making it impossible for the user to know which factor is the true trigger.

## Proposed Solution
Implement an automated, efficient feature selection step using a **Correlation Matrix Filter**. This was chosen over Recursive Feature Elimination (RFE) for its speed and deterministic nature, which is critical for a desktop application user experience.

## Implementation Details
1.  **Algorithm**: Pearson Correlation.
2.  **Threshold**: > 0.90 (Value should be defined as a constant).
3.  **Process**:
    - Calculate correlation matrix `df.corr().abs()`.
    - Iterate through the upper triangle of the matrix.
    - Identify columns with correlation coefficient higher than the threshold.
    - Drop one column from each correlated pair.
    - **Tie-breaking**: If feature A and B are correlated, prefer keeping the one that is more "raw" or has fewer missing values if possible. Otherwise, simply drop the second one encountered to ensure deterministic behavior.

## Location
- Implement the filtering logic in `forecasting/feature_engine.py` or a new utility module.
- Call this filter in `forecasting/train_model.py` immediately after data processing and before model training.

## Acceptance Criteria
- [ ] Unit test proving that perfectly correlated features result in one being dropped.
- [ ] Pipeline runs without errors.
- [ ] Feature importance outputs (if exposed) no longer show dilute signals across redundant variables.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Feature Selection via Correlation Matrix #59

Problem

Proposed Solution

Implementation Details

Location

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement Feature Selection via Correlation Matrix #59

Description

Problem

Proposed Solution

Implementation Details

Location

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions