Skip to content

feat: Add script to fetch Belgium solar data from Elia (Issue #121)#132

Open
AswaniSahoo wants to merge 1 commit intoopenclimatefix:mainfrom
AswaniSahoo:feature/add-belgium-elia-solar-data
Open

feat: Add script to fetch Belgium solar data from Elia (Issue #121)#132
AswaniSahoo wants to merge 1 commit intoopenclimatefix:mainfrom
AswaniSahoo:feature/add-belgium-elia-solar-data

Conversation

@AswaniSahoo
Copy link

Pull Request

Description

This PR adds a new script to fetch solar generation data from the Elia (Belgium TSO) Open Data Platform. This is the foundational step required to extend PVNet models to Belgium (supporting Issue #121 — Country Coordination).

The new EliaData class allows users to fetch 15-minute resolution photovoltaic power generation data from Belgium's national grid operator. The Elia Open Data API is fully public — no API key is required.

Key features (following the patterns established in PR #127):

  • get_data() — Fetches raw solar generation data with automatic pagination (using params.copy() pattern from fetch_eia_data.py)
  • get_dataset() — Converts data to xarray.Dataset compatible with ocf-data-sampler (Zarr-compatible)
  • No API key needed — Elia data is fully public
  • 15-minute resolution measured & monitored solar capacity
  • Historical data available from 2012+

Data source: Elia Open Data — Solar PV (ods087)

Relates to #121


How Has This Been Tested?

I have tested this change in two ways:

  1. Manual Verification: Verified the script locally against the live Elia API. Successfully fetched 15-minute solar generation data for Belgium for a 24-hour period.
  2. Automated Tests: Added tests/test_elia.py which uses unittest.mock to verify the API request logic, pagination, parameter formatting, date filtering, xarray conversion, and error handling without requiring a real API connection.

Test results:

  • pytest tests/test_elia.py -v — all 9 tests pass

  • pytest tests/ -v — all 72 tests pass (no regressions)

  • pytest --cov95% coverage on fetch_elia_data.py (only __main__ block uncovered)

  • Yes


Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

Copilot AI review requested due to automatic review settings February 16, 2026 19:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Belgium solar PV generation ingestion via Elia’s public Opendatasoft API, providing a new data-fetching script and accompanying tests as a foundation for Belgium PVNet support (Issue #121).

Changes:

  • Introduce EliaData with get_data() (pagination + filtering) and get_dataset() (xarray conversion).
  • Add unit tests validating request construction, pagination, filtering, conversion, and error/empty handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/open_data_pvnet/scripts/fetch_elia_data.py New Elia API client to fetch PV generation data and convert it to an xarray.Dataset.
tests/test_elia.py Adds mocked tests for Elia request logic, pagination, and dataset conversion behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

value_cols = [
c
for c in df.columns
if c not in ("datetime_utc", "resolutioncode", "mostrecent")
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exclusion list for value_cols includes "mostrecent", but the API field used in the tests/fixtures is "mostrecentforecast". As written, this filter won’t exclude the forecast column (and may indicate a typo/unused exclusion). Align the exclusion list with actual field names or remove the special-casing to avoid silently including/unintentionally coercing the wrong columns.

Suggested change
if c not in ("datetime_utc", "resolutioncode", "mostrecent")
if c not in (
"datetime_utc",
"resolutioncode",
"mostrecent",
"mostrecentforecast",
)

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +61
limit: int = 100,
) -> Optional[pd.DataFrame]:
"""
Fetch solar generation data from the Elia Open Data API.

Automatically paginates through all available results for the
requested date range.

Args:
start_date: Start date string (YYYY-MM-DD)
end_date: End date string (YYYY-MM-DD)
dataset: Elia dataset identifier (default: ods087 for solar PV)
limit: Number of records per API page (max 100)

Returns:
pd.DataFrame with solar generation records, or None if error/empty
"""
url = f"{self.base_url}/{dataset}/records"

where_clause = (
f"datetime >= '{start_date}T00:00:00Z' "
f"AND datetime <= '{end_date}T23:59:59Z'"
)

params = {
"where": where_clause,
"order_by": "datetime ASC",
"limit": limit,
"offset": 0,
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says limit is "max 100" but the method doesn’t enforce it. If callers pass a higher value, the API may error; consider validating/clamping limit (and also ensuring it’s > 0) so pagination logic remains reliable.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +61
_, kwargs = mock_get.call_args
assert "ods087" in kwargs["params"]["where"] or "ods087" in _[0]
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion mixes checking the dataset ID in the URL with checking it in the where clause, but the dataset ID shouldn’t appear in where. As an OR, it also weakens the test (it could pass even if the URL is wrong). Consider asserting explicitly on the requested URL/path (and keep where assertions focused on date filtering).

Suggested change
_, kwargs = mock_get.call_args
assert "ods087" in kwargs["params"]["where"] or "ods087" in _[0]
args, kwargs = mock_get.call_args
assert "ods087" in args[0]

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +33
def __init__(self) -> None:
self.base_url = (
"https://opendata.elia.be/api/explore/v2.1/catalog/datasets"
)
self.default_dataset = "ods087"

def get_data(
self,
start_date: str,
end_date: str,
dataset: str = "ods087",
limit: int = 100,
) -> Optional[pd.DataFrame]:
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_dataset is set on the instance but get_data()/get_dataset() hard-code their own default dataset="ods087", so the class has two sources of truth. Consider making dataset optional (default None) and falling back to self.default_dataset, or otherwise referencing self.default_dataset so future updates don’t drift.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant