feat: Add script to fetch Belgium solar data from Elia (Issue #121)#132
feat: Add script to fetch Belgium solar data from Elia (Issue #121)#132AswaniSahoo wants to merge 1 commit intoopenclimatefix:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds Belgium solar PV generation ingestion via Elia’s public Opendatasoft API, providing a new data-fetching script and accompanying tests as a foundation for Belgium PVNet support (Issue #121).
Changes:
- Introduce
EliaDatawithget_data()(pagination + filtering) andget_dataset()(xarray conversion). - Add unit tests validating request construction, pagination, filtering, conversion, and error/empty handling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/open_data_pvnet/scripts/fetch_elia_data.py |
New Elia API client to fetch PV generation data and convert it to an xarray.Dataset. |
tests/test_elia.py |
Adds mocked tests for Elia request logic, pagination, and dataset conversion behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| value_cols = [ | ||
| c | ||
| for c in df.columns | ||
| if c not in ("datetime_utc", "resolutioncode", "mostrecent") |
There was a problem hiding this comment.
The exclusion list for value_cols includes "mostrecent", but the API field used in the tests/fixtures is "mostrecentforecast". As written, this filter won’t exclude the forecast column (and may indicate a typo/unused exclusion). Align the exclusion list with actual field names or remove the special-casing to avoid silently including/unintentionally coercing the wrong columns.
| if c not in ("datetime_utc", "resolutioncode", "mostrecent") | |
| if c not in ( | |
| "datetime_utc", | |
| "resolutioncode", | |
| "mostrecent", | |
| "mostrecentforecast", | |
| ) |
| limit: int = 100, | ||
| ) -> Optional[pd.DataFrame]: | ||
| """ | ||
| Fetch solar generation data from the Elia Open Data API. | ||
|
|
||
| Automatically paginates through all available results for the | ||
| requested date range. | ||
|
|
||
| Args: | ||
| start_date: Start date string (YYYY-MM-DD) | ||
| end_date: End date string (YYYY-MM-DD) | ||
| dataset: Elia dataset identifier (default: ods087 for solar PV) | ||
| limit: Number of records per API page (max 100) | ||
|
|
||
| Returns: | ||
| pd.DataFrame with solar generation records, or None if error/empty | ||
| """ | ||
| url = f"{self.base_url}/{dataset}/records" | ||
|
|
||
| where_clause = ( | ||
| f"datetime >= '{start_date}T00:00:00Z' " | ||
| f"AND datetime <= '{end_date}T23:59:59Z'" | ||
| ) | ||
|
|
||
| params = { | ||
| "where": where_clause, | ||
| "order_by": "datetime ASC", | ||
| "limit": limit, | ||
| "offset": 0, | ||
| } |
There was a problem hiding this comment.
The docstring says limit is "max 100" but the method doesn’t enforce it. If callers pass a higher value, the API may error; consider validating/clamping limit (and also ensuring it’s > 0) so pagination logic remains reliable.
| _, kwargs = mock_get.call_args | ||
| assert "ods087" in kwargs["params"]["where"] or "ods087" in _[0] |
There was a problem hiding this comment.
This assertion mixes checking the dataset ID in the URL with checking it in the where clause, but the dataset ID shouldn’t appear in where. As an OR, it also weakens the test (it could pass even if the URL is wrong). Consider asserting explicitly on the requested URL/path (and keep where assertions focused on date filtering).
| _, kwargs = mock_get.call_args | |
| assert "ods087" in kwargs["params"]["where"] or "ods087" in _[0] | |
| args, kwargs = mock_get.call_args | |
| assert "ods087" in args[0] |
| def __init__(self) -> None: | ||
| self.base_url = ( | ||
| "https://opendata.elia.be/api/explore/v2.1/catalog/datasets" | ||
| ) | ||
| self.default_dataset = "ods087" | ||
|
|
||
| def get_data( | ||
| self, | ||
| start_date: str, | ||
| end_date: str, | ||
| dataset: str = "ods087", | ||
| limit: int = 100, | ||
| ) -> Optional[pd.DataFrame]: |
There was a problem hiding this comment.
default_dataset is set on the instance but get_data()/get_dataset() hard-code their own default dataset="ods087", so the class has two sources of truth. Consider making dataset optional (default None) and falling back to self.default_dataset, or otherwise referencing self.default_dataset so future updates don’t drift.
Pull Request
Description
This PR adds a new script to fetch solar generation data from the Elia (Belgium TSO) Open Data Platform. This is the foundational step required to extend PVNet models to Belgium (supporting Issue #121 — Country Coordination).
The new
EliaDataclass allows users to fetch 15-minute resolution photovoltaic power generation data from Belgium's national grid operator. The Elia Open Data API is fully public — no API key is required.Key features (following the patterns established in PR #127):
get_data()— Fetches raw solar generation data with automatic pagination (usingparams.copy()pattern fromfetch_eia_data.py)get_dataset()— Converts data toxarray.Datasetcompatible with ocf-data-sampler (Zarr-compatible)Data source: Elia Open Data — Solar PV (ods087)
Relates to #121
How Has This Been Tested?
I have tested this change in two ways:
tests/test_elia.pywhich usesunittest.mockto verify the API request logic, pagination, parameter formatting, date filtering, xarray conversion, and error handling without requiring a real API connection.Test results:
pytest tests/test_elia.py -v— all 9 tests passpytest tests/ -v— all 72 tests pass (no regressions)pytest --cov— 95% coverage onfetch_elia_data.py(only__main__block uncovered)Yes
Checklist: