Skip to content

Extends PVNet to support the USA#108

Open
prasanna1504 wants to merge 12 commits intoopenclimatefix:mainfrom
prasanna1504:bug
Open

Extends PVNet to support the USA#108
prasanna1504 wants to merge 12 commits intoopenclimatefix:mainfrom
prasanna1504:bug

Conversation

@prasanna1504
Copy link
Contributor

Pull Request

Description

Extends PVNet to support the United States by adding data ingestion for U.S. solar generation (EIA API) and GFS weather data processing. Enables training/validation for U.S. regions using the same CLI as UK.

Key Changes:

  • EIA Data Ingestion: fetch_eia_data.py and collect_eia_data.py to fetch hourly solar generation by Balancing Authority (7 major ISOs: CAISO, ERCOT, PJM, MISO, NYISO, ISO-NE, SPP)
  • GFS Processing: Complete pipeline to download GFS GRIB2 from NOAA S3, convert to Zarr with channel filtering, supports --region us and --region global
  • US Config: Added gfs_us_data_config.yaml for US-specific GFS settings
  • CLI Integration: Extended GFS provider with --region flag (defaults to "global" for backward compatibility)

Fixes

Fixes #103

How Has This Been Tested?

  • Unit tests: Added test_eia_fetcher.py and test_collect_eia.py covering API client, data collection, pagination, and error handling

  • Integration: Verified GFS download from NOAA S3, GRIB→Zarr conversion, CLI --region us flag, and backward compatibility

  • Code quality: Formatted with black, linted with ruff, Google-style docstrings

  • Yes, I have tested this code

  • Yes, I have tested plotting changes (if data processing is affected)

Checklist

  • My code follows OCF's coding style guidelines (coding_style.md)
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@codecov
Copy link

codecov bot commented Dec 28, 2025

Codecov Report

❌ Patch coverage is 50.48544% with 102 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.21%. Comparing base (12d4558) to head (582492f).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/open_data_pvnet/nwp/gfs.py 11.82% 82 Missing ⚠️
src/open_data_pvnet/scripts/fetch_eia_data.py 83.05% 10 Missing ⚠️
src/open_data_pvnet/scripts/collect_eia_data.py 82.35% 9 Missing ⚠️
src/open_data_pvnet/scripts/archive.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #108      +/-   ##
==========================================
+ Coverage   45.72%   46.21%   +0.48%     
==========================================
  Files          16       18       +2     
  Lines        1124     1413     +289     
==========================================
+ Hits          514      653     +139     
- Misses        610      760     +150     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@prasanna1504
Copy link
Contributor Author

prasanna1504 commented Dec 28, 2025

@siddharth7113 @jcamier @peterdudfield hey could you review this pr fixing #103

@siddharth7113 siddharth7113 self-requested a review December 29, 2025 11:22
@prasanna1504 prasanna1504 marked this pull request as draft January 5, 2026 11:47
@prasanna1504
Copy link
Contributor Author

Summary of Changes

1. New US Data Pipeline

A complete pipeline has been implemented to fetch, process, and store US solar generation data.

Component File(s) Description
Data Collection collect_eia_data.py, fetch_eia_data.py Pulls raw solar generation data from the EIA API
Preprocessing preprocess_eia_for_sampler.py Converts raw data into Zarr format for ocf-data-sampler. Mimics the UK GSP structure (coordinates, capacity, timestamps)
S3 Integration upload_eia_to_s3.py Uploads processed Zarr files to s3://ocf-open-data-pvnet/data/us/eia/

2. GFS (Weather) Updates

  • Global Support: Modified src/open_data_pvnet/nwp/gfs.py to better handle Global Forecast System (GFS) data.
  • Logic Fixes: Enabled channel filtering and improved region handling for processing weather data over the US.

3. Verification & Testing

  • Compatibility Checks: Added test_eia_sampler_compatibility.py to verify the processed US data loads correctly in the training sampler.
  • Unit Tests: Comprehensive tests added for the new fetchers and uploaders (test_collect_eia.py, test_upload_s3.py).

4. Documentation

  • New Guide: Created docs/us_data_preprocessing.md detailing the full workflow: fetching → preprocessing → uploading to S3.
  • Getting Started: Updated docs/getting_started.md with code snippets for accessing US EIA data from S3 using xarray.

Let me know if you have any questions or would like any changes!

@prasanna1504 prasanna1504 marked this pull request as ready for review January 5, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[META] Extend PVNet solar generation model to the United States

1 participant