Added Germany PV & GFS data download pipelines by Sharkyii · Pull Request #124 · openclimatefix/open-data-pvnet

Sharkyii · 2026-01-22T17:38:54Z

Description

This PR adds automated data download pipelines for Germany, including:

Solar PV generation data using the Bundesnetzagentur SMARD API (15-minute resolution)
GFS meteorological data download notebook for Germany using NOAA public datasets

I will proceed with Step 5 (Model Training Pipeline Integration).

Relates #121

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

Sharkyii · 2026-01-26T16:18:01Z

@peterdudfield till now i have downloaded the data for 1 year and uploaded the generational data for Germany here:
Hugging Face: https://huggingface.co/datasets/Shark26/germany_pv_data
S3: s3://germany_pv_data
Will soon test the data via training the model
could you please check it once @siddharth7113

Removed hardcoded zarr paths for GSP and GFS data in Germany configuration.

Sharkyii · 2026-02-03T17:17:47Z

@peterdudfield @siddharth7113
i added scripts to download the data and still i am not able to run save_sample and train the model because of this error
python save_samples.py `

+datamodule.sample_output_dir="./samples" +datamodule.num_train_samples=10
+datamodule.num_val_samples=5
CONFIG
├── trainer
│ └── target: lightning.pytorch.trainer.trainer.Trainer
│ accelerator: auto
│ devices: auto
│ min_epochs: null
│ max_epochs: null
│ reload_dataloaders_every_n_epochs: 0
│ num_sanity_val_steps: 8
│ fast_dev_run: false
│ accumulate_grad_batches: 4
│ log_every_n_steps: 50
│
├── model
│ └── target: pvnet.models.multimodal.multimodal.Model
│ output_quantiles:
│ - 0.02
│ - 0.1
│ - 0.25
│ - 0.5
│ - 0.75
│ - 0.9
│ - 0.98
│ nwp_encoders_dict:
│ gfs:
│ target: pvnet.models.multimodal.encoders.encoders3d.ResConv3DNet2
│ partial: true
│ in_channels: 14
│ out_features: 32
│ n_res_blocks: 1
│ hidden_channels: 6
│ image_size_pixels: 2
│ output_network:
│ target: pvnet.models.multimodal.linear_networks.networks.ResFCNet2
│ partial: true
│ fc_hidden_features: 128
│ n_res_blocks: 6
│ res_block_layers: 2
│ dropout_frac: 0.0
│ embedding_dim: 16
│ include_sun: true
│ include_gsp_yield_history: false
│ include_site_yield_history: false
│ forecast_minutes: 480
│ history_minutes: 60
│ nwp_history_minutes:
│ gfs: 180
│ nwp_forecast_minutes:
│ gfs: 540
│ nwp_interval_minutes:
│ gfs: 180
│ optimizer:
│ target: pvnet.optimizers.EmbAdamWReduceLROnPlateau
│ lr: 0.0001
│ weight_decay: 0.01
│ amsgrad: true
│ patience: 5
│ factor: 0.1
│ threshold: 0.002
│
├── datamodule
│ └── target: pvnet.data.DataModule
│ configuration: C:\Users\SNEH\open-data-pvnet\src\open_data_pvnet\configs\PVNet_configs\datamodule\configuration\germany_configuration.yaml
│ num_workers: 8
│ prefetch_factor: 2
│ batch_size: 8
│ train_period:
│ - null
│ - '2023-06-30'
│ val_period:
│ - '2023-07-01'
│ - '2023-12-31'
│ sample_output_dir: ./samples
│ num_train_samples: 10
│ num_val_samples: 5
│
├── callbacks
│ └── early_stopping:
│ target: lightning.pytorch.callbacks.EarlyStopping
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ patience: 10
│ min_delta: 0
│ learning_rate_monitor:
│ target: lightning.pytorch.callbacks.LearningRateMonitor
│ logging_interval: epoch
│ model_summary:
│ target: lightning.pytorch.callbacks.ModelSummary
│ max_depth: 3
│ model_checkpoint:
│ target: lightning.pytorch.callbacks.ModelCheckpoint
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ save_top_k: 1
│ save_last: true
│ every_n_epochs: 1
│ verbose: false
│ filename: epoch={epoch}-step={step}
│ dirpath: PLACEHOLDER/${model_name}
│ auto_insert_metric_name: false
│ save_on_train_epoch_end: false
│
├── logger
│ └── wandb:
│ target: lightning.pytorch.loggers.wandb.WandbLogger
│ project: GFS_TEST_RUN
│ name: ${model_name}
│ save_dir: PLACEHOLDER
│ offline: false
│ id: ${oc.env:WANDB_RUN_ID}
│ log_model: true
│ prefix: ''
│ job_type: train
│ group: ''
│ tags: []
│
└── seed
└── 2727831
----- Saving val samples -----
Error executing job with overrides: ['+datamodule.sample_output_dir=./samples', '+datamodule.num_train_samples=10', '+datamodule.num_val_samples=5']
Traceback (most recent call last):
File "C:\Users\open-data-pvnet\src\open_data_pvnet\scripts\save_samples.py", line 171, in main
val_dataset = get_dataset(
^^^^^^^^^^^^
File "C:\Users\open-data-pvne\src\open_data_pvnet\scripts\save_samples.py", line 106, in get_dataset
return dataset_cls(config_path, start_time=start_time, end_time=end_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 254, in init
super().init(config_filename, start_time, end_time, gsp_ids)
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 100, in init
datasets_dict = get_dataset_dict(config.input_data, gsp_ids=gsp_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\load_dataset.py", line 24, in get_dataset_dict
da_gsp = open_gsp(
^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\gsp.py", line 60, in open_gsp
raise ValueError(
ValueError: Some GSP IDs in the GSP generation data are not available in the locations file.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
PS C:\Users\open-data-pvnet\src\open_data_pvnet

Updated zarr_path to an empty string for flexibility.

Refactor process_grib function to handle multiple levels and improve error handling.

siddharth7113 · 2026-02-16T18:31:16Z

notebooks/downloading_gfs_germany.py

We only put notebooks in this directory, for putting up the solar data , I would recommend to use scripts/nwp

Sure will modify it

siddharth7113 · 2026-02-16T18:32:23Z

notebooks/germany_data_processor

Please avoid using emojis in the code, and this file has extension missing.

Sharkyii · 2026-02-16T18:35:16Z

@siddharth7113 how could i overcome the above error
Should i use my own model temporary?

siddharth7113 · 2026-02-16T18:38:04Z

Hi @Sharkyii ,

I have left some comments on current code.

My advice to do the following things in separate PRs and preferably in order :

Write a script to download the data, make sure the script is clean with docstrings and tests (this is something that has also been missing with old scripts) and put them in scripts/generation.
Once this is done, I would recommend to get the nwp data using gfs script and then write model configs. test these and open a draft PR for this.

siddharth7113 · 2026-02-16T18:39:11Z

@peterdudfield @siddharth7113 i added scripts to download the data and still i am not able to run save_sample and train the model because of this error python save_samples.py `

+datamodule.sample_output_dir="./samples" +datamodule.num_train_samples=10
+datamodule.num_val_samples=5
CONFIG
├── trainer
│ └── target: lightning.pytorch.trainer.trainer.Trainer
│ accelerator: auto
│ devices: auto
│ min_epochs: null
│ max_epochs: null
│ reload_dataloaders_every_n_epochs: 0
│ num_sanity_val_steps: 8
│ fast_dev_run: false
│ accumulate_grad_batches: 4
│ log_every_n_steps: 50
│
├── model
│ └── target: pvnet.models.multimodal.multimodal.Model
│ output_quantiles:
│ - 0.02
│ - 0.1
│ - 0.25
│ - 0.5
│ - 0.75
│ - 0.9
│ - 0.98
│ nwp_encoders_dict:
│ gfs:
│ target: pvnet.models.multimodal.encoders.encoders3d.ResConv3DNet2
│ partial: true
│ in_channels: 14
│ out_features: 32
│ n_res_blocks: 1
│ hidden_channels: 6
│ image_size_pixels: 2
│ output_network:
│ target: pvnet.models.multimodal.linear_networks.networks.ResFCNet2
│ partial: true
│ fc_hidden_features: 128
│ n_res_blocks: 6
│ res_block_layers: 2
│ dropout_frac: 0.0
│ embedding_dim: 16
│ include_sun: true
│ include_gsp_yield_history: false
│ include_site_yield_history: false
│ forecast_minutes: 480
│ history_minutes: 60
│ nwp_history_minutes:
│ gfs: 180
│ nwp_forecast_minutes:
│ gfs: 540
│ nwp_interval_minutes:
│ gfs: 180
│ optimizer:
│ target: pvnet.optimizers.EmbAdamWReduceLROnPlateau
│ lr: 0.0001
│ weight_decay: 0.01
│ amsgrad: true
│ patience: 5
│ factor: 0.1
│ threshold: 0.002
│
├── datamodule
│ └── target: pvnet.data.DataModule
│ configuration: C:\Users\SNEH\open-data-pvnet\src\open_data_pvnet\configs\PVNet_configs\datamodule\configuration\germany_configuration.yaml
│ num_workers: 8
│ prefetch_factor: 2
│ batch_size: 8
│ train_period:
│ - null
│ - '2023-06-30'
│ val_period:
│ - '2023-07-01'
│ - '2023-12-31'
│ sample_output_dir: ./samples
│ num_train_samples: 10
│ num_val_samples: 5
│
├── callbacks
│ └── early_stopping:
│ target: lightning.pytorch.callbacks.EarlyStopping
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ patience: 10
│ min_delta: 0
│ learning_rate_monitor:
│ target: lightning.pytorch.callbacks.LearningRateMonitor
│ logging_interval: epoch
│ model_summary:
│ target: lightning.pytorch.callbacks.ModelSummary
│ max_depth: 3
│ model_checkpoint:
│ target: lightning.pytorch.callbacks.ModelCheckpoint
│ monitor: ${resolve_monitor_loss:${model.output_quantiles}}
│ mode: min
│ save_top_k: 1
│ save_last: true
│ every_n_epochs: 1
│ verbose: false
│ filename: epoch={epoch}-step={step}
│ dirpath: PLACEHOLDER/${model_name}
│ auto_insert_metric_name: false
│ save_on_train_epoch_end: false
│
├── logger
│ └── wandb:
│ target: lightning.pytorch.loggers.wandb.WandbLogger
│ project: GFS_TEST_RUN
│ name: ${model_name}
│ save_dir: PLACEHOLDER
│ offline: false
│ id: ${oc.env:WANDB_RUN_ID}
│ log_model: true
│ prefix: ''
│ job_type: train
│ group: ''
│ tags: []
│
└── seed
└── 2727831
----- Saving val samples -----
Error executing job with overrides: ['+datamodule.sample_output_dir=./samples', '+datamodule.num_train_samples=10', '+datamodule.num_val_samples=5']
Traceback (most recent call last):
File "C:\Users\open-data-pvnet\src\open_data_pvnet\scripts\save_samples.py", line 171, in main
val_dataset = get_dataset(
^^^^^^^^^^^^
File "C:\Users\open-data-pvne\src\open_data_pvnet\scripts\save_samples.py", line 106, in get_dataset
return dataset_cls(config_path, start_time=start_time, end_time=end_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 254, in init
super().init(config_filename, start_time, end_time, gsp_ids)
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\torch_datasets\datasets\pvnet_uk.py", line 100, in init
datasets_dict = get_dataset_dict(config.input_data, gsp_ids=gsp_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\load_dataset.py", line 24, in get_dataset_dict
da_gsp = open_gsp(
^^^^^^^^^
File "C:\Users\SNEH\AppData\Local\Programs\Python\Python312\Lib\site-packages\ocf_data_sampler\load\gsp.py", line 60, in open_gsp
raise ValueError
ValueError: Some GSP IDs in the GSP generation data are not available in the locations file.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. PS C:\Users\open-data-pvnet\src\open_data_pvnet

save_samples.py is for making samples once the dataset is downlaoded and in format with ocf-data-sampler it used to create samples in the format compatible with PVnet training.

siddharth7113 · 2026-02-16T18:50:40Z

@siddharth7113 how could i overcome the above error Should i use my own model temporary?

refer to this

Sharkyii added 3 commits January 22, 2026 22:55

downloading files added

a22067c

downloading files added

28ef7af

Clean notebook outputs

2a534f2

Sharkyii mentioned this pull request Jan 22, 2026

Country selection and coordination for PVNet training #121

Open

Sharkyii marked this pull request as ready for review January 22, 2026 17:44

Sharkyii marked this pull request as draft January 22, 2026 17:44

Sharkyii added 3 commits February 3, 2026 16:58

done

ba49d7c

Delete downloading_gfs_germany.ipynb

1936170

cleared

dc71c71

Removed hardcoded zarr paths for GSP and GFS data in Germany configuration.

Sharkyii added 4 commits February 9, 2026 12:38

updated germany_data_processor notebook

7bd0bc7

sneh

a05c720

Modify zarr_path to use an empty string

24dcbd3

Updated zarr_path to an empty string for flexibility.

1

f602e41

Refactor process_grib function to handle multiple levels and improve error handling.

siddharth7113 reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added Germany PV & GFS data download pipelines#124

Added Germany PV & GFS data download pipelines#124
Sharkyii wants to merge 10 commits intoopenclimatefix:mainfrom
Sharkyii:data/Germany

Sharkyii commented Jan 22, 2026

Uh oh!

Sharkyii commented Jan 26, 2026

Uh oh!

Sharkyii commented Feb 3, 2026 •

edited

Loading

Uh oh!

siddharth7113 Feb 16, 2026

Uh oh!

Sharkyii Feb 16, 2026

Uh oh!

siddharth7113 Feb 16, 2026

Uh oh!

Sharkyii commented Feb 16, 2026

Uh oh!

siddharth7113 commented Feb 16, 2026

Uh oh!

siddharth7113 commented Feb 16, 2026

Uh oh!

siddharth7113 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sharkyii commented Jan 22, 2026

Description

Checklist:

Uh oh!

Sharkyii commented Jan 26, 2026

Uh oh!

Sharkyii commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

siddharth7113 Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Sharkyii Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

siddharth7113 Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Sharkyii commented Feb 16, 2026

Uh oh!

siddharth7113 commented Feb 16, 2026

Uh oh!

siddharth7113 commented Feb 16, 2026

Uh oh!

siddharth7113 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sharkyii commented Feb 3, 2026 •

edited

Loading