Add WGMS data by albangossard · Pull Request #177 · ODINN-SciML/MassBalanceMachine

albangossard · 2026-03-24T16:09:52Z

Includes the preprocessing of WGMS data added by @JoachimPiret in #172 + some fixes and unit tests

* improve code efficiency of two functions of class AggregatedDataset() : init() and mapSplitsToDataset() * allows to record dataframe in parquet format in addition to csv format * add possibility to divide between test and train absed on subregion (c-region) as well as the possibility to have randomness and different sampling from sampling to sampling in set_train_test_split(). assign_train_test_indices(self,train_indices, test_indices, test_size) is defined to update dataloader with the values of the selected test/train divisions after 10 sampling based on subregion. * adapation of dataset.py to choose output format of _get_output_filename() between csv and parquet * Alban's feedback on PR : mapSplitsToDataset() and init() more efficient for large dataset, output format to csv and parquet * Alban's feedback on PR : split on subregion added, modification of _create_group_kfold_splits() to cross-validate on subregion * Add function to plot test and train dataset (SMB versus elevation * Adapation of dataloader to asnwer review of #158 and new plot functions * to asnwer review of #158 * correct __init__ for mbm plot * preprocess WGMS data to be used by MBM #169 --------- Co-authored-by: Alban Gossard <alban.paul.gossard@gmail.com>

codecov · 2026-03-24T16:18:25Z

Codecov Report

❌ Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.86%. Comparing base (a387a54) to head (9e9d2ce).
⚠️ Report is 3 commits behind head on dev.

Files with missing lines	Patch %	Lines
tests/data_processing/test_wgms_preprocessing.py	90.47%	2 Missing ⚠️
massbalancemachine/data_processing/wgms.py	97.14%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #177      +/-   ##
==========================================
+ Coverage   43.08%   43.86%   +0.77%     
==========================================
  Files          61       62       +1     
  Lines        4804     4854      +50     
==========================================
+ Hits         2070     2129      +59     
+ Misses       2734     2725       -9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

albangossard added the new mass balance data New ground truth dataset to be used as a target label Mar 24, 2026

add WGMS unit tests + fix a few issues + add data filtering

9e9d2ce

albangossard force-pushed the wgms branch from 7c4698f to 9e9d2ce Compare March 24, 2026 16:13

albangossard merged commit e6889e1 into dev Mar 24, 2026
8 checks passed

albangossard deleted the wgms branch March 24, 2026 16:24

albangossard mentioned this pull request Mar 24, 2026

Add WGMS dataset to repository #169

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WGMS data#177

Add WGMS data#177
albangossard merged 2 commits intodevfrom
wgms

albangossard commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

albangossard commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 24, 2026 •

edited

Loading