Skip to content

Add WGMS data#177

Merged
albangossard merged 2 commits intodevfrom
wgms
Mar 24, 2026
Merged

Add WGMS data#177
albangossard merged 2 commits intodevfrom
wgms

Conversation

@albangossard
Copy link
Copy Markdown
Member

Includes the preprocessing of WGMS data added by @JoachimPiret in #172 + some fixes and unit tests

* improve code efficiency of two functions of class AggregatedDataset() :  init() and  mapSplitsToDataset()

* allows to record dataframe in parquet format in addition to csv format

* add possibility to divide between test and train absed on subregion (c-region) as well as the possibility to have randomness and different sampling from sampling to sampling in set_train_test_split().  assign_train_test_indices(self,train_indices, test_indices, test_size) is defined to update dataloader with the values of the selected test/train divisions after 10 sampling based on subregion.

* adapation of dataset.py to choose output format of _get_output_filename() between csv and parquet

* Alban's feedback on PR :  mapSplitsToDataset() and init() more efficient for large dataset, output format to csv and parquet

* Alban's feedback on PR : split on subregion added, modification of  _create_group_kfold_splits() to cross-validate on subregion

* Add function to plot test and train dataset (SMB versus elevation

* Adapation of dataloader to asnwer review of #158 and new plot functions

* to asnwer review of #158

* correct __init__ for mbm plot

* preprocess WGMS data to be used by MBM #169

---------

Co-authored-by: Alban Gossard <alban.paul.gossard@gmail.com>
@albangossard albangossard added the new mass balance data New ground truth dataset to be used as a target label Mar 24, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.86%. Comparing base (a387a54) to head (9e9d2ce).
⚠️ Report is 3 commits behind head on dev.

Files with missing lines Patch % Lines
tests/data_processing/test_wgms_preprocessing.py 90.47% 2 Missing ⚠️
massbalancemachine/data_processing/wgms.py 97.14% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #177      +/-   ##
==========================================
+ Coverage   43.08%   43.86%   +0.77%     
==========================================
  Files          61       62       +1     
  Lines        4804     4854      +50     
==========================================
+ Hits         2070     2129      +59     
+ Misses       2734     2725       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@albangossard albangossard merged commit e6889e1 into dev Mar 24, 2026
8 checks passed
@albangossard albangossard deleted the wgms branch March 24, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new mass balance data New ground truth dataset to be used as a target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants