-
Notifications
You must be signed in to change notification settings - Fork 51
Cropping overlap control using cone distance #1554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…ve to extra readers (ecmwf#1231) * initial commit for ICON data reader - new version * merging ICON Zarr and ICON CMIP6 NetCDF4 data readers * fixed inconsistencies + added descriptive comments * removed unecessary attribute * added attribute cf.multiprocessing_method * changed self.colnames self.cols_idx from parent to children classes * config file for CMIP6 * ICON config * code ruffing * fixed lines that were too long * temporary fix for multiprocedding method * unecessary code removed and added more channels * title: changes to dataloading method description: - fixed datetime, lat, lon reglated bugs - added more memory efficient way of loading channels * change in len_hrs and step_hrs * config for CMIP6 monthly data * title: handling built-in ICON levels for CMIP6 data (details below) description: - added method select_by_level() to construct selected colnames using pre-selected levels in data config file - changes in _get() method to handle built-in plevs from CMIP6 ICON - added method get_cols() to construct general colnames - clipping lat/lon - changed data config attributes from source_channels/target_channels to source/target - fixed bug in asserting non positive stds * removed breaking change * committing ruffing changes * added ICON CMIP6 monthly data to the stream * removed the previous stream * changed per-level reading for CMIP6 only * dtype missmatch fix - training on ICON CMIP6 day + Amon channels * changed len_hrs value to accommodate monthly data * check in merged reader for art * remove cmip * revert icon cmip specific changes * ruff * speed up loading 500x by selecting time slice first * correct type * update config * linting * ruff * current working loader param setup * reset to standard setting * current settings * adjustments to multistream data sampler to be tested * reset to original since it works * update registry * current config * remove legacy code and local files * reset default config * adjust stream config * remove icon data reader from main codebase * move scheduler setting inside datareader --------- Co-authored-by: sbAsma <[email protected]> Co-authored-by: Timothy Hunter <[email protected]>
* Update to develop, prepare for new experiment series * Setting o48 as default in era5 config Committer: Matthias Karlbauer <[email protected]> On branch mk/develop/fe_experiments Your branch is ahead of 'origin/mk/develop/fe_experiments' by 57 commits. (use "git push" to publish your local commits) Changes to be committed: modified: config/streams/era5_1deg/era5.yml * Updated default config to 256 dim latent size On branch mk/develop/fe_experiments Your branch is ahead of 'origin/mk/develop/fe_experiments' by 58 commits. (use "git push" to publish your local commits) Changes to be committed: modified: config/default_config.yml * Update branch to latest develop * Change epochs from 64 to 32 * ICON ESM (CMIP6) data reader * hooking-up the ICON ESM data reader * ICON ESM historical data config * added config multiprocessing_method as param * ICON ESM custom config params * fixed multiprocessing method needed for ICON ESM * fixed data type + cleaned code * added param streams_output * changed stream config names * ruffing changes * restored era5 file * removed experiment specific config * removed unecessary file * restored default config * restore era5 file * removed wrong eval config * another attempt to fix era5 filename * ruff requested fixes * Add configurable multiprocessing method with fork as default * Another add configurable multiprocessing method with fork as default * fixing forgotten conflicts in default_config --------- Co-authored-by: Matthias Karlbauer <[email protected]> Co-authored-by: Matthias Karlbauer <[email protected]>
* added integ test for jepa * added weight > 0.0 checks * lint * added jepa test command * i_obs -> i_stream * added integration-test-all * using .get()
* modified ratio plot * heat-map * fix cosmetics * lint * change config * add metric name to heatmap * multiply forecasts by step_hrs * remove breakpoints * fix variable order * lint * implement lead time plotting * lint * implement comments
* Revert "Make missing fstep 0 produce a more meaningfull error (ecmwf#1444)" This reverts commit e3b29be. * add function to truncate runs * lint * add warning
* mae loss function * implement generalized lp-norm * docstring updates
| hl_mask: 3 # Can be different from teacher | ||
| rate: 0.4 # Student cone: 40% of sphere (~72° radius) | ||
| method: "geodesic_disk" # Must be geodesic_disk | ||
| center_distance_degrees: 180 # REQUIRED: Student center is 45° from teacher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little unclear, the center_distance_degrees is required, but the center_distance_degrees can be any chosen angle right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could also include a range as below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the comment is confusing as I had to set center_distance_degrees to 45 before submitting the PR ( I was trying different values to see if they work)
Any angle you choose should work as demonstrated in the example below.
| # number of healpix cells | ||
| self.healpix_level_data = cf.healpix_level | ||
| self.healpix_num_cells = 12 * (4**cf.healpix_level) | ||
| self._hp_cache = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, I think this is helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it helped speeding up the computation (we do not have to repeat hp calculation every single time)
| instance default if None. | ||
| masking_strategy_config : dict | None | ||
| Optional override of strategy config (e.g., {'hl_mask': 3}). | ||
| target_metadata : dict | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps state here that this is required when we do overlap with the cones? Otherwise it is unused right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this is only used when doing cone overlap. I am not sure if there is a more elegant way to do it without adding target_metadata
| "relationship 'cone_distance' requires 'center_cell' in target_metadata" | ||
| ) | ||
| # Get teacher's hl_mask level to properly convert coordinates | ||
| teacher_hl_mask = target_metadata.get("hl_mask", masking_strategy_config.get("hl_mask", 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this default to 0? Perhaps should check that hl_mask is specified with an assert like the above lines? Do we default to 0 in other parts of the code?
Description
This PR implements cone distance for spatial sampling. The core logic uses angular distance (in degrees/radii) to determine the spacing between the centers of the Teacher and Student cones/crops.
Key Features:
Geodesic Support: Currently optimized to work specifically with geodesic_disk.
Multi-Level Sampling: Allows Student and Teacher crops to be sampled from different hl_level masks.
Performance Optimizations: I’ve implemented caching for the Healpix object initialization to improve efficiency.
Geometric Logic: Uses spherical trigonometry to calculate destination points given a starting (lon, lat), distance, and azimuth (bearing).
Configuration:
Usage Example: Refer to default_config for the necessary parameters and an explanation of how center_distance_degrees is used for overlap control.
Documentation: I have included long/detailed docstrings for all new functions to make the review process easier.
Issue Number
Closes #1515
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60