-
Notifications
You must be signed in to change notification settings - Fork 321
Refactor _resampled_scene() and _reduce_data() methods of the Scene class
#3178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3178 +/- ##
==========================================
- Coverage 96.28% 96.28% -0.01%
==========================================
Files 436 436
Lines 57830 57940 +110
==========================================
+ Hits 55681 55785 +104
- Misses 2149 2155 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'll do some more refactoring to |
Pull Request Test Coverage Report for Build 16566762203Details
💛 - Coveralls |
_resampled_scene() and _reduce_data() methods of the Scene class
|
Thanks! I've merged this into #3168 but there is still an issue with missing test coverage. |
| replace_anc(res, pres) | ||
|
|
||
| @classmethod | ||
| def _get_new_datasets_from_parent(self, new_datasets, parent_dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self should be cls since these are classmethods now, but given that they are classmethods or could be staticmethods, how about we move these to outside of the Scene? And could they (or should they) be moved to the satpy.resample subpackage?
| try: | ||
| if reduce_data: | ||
| key = source_area | ||
| try: | ||
| (slice_x, slice_y), source_area = reductions[key] | ||
| except KeyError: | ||
| if resample_kwargs.get("resampler") == "gradient_search": | ||
| factor = resample_kwargs.get("shape_divisible_by", 2) | ||
| else: | ||
| factor = None | ||
| try: | ||
| slice_x, slice_y = source_area.get_area_slices( | ||
| destination_area, shape_divisible_by=factor) | ||
| except TypeError: | ||
| slice_x, slice_y = source_area.get_area_slices( | ||
| destination_area) | ||
| source_area = source_area[slice_y, slice_x] | ||
| reductions[key] = (slice_x, slice_y), source_area | ||
| dataset = self._slice_data(source_area, (slice_x, slice_y), dataset) | ||
| else: | ||
| LOG.debug("Data reduction disabled by the user") | ||
| slice_x, slice_y = self._get_source_dest_slices(source_area, destination_area, reductions, resample_kwargs) | ||
| source_area = source_area[slice_y, slice_x] | ||
| reductions[source_area] = (slice_x, slice_y), source_area | ||
| dataset = self._slice_data(source_area, (slice_x, slice_y), dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if the _get_source_dest_slices operation is the only step that raises NotImplementedError or if _slice_data also does it? If the former, then maybe _slice_data should be moved outside of the try/except. Thoughts?
| @classmethod | ||
| def _get_new_datasets_from_parent(self, new_datasets, parent_dataset): | ||
| if parent_dataset is not None: | ||
| return new_datasets[DataID.from_dataarray(parent_dataset)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataID.from_dataarray returns a single DataID, right? I think I'd prefer a different name for this method. I think the purpose of this chunk of code is to say "if we've resampled the parent already, use the resampled parent", right? Or rather, if the current dataset has a parent, it should have been resampled already, so we should use the resampled version of the parent. I think?
In addition to renaming, it seems that parent_dataset is only used in the later steps to check for is None. I'm wondering if we can remove the use of parent_dataset in favor of pres and "bundle" this methods operation with the dataset_walker to be something like:
for ds_id, dataset, resampled_parent in resampled_dataset_walker(datasets, new_datasets):Or something like that.
...and if that is done, then there might be an argument for putting _replace_anc_for_new_datasets and _update_area into the for loop generator too. This changes the purpose of the for loop to be "what datasets do we need to resample" and then the inside of the for loop logic is just "reduce data", "resample data", "store result".
I'll admit the code was ugly and the logic of new_scn._datasets and new_datasets really isn't helping that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the comments and no regular review. I just keep brainstorming.
One other idea, what if only new_datasets gets modified in the for loop and then assigning to new_scn._datasets is left for a second for loop (ex. for ds_id, new_data_arr in new_datasets.items():)? Maybe that would clean up the code inside the loop. I feel like a lot of this ugliness is caused by new_scn (or rather the DatasetDict inside) copying the DataArray and/or making modifications to it.
This PR refactors
Scene._resampled_scene()that @gerritholl reported to be complicated in #3168 (comment) . As there was a change to_reduce_data(), I did some additional refactoring to it too.AUTHORS.mdif not there already