Skip to content

Conversation

@pnuu
Copy link
Member

@pnuu pnuu commented Jul 28, 2025

This PR refactors Scene._resampled_scene() that @gerritholl reported to be complicated in #3168 (comment) . As there was a change to _reduce_data(), I did some additional refactoring to it too.

  • Closes #xxxx
  • Tests added
  • Fully documented
  • Add your name to AUTHORS.md if not there already

@pnuu pnuu self-assigned this Jul 28, 2025
@pnuu pnuu requested review from djhoese and mraspaud as code owners July 28, 2025 08:13
@pnuu pnuu added component:scene cleanup Code cleanup but otherwise no change in functionality labels Jul 28, 2025
@codecov
Copy link

codecov bot commented Jul 28, 2025

Codecov Report

❌ Patch coverage is 91.07143% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.28%. Comparing base (5cc91a2) to head (2d8d883).
⚠️ Report is 46 commits behind head on main.

Files with missing lines Patch % Lines
satpy/scene.py 91.07% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3178      +/-   ##
==========================================
- Coverage   96.28%   96.28%   -0.01%     
==========================================
  Files         436      436              
  Lines       57830    57940     +110     
==========================================
+ Hits        55681    55785     +104     
- Misses       2149     2155       +6     
Flag Coverage Δ
behaviourtests 3.78% <17.85%> (+<0.01%) ⬆️
unittests 96.37% <91.07%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pnuu
Copy link
Member Author

pnuu commented Jul 28, 2025

I'll do some more refactoring to _reduce_data().

@coveralls
Copy link

coveralls commented Jul 28, 2025

Pull Request Test Coverage Report for Build 16566762203

Details

  • 51 of 56 (91.07%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 96.381%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/scene.py 51 56 91.07%
Totals Coverage Status
Change from base Build 16528353281: 0.0%
Covered Lines: 56082
Relevant Lines: 58188

💛 - Coveralls

@pnuu pnuu changed the title Refactor Scene._resampled_scene() Refactor _resampled_scene() and _reduce_data() methods of the Scene class Jul 28, 2025
@gerritholl
Copy link
Member

Thanks! I've merged this into #3168 but there is still an issue with missing test coverage.

replace_anc(res, pres)

@classmethod
def _get_new_datasets_from_parent(self, new_datasets, parent_dataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self should be cls since these are classmethods now, but given that they are classmethods or could be staticmethods, how about we move these to outside of the Scene? And could they (or should they) be moved to the satpy.resample subpackage?

Comment on lines 959 to +963
try:
if reduce_data:
key = source_area
try:
(slice_x, slice_y), source_area = reductions[key]
except KeyError:
if resample_kwargs.get("resampler") == "gradient_search":
factor = resample_kwargs.get("shape_divisible_by", 2)
else:
factor = None
try:
slice_x, slice_y = source_area.get_area_slices(
destination_area, shape_divisible_by=factor)
except TypeError:
slice_x, slice_y = source_area.get_area_slices(
destination_area)
source_area = source_area[slice_y, slice_x]
reductions[key] = (slice_x, slice_y), source_area
dataset = self._slice_data(source_area, (slice_x, slice_y), dataset)
else:
LOG.debug("Data reduction disabled by the user")
slice_x, slice_y = self._get_source_dest_slices(source_area, destination_area, reductions, resample_kwargs)
source_area = source_area[slice_y, slice_x]
reductions[source_area] = (slice_x, slice_y), source_area
dataset = self._slice_data(source_area, (slice_x, slice_y), dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if the _get_source_dest_slices operation is the only step that raises NotImplementedError or if _slice_data also does it? If the former, then maybe _slice_data should be moved outside of the try/except. Thoughts?

@classmethod
def _get_new_datasets_from_parent(self, new_datasets, parent_dataset):
if parent_dataset is not None:
return new_datasets[DataID.from_dataarray(parent_dataset)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataID.from_dataarray returns a single DataID, right? I think I'd prefer a different name for this method. I think the purpose of this chunk of code is to say "if we've resampled the parent already, use the resampled parent", right? Or rather, if the current dataset has a parent, it should have been resampled already, so we should use the resampled version of the parent. I think?

In addition to renaming, it seems that parent_dataset is only used in the later steps to check for is None. I'm wondering if we can remove the use of parent_dataset in favor of pres and "bundle" this methods operation with the dataset_walker to be something like:

for ds_id, dataset, resampled_parent in resampled_dataset_walker(datasets, new_datasets):

Or something like that.

...and if that is done, then there might be an argument for putting _replace_anc_for_new_datasets and _update_area into the for loop generator too. This changes the purpose of the for loop to be "what datasets do we need to resample" and then the inside of the for loop logic is just "reduce data", "resample data", "store result".

I'll admit the code was ugly and the logic of new_scn._datasets and new_datasets really isn't helping that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the comments and no regular review. I just keep brainstorming.

One other idea, what if only new_datasets gets modified in the for loop and then assigning to new_scn._datasets is left for a second for loop (ex. for ds_id, new_data_arr in new_datasets.items():)? Maybe that would clean up the code inside the loop. I feel like a lot of this ugliness is caused by new_scn (or rather the DatasetDict inside) copying the DataArray and/or making modifications to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Code cleanup but otherwise no change in functionality component:scene

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants