Data validation checker by dkazanc · Pull Request #214 · DiamondLightSource/httomolibgpu

dkazanc · 2025-06-02T16:37:31Z

adding data validation modules, tests and correcting all functions in misc

… misc

yousefmoazzam

Looks reasonable to me!

If it hasn't been done so already, I'd recommend running the memory hook tests in httomo-backends against these changes, to double-check that memory estimation of methods which use the new validation function hasn't been affected.

Another thing to mention is that httomolibgpu has a .pre-commit-config.yaml which has black formatting setup, so I'm assuming that httomolibgpu code should be formatted with black? If so, you may want to check if black has been applied to the changed files, I can see some things like missing newline characters and no spaces after commas that I think wouldn't be present if black were applied (though I could be wrong!).

tests/test_misc/test_supp_func.py

tests/test_prep/test_stripe.py

httomolibgpu/misc/supp_func.py

dkazanc · 2025-06-05T15:23:37Z

Very good point about httomo-backends tests, thanks. I see that it does affect them and some of the tests failing. I'll do some research, but it looks there are mainly two lines to blame?
xp.nan_to_num(data, copy=False, nan=0.0, posinf=0.0, neginf=0.0)
and
zero_elements_total = int(xp.count_nonzero(data == 0))

dkazanc · 2025-06-05T15:29:26Z

This also allocates the memory:
xp.all(xp.isfinite(data))

dkazanc · 2025-06-05T16:16:02Z

Yes, looks like both functions (inf's nan's estimation/removal) and zeros calculation create the copy of the data.

This actually opens up more general discussion, sorry for many words.

I can see two ways of dealing with this as modifying every memory estimator doesn't seem appealing at all.

Write bespoke element-wise kernels for CuPy arrays to deal with the data without creating any copies of it. I reckon we will need to loop through arrays and fix OR not nan's/inf's. Same with zeros. It will be probably more computationally efficient as now.
Have data validator as a separate method in the library with its own memory estimator and insert it using the framework between each method. Similar to what we do with data_reducer added after the loader by default. The data validator will be removed from the methods themselves in httomolibgpu library as it becomes a standalone method.

Both approaches have benefits actually:

no. 1 Seems like a nice algorithmic workaround and probably not very time consuming to do. Also I don't like that the fact that such a simple method creates copies of data, that is probably not very efficient?
no. 2 Has a benefit of having this method for any other backend, it will also take care of httomolib and tomopy methods, for instance. Notably, we do not apply data validator to data_reducer and save_to_images of httomolib. In addition to that, it makes simpler for us to log warnings with the framework coming from the method. We can modify it in a way so that it will also return some info about the presence of nans/infs/zeros.

To me it looks like we need both 1 and 2 :) But we can do this in stages, if no. 1 resolves the memory issue, we can merge it first and then we can do no. 2. Any thoughts?

…culator

dkazanc · 2025-06-06T15:57:17Z

Latest update. I've implemented no.1 from above. All tests now pass in the branch and with httomo-backends.

adding data validation modules, tests and correcting all functions in…

3302b09

… misc

dkazanc mentioned this pull request Jun 2, 2025

Data transfer checker DiamondLightSource/httomo#372

Closed

dkazanc added 2 commits June 3, 2025 16:59

modification to the supplementary function

30d78c9

enabling data validation for the remaining methods

86b5ae0

dkazanc added the run-zenodo-tests Run Zenodo tests for each PR label Jun 4, 2025

removes flatten func from the zeros check

cca5362

dkazanc marked this pull request as ready for review June 4, 2025 14:51

dkazanc and others added 2 commits June 4, 2025 16:04

docs update

ba72baf

Merge branch 'main' into datavalidator

44442f9

dkazanc requested a review from yousefmoazzam June 4, 2025 15:23

yousefmoazzam reviewed Jun 5, 2025

View reviewed changes

tests/test_misc/test_supp_func.py Outdated Show resolved Hide resolved

tests/test_prep/test_stripe.py Outdated Show resolved Hide resolved

httomolibgpu/misc/supp_func.py Show resolved Hide resolved

dkazanc added 2 commits June 5, 2025 16:11

fixing few suggestions, applying black formatting

d95e59d

removes irrelevant stripe_ri test

f3de1a1

dkazanc added 2 commits June 6, 2025 15:43

adding cuda kernel for nansinfs correction and correction to zero cal…

11ec6e0

…culator

fixing tests as data checker accepts only float and uint16 data

6557f01

dkazanc and others added 3 commits June 9, 2025 08:56

adds data validator cuda kernel

5248a7f

modifies warning signature and re-nabled checking in disortion corr

ee78f0c

Merge branch 'main' into datavalidator

760512a

dkazanc merged commit b582acc into main Jun 12, 2025
1 of 3 checks passed

dkazanc deleted the datavalidator branch June 12, 2025 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data validation checker#214

Data validation checker#214
dkazanc merged 13 commits intomainfrom
datavalidator

dkazanc commented Jun 2, 2025

Uh oh!

yousefmoazzam left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkazanc commented Jun 5, 2025 •

edited

Loading

Uh oh!

dkazanc commented Jun 5, 2025

Uh oh!

dkazanc commented Jun 5, 2025

Uh oh!

dkazanc commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dkazanc commented Jun 2, 2025

Uh oh!

yousefmoazzam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkazanc commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkazanc commented Jun 5, 2025

Uh oh!

dkazanc commented Jun 5, 2025

Uh oh!

dkazanc commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dkazanc commented Jun 5, 2025 •

edited

Loading

dkazanc commented Jun 6, 2025 •

edited

Loading