NF: Enable data scaling within the target dtype #833

effigies · 2019-11-08T14:44:59Z

Following some initial discussion in #832, this PR allows for ArrayProxies to scale data objects within a target dtype, rather than always defaulting to float64, which will prevent memory spikes and speed up results for get_fdata(dtype=np.float32).

This obviously has consequences for accumulating rounding errors. The old behavior can be achieved with get_fdata().astype(np.float32), although the caching properties are changed.

This involves a refactor that uniformizes access to unscaled and scaled data using file slices, and get_scaled(), get_unscaled() and __array__ will now call fileslice with the () slicer. With some informal benchmarking, img.dataobj[()] is at least as fast as np.asanyarray(img.dataobj):

In [60]: %timeit _ = np.array(img.dataobj)                                                          
7.38 s ± 169 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [62]: %timeit _ = img.dataobj[:]                                                                 
5.89 s ± 376 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [63]: %timeit _ = img.dataobj[()]                                                                
5.76 s ± 159 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [67]: %timeit _ = np.asanyarray(img.dataobj)                                                     
6.48 s ± 233 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

One question is whether to have Nifti1Header.get_slope_inter() return np.float32s or continue returning floats. By continuing to return float, get_data() will continue to return float64 arrays when scale factors are applied.

This is a potentially consequential API change, so I'd appreciate wide comments from @nipy/team-nibabel, and any other interested parties.

Tests on the way.

codecov · 2019-11-08T14:48:25Z

Codecov Report

Merging #833 into master will decrease coverage by 0.29%.
The diff coverage is 91.86%.

@@            Coverage Diff            @@
##           master     #833     +/-   ##
=========================================
- Coverage   90.32%   90.03%   -0.3%     
=========================================
  Files          96       98      +2     
  Lines       12192    12355    +163     
  Branches     2136     2165     +29     
=========================================
+ Hits        11013    11124    +111     
- Misses        834      882     +48     
- Partials      345      349      +4

Impacted Files	Coverage Δ
nibabel/ecat.py	`88.26% <100%> (+0.19%)`	⬆️
nibabel/minc1.py	`91.01% <100%> (+0.42%)`	⬆️
nibabel/dataobj_images.py	`95.77% <100%> (+0.25%)`	⬆️
nibabel/arrayproxy.py	`100% <100%> (ø)`	⬆️
nibabel/brikhead.py	`97.54% <100%> (+0.04%)`	⬆️
nibabel/parrec.py	`91.86% <75.86%> (-2.59%)`	⬇️
nibabel/testing_pytest/__init__.py	`62% <0%> (ø)`
nibabel/testing_pytest/np_features.py	`33.33% <0%> (ø)`
nibabel/nicom/dicomwrappers.py	`90.9% <0%> (+0.08%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 785e9e2...8be3a0e. Read the comment docs.

nibabel/arrayproxy.py

pep8speaks · 2019-11-08T17:03:01Z

Hello @effigies, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 nibabel.

Comment last updated at 2019-11-13 19:08:14 UTC

get_fdata(dtype=float32) and get_fdata(dtype=float64).astype(float32) are no longer equivalent

effigies · 2019-11-08T21:22:16Z

The existing failing tests ended up taking up most of my day, so I'll think about an appropriate set of new tests on the train on Sunday.

The allclose checks might cover most of what we need, but I suspect we'll want to really exercise the bounds where a proxy gets pushed to higher precision.

effigies · 2019-11-10T20:41:32Z

nibabel/arrayproxy.py

-            raw_data = array_from_file(self._shape,
+    def _get_unscaled(self, slicer):
+        if canonical_slicers(slicer, self._shape, False) == \
+                canonical_slicers((), self._shape, False):


This is a heavy check for whether we can just grab the whole file, which ensures we get an mmap if possible.

Might break some people using dataobj[:] to always retrieve an in-memory copy.

nibabel/tests/test_proxy_api.py

effigies · 2019-11-10T21:36:17Z

This is ready for a review.

rmarkello

From a cursory check (a bit out of practice) this looks good -- just have a few minor comments!

nibabel/arrayproxy.py

nibabel/parrec.py

nibabel/tests/test_image_api.py

nibabel/tests/test_proxy_api.py

effigies · 2019-11-13T22:07:09Z

Thanks for the review, @rmarkello.

I'll plan to merge on Friday if there are no further comments.

effigies · 2019-11-15T17:10:42Z

Alright, here we go...

effigies force-pushed the nf/get_scaled branch from 45cc048 to 397f21b Compare November 8, 2019 14:52

effigies commented Nov 8, 2019

View reviewed changes

nibabel/arrayproxy.py Show resolved Hide resolved

effigies added 4 commits November 8, 2019 15:28

NF: Enable data scaling within the target dtype

ea3e607

RF: Implement AFNIArrayProxy._get_scaled(), replacing other overrides

2018e94

ENH: Add get_scaled to Minc, Ecat

16e3552

RF: Rework PARRECArrayProxy

dfebe58

effigies force-pushed the nf/get_scaled branch from 9141a0f to 931ed53 Compare November 8, 2019 20:28

TEST: Revise equality check to approximate

5577eb8

get_fdata(dtype=float32) and get_fdata(dtype=float64).astype(float32) are no longer equivalent

effigies force-pushed the nf/get_scaled branch from 931ed53 to 5577eb8 Compare November 8, 2019 21:00

effigies added 4 commits November 10, 2019 13:44

TEST: Validate get_scaled with several float types

ac52ad8

FIX: Better type promotion

066c32a

TEST: Use np.sctypes to detect available floats

c0237f1

RF: Uniformize AFNI/PARREC scaling

29a68dd

effigies force-pushed the nf/get_scaled branch from 4a093c7 to 29a68dd Compare November 10, 2019 20:13

effigies commented Nov 10, 2019

View reviewed changes

RF: Use consistent logic for ECAT, Minc1 and PARREC proxies

54e87bb

effigies commented Nov 10, 2019

View reviewed changes

nibabel/tests/test_proxy_api.py Outdated Show resolved Hide resolved

TEST: Verify get_scaled works sensibly with ints

bec78b6

adelavega mentioned this pull request Nov 12, 2019

get_data (via load_niimg) casts to float64 (high memory usage) nilearn/nilearn#2209

Open

rmarkello reviewed Nov 12, 2019

View reviewed changes

nibabel/arrayproxy.py Outdated Show resolved Hide resolved

nibabel/parrec.py Outdated Show resolved Hide resolved

nibabel/tests/test_image_api.py Outdated Show resolved Hide resolved

nibabel/tests/test_proxy_api.py Outdated Show resolved Hide resolved

effigies added 2 commits November 13, 2019 09:16

RF: Avoid unneeded reference

3b6d7fe

DOC: Update docstrings, comments, for clarity

8be3a0e

effigies merged commit b773576 into nipy:master Nov 15, 2019

effigies added this to the 3.0.0 milestone Nov 15, 2019

effigies deleted the nf/get_scaled branch November 16, 2019 19:04

rmarkello mentioned this pull request Nov 18, 2019

parrec/minc1 don't subclass ArrayProxy #842

Open

effigies mentioned this pull request Nov 19, 2019

FIX: Accept dtype parameter to ArrayProxy.__array__ #844

Merged

skoudoro mentioned this pull request Dec 23, 2019

Update nibabel minimum version (3.0.0) dipy/dipy#2015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NF: Enable data scaling within the target dtype #833

NF: Enable data scaling within the target dtype #833

Uh oh!

effigies commented Nov 8, 2019

Uh oh!

codecov bot commented Nov 8, 2019 •

edited

Loading

Uh oh!

Uh oh!

pep8speaks commented Nov 8, 2019 •

edited

Loading

Uh oh!

effigies commented Nov 8, 2019

Uh oh!

effigies Nov 10, 2019

Uh oh!

Uh oh!

effigies commented Nov 10, 2019

Uh oh!

rmarkello left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

effigies commented Nov 13, 2019

Uh oh!

effigies commented Nov 15, 2019

Uh oh!

Uh oh!

NF: Enable data scaling within the target dtype #833

NF: Enable data scaling within the target dtype #833

Uh oh!

Conversation

effigies commented Nov 8, 2019

Uh oh!

codecov bot commented Nov 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

pep8speaks commented Nov 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-11-13 19:08:14 UTC

Uh oh!

effigies commented Nov 8, 2019

Uh oh!

effigies Nov 10, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

effigies commented Nov 10, 2019

Uh oh!

rmarkello left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

effigies commented Nov 13, 2019

Uh oh!

effigies commented Nov 15, 2019

Uh oh!

Uh oh!

codecov bot commented Nov 8, 2019 •

edited

Loading

pep8speaks commented Nov 8, 2019 •

edited

Loading