Skip to content

Conversation

effigies
Copy link
Member

@effigies effigies commented Nov 8, 2019

Following some initial discussion in #832, this PR allows for ArrayProxies to scale data objects within a target dtype, rather than always defaulting to float64, which will prevent memory spikes and speed up results for get_fdata(dtype=np.float32).

This obviously has consequences for accumulating rounding errors. The old behavior can be achieved with get_fdata().astype(np.float32), although the caching properties are changed.

This involves a refactor that uniformizes access to unscaled and scaled data using file slices, and get_scaled(), get_unscaled() and __array__ will now call fileslice with the () slicer. With some informal benchmarking, img.dataobj[()] is at least as fast as np.asanyarray(img.dataobj):

In [60]: %timeit _ = np.array(img.dataobj)                                                          
7.38 s ± 169 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [62]: %timeit _ = img.dataobj[:]                                                                 
5.89 s ± 376 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [63]: %timeit _ = img.dataobj[()]                                                                
5.76 s ± 159 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [67]: %timeit _ = np.asanyarray(img.dataobj)                                                     
6.48 s ± 233 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

One question is whether to have Nifti1Header.get_slope_inter() return np.float32s or continue returning floats. By continuing to return float, get_data() will continue to return float64 arrays when scale factors are applied.

This is a potentially consequential API change, so I'd appreciate wide comments from @nipy/team-nibabel, and any other interested parties.

Tests on the way.

@codecov
Copy link

codecov bot commented Nov 8, 2019

Codecov Report

Merging #833 into master will decrease coverage by 0.29%.
The diff coverage is 91.86%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #833     +/-   ##
=========================================
- Coverage   90.32%   90.03%   -0.3%     
=========================================
  Files          96       98      +2     
  Lines       12192    12355    +163     
  Branches     2136     2165     +29     
=========================================
+ Hits        11013    11124    +111     
- Misses        834      882     +48     
- Partials      345      349      +4
Impacted Files Coverage Δ
nibabel/ecat.py 88.26% <100%> (+0.19%) ⬆️
nibabel/minc1.py 91.01% <100%> (+0.42%) ⬆️
nibabel/dataobj_images.py 95.77% <100%> (+0.25%) ⬆️
nibabel/arrayproxy.py 100% <100%> (ø) ⬆️
nibabel/brikhead.py 97.54% <100%> (+0.04%) ⬆️
nibabel/parrec.py 91.86% <75.86%> (-2.59%) ⬇️
nibabel/testing_pytest/__init__.py 62% <0%> (ø)
nibabel/testing_pytest/np_features.py 33.33% <0%> (ø)
nibabel/nicom/dicomwrappers.py 90.9% <0%> (+0.08%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 785e9e2...8be3a0e. Read the comment docs.

@pep8speaks
Copy link

pep8speaks commented Nov 8, 2019

Hello @effigies, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 nibabel.

Comment last updated at 2019-11-13 19:08:14 UTC

get_fdata(dtype=float32) and get_fdata(dtype=float64).astype(float32)
are no longer equivalent
@effigies
Copy link
Member Author

effigies commented Nov 8, 2019

The existing failing tests ended up taking up most of my day, so I'll think about an appropriate set of new tests on the train on Sunday.

The allclose checks might cover most of what we need, but I suspect we'll want to really exercise the bounds where a proxy gets pushed to higher precision.

raw_data = array_from_file(self._shape,
def _get_unscaled(self, slicer):
if canonical_slicers(slicer, self._shape, False) == \
canonical_slicers((), self._shape, False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a heavy check for whether we can just grab the whole file, which ensures we get an mmap if possible.

Might break some people using dataobj[:] to always retrieve an in-memory copy.

@effigies
Copy link
Member Author

This is ready for a review.

Copy link
Contributor

@rmarkello rmarkello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cursory check (a bit out of practice) this looks good -- just have a few minor comments!

@effigies
Copy link
Member Author

Thanks for the review, @rmarkello.

I'll plan to merge on Friday if there are no further comments.

@effigies
Copy link
Member Author

Alright, here we go...

@effigies effigies merged commit b773576 into nipy:master Nov 15, 2019
@effigies effigies added this to the 3.0.0 milestone Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants