-
Notifications
You must be signed in to change notification settings - Fork 265
NF: Enable data scaling within the target dtype #833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #833 +/- ##
=========================================
- Coverage 90.32% 90.03% -0.3%
=========================================
Files 96 98 +2
Lines 12192 12355 +163
Branches 2136 2165 +29
=========================================
+ Hits 11013 11124 +111
- Misses 834 882 +48
- Partials 345 349 +4
Continue to review full report at Codecov.
|
45cc048
to
397f21b
Compare
Hello @effigies, Thank you for updating! Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, Comment last updated at 2019-11-13 19:08:14 UTC |
9141a0f
to
931ed53
Compare
get_fdata(dtype=float32) and get_fdata(dtype=float64).astype(float32) are no longer equivalent
931ed53
to
5577eb8
Compare
The existing failing tests ended up taking up most of my day, so I'll think about an appropriate set of new tests on the train on Sunday. The |
4a093c7
to
29a68dd
Compare
raw_data = array_from_file(self._shape, | ||
def _get_unscaled(self, slicer): | ||
if canonical_slicers(slicer, self._shape, False) == \ | ||
canonical_slicers((), self._shape, False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a heavy check for whether we can just grab the whole file, which ensures we get an mmap
if possible.
Might break some people using dataobj[:]
to always retrieve an in-memory copy.
This is ready for a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a cursory check (a bit out of practice) this looks good -- just have a few minor comments!
Thanks for the review, @rmarkello. I'll plan to merge on Friday if there are no further comments. |
Alright, here we go... |
Following some initial discussion in #832, this PR allows for ArrayProxies to scale data objects within a target dtype, rather than always defaulting to float64, which will prevent memory spikes and speed up results for
get_fdata(dtype=np.float32)
.This obviously has consequences for accumulating rounding errors. The old behavior can be achieved with
get_fdata().astype(np.float32)
, although the caching properties are changed.This involves a refactor that uniformizes access to unscaled and scaled data using file slices, and
get_scaled()
,get_unscaled()
and__array__
will now callfileslice
with the()
slicer. With some informal benchmarking,img.dataobj[()]
is at least as fast asnp.asanyarray(img.dataobj)
:One question is whether to have
Nifti1Header.get_slope_inter()
returnnp.float32
s or continue returningfloat
s. By continuing to returnfloat
,get_data()
will continue to returnfloat64
arrays when scale factors are applied.This is a potentially consequential API change, so I'd appreciate wide comments from @nipy/team-nibabel, and any other interested parties.
Tests on the way.