-
Notifications
You must be signed in to change notification settings - Fork 265
NF: GiftiImage method agg_data to return usable data arrays #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @htwangtw, Thank you for updating! Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, Comment last updated at 2019-10-28 19:54:00 UTC |
Codecov Report
@@ Coverage Diff @@
## master #793 +/- ##
==========================================
+ Coverage 90.1% 90.32% +0.22%
==========================================
Files 96 96
Lines 11910 12192 +282
Branches 2125 2136 +11
==========================================
+ Hits 10731 11013 +282
Misses 834 834
Partials 345 345
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! A couple minor comments and a big strategic one.
We'll also need some test files and tests to ensure that the method behaves as expected. If the files are under, say 100KB, we can put them in nibabel/gifti/tests/data. If not, we can create a submodule in nibabel-data.
Just to set expectations, I'm not going to worry about style issues or docstrings until we settle on the algorithm and the tests. When things are looking basically ready, I'll do a nit-picky review.
# surf.gii is a special case of having two data array of different intent code | ||
|
||
if (self.numDA > 1 and all(el == all_intent[0] for el in all_intent)): | ||
return np.column_stack(all_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this one. It's not clear to me that:
- We can assume multiple arrays of the same intent will be stackable. The shapes might not work out. For example, suppose I have a GIFTI with multiple pointsets of different lengths. (I don't know why you'd want this, but it's definitely not ruled out by the standard.)
- A user will prefer this stacking for any arbitrary intent. For instance, if I have a two-parameter statistical distribution like
NIFTI_INTENT_NORMAL
, I might get alternating columns(mu, sigma, mu, sigma, ...)
. In that case, perhaps concatenating on a new axis would be preferred. The nice thing about time series is that the spec is pretty clear that each time step is a 1-D array. And if we seeNIFTI_INTENT_NONE
, all bets are off.
I'm hesitant to make concatenation contingent on the shapes working out, as the output type will then be an even more complex function of the input data than we're already proposing. I see a few options:
- Try to
np.column_stack
, and if it fails, raise an error sayingagg_data
failed, so please munge the data arrays yourself. np.column_stack
for time series only (I didn't see any others that seem obvious candidates),tuple
for everything else. This should be pretty safe.- Take option (2) by default, but allow it to be parameterized with an
aggregator
parameter:def agg_data(self, intent_code=None, aggregator=None): if aggregator is None: if intent_code == 'NIFTI_INTENT_TIME_SERIES': aggregator = np.column_stack else: aggregator = tuple
- Require an aggregator parameter, using
tuple
by default. This would make time series a less special case, but would make the return type much more predictable.
I'm inclined toward (2), with an option to move to (3) if people actually want control over that. Not sure they will, as they can always do np.column_stack(img.agg_data())
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote something but for got to hit reply.....
Anyway, the TLDR of my original comment was, these are great suggestions! I didn't think about those weird but totally legit use of gifti format. I agreed to go for 2 for now and move to 3. I think we will need more people to use these functions to know what is common for users.
Option 1 is just cruel. Option 4 is going to be confusing for users that don't really know the data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I think the logic is nearly there, so I mostly focused on updating the docstring in this review.
For adding tests, we should think about where we want to keep them. If they're small (<100KiB), then we can just put them directly in the repository. Otherwise, we can add them to a data repository that we include as a submodule. Do you have suggestions as to files to include?
Retrun a numpy arrary of aggregated GiftiDataArray of the same intent code | ||
or | ||
Retrun GiftiDataArray in tuples for surface files | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add some doctests that demonstrate how to use this. If we include a time series and a surface file, we could follow the examples of get_fdata()
:
nibabel/nibabel/dataobj_images.py
Lines 287 to 342 in 22fe8c2
The cache can effect the behavior of the image, because if the cache is | |
full, or you have an array image, then modifying the returned array | |
will modify the result of future calls to ``get_fdata()``. For example | |
you might do this: | |
>>> import os | |
>>> import nibabel as nib | |
>>> from nibabel.testing import data_path | |
>>> img_fname = os.path.join(data_path, 'example4d.nii.gz') | |
>>> img = nib.load(img_fname) # This is a proxy image | |
>>> nib.is_proxy(img.dataobj) | |
True | |
The array is not yet cached by a call to "get_fdata", so: | |
>>> img.in_memory | |
False | |
After we call ``get_fdata`` using the default `caching` == 'fill', the | |
cache contains a reference to the returned array ``data``: | |
>>> data = img.get_fdata() | |
>>> img.in_memory | |
True | |
We modify an element in the returned data array: | |
>>> data[0, 0, 0, 0] | |
0.0 | |
>>> data[0, 0, 0, 0] = 99 | |
>>> data[0, 0, 0, 0] | |
99.0 | |
The next time we call 'get_fdata', the method returns the cached | |
reference to the (modified) array: | |
>>> data_again = img.get_fdata() | |
>>> data_again is data | |
True | |
>>> data_again[0, 0, 0, 0] | |
99.0 | |
If you had *initially* used `caching` == 'unchanged' then the returned | |
``data`` array would have been loaded from file, but not cached, and: | |
>>> img = nib.load(img_fname) # a proxy image again | |
>>> data = img.get_fdata(caching='unchanged') | |
>>> img.in_memory | |
False | |
>>> data[0, 0, 0] = 99 | |
>>> data_again = img.get_fdata(caching='unchanged') | |
>>> data_again is data | |
False | |
>>> data_again[0, 0, 0, 0] | |
0.0 |
I would show aggregating data:
- without intent code
- with matching intent codes
- with mismatching intent codes (should return
()
) - with tuple intent codes
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
@htwangtw Just a note that I'm going to aim to feature freeze 3.0 on October 28 (three weeks from Monday) so we can have a decent RC period. I think we're probably going to need one or two more rounds of review to get this one in. |
ENH: Add GIFTI test data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks good, but the formatting is off. If you want to build the docs locally, you can see the effects:
make html
open build/html/reference/nibabel.gifti.html
You may need to pip install -r doc-requirements.txt
for this to work, and it might be cleanest to do in a fresh venv.
Also, can you merge/rebase master? Your branch is pretty far behind at this point. |
…ti_improve_io * 'master' of https://github.com/htwangtw/nibabel: (94 commits) Fixed A-P to P-A in coord systems docs DOCTEST: Delete reference to mmap data to avoid warning DOCTEST: Avoid fixed ID in doctest Re-import externals/netcdf.py from scipy DOCTEST: Delete reference to mmap data to avoid warning DOCTEST: Avoid fixed ID in doctest Re-import externals/netcdf.py from scipy MAINT: 2.5.2-dev add @hbraunDSP affiliation and ORCID MAINT: Update setup.cfg nose version to match installation instructions DOC: Add Henry Braun to contributor list MAINT: Update zenodo.json MAINT: Version 2.5.1 DOC: Update changelog for upcoming 2.5.1 release DOC: Update test_image_api docstring for clarity, consistency ENH: Remove img.get_data() from internal use except for appropriate tests TEST: Reproduce nipygh-802 MAINT: Check nightly builds on 3.7 MAINT: Check nightly builds on 3.7 FIX: Coerce data types on writing GIFTI DataArrays ...
Thanks for the tips! I have rebased the branch to the latest master. |
Well that sure had unintended consequences... Perhaps let's undo that last one, and move the difficult to display demonstrations to a test instead of in the doctest. |
Yep I have realised that after I pushed the commit.... |
I just meant that we can exercise some of the behavior in Another option is printing things with numpy.array2string when necessary, which gives you more control over the format. There are also some strategies for handling variable output here: https://docs.python.org/3/library/doctest.html#warnings |
This is looking great! We have full coverage of this method, which is a good place to be, and we cover a couple different types of images. I think I want to restore a couple examples in the docstring, if a little less detail to avoid the earlier issues. I can submit a PR to this branch based on your tests. I think it's a good time to invite wider comment on the API. Does anybody have concerns? In particular, if we can think of ways to induce unintuitive output, it would be good to add to the test battery. I'm not sure we need to overly concern ourselves with truly pathological inputs, but sensible inputs producing surprising outputs is worth thinking about. |
This is great! We might just need to comment out the examples so the docstest won't fail. Let me know if there's a better idea. One of the most common problems with gifti metadata assigns arbitrary intent code like |
I've opened htwangtw#2 with some suggestions for updating the docstring. Feel free to merge that as-is or build on it. I don't have any further concerns at this point, and I'm not seeing any general outcry, so I think once we finalize the docstring, this will be ready to merge.
I'm not sure we can reasonably hope to detect this case. I suspect workbench is able to do it because it's loading it as a time series and ignoring the intent codes altogether. Perhaps we can think of an API for easily modifying intent codes en masse, but that should probably be a separate PR. (We could also add warnings when loading the extension doesn't match the intents.) |
DOC: More comprehensive agg_data examples
My guess is the same as yours. After thinking about it a bit, including special cases will just add more complexity of the code.
Good point. Adding warning is a good idea - maybe list out the intent codes in the file when showing the warning as well? I agree this should be a separate PR. |
In it goes! Thanks very much for this. Feel free to open a new issue if/when you want to start thinking about other methods. |
Here's the first attempt of
agg_data
related to issue #789I looked into other official file types and found most of them consist of one or more
DataArray
s of the same intent type. The three exceptions aresurf.gii
with twoDataArray
of different intent code (pointset, triangle)coord.gii
with oneDataArray
set to pointsettopo.gii
withDataArray
set to triangle.This makes the implementation of random wild intent types slightly easier. If there is more than one data array and all intents are the same,
agg_data
will return a stacked array of all values indarrays
. Otherwise it returns a tuple of array(s).Let me know what you think.