NF: GiftiImage method agg_data to return usable data arrays #793

htwangtw · 2019-08-18T15:22:56Z

Here's the first attempt of agg_datarelated to issue #789
I looked into other official file types and found most of them consist of one or more DataArrays of the same intent type. The three exceptions are

surf.gii with two DataArray of different intent code (pointset, triangle)
coord.gii with one DataArray set to pointset
topo.gii with DataArray set to triangle.

This makes the implementation of random wild intent types slightly easier. If there is more than one data array and all intents are the same, agg_data will return a stacked array of all values in darrays. Otherwise it returns a tuple of array(s).

Let me know what you think.

pep8speaks · 2019-08-18T15:22:58Z

Hello @htwangtw, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 nibabel.

Comment last updated at 2019-10-28 19:54:00 UTC

codecov · 2019-08-18T15:29:17Z

Codecov Report

Merging #793 into master will increase coverage by 0.22%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
+ Coverage    90.1%   90.32%   +0.22%     
==========================================
  Files          96       96              
  Lines       11910    12192     +282     
  Branches     2125     2136      +11     
==========================================
+ Hits        10731    11013     +282     
  Misses        834      834              
  Partials      345      345

Impacted Files	Coverage Δ
nibabel/testing/__init__.py	`98.07% <100%> (+0.2%)`	⬆️
nibabel/gifti/gifti.py	`95.42% <100%> (+0.52%)`	⬆️
nibabel/gifti/giftiio.py	`100% <0%> (ø)`	⬆️
nibabel/keywordonly.py	`100% <0%> (ø)`	⬆️
nibabel/streamlines/tractogram_file.py	`100% <0%> (ø)`	⬆️
nibabel/arrayproxy.py	`100% <0%> (ø)`	⬆️
nibabel/affines.py	`100% <0%> (ø)`	⬆️
nibabel/funcs.py	`80.28% <0%> (ø)`	⬆️
nibabel/imageclasses.py	`100% <0%> (ø)`	⬆️
nibabel/streamlines/array_sequence.py	`100% <0%> (ø)`	⬆️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de44a10...5ea1e88. Read the comment docs.

effigies

Thanks for this! A couple minor comments and a big strategic one.

We'll also need some test files and tests to ensure that the method behaves as expected. If the files are under, say 100KB, we can put them in nibabel/gifti/tests/data. If not, we can create a submodule in nibabel-data.

Just to set expectations, I'm not going to worry about style issues or docstrings until we settle on the algorithm and the tests. When things are looking basically ready, I'll do a nit-picky review.

nibabel/gifti/gifti.py

effigies · 2019-08-18T23:30:17Z

nibabel/gifti/gifti.py

+        # surf.gii is a special case of having two data array of different intent code
+
+        if (self.numDA > 1 and all(el == all_intent[0] for el in all_intent)):
+            return np.column_stack(all_data)


I'm not sure about this one. It's not clear to me that:

We can assume multiple arrays of the same intent will be stackable. The shapes might not work out. For example, suppose I have a GIFTI with multiple pointsets of different lengths. (I don't know why you'd want this, but it's definitely not ruled out by the standard.)

A user will prefer this stacking for any arbitrary intent. For instance, if I have a two-parameter statistical distribution like NIFTI_INTENT_NORMAL, I might get alternating columns (mu, sigma, mu, sigma, ...). In that case, perhaps concatenating on a new axis would be preferred. The nice thing about time series is that the spec is pretty clear that each time step is a 1-D array. And if we see NIFTI_INTENT_NONE, all bets are off.

I'm hesitant to make concatenation contingent on the shapes working out, as the output type will then be an even more complex function of the input data than we're already proposing. I see a few options:

Try to np.column_stack, and if it fails, raise an error saying agg_data failed, so please munge the data arrays yourself.

np.column_stack for time series only (I didn't see any others that seem obvious candidates), tuple for everything else. This should be pretty safe.

Take option (2) by default, but allow it to be parameterized with an aggregator parameter:
def agg_data(self, intent_code=None, aggregator=None): if aggregator is None: if intent_code == 'NIFTI_INTENT_TIME_SERIES': aggregator = np.column_stack else: aggregator = tuple

Require an aggregator parameter, using tuple by default. This would make time series a less special case, but would make the return type much more predictable.

I'm inclined toward (2), with an option to move to (3) if people actually want control over that. Not sure they will, as they can always do np.column_stack(img.agg_data()). WDYT?

I wrote something but for got to hit reply.....

Anyway, the TLDR of my original comment was, these are great suggestions! I didn't think about those weird but totally legit use of gifti format. I agreed to go for 2 for now and move to 3. I think we will need more people to use these functions to know what is common for users.
Option 1 is just cruel. Option 4 is going to be confusing for users that don't really know the data structure.

effigies

Awesome! I think the logic is nearly there, so I mostly focused on updating the docstring in this review.

For adding tests, we should think about where we want to keep them. If they're small (<100KiB), then we can just put them directly in the repository. Otherwise, we can add them to a data repository that we include as a submodule. Do you have suggestions as to files to include?

nibabel/gifti/gifti.py

effigies · 2019-09-07T15:16:52Z

nibabel/gifti/gifti.py

+        Retrun a numpy arrary of aggregated GiftiDataArray of the same intent code
+        or
+        Retrun GiftiDataArray in tuples for surface files
+


It would be good to add some doctests that demonstrate how to use this. If we include a time series and a surface file, we could follow the examples of get_fdata():

nibabel/nibabel/dataobj_images.py

Lines 287 to 342 in 22fe8c2

The cache can effect the behavior of the image, because if the cache is

full, or you have an array image, then modifying the returned array

will modify the result of future calls to ``get_fdata()``. For example

you might do this:

>>> import os

>>> import nibabel as nib

>>> from nibabel.testing import data_path

>>> img_fname = os.path.join(data_path, 'example4d.nii.gz')

>>> img = nib.load(img_fname) # This is a proxy image

>>> nib.is_proxy(img.dataobj)

True

The array is not yet cached by a call to "get_fdata", so:

>>> img.in_memory

False

After we call ``get_fdata`` using the default `caching` == 'fill', the

cache contains a reference to the returned array ``data``:

>>> data = img.get_fdata()

>>> img.in_memory

True

We modify an element in the returned data array:

>>> data[0, 0, 0, 0]

0.0

>>> data[0, 0, 0, 0] = 99

>>> data[0, 0, 0, 0]

99.0

The next time we call 'get_fdata', the method returns the cached

reference to the (modified) array:

>>> data_again = img.get_fdata()

>>> data_again is data

True

>>> data_again[0, 0, 0, 0]

99.0

If you had *initially* used `caching` == 'unchanged' then the returned

``data`` array would have been loaded from file, but not cached, and:

>>> img = nib.load(img_fname) # a proxy image again

>>> data = img.get_fdata(caching='unchanged')

>>> img.in_memory

False

>>> data[0, 0, 0] = 99

>>> data_again = img.get_fdata(caching='unchanged')

>>> data_again is data

False

>>> data_again[0, 0, 0, 0]

0.0

I would show aggregating data:

without intent code

with matching intent codes

with mismatching intent codes (should return ())

with tuple intent codes

nibabel/gifti/gifti.py

Co-Authored-By: Chris Markiewicz <[email protected]>

effigies · 2019-10-03T13:57:39Z

@htwangtw Just a note that I'm going to aim to feature freeze 3.0 on October 28 (three weeks from Monday) so we can have a decent RC period. I think we're probably going to need one or two more rounds of review to get this one in.

ENH: Add GIFTI test data

effigies

In general, this looks good, but the formatting is off. If you want to build the docs locally, you can see the effects:

make html
open build/html/reference/nibabel.gifti.html

You may need to pip install -r doc-requirements.txt for this to work, and it might be cleanest to do in a fresh venv.

effigies · 2019-10-09T15:58:37Z

Also, can you merge/rebase master? Your branch is pretty far behind at this point.

@hbraunDSP

…ti_improve_io * 'master' of https://github.com/htwangtw/nibabel: (94 commits) Fixed A-P to P-A in coord systems docs DOCTEST: Delete reference to mmap data to avoid warning DOCTEST: Avoid fixed ID in doctest Re-import externals/netcdf.py from scipy DOCTEST: Delete reference to mmap data to avoid warning DOCTEST: Avoid fixed ID in doctest Re-import externals/netcdf.py from scipy MAINT: 2.5.2-dev add @hbraunDSP affiliation and ORCID MAINT: Update setup.cfg nose version to match installation instructions DOC: Add Henry Braun to contributor list MAINT: Update zenodo.json MAINT: Version 2.5.1 DOC: Update changelog for upcoming 2.5.1 release DOC: Update test_image_api docstring for clarity, consistency ENH: Remove img.get_data() from internal use except for appropriate tests TEST: Reproduce nipygh-802 MAINT: Check nightly builds on 3.7 MAINT: Check nightly builds on 3.7 FIX: Coerce data types on writing GIFTI DataArrays ...

htwangtw · 2019-10-10T09:37:42Z

Thanks for the tips! I have rebased the branch to the latest master.
This is the first time I do a rebase so please let me know if there's anything not right.

effigies · 2019-10-16T22:34:24Z

Well that sure had unintended consequences...

Perhaps let's undo that last one, and move the difficult to display demonstrations to a test instead of in the doctest.

htwangtw · 2019-10-17T07:21:29Z

Yep I have realised that after I pushed the commit....
What do you mean by moving the docstring to a test? Do you have an example?

effigies · 2019-10-21T03:05:24Z

I just meant that we can exercise some of the behavior in nibabel/gifti/tests/test_gifti.py instead of in the doctest. This is probably the easiest way to deal with the np.int32 showing up in Linux but not Windows.

Another option is printing things with numpy.array2string when necessary, which gives you more control over the format.

There are also some strategies for handling variable output here: https://docs.python.org/3/library/doctest.html#warnings

This reverts commit f633676.

This reverts commit bf6eb97.

This reverts commit a9471a0.

This reverts commit 384475b.

effigies · 2019-10-22T12:38:22Z

This is looking great! We have full coverage of this method, which is a good place to be, and we cover a couple different types of images. I think I want to restore a couple examples in the docstring, if a little less detail to avoid the earlier issues. I can submit a PR to this branch based on your tests.

I think it's a good time to invite wider comment on the API. Does anybody have concerns? In particular, if we can think of ways to induce unintuitive output, it would be good to add to the test battery. I'm not sure we need to overly concern ourselves with truly pathological inputs, but sensible inputs producing surprising outputs is worth thinking about.

htwangtw · 2019-10-22T13:32:54Z

This is great! We might just need to comment out the examples so the docstest won't fail. Let me know if there's a better idea.

One of the most common problems with gifti metadata assigns arbitrary intent code like NIFTI_NORMAL or something like that for timeseries image. Workbench can load and display those just fine despite the mistake. Any other suggestions are welcome!

effigies · 2019-10-27T17:43:31Z

I've opened htwangtw#2 with some suggestions for updating the docstring. Feel free to merge that as-is or build on it.

I don't have any further concerns at this point, and I'm not seeing any general outcry, so I think once we finalize the docstring, this will be ready to merge.

One of the most common problems with gifti metadata assigns arbitrary intent code like NIFTI_NORMAL or something like that for timeseries image.

I'm not sure we can reasonably hope to detect this case. I suspect workbench is able to do it because it's loading it as a time series and ignoring the intent codes altogether. Perhaps we can think of an API for easily modifying intent codes en masse, but that should probably be a separate PR. (We could also add warnings when loading the extension doesn't match the intents.)

DOC: More comprehensive agg_data examples

htwangtw · 2019-10-28T20:01:30Z

I'm not sure we can reasonably hope to detect this case. I suspect workbench is able to do it because it's loading it as a time series and ignoring the intent codes altogether.
I never really investigate how workbench under the hood for those cases.

My guess is the same as yours. After thinking about it a bit, including special cases will just add more complexity of the code.

Perhaps we can think of an API for easily modifying intent codes en masse, but that should probably be a separate PR. (We could also add warnings when loading the extension doesn't match the intents.)

Good point. Adding warning is a good idea - maybe list out the intent codes in the file when showing the warning as well? I agree this should be a separate PR.

effigies · 2019-10-28T20:33:08Z

In it goes! Thanks very much for this.

Feel free to open a new issue if/when you want to start thinking about other methods.

add gifti image class function agg_data

6feb581

htwangtw changed the title ~~add gifti image class function agg_data~~ NF:add gifti image class function agg_data Aug 18, 2019

htwangtw changed the title ~~NF:add gifti image class function agg_data~~ NF: add gifti image class function agg_data Aug 18, 2019

htwangtw changed the title ~~NF: add gifti image class function agg_data~~ [WIP]NF: add gifti image class function agg_data Aug 18, 2019

htwangtw changed the title ~~[WIP]NF: add gifti image class function agg_data~~ WIP/NF: add gifti image class function agg_data Aug 18, 2019

htwangtw marked this pull request as ready for review August 18, 2019 15:36

PEP8

72471a7

effigies reviewed Aug 18, 2019

View reviewed changes

effigies added this to the 3.0.0 RC1 milestone Aug 22, 2019

limit stacking to timeseries data only

98c68af

effigies reviewed Sep 7, 2019

View reviewed changes

htwangtw and others added 8 commits September 10, 2019 10:42

Update nibabel/gifti/gifti.py

448e073

Co-Authored-By: Chris Markiewicz <[email protected]>

Update nibabel/gifti/gifti.py

15d1dd7

Co-Authored-By: Chris Markiewicz <[email protected]>

Update nibabel/gifti/gifti.py

85f65a4

Co-Authored-By: Chris Markiewicz <[email protected]>

Update nibabel/gifti/gifti.py

de8f028

Co-Authored-By: Chris Markiewicz <[email protected]>

doc string draft 1, need to finish the examples

154321c

ENH: Add general test data retriever

0530484

DOCTEST: Retreive a surface file using test_data

8c6cd7d

DATA: Add 10 time point time series GIFTI in fsaverage3 space

c54b696

effigies mentioned this pull request Sep 12, 2019

ENH: Add GIFTI test data htwangtw/nibabel#1

Merged

TEST: Test new test_data function

641feaa

htwangtw added 2 commits October 8, 2019 10:01

Merge pull request #1 from effigies/enh/gifti_test_data

bee6065

ENH: Add GIFTI test data

add doc and example for surface gii files

bd95ce8

effigies reviewed Oct 9, 2019

View reviewed changes

Changing numpy float print style to 1.13

bf6eb97

htwangtw added 8 commits October 21, 2019 16:13

Revert "Rename example file"

a9471a0

This reverts commit f633676.

Revert "Changing numpy float print style to 1.13"

9a52b78

This reverts commit bf6eb97.

Revert "Revert "Rename example file""

0447bbc

This reverts commit a9471a0.

Move the docstring to test

1ecaa26

Remove the actual docstring to prevent errors

384475b

Revert "Remove the actual docstring to prevent errors"

3fb7003

This reverts commit 384475b.

Remove docstring in agg_data

08a8752

Remove docstring in test

bb7517d

effigies changed the title ~~WIP/NF: add gifti image class function agg_data~~ NF: GiftiImage method agg_data to return usable data arrays Oct 22, 2019

htwangtw added 4 commits October 22, 2019 16:53

add minimum example to docstring

033ca51

add shape gifti

76efe61

fix the test with shape gii

c8c2c43

delete trailing whitespace

654ee5b

effigies mentioned this pull request Oct 24, 2019

REL: 3.0.0rc1 #829

Merged

11 tasks

DOC: More comprehensive agg_data examples

64b019a

Merge pull request #2 from effigies/doc/agg_data

5ea1e88

DOC: More comprehensive agg_data examples

effigies merged commit ddf2683 into nipy:master Oct 28, 2019

effigies modified the milestones: 3.0.0 RC1, 3.0.0 Oct 28, 2019

effigies mentioned this pull request Feb 6, 2020

giftiio.write deprecation; suggested alternative misleading #880

Open

htwangtw deleted the gifti_improve_io branch February 8, 2021 21:13

	The cache can effect the behavior of the image, because if the cache is
	full, or you have an array image, then modifying the returned array
	will modify the result of future calls to ``get_fdata()``. For example
	you might do this:

	>>> import os
	>>> import nibabel as nib
	>>> from nibabel.testing import data_path
	>>> img_fname = os.path.join(data_path, 'example4d.nii.gz')

	>>> img = nib.load(img_fname) # This is a proxy image
	>>> nib.is_proxy(img.dataobj)
	True

	The array is not yet cached by a call to "get_fdata", so:

	>>> img.in_memory
	False

	After we call ``get_fdata`` using the default `caching` == 'fill', the
	cache contains a reference to the returned array ``data``:

	>>> data = img.get_fdata()
	>>> img.in_memory
	True

	We modify an element in the returned data array:

	>>> data[0, 0, 0, 0]
	0.0
	>>> data[0, 0, 0, 0] = 99
	>>> data[0, 0, 0, 0]
	99.0

	The next time we call 'get_fdata', the method returns the cached
	reference to the (modified) array:

	>>> data_again = img.get_fdata()
	>>> data_again is data
	True
	>>> data_again[0, 0, 0, 0]
	99.0

	If you had initially used `caching` == 'unchanged' then the returned
	``data`` array would have been loaded from file, but not cached, and:

	>>> img = nib.load(img_fname) # a proxy image again
	>>> data = img.get_fdata(caching='unchanged')
	>>> img.in_memory
	False
	>>> data[0, 0, 0] = 99
	>>> data_again = img.get_fdata(caching='unchanged')
	>>> data_again is data
	False
	>>> data_again[0, 0, 0, 0]
	0.0

NF: GiftiImage method agg_data to return usable data arrays #793

NF: GiftiImage method agg_data to return usable data arrays #793

Uh oh!

Conversation

htwangtw commented Aug 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Aug 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-10-28 19:54:00 UTC

Uh oh!

codecov bot commented Aug 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

effigies Aug 18, 2019

Choose a reason for hiding this comment

Uh oh!

htwangtw Sep 1, 2019

Choose a reason for hiding this comment

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

effigies Sep 7, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

effigies commented Oct 3, 2019

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

effigies commented Oct 9, 2019

Uh oh!

htwangtw commented Oct 10, 2019

Uh oh!

effigies commented Oct 16, 2019

Uh oh!

htwangtw commented Oct 17, 2019

Uh oh!

effigies commented Oct 21, 2019

Uh oh!

effigies commented Oct 22, 2019

Uh oh!

htwangtw commented Oct 22, 2019

Uh oh!

effigies commented Oct 27, 2019

Uh oh!

htwangtw commented Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

effigies commented Oct 28, 2019

Uh oh!

Uh oh!

htwangtw commented Aug 18, 2019 •

edited

Loading

pep8speaks commented Aug 18, 2019 •

edited

Loading

codecov bot commented Aug 18, 2019 •

edited

Loading

htwangtw commented Oct 28, 2019 •

edited

Loading