Skip to content

Conversation

htwangtw
Copy link
Contributor

@htwangtw htwangtw commented Aug 18, 2019

Here's the first attempt of agg_datarelated to issue #789
I looked into other official file types and found most of them consist of one or more DataArrays of the same intent type. The three exceptions are

  1. surf.gii with two DataArray of different intent code (pointset, triangle)
  2. coord.gii with one DataArray set to pointset
  3. topo.gii with DataArray set to triangle.

This makes the implementation of random wild intent types slightly easier. If there is more than one data array and all intents are the same, agg_data will return a stacked array of all values in darrays. Otherwise it returns a tuple of array(s).

Let me know what you think.

@pep8speaks
Copy link

pep8speaks commented Aug 18, 2019

Hello @htwangtw, Thank you for updating!

Cheers! There are no style issues detected in this Pull Request. 🍻 To test for issues locally, pip install flake8 and then run flake8 nibabel.

Comment last updated at 2019-10-28 19:54:00 UTC

@codecov
Copy link

codecov bot commented Aug 18, 2019

Codecov Report

Merging #793 into master will increase coverage by 0.22%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
+ Coverage    90.1%   90.32%   +0.22%     
==========================================
  Files          96       96              
  Lines       11910    12192     +282     
  Branches     2125     2136      +11     
==========================================
+ Hits        10731    11013     +282     
  Misses        834      834              
  Partials      345      345
Impacted Files Coverage Δ
nibabel/testing/__init__.py 98.07% <100%> (+0.2%) ⬆️
nibabel/gifti/gifti.py 95.42% <100%> (+0.52%) ⬆️
nibabel/gifti/giftiio.py 100% <0%> (ø) ⬆️
nibabel/keywordonly.py 100% <0%> (ø) ⬆️
nibabel/streamlines/tractogram_file.py 100% <0%> (ø) ⬆️
nibabel/arrayproxy.py 100% <0%> (ø) ⬆️
nibabel/affines.py 100% <0%> (ø) ⬆️
nibabel/funcs.py 80.28% <0%> (ø) ⬆️
nibabel/imageclasses.py 100% <0%> (ø) ⬆️
nibabel/streamlines/array_sequence.py 100% <0%> (ø) ⬆️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de44a10...5ea1e88. Read the comment docs.

@htwangtw htwangtw changed the title add gifti image class function agg_data NF:add gifti image class function agg_data Aug 18, 2019
@htwangtw htwangtw changed the title NF:add gifti image class function agg_data NF: add gifti image class function agg_data Aug 18, 2019
@htwangtw htwangtw changed the title NF: add gifti image class function agg_data [WIP]NF: add gifti image class function agg_data Aug 18, 2019
@htwangtw htwangtw changed the title [WIP]NF: add gifti image class function agg_data WIP/NF: add gifti image class function agg_data Aug 18, 2019
@htwangtw htwangtw marked this pull request as ready for review August 18, 2019 15:36
Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! A couple minor comments and a big strategic one.

We'll also need some test files and tests to ensure that the method behaves as expected. If the files are under, say 100KB, we can put them in nibabel/gifti/tests/data. If not, we can create a submodule in nibabel-data.

Just to set expectations, I'm not going to worry about style issues or docstrings until we settle on the algorithm and the tests. When things are looking basically ready, I'll do a nit-picky review.

# surf.gii is a special case of having two data array of different intent code

if (self.numDA > 1 and all(el == all_intent[0] for el in all_intent)):
return np.column_stack(all_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this one. It's not clear to me that:

  1. We can assume multiple arrays of the same intent will be stackable. The shapes might not work out. For example, suppose I have a GIFTI with multiple pointsets of different lengths. (I don't know why you'd want this, but it's definitely not ruled out by the standard.)
  2. A user will prefer this stacking for any arbitrary intent. For instance, if I have a two-parameter statistical distribution like NIFTI_INTENT_NORMAL, I might get alternating columns (mu, sigma, mu, sigma, ...). In that case, perhaps concatenating on a new axis would be preferred. The nice thing about time series is that the spec is pretty clear that each time step is a 1-D array. And if we see NIFTI_INTENT_NONE, all bets are off.

I'm hesitant to make concatenation contingent on the shapes working out, as the output type will then be an even more complex function of the input data than we're already proposing. I see a few options:

  1. Try to np.column_stack, and if it fails, raise an error saying agg_data failed, so please munge the data arrays yourself.
  2. np.column_stack for time series only (I didn't see any others that seem obvious candidates), tuple for everything else. This should be pretty safe.
  3. Take option (2) by default, but allow it to be parameterized with an aggregator parameter:
    def agg_data(self, intent_code=None, aggregator=None):
        if aggregator is None:
            if intent_code == 'NIFTI_INTENT_TIME_SERIES':
                aggregator = np.column_stack
            else:
                aggregator = tuple
  1. Require an aggregator parameter, using tuple by default. This would make time series a less special case, but would make the return type much more predictable.

I'm inclined toward (2), with an option to move to (3) if people actually want control over that. Not sure they will, as they can always do np.column_stack(img.agg_data()). WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote something but for got to hit reply.....

Anyway, the TLDR of my original comment was, these are great suggestions! I didn't think about those weird but totally legit use of gifti format. I agreed to go for 2 for now and move to 3. I think we will need more people to use these functions to know what is common for users.
Option 1 is just cruel. Option 4 is going to be confusing for users that don't really know the data structure.

@effigies effigies added this to the 3.0.0 RC1 milestone Aug 22, 2019
Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I think the logic is nearly there, so I mostly focused on updating the docstring in this review.

For adding tests, we should think about where we want to keep them. If they're small (<100KiB), then we can just put them directly in the repository. Otherwise, we can add them to a data repository that we include as a submodule. Do you have suggestions as to files to include?

Retrun a numpy arrary of aggregated GiftiDataArray of the same intent code
or
Retrun GiftiDataArray in tuples for surface files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some doctests that demonstrate how to use this. If we include a time series and a surface file, we could follow the examples of get_fdata():

The cache can effect the behavior of the image, because if the cache is
full, or you have an array image, then modifying the returned array
will modify the result of future calls to ``get_fdata()``. For example
you might do this:
>>> import os
>>> import nibabel as nib
>>> from nibabel.testing import data_path
>>> img_fname = os.path.join(data_path, 'example4d.nii.gz')
>>> img = nib.load(img_fname) # This is a proxy image
>>> nib.is_proxy(img.dataobj)
True
The array is not yet cached by a call to "get_fdata", so:
>>> img.in_memory
False
After we call ``get_fdata`` using the default `caching` == 'fill', the
cache contains a reference to the returned array ``data``:
>>> data = img.get_fdata()
>>> img.in_memory
True
We modify an element in the returned data array:
>>> data[0, 0, 0, 0]
0.0
>>> data[0, 0, 0, 0] = 99
>>> data[0, 0, 0, 0]
99.0
The next time we call 'get_fdata', the method returns the cached
reference to the (modified) array:
>>> data_again = img.get_fdata()
>>> data_again is data
True
>>> data_again[0, 0, 0, 0]
99.0
If you had *initially* used `caching` == 'unchanged' then the returned
``data`` array would have been loaded from file, but not cached, and:
>>> img = nib.load(img_fname) # a proxy image again
>>> data = img.get_fdata(caching='unchanged')
>>> img.in_memory
False
>>> data[0, 0, 0] = 99
>>> data_again = img.get_fdata(caching='unchanged')
>>> data_again is data
False
>>> data_again[0, 0, 0, 0]
0.0

I would show aggregating data:

  • without intent code
  • with matching intent codes
  • with mismatching intent codes (should return ())
  • with tuple intent codes

@effigies
Copy link
Member

effigies commented Oct 3, 2019

@htwangtw Just a note that I'm going to aim to feature freeze 3.0 on October 28 (three weeks from Monday) so we can have a decent RC period. I think we're probably going to need one or two more rounds of review to get this one in.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks good, but the formatting is off. If you want to build the docs locally, you can see the effects:

make html
open build/html/reference/nibabel.gifti.html

You may need to pip install -r doc-requirements.txt for this to work, and it might be cleanest to do in a fresh venv.

@effigies
Copy link
Member

effigies commented Oct 9, 2019

Also, can you merge/rebase master? Your branch is pretty far behind at this point.

…ti_improve_io

* 'master' of https://github.com/htwangtw/nibabel: (94 commits)
  Fixed A-P to P-A in coord systems docs
  DOCTEST: Delete reference to mmap data to avoid warning
  DOCTEST: Avoid fixed ID in doctest
  Re-import externals/netcdf.py from scipy
  DOCTEST: Delete reference to mmap data to avoid warning
  DOCTEST: Avoid fixed ID in doctest
  Re-import externals/netcdf.py from scipy
  MAINT: 2.5.2-dev
  add @hbraunDSP affiliation and ORCID
  MAINT: Update setup.cfg nose version to match installation instructions
  DOC: Add Henry Braun to contributor list
  MAINT: Update zenodo.json
  MAINT: Version 2.5.1
  DOC: Update changelog for upcoming 2.5.1 release
  DOC: Update test_image_api docstring for clarity, consistency
  ENH: Remove img.get_data() from internal use except for appropriate tests
  TEST: Reproduce nipygh-802
  MAINT: Check nightly builds on 3.7
  MAINT: Check nightly builds on 3.7
  FIX: Coerce data types on writing GIFTI DataArrays
  ...
@htwangtw
Copy link
Contributor Author

Thanks for the tips! I have rebased the branch to the latest master.
This is the first time I do a rebase so please let me know if there's anything not right.

@effigies
Copy link
Member

Well that sure had unintended consequences...

Perhaps let's undo that last one, and move the difficult to display demonstrations to a test instead of in the doctest.

@htwangtw
Copy link
Contributor Author

Yep I have realised that after I pushed the commit....
What do you mean by moving the docstring to a test? Do you have an example?

@effigies
Copy link
Member

I just meant that we can exercise some of the behavior in nibabel/gifti/tests/test_gifti.py instead of in the doctest. This is probably the easiest way to deal with the np.int32 showing up in Linux but not Windows.

Another option is printing things with numpy.array2string when necessary, which gives you more control over the format.

There are also some strategies for handling variable output here: https://docs.python.org/3/library/doctest.html#warnings

@effigies
Copy link
Member

This is looking great! We have full coverage of this method, which is a good place to be, and we cover a couple different types of images. I think I want to restore a couple examples in the docstring, if a little less detail to avoid the earlier issues. I can submit a PR to this branch based on your tests.

I think it's a good time to invite wider comment on the API. Does anybody have concerns? In particular, if we can think of ways to induce unintuitive output, it would be good to add to the test battery. I'm not sure we need to overly concern ourselves with truly pathological inputs, but sensible inputs producing surprising outputs is worth thinking about.

@effigies effigies changed the title WIP/NF: add gifti image class function agg_data NF: GiftiImage method agg_data to return usable data arrays Oct 22, 2019
@htwangtw
Copy link
Contributor Author

This is great! We might just need to comment out the examples so the docstest won't fail. Let me know if there's a better idea.

One of the most common problems with gifti metadata assigns arbitrary intent code like NIFTI_NORMAL or something like that for timeseries image. Workbench can load and display those just fine despite the mistake. Any other suggestions are welcome!

@effigies effigies mentioned this pull request Oct 24, 2019
11 tasks
@effigies
Copy link
Member

I've opened htwangtw#2 with some suggestions for updating the docstring. Feel free to merge that as-is or build on it.

I don't have any further concerns at this point, and I'm not seeing any general outcry, so I think once we finalize the docstring, this will be ready to merge.

One of the most common problems with gifti metadata assigns arbitrary intent code like NIFTI_NORMAL or something like that for timeseries image.

I'm not sure we can reasonably hope to detect this case. I suspect workbench is able to do it because it's loading it as a time series and ignoring the intent codes altogether. Perhaps we can think of an API for easily modifying intent codes en masse, but that should probably be a separate PR. (We could also add warnings when loading the extension doesn't match the intents.)

DOC: More comprehensive agg_data examples
@htwangtw
Copy link
Contributor Author

htwangtw commented Oct 28, 2019

I'm not sure we can reasonably hope to detect this case. I suspect workbench is able to do it because it's loading it as a time series and ignoring the intent codes altogether.
I never really investigate how workbench under the hood for those cases.

My guess is the same as yours. After thinking about it a bit, including special cases will just add more complexity of the code.

Perhaps we can think of an API for easily modifying intent codes en masse, but that should probably be a separate PR. (We could also add warnings when loading the extension doesn't match the intents.)

Good point. Adding warning is a good idea - maybe list out the intent codes in the file when showing the warning as well? I agree this should be a separate PR.

@effigies
Copy link
Member

In it goes! Thanks very much for this.

Feel free to open a new issue if/when you want to start thinking about other methods.

@effigies effigies merged commit ddf2683 into nipy:master Oct 28, 2019
@effigies effigies modified the milestones: 3.0.0 RC1, 3.0.0 Oct 28, 2019
@htwangtw htwangtw deleted the gifti_improve_io branch February 8, 2021 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants