Skip to content

Commit c54080f

Browse files
committed
Merge pull request #287 from matthew-brett/in-memory-doc-rewrite
DOC: `get_data` docstring rewrite Rewrite of docstring for get_data to give more details on the cache.
2 parents 04e6465 + 643266f commit c54080f

File tree

1 file changed

+113
-20
lines changed

1 file changed

+113
-20
lines changed

nibabel/spatialimages.py

Lines changed: 113 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# copyright and license terms.
77
#
88
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
9-
''' Very simple spatial image class
9+
''' A simple spatial image class
1010
1111
The image class maintains the association between a 3D (or greater)
1212
array, and an affine transform that maps voxel coordinates to some world space.
@@ -27,7 +27,7 @@
2727
``fname``, where the derivation may differ between formats.
2828
* to_file_map() - save image to files with which the image is already
2929
associated.
30-
* .get_shape() (Deprecated)
30+
* .get_shape() (deprecated)
3131
3232
properties:
3333
@@ -451,34 +451,127 @@ def __str__(self):
451451
def get_data(self, caching='fill'):
452452
""" Return image data from image with any necessary scalng applied
453453
454-
If the image data is a array proxy (data not yet read from disk) then
455-
the default behavior (`caching` == "fill") is to read the data, and
456-
store in an internal cache. Future calls to ``get_data`` will return
457-
the cached copy.
454+
If the image data is an array proxy (an object that knows how to load
455+
the image data from disk) then the default behavior (`caching` ==
456+
"fill") is to read the data from the proxy, and store in an internal
457+
cache. Future calls to ``get_data`` will return the cached array.
458458
459-
Once the data has been cached and returned from a proxy array, the
460-
cached array can be modified by modifying the returned array, because
461-
the returned array is a reference to the array in the cache. Regardless
462-
of the `caching` flag, this is always true of an in-memory image (where
463-
the image data is an array rather than an array proxy).
459+
Once the data has been cached and returned from an image array proxy,
460+
if you modify the returned array, you will also modify the cached
461+
array (because they are the same array). Regardless of the `caching`
462+
flag, this is always true of an in-memory image (where the image data
463+
is an array rather than an array proxy).
464464
465465
Parameters
466466
----------
467467
caching : {'fill', 'unchanged'}, optional
468-
This argument has no effect in the case where the image data is an
469-
array, or the image data has already been cached. If the image data
470-
is an array proxy, and the image data has not yet been cached, then
471-
'fill' (the default) will read the data from the array proxy, and
472-
store in an internal cache, so that future calls to ``get_data``
473-
will return the cached copy. If 'unchanged' then leave the current
474-
state of caching unchanged; return the cached copy if it exists, if
475-
not, load the data from disk and return that, but without filling
476-
the cache.
468+
See the Notes section for a detailed explanation. This argument
469+
specifies whether the image object should fill in an internal
470+
cached reference to the returned image data array. "fill" specifies
471+
that the image should fill an internal cached reference if
472+
currently empty. Future calls to ``get_data`` will return this
473+
cached reference. You might prefer "fill" to save the image object
474+
from having to reload the array data from disk on each call to
475+
``get_data``. "unchanged" means that the image should not fill in
476+
the internal cached reference if the cache is currently empty. You
477+
might prefer "unchanged" to "fill" if you want to make sure that
478+
the call to ``get_data`` does not create an extra (cached)
479+
reference to the returned array. In this case it is easier for
480+
Python to free the memory from the returned array.
477481
478482
Returns
479483
-------
480484
data : array
481485
array of image data
486+
487+
See also
488+
--------
489+
uncache: empty the array data cache
490+
491+
Notes
492+
-----
493+
All images have a property ``dataobj`` that represents the image array
494+
data. Images that have been loaded from files usually do not load the
495+
array data from file immediately, in order to reduce image load time
496+
and memory use. For these images, ``dataobj`` is an *array proxy*; an
497+
object that knows how to load the image array data from file. Images
498+
with an array proxy ``dataobj`` are called *proxy images*. In contrast,
499+
images created directly from numpy arrays carry a simple reference to
500+
their array data in ``dataobj``. These are *in-memory images*.
501+
502+
By default (`caching` == "fill"), when you call ``get_data`` on a
503+
proxy image, we load the array data from disk, store (cache) an
504+
internal reference to this array data, and return the array. The next
505+
time you call ``get_data``, you will get the cached reference to the
506+
array, so we don't have to load the array data from disk again.
507+
508+
In-memory images are already in memory, so there is no benefit to
509+
caching, and the `caching` keywords have no effect.
510+
511+
For proxy images, you may not want to fill the cache after reading the
512+
data from disk because the cache will hold onto the array memory until
513+
the image object is deleted, or you use the image ``uncache`` method.
514+
If you don't want to fill the cache, then always use
515+
``get_data(caching='unchanged')``; in this case ``get_data`` will not
516+
fill the cache (store the reference to the array) if the cache is empty
517+
(no reference to the array). If the cache is full, "unchanged" leaves
518+
the cache full and returns the cached array reference.
519+
520+
The cache can effect the behavior of the image, because if the cache is
521+
full, or you have an in-memory image, then modifying the returned array
522+
will modify the result of future calls to ``get_data()``. For example
523+
you might do this:
524+
525+
>>> import os
526+
>>> import nibabel as nib
527+
>>> from nibabel.testing import data_path
528+
>>> img_fname = os.path.join(data_path, 'example4d.nii.gz')
529+
530+
>>> img = nib.load(img_fname) # This is a proxy image
531+
>>> nib.is_proxy(img.dataobj)
532+
True
533+
534+
The array is not yet cached by a call to "get_data", so:
535+
>>> img.in_memory
536+
False
537+
538+
After we call ``get_data`` using the default `caching='fill', the cache
539+
contains a reference to the returned array ``data``:
540+
541+
>>> data = img.get_data()
542+
>>> img.in_memory
543+
True
544+
545+
We modify an element in the returned data array:
546+
547+
>>> data[0, 0, 0, 0]
548+
0
549+
>>> data[0, 0, 0, 0] = 99
550+
>>> data[0, 0, 0, 0]
551+
99
552+
553+
The next time we call 'get_data', the method returns the cached
554+
reference to the (modified) array:
555+
556+
>>> data_again = img.get_data()
557+
>>> data_again is data
558+
True
559+
>>> data_again[0, 0, 0, 0]
560+
99
561+
562+
If you had *initially* used `caching` == 'unchanged' then the returned
563+
``data`` array would have been loaded from file, but not cached, and:
564+
565+
>>> img = nib.load(img_fname) # a proxy image again
566+
>>> data = img.get_data(caching='unchanged')
567+
>>> img.in_memory
568+
False
569+
>>> data[0, 0, 0] = 99
570+
>>> data_again = img.get_data(caching='unchanged')
571+
>>> data_again is data
572+
False
573+
>>> data_again[0, 0, 0, 0]
574+
0
482575
"""
483576
if caching not in ('fill', 'unchanged'):
484577
raise ValueError('caching value should be "fill" or "unchanged"')

0 commit comments

Comments
 (0)