Skip to content

Commit 6adeb15

Browse files
committed
DOC: get_data docstring rewrite
Rewrite of docstring for get_data to give more details on the cache.
1 parent b91336e commit 6adeb15

File tree

1 file changed

+77
-13
lines changed

1 file changed

+77
-13
lines changed

nibabel/spatialimages.py

Lines changed: 77 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -451,10 +451,10 @@ def __str__(self):
451451
def get_data(self, caching='fill'):
452452
""" Return image data from image with any necessary scalng applied
453453
454-
If the image data is a array proxy (data not yet read from disk) then
455-
the default behavior (`caching` == "fill") is to read the data, and
456-
store in an internal cache. Future calls to ``get_data`` will return
457-
the cached copy.
454+
If the image data is a array proxy (an object that knows how to load
455+
the image data from disk) then the default behavior (`caching` ==
456+
"fill") is to read the data from the proxy, and store in an internal
457+
cache. Future calls to ``get_data`` will return the cached copy.
458458
459459
Once the data has been cached and returned from a proxy array, the
460460
cached array can be modified by modifying the returned array, because
@@ -465,20 +465,84 @@ def get_data(self, caching='fill'):
465465
Parameters
466466
----------
467467
caching : {'fill', 'unchanged'}, optional
468-
This argument has no effect in the case where the image data is an
469-
array, or the image data has already been cached. If the image data
470-
is an array proxy, and the image data has not yet been cached, then
471-
'fill' (the default) will read the data from the array proxy, and
472-
store in an internal cache, so that future calls to ``get_data``
473-
will return the cached copy. If 'unchanged' then leave the current
474-
state of caching unchanged; return the cached copy if it exists, if
475-
not, load the data from disk and return that, but without filling
476-
the cache.
468+
See the Notes section for a detailed explanation. This argument
469+
specifies whether the image object should fill in an internal
470+
cached reference to the returned image data array. "fill" specifies
471+
that the image should fill an internal cached reference if
472+
currently empty. Future calls to ``get_data`` will return this
473+
cached reference. You might prefer "fill" to save the image object
474+
from having to reload the array data from disk on each call to
475+
``get_data``. "unchanged" means that the image should not fill in
476+
the internal cached reference if the cache is currently empty. You
477+
might prefer "unchanged" to "fill" if you want to make sure that
478+
the call to ``get_data`` does not create an extra (cached)
479+
reference to the returned array. In this case it is easier for
480+
Python to free the memory from the returned array.
477481
478482
Returns
479483
-------
480484
data : array
481485
array of image data
486+
487+
See also
488+
--------
489+
uncache: empty the array data cache
490+
491+
Notes
492+
-----
493+
All images have a property ``dataobj`` that represents the image array
494+
data. Images that have been loaded from files usually do not load the
495+
array data from file immediately, in order to reduce image load time
496+
and memory use. For these images, ``dataobj`` is an *array proxy*; an
497+
object that knows how to load the image array data from file. Images
498+
with an array proxy ``dataobj`` are called *proxy images*. In contrast,
499+
images created directly from numpy arrays carry a simple reference to
500+
their array data in ``dataobj``. These are *in-memory images*.
501+
502+
By default (`caching` == "fill"), when you call ``get_data`` on a
503+
proxy image, we load the array data from disk, store (cache) an
504+
internal reference to this array data, and return the array. The next
505+
time you call ``get_data``, you will get the cached reference to the
506+
array, so we don't have to load the array data from disk again.
507+
508+
In-memory images are already in memory, so there is no benefit to
509+
caching, and the `caching` keywords have no effect.
510+
511+
For proxy images, you may not want to fill the cache after reading the
512+
data from disk because the cache will hold onto the array memory until
513+
the image object is deleted, or you use the image ``uncache`` method.
514+
If you don't want to fill the cache, then always use
515+
``get_data(caching='unchanged')``; in this case ``get_data`` will not
516+
fill the cache (store the reference to the array) if the cache is empty
517+
(no reference to the array). If the cache is full, "unchanged" leaves
518+
the cache full and returns the cached array reference.
519+
520+
The cache can effect the behavior of the image, because if the cache is
521+
full, or you have an in-memory image, then modifying the returned array
522+
will modify the result of future calls to ``get_data()``. For example
523+
you might do this:
524+
525+
img = load('my_image.nii') # a proxy image
526+
data = img.get_data()
527+
data[0, 0, 0] = 99
528+
529+
In this case the cache is full (default `caching='fill'), and the cache
530+
contains a reference to the returned array ``data``, so the next time
531+
you call ``get_data()``:
532+
533+
data_again = img.get_data()
534+
data_again is data # will be True
535+
data_again[0, 0, 0] == 99 # will be True
536+
537+
If you had *initially* used `caching` == 'unchanged' then the returned
538+
``data`` array is loaded from file, but not cached, and:
539+
540+
img = load('my_image.nii') # a proxy image
541+
data = img.get_data(caching='unchanged')
542+
data[0, 0, 0] = 99
543+
data_again = img.get_data(caching='unchanged')
544+
data_again is data # will be False
545+
data_again[0, 0, 0] == 99 # will be False
482546
"""
483547
if caching not in ('fill', 'unchanged'):
484548
raise ValueError('caching value should be "fill" or "unchanged"')

0 commit comments

Comments
 (0)