DOC+TST - scaling in principle

matthew-brett · matthew-brett · commit d7ce13a05a96 · 2012-02-11T15:17:47.000-08:00
Set out the process of using the scalefactor and intercept in a somewhat
formal way to explain it to myself.

Test the rules determining output dtypes and show that numpy 1.5.1
differs from numpy 1.6.0
diff --git a/doc/source/devel/scaling.rst b/doc/source/devel/scaling.rst
@@ -0,0 +1,73 @@
+###########################
+Scalefactors and intercepts
+###########################
+
+SPM Analyze and nifti1 images have *scalefactors*.  nifti1 images also have
+*intercepts*.  If ``A`` is an array in memory, and ``S`` is the array that will
+be written to disk, then::
+
+    R = (A - intercept) / scalefactor
+
+and ``R == S`` if ``R`` is already the data dtype we need to write.
+
+If we load the image from disk, we exactly recover ``S`` (and ``R``).  To get
+something approximating ``A`` (say ``Aprime``) we apply the intercept and
+scalefactor::
+
+    Aprime = (S * scalefactor) + intercept
+
+In a perfect world ``A`` would be exactly the same as ``Aprime``.  However
+``scalefactor`` and ``intercept`` are floating point values.  With floating
+point, if ``r = (a - b) / c; p = (r * c) + b`` it is not necessarily true that
+``p == a``. For example:
+
+>>> import numpy as np
+>>> a = 10
+>>> b = np.e
+>>> c = np.pi
+>>> r = (a - b) / c
+>>> p = (r * c) + b
+>>> p == a
+False
+
+So there will be some error in this reconstruction, even when ``R`` is the same
+type as ``S``.
+
+More common is the situation where ``R`` is a different type from ``S``.  If
+``R`` is of type ``r_dtype``, ``S`` is of type ``s_dtype`` and
+``cast_function(R, dtype)`` is some function that casts ``R`` to the desired
+type ``dtype``, then::
+
+    R = (A - intercept) / scalefactor
+    S = cast_function(R, s_dtype)
+    R_prime = cast_function(S, r_dtype)
+    A_prime = (R_prime * scalefactor) + intercept
+
+The type of ``R`` will depend on what numpy did for upcasting ``A, intercept,
+scalefactor``.
+
+In order that ``cast_function(S, r_dtype)`` can best reverse ``cast_function(R,
+s_dtype)``, the second needs to know the type of ``R``, which is not stored. The
+type of ``R`` depends on the types of ``A`` and of ``intercept, scalefactor``.
+We don't know the type of ``A`` because it is not stored.
+
+``R`` is likely to be a floating point type because of the application of
+scalefactor and intercept. If ``(intercept, scalefactor)`` are not the identity
+(0, 1), then we can ensure that ``R`` is at minimum the type of the ``intercept,
+scalefactor`` by making these be at least 1D arrays, so that floating point
+types will upcast in ``R = (A - intercept) / scalefactor``.
+
+See the file ``nibabel/tests/test_cast_assumptions.py`` for tests of the
+predictability of dtype output given dtype input.
+
+The cast of ``R`` to ``S`` and back to ``R_prime`` can lose resolution if the
+types of ``R`` and ``S`` have different resolution.
+
+Our job is to select:
+
+* scalefactor
+* intercept
+* ``cast_function``
+
+such that we minimize some measure of difference between ``A`` and
+``A_prime``.
diff --git a/nibabel/tests/test_cast_assumptions.py b/nibabel/tests/test_cast_assumptions.py
@@ -0,0 +1,86 @@
+""" Investigate casting rules
+
+Specifically, does the data affect the output type?
+
+To do this, we combine all types and investigate the output, when:
+
+* type A, B have data (0)
+* type A has max / min for type A, type B has data (0)
+* type A has max / min for type A, type B has max / min for type B
+
+Expecting that in all cases the same dtype will result.
+
+In fact what happens is that this _is_ true if A and B are atleast_1d, but it is
+not true if (A or B is a scalar, for numpy 1.6.1). It looks like numpy 1.6.1 is
+first checking whether the scalar B np.can_cast to type A, if so, then the
+return type is type of A, otherwise it uses the array casting rules.
+
+Thus - for numpy 1.6.1::
+
+    >>> import numpy as np
+    >>> Adata = np.array([127], dtype=np.int8)
+    >>> Bdata = np.int16(127)
+    >>> (Adata + Bdata).dtype
+    dtype('int8')
+    >>> Bdata = np.int16(128)
+    >>> (Adata + Bdata).dtype
+    dtype('int16')
+    >>> Bdata = np.array([127], dtype=np.int16)
+    >>> (Adata + Bdata).dtype
+    dtype('int16')
+"""
+
+from distutils.version import LooseVersion
+
+import numpy as np
+
+from nose.tools import assert_equal
+
+NP_VERSION = LooseVersion(np.__version__)
+
+ALL_TYPES = (np.sctypes['int'] + np.sctypes['uint'] + np.sctypes['float'] +
+             np.sctypes['complex'])
+
+def get_info(type):
+    try:
+        return np.finfo(type)
+    except ValueError:
+        return np.iinfo(type)
+
+
+def test_cast_assumptions():
+    # Check that dtype is predictable from binary operations
+    npa = np.array
+    for A in ALL_TYPES:
+        a_info = get_info(A)
+        for B in ALL_TYPES:
+            b_info = get_info(B)
+            Adata = np.zeros((2,), dtype=A)
+            Bdata = np.zeros((2,), dtype=B)
+            Bscalar = B(0) # 0 can always be cast to type A
+            out_dtype = (Adata + Bdata).dtype
+            out_sc_dtype = (Adata + Bscalar).dtype
+            assert_equal(out_dtype, (Adata * Bdata).dtype)
+            assert_equal(out_sc_dtype, (Adata * Bscalar).dtype)
+            Adata[0], Adata[1] = a_info.min, a_info.max
+            assert_equal(out_dtype, (Adata + Bdata).dtype)
+            assert_equal(out_dtype, (Adata * Bdata).dtype)
+            # Compiled array gives same dtype
+            assert_equal(out_dtype, npa([Adata[0:1], Bdata[0:1]]).dtype)
+            assert_equal(out_sc_dtype, (Adata + Bscalar).dtype)
+            assert_equal(out_sc_dtype, (Adata * Bscalar).dtype)
+            # Compiled array with scalars gives promoted (can_cast) dtype
+            assert_equal(out_dtype, npa([Adata[0], Bscalar]).dtype)
+            Bdata[0], Bdata[1] = b_info.min, b_info.max
+            Bscalar = B(b_info.max) # cannot always be cast to type A
+            assert_equal(out_dtype, (Adata + Bdata).dtype)
+            assert_equal(out_dtype, (Adata * Bdata).dtype)
+            # Compiled array with scalars - promoted dtype
+            assert_equal(out_dtype, npa([Adata[0], Bscalar]).dtype)
+            # Here numpy >= 1.6.1 differs from previous versions
+            if NP_VERSION <= '1.5.1' or np.can_cast(Bscalar, A):
+                assert_equal(out_sc_dtype, (Adata + Bscalar).dtype)
+                assert_equal(out_sc_dtype, (Adata * Bscalar).dtype)
+            else: # casting rules changed for 1.6 onwards
+                assert_equal(out_dtype, (Adata + Bscalar).dtype)
+                assert_equal(out_dtype, (Adata * Bscalar).dtype)