Skip to content

Commit d7ce13a

Browse files
committed
DOC+TST - scaling in principle
Set out the process of using the scalefactor and intercept in a somewhat formal way to explain it to myself. Test the rules determining output dtypes and show that numpy 1.5.1 differs from numpy 1.6.0
1 parent 2a96c40 commit d7ce13a

File tree

2 files changed

+159
-0
lines changed

2 files changed

+159
-0
lines changed

doc/source/devel/scaling.rst

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
###########################
2+
Scalefactors and intercepts
3+
###########################
4+
5+
SPM Analyze and nifti1 images have *scalefactors*. nifti1 images also have
6+
*intercepts*. If ``A`` is an array in memory, and ``S`` is the array that will
7+
be written to disk, then::
8+
9+
R = (A - intercept) / scalefactor
10+
11+
and ``R == S`` if ``R`` is already the data dtype we need to write.
12+
13+
If we load the image from disk, we exactly recover ``S`` (and ``R``). To get
14+
something approximating ``A`` (say ``Aprime``) we apply the intercept and
15+
scalefactor::
16+
17+
Aprime = (S * scalefactor) + intercept
18+
19+
In a perfect world ``A`` would be exactly the same as ``Aprime``. However
20+
``scalefactor`` and ``intercept`` are floating point values. With floating
21+
point, if ``r = (a - b) / c; p = (r * c) + b`` it is not necessarily true that
22+
``p == a``. For example:
23+
24+
>>> import numpy as np
25+
>>> a = 10
26+
>>> b = np.e
27+
>>> c = np.pi
28+
>>> r = (a - b) / c
29+
>>> p = (r * c) + b
30+
>>> p == a
31+
False
32+
33+
So there will be some error in this reconstruction, even when ``R`` is the same
34+
type as ``S``.
35+
36+
More common is the situation where ``R`` is a different type from ``S``. If
37+
``R`` is of type ``r_dtype``, ``S`` is of type ``s_dtype`` and
38+
``cast_function(R, dtype)`` is some function that casts ``R`` to the desired
39+
type ``dtype``, then::
40+
41+
R = (A - intercept) / scalefactor
42+
S = cast_function(R, s_dtype)
43+
R_prime = cast_function(S, r_dtype)
44+
A_prime = (R_prime * scalefactor) + intercept
45+
46+
The type of ``R`` will depend on what numpy did for upcasting ``A, intercept,
47+
scalefactor``.
48+
49+
In order that ``cast_function(S, r_dtype)`` can best reverse ``cast_function(R,
50+
s_dtype)``, the second needs to know the type of ``R``, which is not stored. The
51+
type of ``R`` depends on the types of ``A`` and of ``intercept, scalefactor``.
52+
We don't know the type of ``A`` because it is not stored.
53+
54+
``R`` is likely to be a floating point type because of the application of
55+
scalefactor and intercept. If ``(intercept, scalefactor)`` are not the identity
56+
(0, 1), then we can ensure that ``R`` is at minimum the type of the ``intercept,
57+
scalefactor`` by making these be at least 1D arrays, so that floating point
58+
types will upcast in ``R = (A - intercept) / scalefactor``.
59+
60+
See the file ``nibabel/tests/test_cast_assumptions.py`` for tests of the
61+
predictability of dtype output given dtype input.
62+
63+
The cast of ``R`` to ``S`` and back to ``R_prime`` can lose resolution if the
64+
types of ``R`` and ``S`` have different resolution.
65+
66+
Our job is to select:
67+
68+
* scalefactor
69+
* intercept
70+
* ``cast_function``
71+
72+
such that we minimize some measure of difference between ``A`` and
73+
``A_prime``.
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
""" Investigate casting rules
2+
3+
Specifically, does the data affect the output type?
4+
5+
To do this, we combine all types and investigate the output, when:
6+
7+
* type A, B have data (0)
8+
* type A has max / min for type A, type B has data (0)
9+
* type A has max / min for type A, type B has max / min for type B
10+
11+
Expecting that in all cases the same dtype will result.
12+
13+
In fact what happens is that this _is_ true if A and B are atleast_1d, but it is
14+
not true if (A or B is a scalar, for numpy 1.6.1). It looks like numpy 1.6.1 is
15+
first checking whether the scalar B np.can_cast to type A, if so, then the
16+
return type is type of A, otherwise it uses the array casting rules.
17+
18+
Thus - for numpy 1.6.1::
19+
20+
>>> import numpy as np
21+
>>> Adata = np.array([127], dtype=np.int8)
22+
>>> Bdata = np.int16(127)
23+
>>> (Adata + Bdata).dtype
24+
dtype('int8')
25+
>>> Bdata = np.int16(128)
26+
>>> (Adata + Bdata).dtype
27+
dtype('int16')
28+
>>> Bdata = np.array([127], dtype=np.int16)
29+
>>> (Adata + Bdata).dtype
30+
dtype('int16')
31+
"""
32+
33+
from distutils.version import LooseVersion
34+
35+
import numpy as np
36+
37+
from nose.tools import assert_equal
38+
39+
NP_VERSION = LooseVersion(np.__version__)
40+
41+
ALL_TYPES = (np.sctypes['int'] + np.sctypes['uint'] + np.sctypes['float'] +
42+
np.sctypes['complex'])
43+
44+
def get_info(type):
45+
try:
46+
return np.finfo(type)
47+
except ValueError:
48+
return np.iinfo(type)
49+
50+
51+
def test_cast_assumptions():
52+
# Check that dtype is predictable from binary operations
53+
npa = np.array
54+
for A in ALL_TYPES:
55+
a_info = get_info(A)
56+
for B in ALL_TYPES:
57+
b_info = get_info(B)
58+
Adata = np.zeros((2,), dtype=A)
59+
Bdata = np.zeros((2,), dtype=B)
60+
Bscalar = B(0) # 0 can always be cast to type A
61+
out_dtype = (Adata + Bdata).dtype
62+
out_sc_dtype = (Adata + Bscalar).dtype
63+
assert_equal(out_dtype, (Adata * Bdata).dtype)
64+
assert_equal(out_sc_dtype, (Adata * Bscalar).dtype)
65+
Adata[0], Adata[1] = a_info.min, a_info.max
66+
assert_equal(out_dtype, (Adata + Bdata).dtype)
67+
assert_equal(out_dtype, (Adata * Bdata).dtype)
68+
# Compiled array gives same dtype
69+
assert_equal(out_dtype, npa([Adata[0:1], Bdata[0:1]]).dtype)
70+
assert_equal(out_sc_dtype, (Adata + Bscalar).dtype)
71+
assert_equal(out_sc_dtype, (Adata * Bscalar).dtype)
72+
# Compiled array with scalars gives promoted (can_cast) dtype
73+
assert_equal(out_dtype, npa([Adata[0], Bscalar]).dtype)
74+
Bdata[0], Bdata[1] = b_info.min, b_info.max
75+
Bscalar = B(b_info.max) # cannot always be cast to type A
76+
assert_equal(out_dtype, (Adata + Bdata).dtype)
77+
assert_equal(out_dtype, (Adata * Bdata).dtype)
78+
# Compiled array with scalars - promoted dtype
79+
assert_equal(out_dtype, npa([Adata[0], Bscalar]).dtype)
80+
# Here numpy >= 1.6.1 differs from previous versions
81+
if NP_VERSION <= '1.5.1' or np.can_cast(Bscalar, A):
82+
assert_equal(out_sc_dtype, (Adata + Bscalar).dtype)
83+
assert_equal(out_sc_dtype, (Adata * Bscalar).dtype)
84+
else: # casting rules changed for 1.6 onwards
85+
assert_equal(out_dtype, (Adata + Bscalar).dtype)
86+
assert_equal(out_dtype, (Adata * Bscalar).dtype)

0 commit comments

Comments
 (0)