Skip to content

Commit a07c90c

Browse files
sebergmdhaber
andauthored
DOC: Add documentation explaining our promotion rules (numpy#25705)
* DOC: Add documentation explaining our promotion rules This adds a dedicated page about promotion rules. * Try clarifying the promotion figure a bit more and maybe fix doc warnings/errors * DOC: adjustments to promotion rule documentation * Small edits * Address Martens review comments * Also adopt a suggestion by Matt (slightly rephrasing anothe part) * Address/adopt Martens review comments --------- Co-authored-by: Matt Haberland <[email protected]>
1 parent 886d361 commit a07c90c

File tree

3 files changed

+1728
-0
lines changed

3 files changed

+1728
-0
lines changed
Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
.. currentmodule:: numpy
2+
3+
.. _arrays.promotion:
4+
5+
****************************
6+
Data type promotion in NumPy
7+
****************************
8+
9+
When mixing two different data types, NumPy has to determine the appropriate
10+
dtype for the result of the operation. This step is referred to as *promotion*
11+
or *finding the common dtype*.
12+
13+
In typical cases, the user does not need to worry about the details of
14+
promotion, since the promotion step usually ensures that the result will
15+
either match or exceed the precision of the input.
16+
17+
For example, when the inputs are of the same dtype, the dtype of the result
18+
matches the dtype of the inputs:
19+
20+
>>> np.int8(1) + np.int8(1)
21+
np.int8(2)
22+
23+
Mixing two different dtypes normally produces a result with the dtype of the
24+
higher precision input:
25+
26+
>>> np.int8(4) + np.int64(8) # 64 > 8
27+
np.int64(12)
28+
>>> np.float32(3) + np.float16(3) # 32 > 16
29+
np.float32(6.0)
30+
31+
In typical cases, this does not lead to surprises. However, if you work with
32+
non-default dtypes like unsigned integers and low-precision floats, or if you
33+
mix NumPy integers, NumPy floats, and Python scalars, some
34+
details of NumPy promotion rules may be relevant. Note that these detailed
35+
rules do not always match those of other languages [#hist-reasons]_.
36+
37+
Numerical dtypes come in four "kinds" with a natural hierarchy.
38+
39+
1. unsigned integers (``uint``)
40+
2. signed integers (``int``)
41+
3. float (``float``)
42+
4. complex (``complex``)
43+
44+
In addition to kind, NumPy numerical dtypes also have an associated precision, specified
45+
in bits. Together, the kind and precision specify the dtype. For example, a
46+
``uint8`` is an unsigned integer stored using 8 bits.
47+
48+
The result of an operation will always be of an equal or higher kind of any of
49+
the inputs. Furthermore, the result will always have a precision greater than
50+
or equal to those of the inputs. Already, this can lead to some examples which
51+
may be unexpected:
52+
53+
1. When mixing floating point numbers and integers, the precision of the
54+
integer may force the result to a higher precision floating point. For
55+
example, the result of an operation involving ``int64`` and ``float16``
56+
is ``float64``.
57+
2. When mixing unsigned and signed integers with the same precision, the
58+
result will have *higher* precision than either inputs. Additionally,
59+
if one of them has 64bit precision already, no higher precision integer
60+
is available and for example an operation involving ``int64`` and ``uint64``
61+
gives ``float64``.
62+
63+
Please see the `Numerical promotion` section and image below for details
64+
on both.
65+
66+
Detailed behavior of Python scalars
67+
-----------------------------------
68+
Since NumPy 2.0 [#NEP50]_, an important point in our promotion rules is
69+
that although operations involving two NumPy dtypes never lose precision,
70+
operations involving a NumPy dtype and a Python scalar (``int``, ``float``,
71+
or ``complex``) *can* lose precision. For instance, it is probably intuitive
72+
that the result of an operation between a Python integer and a NumPy integer
73+
should be a NumPy integer. However, Python integers have arbitrary precision
74+
whereas all NumPy dtypes have fixed precision, so the arbitrary precision
75+
of Python integers cannot be preserved.
76+
77+
More generally, NumPy considers the "kind" of Python scalars, but ignores
78+
their precision when determining the result dtype. This is often convenient.
79+
For instance, when working with arrays of a low precision dtype, it is usually
80+
desirable for simple operations with Python scalars to preserve the dtype.
81+
82+
>>> arr_float32 = np.array([1, 2.5, 2.1], dtype="float32")
83+
>>> arr_float32 + 10.0 # undesirable to promote to float64
84+
array([11. , 12.5, 12.1], dtype=float32)
85+
>>> arr_int16 = np.array([3, 5, 7], dtype="int16")
86+
>>> arr_int16 + 10 # undesirable to promote to int64
87+
array([13, 15, 17], dtype=int16)
88+
89+
In both cases, the result precision is dictated by the NumPy dtype.
90+
Because of this, ``arr_float32 + 3.0`` behaves the same as
91+
``arr_float32 + np.float32(3.0)``, and ``arr_int16 + 10`` behaves as
92+
``arr_int16 + np.int16(10.)``.
93+
94+
As another example, when mixing NumPy integers with a Python ``float``
95+
or ``complex``, the result always has type ``float64`` or ``complex128``:
96+
97+
>> np.int16(1) + 1.0
98+
np.float64(2.0)
99+
100+
However, these rules can also lead to surprising behavior when working with
101+
low precision dtypes.
102+
103+
First, since the Python value is converted to a NumPy one before the operation
104+
can by performed, operations can fail with an error when the result seems
105+
obvious. For instance, ``np.int8(1) + 1000`` cannot continue because ``1000``
106+
exceeds the maximum value of an ``int8``. When the Python scalar
107+
cannot be coerced to the NumPy dtype, an error is raised:
108+
109+
>>> np.int8(1) + 1000
110+
Traceback (most recent call last):
111+
...
112+
OverflowError: Python integer 1000 out of bounds for int8
113+
>>> np.int64(1) * 10**100
114+
Traceback (most recent call last):
115+
...
116+
OverflowError: Python int too large to convert to C long
117+
>>> np.float32(1) + 1e300
118+
np.float32(inf)
119+
... RuntimeWarning: overflow encountered in cast
120+
121+
Second, since the Python float or integer precision is always ignored, a low
122+
precision NumPy scalar will keep using its lower precision unless explicitly
123+
converted to a higher precision NumPy dtype or Python scalar (e.g. via ``int()``,
124+
``float()``, or ``scalar.item()``). This lower precision may be detrimental to
125+
some calculations or lead to incorrect results, especially in the case of integer
126+
overflows:
127+
128+
>>> np.int8(100) + 100 # the result exceeds the capacity of int8
129+
np.int8(-56)
130+
... RuntimeWarning: overflow encountered in scalar add
131+
132+
Note that NumPy warns when overflows occur for scalars, but not for arrays;
133+
e.g., ``np.array(100, dtype="uint8") + 100`` will *not* warn.
134+
135+
Numerical promotion
136+
-------------------
137+
138+
The following image shows the numerical promotion rules with the kinds
139+
on the vertical axis and the precision on the horizontal axis.
140+
141+
.. figure:: figures/nep-0050-promotion-no-fonts.svg
142+
:figclass: align-center
143+
144+
The input dtype with the higher kind determines the kind of the result dtype.
145+
The result dtype has a precision as low as possible without appearing to the
146+
left of either input dtype in the diagram.
147+
148+
Note the following specific rules and observations:
149+
1. When a Python ``float`` or ``complex`` interacts with a NumPy integer
150+
the result will be ``float64`` or ``complex128`` (yellow border).
151+
NumPy booleans will also be cast to the default integer.[#default-int]
152+
This is not relevant when additionally NumPy floating point values are
153+
involved.
154+
2. The precision is drawn such that ``float16 < int16 < uint16`` because
155+
large ``uint16`` do not fit ``int16`` and large ``int16`` will lose precision
156+
when stored in a ``float16``.
157+
This pattern however is broken since NumPy always considers ``float64``
158+
and ``complex128`` to be acceptable promotion results for any integer
159+
value.
160+
3. A special case is that NumPy promotes many combinations of signed and
161+
unsigned integers to ``float64``. A higher kind is used here because no
162+
signed integer dtype is sufficiently precise to hold a ``uint64``.
163+
164+
165+
Exceptions to the general promotion rules
166+
-----------------------------------------
167+
168+
In NumPy promotion refers to what specific functions do with the result and
169+
in some cases, this means that NumPy may deviate from what the `np.result_type`
170+
would give.
171+
172+
Behavior of ``sum`` and ``prod``
173+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
174+
**``np.sum`` and ``np.prod``:** Will alway return the default integer type
175+
when summing over integer values (or booleans). This is usually an ``int64``.
176+
The reason for this is that integer summations are otherwise very likely
177+
to overflow and give confusing results.
178+
This rule also applies to the underlying ``np.add.reduce`` and
179+
``np.multiply.reduce``.
180+
181+
Notable behavior with NumPy or Python integer scalars
182+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
183+
NumPy promotion refers to the result dtype and operation precision,
184+
but the operation will sometimes dictate that result.
185+
Division always returns floating point values and comparison always booleans.
186+
187+
This leads to what may appear as "exceptions" to the rules:
188+
* NumPy comparisons with Python integers or mixed precision integers always
189+
return the correct result. The inputs will never be cast in a way which
190+
loses precision.
191+
* Equality comparisons between types which cannot be promoted will be
192+
considered all ``False`` (equality) or all ``True`` (not-equal).
193+
* Unary math functions like ``np.sin`` that always return floating point
194+
values, accept any Python integer input by converting it to ``float64``.
195+
* Division always returns floating point values and thus also allows divisions
196+
between any NumPy integer with any Python integer value by casting both
197+
to ``float64``.
198+
199+
In principle, some of these exceptions may make sense for other functions.
200+
Please raise an issue if you feel this is the case.
201+
202+
Promotion of non-numerical datatypes
203+
------------------------------------
204+
205+
NumPy extends the promotion to non-numerical types, although in many cases
206+
promotion is not well defined and simply rejected.
207+
208+
The following rules apply:
209+
* NumPy byte strings (``np.bytes_``) can be promoted to unicode strings
210+
(``np.str_``). However, casting the bytes to unicode will fail for
211+
non-ascii characters.
212+
* For some purposes NumPy will promote almost any other datatype to strings.
213+
This applies to array creation or concatenation.
214+
* The array constructers like ``np.array()`` will use ``object`` dtype when
215+
there is no viable promotion.
216+
* Structured dtypes can promote when their field names and order matches.
217+
In that case all fields are promoted individually.
218+
* NumPy ``timedelta`` can in some cases promote with integers.
219+
220+
.. note::
221+
Some of these rules are somewhat surprising, and are being considered for
222+
change in the future. However, any backward-incompatible changes have to
223+
be weighed against the risks of breaking existing code. Please raise an
224+
issue if you have particular ideas about how promotion should work.
225+
226+
Details of promoted ``dtype`` instances
227+
---------------------------------------
228+
The above discussion has mainly dealt with the behavior when mixing different
229+
DType classes.
230+
A ``dtype`` instance attached to an array can carry additional information
231+
such as byte-order, metadata, string length, or exact structured dtype layout.
232+
233+
While the string length or field names of a structured dtype are important,
234+
NumPy considers byte-order, metadata, and the exact layout of a structured
235+
dtype as storage details.
236+
During promotion NumPy does *not* take these storage details into account:
237+
* Byte-order is converted to native byte-order.
238+
* Metadata attached to the dtype may or may not be preserved.
239+
* Resulting structured dtypes will be packed (but aligned if inputs were).
240+
241+
This behaviors is the best behavior for most programs where storage details
242+
are not relevant to the final results and where the use of incorrect byte-order
243+
could drastically slow down evaluation.
244+
245+
246+
.. [#hist-reasons]: To a large degree, this may just be for choices made early
247+
on in NumPy's predecessors. For more details, see `NEP 50 <NEP50>`.
248+
249+
.. [#NEP50]: See also `NEP 50 <NEP50>` which changed the rules for NumPy 2.0.
250+
Previous versions of NumPy would sometimes return higher precision results
251+
based on the input value of Python scalars.
252+
Further, previous versions of NumPy would typically ignore the higher
253+
precision of NumPy scalars or 0-D arrays for promotion purposes.
254+
255+
.. [#default-int]: The default integer is marked as ``int64`` in the schema
256+
but is ``int32`` on 32bit platforms. However, normal PCs are 64bit.

doc/source/reference/arrays.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ of also more complicated arrangements of data.
4141
arrays.ndarray
4242
arrays.scalars
4343
arrays.dtypes
44+
arrays.promotion
4445
arrays.nditer
4546
arrays.classes
4647
maskedarray

0 commit comments

Comments
 (0)