|
| 1 | +.. currentmodule:: numpy |
| 2 | + |
| 3 | +.. _arrays.promotion: |
| 4 | + |
| 5 | +**************************** |
| 6 | +Data type promotion in NumPy |
| 7 | +**************************** |
| 8 | + |
| 9 | +When mixing two different data types, NumPy has to determine the appropriate |
| 10 | +dtype for the result of the operation. This step is referred to as *promotion* |
| 11 | +or *finding the common dtype*. |
| 12 | + |
| 13 | +In typical cases, the user does not need to worry about the details of |
| 14 | +promotion, since the promotion step usually ensures that the result will |
| 15 | +either match or exceed the precision of the input. |
| 16 | + |
| 17 | +For example, when the inputs are of the same dtype, the dtype of the result |
| 18 | +matches the dtype of the inputs: |
| 19 | + |
| 20 | + >>> np.int8(1) + np.int8(1) |
| 21 | + np.int8(2) |
| 22 | + |
| 23 | +Mixing two different dtypes normally produces a result with the dtype of the |
| 24 | +higher precision input: |
| 25 | + |
| 26 | + >>> np.int8(4) + np.int64(8) # 64 > 8 |
| 27 | + np.int64(12) |
| 28 | + >>> np.float32(3) + np.float16(3) # 32 > 16 |
| 29 | + np.float32(6.0) |
| 30 | + |
| 31 | +In typical cases, this does not lead to surprises. However, if you work with |
| 32 | +non-default dtypes like unsigned integers and low-precision floats, or if you |
| 33 | +mix NumPy integers, NumPy floats, and Python scalars, some |
| 34 | +details of NumPy promotion rules may be relevant. Note that these detailed |
| 35 | +rules do not always match those of other languages [#hist-reasons]_. |
| 36 | + |
| 37 | +Numerical dtypes come in four "kinds" with a natural hierarchy. |
| 38 | + |
| 39 | +1. unsigned integers (``uint``) |
| 40 | +2. signed integers (``int``) |
| 41 | +3. float (``float``) |
| 42 | +4. complex (``complex``) |
| 43 | + |
| 44 | +In addition to kind, NumPy numerical dtypes also have an associated precision, specified |
| 45 | +in bits. Together, the kind and precision specify the dtype. For example, a |
| 46 | +``uint8`` is an unsigned integer stored using 8 bits. |
| 47 | + |
| 48 | +The result of an operation will always be of an equal or higher kind of any of |
| 49 | +the inputs. Furthermore, the result will always have a precision greater than |
| 50 | +or equal to those of the inputs. Already, this can lead to some examples which |
| 51 | +may be unexpected: |
| 52 | + |
| 53 | +1. When mixing floating point numbers and integers, the precision of the |
| 54 | + integer may force the result to a higher precision floating point. For |
| 55 | + example, the result of an operation involving ``int64`` and ``float16`` |
| 56 | + is ``float64``. |
| 57 | +2. When mixing unsigned and signed integers with the same precision, the |
| 58 | + result will have *higher* precision than either inputs. Additionally, |
| 59 | + if one of them has 64bit precision already, no higher precision integer |
| 60 | + is available and for example an operation involving ``int64`` and ``uint64`` |
| 61 | + gives ``float64``. |
| 62 | + |
| 63 | +Please see the `Numerical promotion` section and image below for details |
| 64 | +on both. |
| 65 | + |
| 66 | +Detailed behavior of Python scalars |
| 67 | +----------------------------------- |
| 68 | +Since NumPy 2.0 [#NEP50]_, an important point in our promotion rules is |
| 69 | +that although operations involving two NumPy dtypes never lose precision, |
| 70 | +operations involving a NumPy dtype and a Python scalar (``int``, ``float``, |
| 71 | +or ``complex``) *can* lose precision. For instance, it is probably intuitive |
| 72 | +that the result of an operation between a Python integer and a NumPy integer |
| 73 | +should be a NumPy integer. However, Python integers have arbitrary precision |
| 74 | +whereas all NumPy dtypes have fixed precision, so the arbitrary precision |
| 75 | +of Python integers cannot be preserved. |
| 76 | + |
| 77 | +More generally, NumPy considers the "kind" of Python scalars, but ignores |
| 78 | +their precision when determining the result dtype. This is often convenient. |
| 79 | +For instance, when working with arrays of a low precision dtype, it is usually |
| 80 | +desirable for simple operations with Python scalars to preserve the dtype. |
| 81 | + |
| 82 | + >>> arr_float32 = np.array([1, 2.5, 2.1], dtype="float32") |
| 83 | + >>> arr_float32 + 10.0 # undesirable to promote to float64 |
| 84 | + array([11. , 12.5, 12.1], dtype=float32) |
| 85 | + >>> arr_int16 = np.array([3, 5, 7], dtype="int16") |
| 86 | + >>> arr_int16 + 10 # undesirable to promote to int64 |
| 87 | + array([13, 15, 17], dtype=int16) |
| 88 | + |
| 89 | +In both cases, the result precision is dictated by the NumPy dtype. |
| 90 | +Because of this, ``arr_float32 + 3.0`` behaves the same as |
| 91 | +``arr_float32 + np.float32(3.0)``, and ``arr_int16 + 10`` behaves as |
| 92 | +``arr_int16 + np.int16(10.)``. |
| 93 | + |
| 94 | +As another example, when mixing NumPy integers with a Python ``float`` |
| 95 | +or ``complex``, the result always has type ``float64`` or ``complex128``: |
| 96 | + |
| 97 | + >> np.int16(1) + 1.0 |
| 98 | + np.float64(2.0) |
| 99 | + |
| 100 | +However, these rules can also lead to surprising behavior when working with |
| 101 | +low precision dtypes. |
| 102 | + |
| 103 | +First, since the Python value is converted to a NumPy one before the operation |
| 104 | +can by performed, operations can fail with an error when the result seems |
| 105 | +obvious. For instance, ``np.int8(1) + 1000`` cannot continue because ``1000`` |
| 106 | +exceeds the maximum value of an ``int8``. When the Python scalar |
| 107 | +cannot be coerced to the NumPy dtype, an error is raised: |
| 108 | + |
| 109 | + >>> np.int8(1) + 1000 |
| 110 | + Traceback (most recent call last): |
| 111 | + ... |
| 112 | + OverflowError: Python integer 1000 out of bounds for int8 |
| 113 | + >>> np.int64(1) * 10**100 |
| 114 | + Traceback (most recent call last): |
| 115 | + ... |
| 116 | + OverflowError: Python int too large to convert to C long |
| 117 | + >>> np.float32(1) + 1e300 |
| 118 | + np.float32(inf) |
| 119 | + ... RuntimeWarning: overflow encountered in cast |
| 120 | + |
| 121 | +Second, since the Python float or integer precision is always ignored, a low |
| 122 | +precision NumPy scalar will keep using its lower precision unless explicitly |
| 123 | +converted to a higher precision NumPy dtype or Python scalar (e.g. via ``int()``, |
| 124 | +``float()``, or ``scalar.item()``). This lower precision may be detrimental to |
| 125 | +some calculations or lead to incorrect results, especially in the case of integer |
| 126 | +overflows: |
| 127 | + |
| 128 | + >>> np.int8(100) + 100 # the result exceeds the capacity of int8 |
| 129 | + np.int8(-56) |
| 130 | + ... RuntimeWarning: overflow encountered in scalar add |
| 131 | + |
| 132 | +Note that NumPy warns when overflows occur for scalars, but not for arrays; |
| 133 | +e.g., ``np.array(100, dtype="uint8") + 100`` will *not* warn. |
| 134 | + |
| 135 | +Numerical promotion |
| 136 | +------------------- |
| 137 | + |
| 138 | +The following image shows the numerical promotion rules with the kinds |
| 139 | +on the vertical axis and the precision on the horizontal axis. |
| 140 | + |
| 141 | +.. figure:: figures/nep-0050-promotion-no-fonts.svg |
| 142 | + :figclass: align-center |
| 143 | + |
| 144 | +The input dtype with the higher kind determines the kind of the result dtype. |
| 145 | +The result dtype has a precision as low as possible without appearing to the |
| 146 | +left of either input dtype in the diagram. |
| 147 | + |
| 148 | +Note the following specific rules and observations: |
| 149 | +1. When a Python ``float`` or ``complex`` interacts with a NumPy integer |
| 150 | + the result will be ``float64`` or ``complex128`` (yellow border). |
| 151 | + NumPy booleans will also be cast to the default integer.[#default-int] |
| 152 | + This is not relevant when additionally NumPy floating point values are |
| 153 | + involved. |
| 154 | +2. The precision is drawn such that ``float16 < int16 < uint16`` because |
| 155 | + large ``uint16`` do not fit ``int16`` and large ``int16`` will lose precision |
| 156 | + when stored in a ``float16``. |
| 157 | + This pattern however is broken since NumPy always considers ``float64`` |
| 158 | + and ``complex128`` to be acceptable promotion results for any integer |
| 159 | + value. |
| 160 | +3. A special case is that NumPy promotes many combinations of signed and |
| 161 | + unsigned integers to ``float64``. A higher kind is used here because no |
| 162 | + signed integer dtype is sufficiently precise to hold a ``uint64``. |
| 163 | + |
| 164 | + |
| 165 | +Exceptions to the general promotion rules |
| 166 | +----------------------------------------- |
| 167 | + |
| 168 | +In NumPy promotion refers to what specific functions do with the result and |
| 169 | +in some cases, this means that NumPy may deviate from what the `np.result_type` |
| 170 | +would give. |
| 171 | + |
| 172 | +Behavior of ``sum`` and ``prod`` |
| 173 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 174 | +**``np.sum`` and ``np.prod``:** Will alway return the default integer type |
| 175 | +when summing over integer values (or booleans). This is usually an ``int64``. |
| 176 | +The reason for this is that integer summations are otherwise very likely |
| 177 | +to overflow and give confusing results. |
| 178 | +This rule also applies to the underlying ``np.add.reduce`` and |
| 179 | +``np.multiply.reduce``. |
| 180 | + |
| 181 | +Notable behavior with NumPy or Python integer scalars |
| 182 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 183 | +NumPy promotion refers to the result dtype and operation precision, |
| 184 | +but the operation will sometimes dictate that result. |
| 185 | +Division always returns floating point values and comparison always booleans. |
| 186 | + |
| 187 | +This leads to what may appear as "exceptions" to the rules: |
| 188 | +* NumPy comparisons with Python integers or mixed precision integers always |
| 189 | + return the correct result. The inputs will never be cast in a way which |
| 190 | + loses precision. |
| 191 | +* Equality comparisons between types which cannot be promoted will be |
| 192 | + considered all ``False`` (equality) or all ``True`` (not-equal). |
| 193 | +* Unary math functions like ``np.sin`` that always return floating point |
| 194 | + values, accept any Python integer input by converting it to ``float64``. |
| 195 | +* Division always returns floating point values and thus also allows divisions |
| 196 | + between any NumPy integer with any Python integer value by casting both |
| 197 | + to ``float64``. |
| 198 | + |
| 199 | +In principle, some of these exceptions may make sense for other functions. |
| 200 | +Please raise an issue if you feel this is the case. |
| 201 | + |
| 202 | +Promotion of non-numerical datatypes |
| 203 | +------------------------------------ |
| 204 | + |
| 205 | +NumPy extends the promotion to non-numerical types, although in many cases |
| 206 | +promotion is not well defined and simply rejected. |
| 207 | + |
| 208 | +The following rules apply: |
| 209 | +* NumPy byte strings (``np.bytes_``) can be promoted to unicode strings |
| 210 | + (``np.str_``). However, casting the bytes to unicode will fail for |
| 211 | + non-ascii characters. |
| 212 | +* For some purposes NumPy will promote almost any other datatype to strings. |
| 213 | + This applies to array creation or concatenation. |
| 214 | +* The array constructers like ``np.array()`` will use ``object`` dtype when |
| 215 | + there is no viable promotion. |
| 216 | +* Structured dtypes can promote when their field names and order matches. |
| 217 | + In that case all fields are promoted individually. |
| 218 | +* NumPy ``timedelta`` can in some cases promote with integers. |
| 219 | + |
| 220 | +.. note:: |
| 221 | + Some of these rules are somewhat surprising, and are being considered for |
| 222 | + change in the future. However, any backward-incompatible changes have to |
| 223 | + be weighed against the risks of breaking existing code. Please raise an |
| 224 | + issue if you have particular ideas about how promotion should work. |
| 225 | + |
| 226 | +Details of promoted ``dtype`` instances |
| 227 | +--------------------------------------- |
| 228 | +The above discussion has mainly dealt with the behavior when mixing different |
| 229 | +DType classes. |
| 230 | +A ``dtype`` instance attached to an array can carry additional information |
| 231 | +such as byte-order, metadata, string length, or exact structured dtype layout. |
| 232 | + |
| 233 | +While the string length or field names of a structured dtype are important, |
| 234 | +NumPy considers byte-order, metadata, and the exact layout of a structured |
| 235 | +dtype as storage details. |
| 236 | +During promotion NumPy does *not* take these storage details into account: |
| 237 | +* Byte-order is converted to native byte-order. |
| 238 | +* Metadata attached to the dtype may or may not be preserved. |
| 239 | +* Resulting structured dtypes will be packed (but aligned if inputs were). |
| 240 | + |
| 241 | +This behaviors is the best behavior for most programs where storage details |
| 242 | +are not relevant to the final results and where the use of incorrect byte-order |
| 243 | +could drastically slow down evaluation. |
| 244 | + |
| 245 | + |
| 246 | +.. [#hist-reasons]: To a large degree, this may just be for choices made early |
| 247 | + on in NumPy's predecessors. For more details, see `NEP 50 <NEP50>`. |
| 248 | +
|
| 249 | +.. [#NEP50]: See also `NEP 50 <NEP50>` which changed the rules for NumPy 2.0. |
| 250 | + Previous versions of NumPy would sometimes return higher precision results |
| 251 | + based on the input value of Python scalars. |
| 252 | + Further, previous versions of NumPy would typically ignore the higher |
| 253 | + precision of NumPy scalars or 0-D arrays for promotion purposes. |
| 254 | +
|
| 255 | +.. [#default-int]: The default integer is marked as ``int64`` in the schema |
| 256 | + but is ``int32`` on 32bit platforms. However, normal PCs are 64bit. |
0 commit comments