Commit 0b01b48
authored
BUG: Resolve Divide by Zero on Apple silicon + test failures (numpy#19926)
* Resolve divide by zero in reciprocal numpy#18555
clang has an optimization bug where a vector that is only partially loaded / stored will get narrowed to only the lanes used, which can be fine in some cases. However, in numpy's `reciprocal` function a partial load is explicitly extended to the full width of the register (filled with '1's) to avoid divide-by-zero. clang's optimization ignores the explicit filling with '1's.
The changes here insert a dummy `volatile` variable. This convinces clang not to narrow the load and ignore the explicit filling of '1's.
`volatile` can be expensive since it forces loads / stores to refresh contents whenever the variable is used. numpy has its own template / macro system that'll expand the loop function below for sqrt, absolute, square, and reciprocal. Additionally, the loop can be called on a full array if there's overlap between src and dst. Consequently, we try to limit the scope that we need to apply `volatile`. Intention is it should only be needed when compiling with clang, against Apple arm64, and only for the `reciprocal` function. Moreover, `volatile` is only needed when a vector is partially loaded.
Testing:
Beyond fixing the cases mentioned in the GitHub issue, the changes here also resolve several failures in numpy's test suite.
Before:
```
FAILED numpy/core/tests/test_scalarmath.py::TestBaseMath::test_blocked - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_ufunc.py::TestUfuncGenericLoops::test_unary_PyUFunc_O_O_method_full[reciprocal] - AssertionError: FloatingPointError not raised
FAILED numpy/core/tests/test_umath.py::TestPower::test_power_float - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
FAILED numpy/core/tests/test_umath.py::TestAVXUfuncs::test_avx_based_ufunc - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormDouble::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormSingle::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
FAILED numpy/linalg/tests/test_linalg.py::TestNormInt64::test_axis - RuntimeWarning: divide by zero encountered in reciprocal
8 failed, 14759 passed, 204 skipped, 1268 deselected, 34 xfailed in 69.90s (0:01:09)
```
After:
```
FAILED numpy/core/tests/test_umath.py::TestSpecialFloats::test_tan - AssertionError: FloatingPointError not raised by tan
1 failed, 14766 passed, 204 skipped, 1268 deselected, 34 xfailed in 70.37s (0:01:10)
```
* Enhancement on top of workaround for clang bug in reciprocal
Enhancement on top of workaround for clang bug in reciprocal (numpy#18555)
Numpy's FP unary loops use a partial load / store on every iteration. The partial load / store helpers each insert a switch statement to know how many elements to handle. This causes a lot of unnecessary branches to be inserted in the loops. The partial load / store is only needed on the final iteration of the loop if it isn't a full vector.
The changes here breakout the final iteration to use the partial load / stores. The loop has been changed to use full load / stores. Additionally, this means we don't need to conditionalize the volatile workaround in the loop.
* Address Azure CI failures with older versions of clang
- -ftrapping-math is default enabled for Numpy, but support in clang is mainly for x86_64
- Apple Clang and Clang have different, but overlapping versions
- Non-Apple Clang versions come from looking at when they started supporting -ftrapping-math for x86_64
Testing was done against Apple Clang versions
- v11 / x86_64 - failed previously, now passes (azure failure)
- v12+ / x86_64 - passes before and after
- v13 / arm64 - failed before initial patch, passes after1 parent 2d112a9 commit 0b01b48
1 file changed
+83
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
80 | 130 | | |
81 | 131 | | |
82 | 132 | | |
| |||
87 | 137 | | |
88 | 138 | | |
89 | 139 | | |
| 140 | + | |
90 | 141 | | |
91 | 142 | | |
92 | 143 | | |
| |||
101 | 152 | | |
102 | 153 | | |
103 | 154 | | |
| 155 | + | |
| 156 | + | |
104 | 157 | | |
105 | 158 | | |
106 | 159 | | |
| |||
126 | 179 | | |
127 | 180 | | |
128 | 181 | | |
129 | | - | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
130 | 200 | | |
131 | 201 | | |
132 | 202 | | |
| |||
140 | 210 | | |
141 | 211 | | |
142 | 212 | | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
143 | 222 | | |
144 | 223 | | |
145 | 224 | | |
146 | 225 | | |
147 | 226 | | |
148 | 227 | | |
149 | 228 | | |
| 229 | + | |
150 | 230 | | |
151 | 231 | | |
152 | 232 | | |
153 | 233 | | |
154 | 234 | | |
155 | 235 | | |
156 | 236 | | |
| 237 | + | |
| 238 | + | |
157 | 239 | | |
158 | 240 | | |
159 | 241 | | |
| |||
0 commit comments