-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Description
Bug report
Bug description:
The long_true_divide() has a fast path for small integers:
Lines 4720 to 4740 in 726e8e8
| /* Fast path for a and b small (exactly representable in a double). | |
| Relies on floating-point division being correctly rounded; results | |
| may be subject to double rounding on x86 machines that operate with | |
| the x87 FPU set to 64-bit precision. */ | |
| a_is_small = a_size <= MANT_DIG_DIGITS || | |
| (a_size == MANT_DIG_DIGITS+1 && | |
| a->long_value.ob_digit[MANT_DIG_DIGITS] >> MANT_DIG_BITS == 0); | |
| b_is_small = b_size <= MANT_DIG_DIGITS || | |
| (b_size == MANT_DIG_DIGITS+1 && | |
| b->long_value.ob_digit[MANT_DIG_DIGITS] >> MANT_DIG_BITS == 0); | |
| if (a_is_small && b_is_small) { | |
| double da, db; | |
| da = a->long_value.ob_digit[--a_size]; | |
| while (a_size > 0) | |
| da = da * PyLong_BASE + a->long_value.ob_digit[--a_size]; | |
| db = b->long_value.ob_digit[--b_size]; | |
| while (b_size > 0) | |
| db = db * PyLong_BASE + b->long_value.ob_digit[--b_size]; | |
| result = da / db; | |
| goto success; | |
| } |
As mentioned in the comment, this code relies on correct rounding of floating-point division. Wrong behavior could be e.g. triggered by building the 32-bit CPython on 64-bit Linux host (with CPython floats being IEEE doubles, little-endian):
./configure CFLAGS="-m32" LDFLAGS="-m32"
sed -i -e 's!#define PY_HAVE_PERF_TRAMPOLINE 1!#undef PY_HAVE_PERF_TRAMPOLINE!g' pyconfig.h
make -s
>>> 12143 / 31517 # correct answer is 0.385284132373005
0.3852841323730051I think it's a bug. Why not enable this part conditionally, if X87_DOUBLE_ROUNDING is not defined? Attached patch fixes the problem for me, it seems that the rest of code doesn't suffer from double rounding and more or less closely follows to the pure-Python version of this routine.
Alternatively, we can even drop special handling for small ints (X87_DOUBLE_ROUNDING may not catch all cases of broken division). I didn't measured yet performance impact of this, but IMO correctness should be more important.
A patch with ifdef X87_DOUBLE_ROUNDING
diff --git a/Objects/longobject.c b/Objects/longobject.c
index 43c0db753a0..dd06e5878dc 100644
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -4611,7 +4611,7 @@ long_true_divide(PyObject *v, PyObject *w)
PyLongObject *a, *b, *x;
Py_ssize_t a_size, b_size, shift, extra_bits, diff, x_size, x_bits;
digit mask, low;
- int inexact, negate, a_is_small, b_is_small;
+ int inexact, negate;
double dx, result;
CHECK_BINOP(v, w);
@@ -4717,14 +4717,15 @@ long_true_divide(PyObject *v, PyObject *w)
if (a_size == 0)
goto underflow_or_zero;
+#ifndef X87_DOUBLE_ROUNDING
/* Fast path for a and b small (exactly representable in a double).
Relies on floating-point division being correctly rounded; results
may be subject to double rounding on x86 machines that operate with
the x87 FPU set to 64-bit precision. */
- a_is_small = a_size <= MANT_DIG_DIGITS ||
+ int a_is_small = a_size <= MANT_DIG_DIGITS ||
(a_size == MANT_DIG_DIGITS+1 &&
a->long_value.ob_digit[MANT_DIG_DIGITS] >> MANT_DIG_BITS == 0);
- b_is_small = b_size <= MANT_DIG_DIGITS ||
+ int b_is_small = b_size <= MANT_DIG_DIGITS ||
(b_size == MANT_DIG_DIGITS+1 &&
b->long_value.ob_digit[MANT_DIG_DIGITS] >> MANT_DIG_BITS == 0);
if (a_is_small && b_is_small) {
@@ -4738,6 +4739,7 @@ long_true_divide(PyObject *v, PyObject *w)
result = da / db;
goto success;
}
+#endifCPython versions tested on:
CPython main branch
Operating systems tested on:
Linux