Skip to content
Closed
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
64ab5ed
gh-113804: Support "x" and "X" format types for floats
skirpichev Dec 28, 2023
024b02d
Address review: fix asserts
skirpichev Jan 12, 2024
005577c
Merge branch 'main' into hex-format
skirpichev Jan 16, 2024
97134b9
Merge branch 'main' into hex-format
skirpichev Feb 20, 2024
e05059f
Merge branch 'main' into hex-format
skirpichev Feb 29, 2024
b73515f
Expand news entry & change section
skirpichev Feb 29, 2024
7d47aef
Merge branch 'master' into hex-format
skirpichev Mar 3, 2024
025de36
Don't clear trailing zeros + misc for float.hex() compatibility
skirpichev Mar 4, 2024
8739f6b
_Py_dg_dtoa_hex() got special option for float.hex()
skirpichev Mar 4, 2024
7525d1a
Obviously, this is not going in 3.13...
skirpichev May 29, 2024
19e5083
Merge branch 'main' into hex-format
skirpichev May 30, 2024
faed5f3
Merge branch 'main' into hex-format
skirpichev May 30, 2024
01b490e
Merge branch 'master' into hex-format
skirpichev May 30, 2024
322a011
Address review: revert changes in old-style formatting
skirpichev May 30, 2024
9621d36
Address review: revert x/X formatting for complexes
skirpichev May 30, 2024
e721305
Apply suggestions from code review
skirpichev May 31, 2024
7c04f82
address review: rename/document _hex option
skirpichev May 31, 2024
e78b85c
Merge branch 'master' into hex-format
skirpichev May 31, 2024
d973c7a
address review: move autoprec decl & comment on tohex_nbits const
skirpichev May 31, 2024
4940336
address review: memory allocation
skirpichev May 31, 2024
d1ff85e
fix: catch PyMem_Malloc() error and document returned value
skirpichev May 31, 2024
089afe4
fix: warnings
skirpichev May 31, 2024
be84c10
address review: _Py_dg_dtoa_hex -> _Py_float_to_hex
skirpichev Jun 1, 2024
c98b092
address review: round-to-even
skirpichev Jun 1, 2024
e089ff5
Merge branch 'master' into hex-format
skirpichev Jun 3, 2024
795d1da
+
skirpichev Jun 17, 2024
58e84fd
Merge branch 'master' into hex-format-test
skirpichev Jun 17, 2024
728671a
cleanup
skirpichev Jun 17, 2024
4eaa4f8
+1
skirpichev Jun 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Doc/c-api/conversion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ The following functions provide locale-independent string to number conversions.
*format_code*, *precision*, and *flags*.

*format_code* must be one of ``'e'``, ``'E'``, ``'f'``, ``'F'``,
``'g'``, ``'G'`` or ``'r'``. For ``'r'``, the supplied *precision*
``'g'``, ``'G'``, ``'x'``, ``'X'`` or ``'r'``. For ``'r'``, the supplied *precision*
must be 0 and is ignored. The ``'r'`` format code specifies the
standard :func:`repr` format.

Expand All @@ -151,6 +151,9 @@ The following functions provide locale-independent string to number conversions.

.. versionadded:: 3.1

.. versionchanged:: 3.14
Support ``'x'`` and ``'X'`` format types for :class:`float`.


.. c:function:: int PyOS_stricmp(const char *s1, const char *s2)

Expand Down
24 changes: 24 additions & 0 deletions Doc/library/string.rst
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,30 @@ The available presentation types for :class:`float` and
| | as altered by the other format modifiers. |
+---------+----------------------------------------------------------+

Additionally, for :class:`float` available following representation types:

+---------+----------------------------------------------------------+
| Type | Meaning |
+=========+==========================================================+
| ``'x'`` | Represent the number by a hexadecimal string in the |
| | form ``[±][0x]h[.hhh]p±d``, where there is one |
| | hexadecimal digit before the dot and the fractional part |
| | either is exact or the number of its hexadecimal digits |
| | is equal to the specified precision. The exponent ``d`` |
| | is written in decimal, it always contains at least one |
| | digit, and it gives the power of 2 by which to multiply |
| | the coefficient. |
| | |
| | If the ``'#'`` option is specified, the prefix ``'0x'`` |
| | will be inserted before an integer part. |
+---------+----------------------------------------------------------+
| ``'X'`` | Same as ``'x'``, but uses uppercase digits, the ``0X`` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that digits can be upper or lower cases:

Suggested change
| ``'X'`` | Same as ``'x'``, but uses uppercase digits, the ``0X`` |
| ``'X'`` | Same as ``'x'``, but uses uppercase, the ``0X`` |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hexadecimal digits can:) I thought this was obvious from the context, so "hexadecimal" was omitted (doesn't fit to one line). I can add this.

I worry that in your version it might be not clear, that uppercase used not just for the prefix.

| | prefix and ``'P'`` as the exponent separator. |
+---------+----------------------------------------------------------+

.. versionchanged:: 3.14
Support ``'x'`` and ``'X'`` format types for :class:`float`.


.. _formatexamples:

Expand Down
1 change: 1 addition & 0 deletions Include/codecs.h
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ PyAPI_FUNC(PyObject *) PyCodec_NameReplaceErrors(PyObject *exc);

#ifndef Py_LIMITED_API
PyAPI_DATA(const char *) Py_hexdigits;
PyAPI_DATA(const char *) Py_hexdigits_upper;
#endif

#ifdef __cplusplus
Expand Down
2 changes: 2 additions & 0 deletions Include/internal/pycore_floatobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ extern PyObject* _Py_string_to_number_with_underscores(

extern double _Py_parse_inf_or_nan(const char *p, char **endptr);

extern char * _Py_dg_dtoa_hex(double x, int precision, int always_add_sign,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest calling this something clearer like _Py_float_to_hex. The dg_dtoa stuff doesn't make much sense here: "dg" is a reference to David Gay, since he wrote the original version of the dtoa.c code that Python's dtoa.c is baed on; this code has nothing to do with Gay's code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, does make sense.

Initially that helper was a static function in pystrtod.c; just as _Py_dg_dtoa() it, well, "convert double to ASCII string"... This was exported and moved to floatobject.c to (1) slighly reduce diff and (2) support workaround for hex().

I think (1) goal wasn't very successful. Then, maybe this helper belongs to pystrtod.c? If so, _Py_dtoa_hex() sounds ok?

int use_alt_formatting, int upper, int float_hex);

#ifdef __cplusplus
}
Expand Down
35 changes: 30 additions & 5 deletions Lib/test/test_float.py
Original file line number Diff line number Diff line change
Expand Up @@ -700,12 +700,37 @@ def test_format(self):
# % formatting
self.assertEqual(format(-1.0, '%'), '-100.000000%')

# hexadecimal format
x = float.fromhex('0x0.0030p+0')
self.assertEqual(format(x, 'x'), '1.8p-11')
self.assertEqual(format(x, 'X'), '1.8P-11')
self.assertEqual(format(x, '.0x'), '1p-10')
x = float.fromhex('0x1.7p+0')
self.assertEqual(format(x, '.0x'), '1p+0')
x = float.fromhex('0x0.1p-1022') # subnormal
self.assertEqual(format(x, 'x'), '0.1p-1022')
x = float.fromhex('0x0.0040p+0')
self.assertEqual(format(x, 'x'), '1p-10')
self.assertEqual(format(x, '>10x'), ' 1p-10')
self.assertEqual(format(x, '>#10x'), ' 0x1p-10')
self.assertEqual(format(x, '>010x'), '000001p-10')
self.assertEqual(format(x, '>#010x'), '0000x1p-10')
self.assertEqual(format(x, '#010x'), '0x0001p-10')
self.assertEqual(format(x, '<10x'), '1p-10 ')
self.assertEqual(format(x, '<#10x'), '0x1p-10 ')
x = float.fromhex('0x1.fe12p0')
self.assertEqual(format(x, 'x'), '1.fe12p+0')
self.assertEqual(format(x, '#X'), '0X1.FE12P+0')
self.assertEqual(format(x, '.3x'), '1.fe1p+0')
self.assertEqual(format(x, '.1x'), '1.0p+1')
self.assertEqual(format(x, '#.1x'), '0x1.0p+1')

# conversion to string should fail
self.assertRaises(ValueError, format, 3.0, "s")

# confirm format options expected to fail on floats, such as integer
# presentation types
for format_spec in 'sbcdoxX':
# confirm format options expected to fail on floats, such as some
# integer presentation types
for format_spec in 'sbcdo':
self.assertRaises(ValueError, format, 0.0, format_spec)
self.assertRaises(ValueError, format, 1.0, format_spec)
self.assertRaises(ValueError, format, -1.0, format_spec)
Expand Down Expand Up @@ -1472,7 +1497,7 @@ def roundtrip(x):
self.identical(x, roundtrip(x))
self.identical(-x, roundtrip(-x))

# fromHex(toHex(x)) should exactly recover x, for any non-NaN float x.
# roundtrip(x) should exactly recover x, for any non-NaN float x.
import random
for i in range(10000):
e = random.randrange(-1200, 1200)
Expand All @@ -1483,7 +1508,7 @@ def roundtrip(x):
except OverflowError:
pass
else:
self.identical(x, fromHex(toHex(x)))
self.identical(x, roundtrip(x))

def test_subclass(self):
class F(float):
Expand Down
2 changes: 0 additions & 2 deletions Lib/test/test_str.py
Original file line number Diff line number Diff line change
Expand Up @@ -1322,9 +1322,7 @@ def __repr__(self):

# test number formatter errors:
self.assertRaises(ValueError, '{0:x}'.format, 1j)
self.assertRaises(ValueError, '{0:x}'.format, 1.0)
self.assertRaises(ValueError, '{0:X}'.format, 1j)
self.assertRaises(ValueError, '{0:X}'.format, 1.0)
self.assertRaises(ValueError, '{0:o}'.format, 1j)
self.assertRaises(ValueError, '{0:o}'.format, 1.0)
self.assertRaises(ValueError, '{0:u}'.format, 1j)
Expand Down
6 changes: 3 additions & 3 deletions Lib/test/test_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,9 +525,9 @@ def test(f, format_spec, result):
self.assertRaises(TypeError, 3.0.__format__, None)
self.assertRaises(TypeError, 3.0.__format__, 0)

# confirm format options expected to fail on floats, such as integer
# presentation types
for format_spec in 'sbcdoxX':
# confirm format options expected to fail on floats, such as some
# integer presentation types
for format_spec in 'sbcdo':
self.assertRaises(ValueError, format, 0.0, format_spec)
self.assertRaises(ValueError, format, 1.0, format_spec)
self.assertRaises(ValueError, format, -1.0, format_spec)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Support formatting floats (using "x" and "X" format types) in hexadecimal
notation, like ``0x1.2efp-2``. Patch by Sergey B Kirpichev.
192 changes: 140 additions & 52 deletions Objects/floatobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -1134,14 +1134,7 @@ float_conjugate_impl(PyObject *self)
return float_float(self);
}

/* turn ASCII hex characters into integer values and vice versa */

static char
char_from_hex(int x)
{
assert(0 <= x && x < 16);
return Py_hexdigits[x];
}
/* turn ASCII hex characters into integer values */

static int
hex_from_char(char c) {
Expand Down Expand Up @@ -1208,11 +1201,136 @@ hex_from_char(char c) {
return x;
}

/* convert a float to a hexadecimal string */
/* Convert a float to a hexadecimal string [±][0x]h[.hhhhhhhh]p±d,
where the fractional part either is exact (precision < 0) or the
number of digits after the dot is equal to the precision.

The return value is a pointer to buffer with the converted string or NULL if
the conversion failed. The caller is responsible for freeing the returned
string by calling PyMem_Free().

The exponent d is written in decimal, it always contains at least one digit,
and it gives the power of 2 by which to multiply the coefficient.

x - the double to be converted
precision - the desired precision
always_add_sign - nonzero if a '+' sign should be included for x > 0
use_alt_formatting - nonzero if the hexadecimal prefix should be added.
upper - nonzero, if uppercase letters should be used for hexadecimal
numbers, prefix and the exponent separator.
float_hex - use float.hex() format.
*/

char *
_Py_dg_dtoa_hex(double x, int precision, int always_add_sign,
int use_alt_formatting, int upper, int float_hex)
{
int e;
double m = frexp(fabs(x), &e);

int autoprec = precision < 0;
if (autoprec) {
/* DBL_MANT_DIG rounded up to the next integer of the form 4k+1 */
const double tohex_nbits = DBL_MANT_DIG + 3 - (DBL_MANT_DIG+2)%4;
precision = (int) (tohex_nbits - 1)/4;
if (!x && float_hex) {
/* for compatibility with float.hex(), we keep just one
digit of zero */
precision = 1;
}
}

/* normalization */
if (m) {
int shift = 1 - Py_MAX(DBL_MIN_EXP - e, 0);
m = ldexp(m, shift);
e -= shift;
}

/* round to precision digits */
if (!autoprec) {
do {
double frac = ldexp(m, 4*precision);
frac -= floor(frac);
frac *= 16.0;
if (frac >= 8.0) {
m += ldexp(1.0, -4*precision);
}
if ((int)(m) & 0x2) {
m /= 2.0;
e += 1;
}
else {
break;
}
} while (1);
}

/* Conservative estimation for number of digits in the exponent.
IEEE quadruple precision floats should fit. */
const size_t exp_len = 5;

/* Allocate space for [±][0x] h[.] [hhhhhhhh] p± d '\0' */
size_t size = 1 + 2 + precision + 2 + exp_len + 1;
if (use_alt_formatting) {
size += 2;
}
char *s = PyMem_Malloc(size);
if (!s) {
return NULL;
}

/* sign and prefix */
size_t si = 0;
if (copysign(1.0, x) == -1.0) {
s[si] = '-';
si++;
}
else if (always_add_sign) {
s[si] = '+';
si++;
}
if (use_alt_formatting) {
s[si] = '0';
si++;
s[si] = upper ? 'X' : 'x';
si++;
}

/* mantissa */
const char *hexmap = upper ? Py_hexdigits_upper : Py_hexdigits;
assert(0 <= (int)m && (int)m < 16);
s[si] = hexmap[(int)m];
si++;
m -= (int)m;
s[si] = '.';
for (int i = 0; i < precision; i++) {
si++;
m *= 16.0;
assert(0 <= (int)m && (int)m < 16);
s[si] = hexmap[(int)m];
m -= (int)m;
}

/* clear trailing zeros from mantissa */
if (autoprec && !float_hex) {
while (s[si] == '0') {
si--;
}
}

/* TOHEX_NBITS is DBL_MANT_DIG rounded up to the next integer
of the form 4k+1. */
#define TOHEX_NBITS DBL_MANT_DIG + 3 - (DBL_MANT_DIG+2)%4
/* clear trailing dot */
if (s[si] != '.') {
si++;
}

/* exponent */
s[si] = upper ? 'P' : 'p';
si++;
si += snprintf(s + si, exp_len + 2, "%+d", e) + 1;

return s;
}

/*[clinic input]
float.hex
Expand All @@ -1229,54 +1347,24 @@ static PyObject *
float_hex_impl(PyObject *self)
/*[clinic end generated code: output=0ebc9836e4d302d4 input=bec1271a33d47e67]*/
{
double x, m;
int e, shift, i, si, esign;
/* Space for 1+(TOHEX_NBITS-1)/4 digits, a decimal point, and the
trailing NUL byte. */
char s[(TOHEX_NBITS-1)/4+3];

CONVERT_TO_DOUBLE(self, x);
PyObject *result = NULL;
double x = PyFloat_AS_DOUBLE(self);

if (isnan(x) || isinf(x))
if (isnan(x) || isinf(x)) {
return float_repr((PyFloatObject *)self);

if (x == 0.0) {
if (copysign(1.0, x) == -1.0)
return PyUnicode_FromString("-0x0.0p+0");
else
return PyUnicode_FromString("0x0.0p+0");
}

m = frexp(fabs(x), &e);
shift = 1 - Py_MAX(DBL_MIN_EXP - e, 0);
m = ldexp(m, shift);
e -= shift;
char *buf = _Py_dg_dtoa_hex(x, -1, 0, 1, 0, 1);

si = 0;
s[si] = char_from_hex((int)m);
si++;
m -= (int)m;
s[si] = '.';
si++;
for (i=0; i < (TOHEX_NBITS-1)/4; i++) {
m *= 16.0;
s[si] = char_from_hex((int)m);
si++;
m -= (int)m;
if (buf) {
result = PyUnicode_FromString(buf);
PyMem_Free(buf);
}
s[si] = '\0';

if (e < 0) {
esign = (int)'-';
e = -e;
else {
PyErr_NoMemory();
}
else
esign = (int)'+';

if (x < 0.0)
return PyUnicode_FromFormat("-0x%sp%c%d", s, esign, e);
else
return PyUnicode_FromFormat("0x%sp%c%d", s, esign, e);
return result;
}

/* Convert a hexadecimal string to a float. */
Expand Down
1 change: 1 addition & 0 deletions Python/codecs.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Copyright (c) Corporation for National Research Initiatives.
#include "pycore_ucnhash.h" // _PyUnicode_Name_CAPI

const char *Py_hexdigits = "0123456789abcdef";
const char *Py_hexdigits_upper = "0123456789ABCDEF";

/* --- Codec Registry ----------------------------------------------------- */

Expand Down
Loading