@@ -3,12 +3,12 @@ Title: Fix the default floating-point representation in std::format
33Shortname : P3505
44Revision : 0
55Audience : LEWG
6- Status : D
6+ Status : P
77Group : WG21
88URL :
99Editor : Victor Zverovich, victor.zverovich@gmail.com
1010No abstract : true
11- Date : 2025-02-01
11+ Date : 2025-03-15
1212Markup Shorthands : markdown yes
1313</pre>
1414
@@ -23,9 +23,9 @@ Introduction {#intro}
2323When `std::format` was proposed for standardization, floating-point formatting
2424was defined in terms of `std::to_chars` to simplify specification. While being
2525a positive change overall, this introduced a small but undesirable change
26- compared to the reference implementation in [[FMT]] , resulting in surprising
27- behavior to users, performance regression and an inconsistency with other
28- mainstream programming languages that have similar facilities. This paper
26+ compared to the design and reference implementation in [[FMT]] , resulting in
27+ surprising behavior to users, performance regression and an inconsistency with
28+ other mainstream programming languages that have similar facilities. This paper
2929proposes fixing this issue, bringing the floating-point formatting on par with
3030other languages and in line with the original design intent.
3131
@@ -112,7 +112,7 @@ described above. It was great for explicit format specifiers such as `e` but,
112112as it turned out recently, it introduced an undesirable change to the default
113113format. This problem is that `std::to_chars` defines "shortness" in terms of the
114114number of characters in the output which is different from the "shortness" of
115- decimal significand normally used both in the literature and in the reference .
115+ decimal significand normally used both in the literature and in the industry .
116116
117117The exponent range is much easier to reason about. For example, in this model
118118`100000.0` and `120000.0` are printed in the same format:
@@ -134,9 +134,9 @@ auto s2 = std::format("{}", 120000.0); // s2 == "120000"
134134
135135It seems surprising and undesirable.
136136
137- If the shortness of the output was indeed the main criteria then it is unclear
138- why the output format requires redundant `+` and leading zero in the exponent.
139- Those are included in the output because , according to the specification of
137+ Note that the output `1e+05` is not really of the shortest possible number of
138+ characters, because + and the leading zero in the exponent are redundant .
139+ In fact, those are required , according to the specification of
140140`to_chars` ([[charconv.to.chars] (https://eel.is/c++draft/charconv.to.chars)]),
141141
142142> `value` is converted to a string in the style of `printf` in the `"C"` locale.
@@ -146,6 +146,9 @@ and the exponential format is defined as follows by the C standard ([[N3220]]):
146146> A `double` argument representing a floating-point number is converted in the
147147> style *[−] d.ddd e±dd* ...
148148
149+ Nevertheless, users interpreting the shortness condition too literally may find
150+ this surprising.
151+
149152Even more importantly, the current representation violates the original
150153shortness requirement from [[STEELE-WHITE]] :
151154
@@ -175,19 +178,34 @@ rounding condition
175178> ([[round.style] (https://eel.is/c++draft/round.style)]).
176179
177180Apart from giving a false sense of accuracy to users it also has negative
178- performance implications. Producing "garbage digits" means that you
179- may no longer be able to use the optimized float-to-string algorithm such as
180- Dragonbox ([[DRAGONBOX]] ) and Ryū ([[RYU]] ) in some cases. It also introduces
181- complicated logic to handle those cases. If the fallback algorithm does
182- multiprecision arithmetic this may even require additional allocation(s).
181+ performance implications. Many of the optimized float-to-string algorithms
182+ based on Steele and White's criteria, such as Dragonbox ([[DRAGONBOX]] ) and
183+ Ryū ([[RYU]] ), only focus on those criteria, especially the shortness of
184+ decimal significand rather than the number of characters. As a result, an
185+ implementation of the default floating-point handling of `std::format`
186+ (and `std::to_chars`) cannot just directly rely on these otherwise perfectly
187+ appropriate algorithms. Instead, it has to introduce non-trivial logic
188+ dedicated for computing these "garbage digits". Furthermore, having to
189+ introduce dedicated logic is likely not just because of the lack of advancement
190+ in the algorithm research, because in this case we do need to compute more
191+ digits than the actual precision implied by the data type, thus it is natural
192+ to expect that we may need more precision than the case without garbage digits.
193+ (In other words, even though a new algorithm that correctly deals with this
194+ garbage digits case according to the current C++ standard is invented, it is
195+ likely that it still includes some special handling of that case, in one form
196+ or another.)
183197
184198The performance issue can be illustrated on the following simple benchmark:
185199
186200```c++
187201#include <format>
188202#include <benchmark/benchmark.h>
189203
204+ // Output: "1.2345678901234568e+22"
190205double normal_input = 12345678901234567000000.0;
206+
207+ // Output (current): "1234567890123456774144"
208+ // Output (desired): "1.2345678901234568e+21"
191209double garbage_input = 1234567890123456700000.0;
192210
193211void normal(benchmark::State& state) {
@@ -211,7 +229,7 @@ BENCHMARK_MAIN();
211229
212230Results on macOS with Apple clang version 16.0.0 (clang-1600.0.26.6) and libc++:
213231
214- ```
232+ ```text
215233% ./double-benchmark
216234Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
217235This does not affect benchmark measurements, only the metadata output.
@@ -234,10 +252,10 @@ garbage 91.4 ns 91.4 ns 7675186
234252Results on GNU/Linux with gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 and
235253libstdc++:
236254
237- ```
238- $ ./int -benchmark
255+ ```text
256+ $ ./double -benchmark
2392572025-02-02T17:22:25+00:00
240- Running ./int -benchmark
258+ Running ./double -benchmark
241259Run on (2 X 48 MHz CPU s)
242260CPU Caches:
243261 L1 Data 128 KiB (x2)
@@ -254,10 +272,10 @@ garbage 90.6 ns 90.6 ns 7360351
254272Results on Windows with Microsoft (R) C/C++ Optimizing Compiler Version
25527319.40.33811 for ARM64 and Microsoft STL:
256274
257- ```
258- >int -benchmark.exe
275+ ```text
276+ >double -benchmark.exe
2592772025-02-02T08:10:39-08:00
260- Running int -benchmark.exe
278+ Running double -benchmark.exe
261279Run on (2 X 2000 MHz CPU s)
262280CPU Caches:
263281 L1 Instruction 192 KiB (x2)
@@ -283,6 +301,33 @@ normal(benchmark::State&):
283301159.00 ms ... std::__1::to_chars_result std::__1::_Floating_to_chars[abi:ne180100] <...>(char*, char*, double, std::__1::chars_format, int)
284302```
285303
304+ For comparison here are the results of running the same benchmark with
305+ `std::format` replaced with `fmt::format` which doesn't produce "garbage
306+ digits":
307+
308+ ```text
309+ $ ./double-benchmark
310+ Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
311+ This does not affect benchmark measurements, only the metadata output.
312+ ***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
313+ 2025-03-15T08:18:56-07:00
314+ Running ./double-benchmark
315+ Run on (8 X 24 MHz CPU s)
316+ CPU Caches:
317+ L1 Data 64 KiB
318+ L1 Instruction 128 KiB
319+ L2 Unified 4096 KiB (x8)
320+ Load Average: 3.00, 3.91, 4.85
321+ ------------------------------------------------------
322+ Benchmark Time CPU Iterations
323+ ------------------------------------------------------
324+ fmt_normal 53.0 ns 53.0 ns 13428484
325+ fmt_garbage 53.4 ns 53.4 ns 13032712
326+ ```
327+
328+ As expected, the time is nearly identical between the two cases. It demonstrates
329+ that the performance gap can be eliminated if this paper is accepted.
330+
286331Locale makes the situation even more confusing to users. Consider the following
287332example:
288333
@@ -463,8 +508,9 @@ Table 105 — Meaning of type options for floating-point types
463508 <td>
464509 <ins> Let <code> fmt</code> be `chars_format::fixed` if <code> value</code>
465510 is in the range [10<sup> -4</sup> , 10<sup><i> n</i></sup> ), where
466- 10<sup><i> n</i></sup> is 2<sup><code> std::numeric_limits<decltype(value)>::digits</code></sup>
467- rounded to the nearest power of 10, `chars_format::scientific` otherwise.
511+ 10<sup><i> n</i></sup> is
512+ 2<sup><code> std::numeric_limits<decltype(value)>::digits</code> + 1</sup>
513+ rounded down to the nearest power of 10, `chars_format::scientific` otherwise.
468514 </ins>
469515
470516 If *precision* is specified, equivalent to
@@ -484,8 +530,8 @@ Implementation and usage experience {#impl}
484530
485531The current proposal is based on the existing implementation in [[FMT]] which
486532has been available and widely used for over 6 years. Similar logic based on the
487- value range rather than output size is implemented in Python, Java, JavaScript ,
488- Rust and Swift.
533+ value range rather than the output size is implemented in Python, Java,
534+ JavaScript, Rust and Swift.
489535
490536<!-- Grisu in {fmt}: https://github.com/fmtlib/fmt/issues/147#issuecomment-461118641 -->
491537
0 commit comments