Skip to content

Commit 2eb094c

Browse files
committed
feat: scaling widening changed to always use std::int64_t for smaller types
1 parent 0b6fabb commit 2eb094c

File tree

4 files changed

+34
-23
lines changed

4 files changed

+34
-23
lines changed

docs/users_guide/framework_basics/representation_types.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ flowchart TD
216216
FIXED --> G{"magnitude?"}
217217
EWS --> G
218218
G -- "integral (e.g. m→mm, ×1000)" --> I["exact integer multiplication"]
219-
G -- "rational (e.g. ft→m, ×3048/10000)" --> R["double-width integer arithmetic<br>(avoids overflow &amp; FP rounding)"]
219+
G -- "rational (e.g. ft→m, ×3048/10000)" --> R["widened integer arithmetic<br>(int64_t or 128-bit;<br>avoids overflow &amp; FP rounding)"]
220220
G -- "irrational (e.g. deg→rad, ×π/180)" --> IR["long double fixed-point approximation"]
221221
```
222222

@@ -256,18 +256,25 @@ as a fallback when this operator is not available.
256256
The design preference order is therefore:
257257
**exact integer > exact rational > approximate irrational**.
258258

259-
??? question "Why double-width integers for the rational path?"
259+
??? question "Why widened integers for the rational path?"
260260

261261
The library computes `value * numerator / denominator` entirely in integer
262262
arithmetic. Without extra width, the intermediate product
263263
`value * numerator` can overflow even when the final result fits — for example,
264264
converting feet to metres multiplies by 3048 before dividing by 10000, which
265265
overflows a 64-bit integer for values above ~3×10¹⁵.
266266

267-
Double-width integers (e.g. 128-bit for 64-bit values) absorb that intermediate
268-
growth. Using `long double` instead would violate the no-FP principle above and
269-
introduce rounding (`0.3048` is not exactly representable in binary floating-point),
270-
and on ARM / Apple Silicon `long double == double` anyway, giving no extra range.
267+
**mp-units uses widened integers to absorb that intermediate growth:**
268+
269+
- For types up to 32 bits (`int8_t`, `int16_t`, `int32_t`): widens to `int64_t`
270+
- For `int64_t`: widens to 128-bit arithmetic (`__int128` or custom emulation)
271+
272+
This approach provides maximum safety headroom for smaller types (with no performance
273+
cost on modern 64-bit systems), while 128-bit arithmetic handles the vast majority
274+
of real-world `int64_t` conversions. Using `long double` instead would violate the
275+
no-FP principle above and introduce rounding (`0.3048` is not exactly representable
276+
in binary floating-point), and on ARM / Apple Silicon `long double == double` anyway,
277+
giving no extra range.
271278

272279
If different internal fields need different scale factors, encode that logic in
273280
`operator*` and `operator/` — the library routes scaling through them via the

docs/users_guide/framework_basics/value_conversions.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -222,9 +222,9 @@ that a naive implementation cannot handle correctly:
222222
(a `double` has only 53 bits of mantissa).
223223

224224
Both challenges are addressed by using **fixed-point arithmetic**: the conversion factor is
225-
represented at compile time as a double-width integer constant, so the runtime computation
226-
is a pure integer multiply followed by a right-shift with no risk of intermediate overflow
227-
and no floating-point operations.
225+
represented at compile time as a widened integer constant (64-bit for types up to 32 bits,
226+
128-bit for `int64_t`), so the runtime computation is a pure integer multiply followed by
227+
a right-shift with no risk of intermediate overflow and no floating-point operations.
228228

229229
??? info "Implementation details"
230230

@@ -238,23 +238,24 @@ and no floating-point operations.
238238
| Non-integer ratio | otherwise | `ft → m` ($\times 0.3048$) | fixed-point multiply | explicit |
239239

240240
For the non-integer case the magnitude is converted **at compile time** to a
241-
fixed-point constant with double the bit-width of the representation type. For
242-
example, when scaling a 32-bit integer value, a 64-bit fixed-point intermediate is
243-
used. The actual runtime computation is then a pure integer multiply followed by a
244-
right-shift:
241+
fixed-point constant with widened bit-width. The library uses **`int64_t` for
242+
all types up to 32 bits** (`int8_t`, `int16_t`, `int32_t`) and **128-bit arithmetic
243+
for `int64_t`**. The actual runtime computation is then a pure integer multiply
244+
followed by a right-shift:
245245

246246
$$
247247
\text{result} = \left\lfloor \text{value} \times \lfloor M \cdot 2^N \rfloor \right\rfloor \gg N
248248
$$
249249

250-
where $N$ equals the bit-width of the source representation type. On platforms where
251-
`__int128` is available (most 64-bit targets), the double-width arithmetic is
250+
where $N$ equals 64 for types up to 32 bits, and 128 for `int64_t`. On platforms
251+
where `__int128` is available (most 64-bit targets), the 128-bit arithmetic is
252252
implemented natively; on others, a portable `double_width_int` emulation is used in
253253
`constexpr` context.
254254

255-
Because the intermediate is double-width, it cannot overflow as long as the input
256-
value fits in the representation type — a value of `std::int64_t` will never silently
257-
overflow during the multiplication step.
255+
Because the intermediate is significantly widened (e.g., `int64_t` for types up to
256+
32 bits, providing much more than double-width headroom for smaller types), it cannot
257+
overflow as long as the input value fits in the representation type — for example,
258+
a value of `std::int32_t` computed in `int64_t` has 32 extra bits of safety margin.
258259

259260
For the non-integer ratio path, the result is **truncated toward zero**. The
260261
fixed-point constant is rounded *away* from zero at compile time to compensate for

src/core/include/mp-units/framework/scaling.h

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -87,10 +87,13 @@ template<typename common_t, auto M>
8787
// M is a pure rational p/q (no irrational factors such as π).
8888
constexpr common_t num = get_value<common_t>(numerator(M));
8989
constexpr common_t den = get_value<common_t>(denominator(M));
90-
if constexpr (sizeof(common_t) < sizeof(int128_t)) {
91-
// Use a wider native type to avoid intermediate overflow.
92-
using wide_t = double_width_int_for_t<common_t>;
93-
return static_cast<common_t>(static_cast<wide_t>(v) * num / den);
90+
if constexpr (sizeof(common_t) <= sizeof(std::int32_t)) {
91+
// Use int64_t for all types up to int32_t: provides maximum safety headroom
92+
// with no performance cost on modern 64-bit systems.
93+
return static_cast<common_t>(static_cast<std::int64_t>(v) * num / den);
94+
} else if constexpr (sizeof(common_t) < sizeof(int128_t)) {
95+
// Use 128-bit arithmetic for int64_t.
96+
return static_cast<common_t>(static_cast<int128_t>(v) * num / den);
9497
} else {
9598
// At max integer width: no wider type available, compute in common_t directly.
9699
return v * num / den;

test/static/fixed_point_test.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ static_assert(scale<long>(mag<60>, 2l) == 120l);
5858
static_assert(scale<int>(mag_ratio<1, 1000>, 5000) == 5);
5959
static_assert(scale<int>(mag_ratio<1, 60>, 120) == 2);
6060

61-
// rational M (3/2 * 4 == 6): exact double-width integer arithmetic
61+
// rational M (3/2 * 4 == 6): exact widened integer arithmetic (int64_t for int)
6262
static_assert(scale<int>(mag_ratio<3, 2>, 4) == 6);
6363
// (1/3 * 9 == 3)
6464
static_assert(scale<int>(mag_ratio<1, 3>, 9) == 3);

0 commit comments

Comments
 (0)