feat: scaling widening changed to always use std::int64_t for smaller types

mpusz · mpusz · commit 2eb094c0b118 · 2026-04-08T17:56:26.000+02:00
diff --git a/docs/users_guide/framework_basics/representation_types.md b/docs/users_guide/framework_basics/representation_types.md
@@ -216,7 +216,7 @@ flowchart TD
     FIXED --> G{"magnitude?"}
     EWS --> G
     G -- "integral (e.g. m→mm, ×1000)" --> I["exact integer multiplication"]
-    G -- "rational (e.g. ft→m, ×3048/10000)" --> R["double-width integer arithmetic<br>(avoids overflow &amp; FP rounding)"]
+    G -- "rational (e.g. ft→m, ×3048/10000)" --> R["widened integer arithmetic<br>(int64_t or 128-bit;<br>avoids overflow &amp; FP rounding)"]
     G -- "irrational (e.g. deg→rad, ×π/180)" --> IR["long double fixed-point approximation"]
 ```
 
@@ -256,18 +256,25 @@ as a fallback when this operator is not available.
     The design preference order is therefore:
     **exact integer > exact rational > approximate irrational**.
 
-??? question "Why double-width integers for the rational path?"
+??? question "Why widened integers for the rational path?"
 
     The library computes `value * numerator / denominator` entirely in integer
     arithmetic. Without extra width, the intermediate product
     `value * numerator` can overflow even when the final result fits — for example,
     converting feet to metres multiplies by 3048 before dividing by 10000, which
     overflows a 64-bit integer for values above ~3×10¹⁵.
 
-    Double-width integers (e.g. 128-bit for 64-bit values) absorb that intermediate
-    growth. Using `long double` instead would violate the no-FP principle above and
-    introduce rounding (`0.3048` is not exactly representable in binary floating-point),
-    and on ARM / Apple Silicon `long double == double` anyway, giving no extra range.
+    **mp-units uses widened integers to absorb that intermediate growth:**
+
+    - For types up to 32 bits (`int8_t`, `int16_t`, `int32_t`): widens to `int64_t`
+    - For `int64_t`: widens to 128-bit arithmetic (`__int128` or custom emulation)
+
+    This approach provides maximum safety headroom for smaller types (with no performance
+    cost on modern 64-bit systems), while 128-bit arithmetic handles the vast majority
+    of real-world `int64_t` conversions. Using `long double` instead would violate the
+    no-FP principle above and introduce rounding (`0.3048` is not exactly representable
+    in binary floating-point), and on ARM / Apple Silicon `long double == double` anyway,
+    giving no extra range.
 
 If different internal fields need different scale factors, encode that logic in
 `operator*` and `operator/` — the library routes scaling through them via the
diff --git a/docs/users_guide/framework_basics/value_conversions.md b/docs/users_guide/framework_basics/value_conversions.md
@@ -222,9 +222,9 @@ that a naive implementation cannot handle correctly:
   (a `double` has only 53 bits of mantissa).
 
 Both challenges are addressed by using **fixed-point arithmetic**: the conversion factor is
-represented at compile time as a double-width integer constant, so the runtime computation
-is a pure integer multiply followed by a right-shift with no risk of intermediate overflow
-and no floating-point operations.
+represented at compile time as a widened integer constant (64-bit for types up to 32 bits,
+128-bit for `int64_t`), so the runtime computation is a pure integer multiply followed by
+a right-shift with no risk of intermediate overflow and no floating-point operations.
 
 ??? info "Implementation details"
 
@@ -238,23 +238,24 @@ and no floating-point operations.
     | Non-integer ratio | otherwise                 | `ft → m` ($\times 0.3048$) | fixed-point multiply |  explicit  |
 
     For the non-integer case the magnitude is converted **at compile time** to a
-    fixed-point constant with double the bit-width of the representation type.  For
-    example, when scaling a 32-bit integer value, a 64-bit fixed-point intermediate is
-    used.  The actual runtime computation is then a pure integer multiply followed by a
-    right-shift:
+    fixed-point constant with widened bit-width. The library uses **`int64_t` for
+    all types up to 32 bits** (`int8_t`, `int16_t`, `int32_t`) and **128-bit arithmetic
+    for `int64_t`**.  The actual runtime computation is then a pure integer multiply
+    followed by a right-shift:
 
     $$
     \text{result} = \left\lfloor \text{value} \times \lfloor M \cdot 2^N \rfloor \right\rfloor \gg N
     $$
 
-    where $N$ equals the bit-width of the source representation type.  On platforms where
-    `__int128` is available (most 64-bit targets), the double-width arithmetic is
+    where $N$ equals 64 for types up to 32 bits, and 128 for `int64_t`.  On platforms
+    where `__int128` is available (most 64-bit targets), the 128-bit arithmetic is
     implemented natively; on others, a portable `double_width_int` emulation is used in
     `constexpr` context.
 
-    Because the intermediate is double-width, it cannot overflow as long as the input
-    value fits in the representation type — a value of `std::int64_t` will never silently
-    overflow during the multiplication step.
+    Because the intermediate is significantly widened (e.g., `int64_t` for types up to
+    32 bits, providing much more than double-width headroom for smaller types), it cannot
+    overflow as long as the input value fits in the representation type — for example,
+    a value of `std::int32_t` computed in `int64_t` has 32 extra bits of safety margin.
 
     For the non-integer ratio path, the result is **truncated toward zero**.  The
     fixed-point constant is rounded *away* from zero at compile time to compensate for
diff --git a/src/core/include/mp-units/framework/scaling.h b/src/core/include/mp-units/framework/scaling.h
@@ -87,10 +87,13 @@ template<typename common_t, auto M>
     // M is a pure rational p/q (no irrational factors such as π).
     constexpr common_t num = get_value<common_t>(numerator(M));
     constexpr common_t den = get_value<common_t>(denominator(M));
-    if constexpr (sizeof(common_t) < sizeof(int128_t)) {
-      // Use a wider native type to avoid intermediate overflow.
-      using wide_t = double_width_int_for_t<common_t>;
-      return static_cast<common_t>(static_cast<wide_t>(v) * num / den);
+    if constexpr (sizeof(common_t) <= sizeof(std::int32_t)) {
+      // Use int64_t for all types up to int32_t: provides maximum safety headroom
+      // with no performance cost on modern 64-bit systems.
+      return static_cast<common_t>(static_cast<std::int64_t>(v) * num / den);
+    } else if constexpr (sizeof(common_t) < sizeof(int128_t)) {
+      // Use 128-bit arithmetic for int64_t.
+      return static_cast<common_t>(static_cast<int128_t>(v) * num / den);
     } else {
       // At max integer width: no wider type available, compute in common_t directly.
       return v * num / den;
diff --git a/test/static/fixed_point_test.cpp b/test/static/fixed_point_test.cpp
@@ -58,7 +58,7 @@ static_assert(scale<long>(mag<60>, 2l) == 120l);
 static_assert(scale<int>(mag_ratio<1, 1000>, 5000) == 5);
 static_assert(scale<int>(mag_ratio<1, 60>, 120) == 2);
 
-// rational M (3/2 * 4 == 6): exact double-width integer arithmetic
+// rational M (3/2 * 4 == 6): exact widened integer arithmetic (int64_t for int)
 static_assert(scale<int>(mag_ratio<3, 2>, 4) == 6);
 // (1/3 * 9 == 3)
 static_assert(scale<int>(mag_ratio<1, 3>, 9) == 3);