Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/cmake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ name: CMake

on:
push:
branches-ignore:
- main
pull_request:
branches: [ main ]

Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: Code Coverage

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch: # Manual trigger
Expand Down
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,45 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

#### 2026-02-26 - decimal128 support for dfloat

- **dfloat decimal128** (`dfloat<34, 12>`): full IEEE 754-2008 decimal128 support (34 significant digits, 128 bits)
- Conditional `significand_t` type alias: `uint64_t` for ndigits <= 19, `__uint128_t` for ndigits <= 38 (guarded by `__SIZEOF_INT128__`)
- All significand-handling code updated: `unpack()`, `pack()`, `normalize_and_pack()`, arithmetic operators, comparisons, DPD codec, manipulators
- Addition: mixed scale-up/scale-down strategy for decimal128 (safe_scale_up = 3 digits to stay within `__uint128_t` capacity)
- Multiplication: schoolbook split multiply with 17-digit halves (each partial product fits `__uint128_t`)
- Division: iterative long division (remainder * 10 per step, fits `__uint128_t`)
- BID trailing significand > 64 bits: two-pass read/write (low 64 bits + high 46 bits for 110-bit field)
- DPD trailing > 64 bits: declet-by-declet bit-level read/write (11 declets for decimal128)
- Native `operator<` comparison (replaces double delegation, which would lose precision for 34-digit values)
- Standard aliases `decimal128` and `decimal128_dpd` enabled (guarded by `__SIZEOF_INT128__`)
- `setbits()` overload for `__uint128_t` to support full 128-bit raw bit setting
- New regression test `static/float/dfloat/standard/decimal128.cpp`: field widths, special values, integer round-trip, decimal exactness, arithmetic, BID/DPD agreement, comparisons
- All existing decimal32/decimal64 tests unaffected — 18/18 tests pass on both gcc and clang

#### 2026-02-26 - dfloat (IEEE 754-2008 Decimal FP) and hfloat (IBM System/360 Hex FP)

- **dfloat: IEEE 754-2008 decimal floating-point** (`dfloat<ndigits, es, Encoding, bt>`):
- Complete implementation with both BID (Binary Integer Decimal) and DPD (Densely Packed Decimal) encodings via `DecimalEncoding` enum template parameter
- IEEE 754-2008 combination field encode/decode (5-bit field discriminating MSD 0-7 vs 8-9 vs inf/NaN)
- Arithmetic operations (+, -, *, /) using `__uint128_t` wide intermediates for precision
- DPD codec with canonical IEEE 754-2008 Table 3.3 truth table — all 1000 encode/decode round-trips verified
- Standard aliases: `decimal32`, `decimal64`, `decimal128` (BID) and `decimal32_dpd`, `decimal64_dpd`, `decimal128_dpd` (DPD)
- Math library (all functions delegating through double), numeric_limits (radix=10), traits
- Regression tests: assignment/conversion, comparison operators, addition, subtraction, multiplication, division, decimal32 standard format, DPD codec exhaustive verification (17 tests total across dfloat+hfloat)

- **hfloat: IBM System/360 hexadecimal floating-point** (`hfloat<ndigits, es, bt>`):
- Classic 1964-era HFP: base-16 exponent, no hidden bit, no NaN, no infinity, no subnormals
- Truncation rounding only (never rounds up), overflow saturates to maxpos/maxneg
- Wobbling precision: 0-3 leading zero bits in MSB hex digit
- Standard aliases: `hfloat_short` (32-bit), `hfloat_long` (64-bit), `hfloat_extended` (128-bit)
- Math library, numeric_limits (radix=16, has_infinity=false, has_quiet_NaN=false), traits
- Regression tests: assignment/conversion, comparison operators, addition, subtraction, multiplication, division, short precision standard format

- Both types pass `ReportTrivialityOfType` (trivially constructible/copyable) and compile with zero warnings on both gcc and clang

### Fixed

#### 2026-02-26 - Issue Triage, Clang/Android Binary128, and Posit CLI Precision
Expand Down
9 changes: 8 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ if(NOT DEFINED UNIVERSAL_VERSION_MAJOR)
set(UNIVERSAL_VERSION_MAJOR 3)
endif()
if(NOT DEFINED UNIVERSAL_VERSION_MINOR)
set(UNIVERSAL_VERSION_MINOR 103)
set(UNIVERSAL_VERSION_MINOR 104)
endif()
if(NOT DEFINED UNIVERSAL_VERSION_PATCH)
set(UNIVERSAL_VERSION_PATCH 1)
Expand Down Expand Up @@ -165,6 +165,7 @@ option(UNIVERSAL_BUILD_NUMBER_FIXPNTS "Set to ON to build static fi
option(UNIVERSAL_BUILD_NUMBER_BFLOATS "Set to ON to build static bfloat tests" OFF)
option(UNIVERSAL_BUILD_NUMBER_CFLOATS "Set to ON to build static cfloat tests" OFF)
option(UNIVERSAL_BUILD_NUMBER_DFLOATS "Set to ON to build static dfloat tests" OFF)
option(UNIVERSAL_BUILD_NUMBER_HFLOATS "Set to ON to build static hfloat tests" OFF)
option(UNIVERSAL_BUILD_NUMBER_DOUBLE_DOUBLE "Set to ON to build static dd tests" OFF)
option(UNIVERSAL_BUILD_NUMBER_QUAD_DOUBLE "Set to ON to build static qd tests" OFF)
option(UNIVERSAL_BUILD_NUMBER_DD_CASCADE "Set to ON to build static dd cascade tests" OFF)
Expand Down Expand Up @@ -800,6 +801,7 @@ if(UNIVERSAL_BUILD_NUMBER_STATICS)
set(UNIVERSAL_BUILD_NUMBER_BFLOATS ON)
set(UNIVERSAL_BUILD_NUMBER_CFLOATS ON)
set(UNIVERSAL_BUILD_NUMBER_DFLOATS ON)
set(UNIVERSAL_BUILD_NUMBER_HFLOATS ON)
set(UNIVERSAL_BUILD_NUMBER_DOUBLE_DOUBLE ON)
set(UNIVERSAL_BUILD_NUMBER_QUAD_DOUBLE ON)
set(UNIVERSAL_BUILD_NUMBER_DD_CASCADE ON)
Expand Down Expand Up @@ -1025,6 +1027,11 @@ if(UNIVERSAL_BUILD_NUMBER_DFLOATS)
add_subdirectory("static/float/dfloat")
endif(UNIVERSAL_BUILD_NUMBER_DFLOATS)

# hexadecimal floats (IBM System/360)
if(UNIVERSAL_BUILD_NUMBER_HFLOATS)
add_subdirectory("static/float/hfloat")
endif(UNIVERSAL_BUILD_NUMBER_HFLOATS)

# double-double floats
if(UNIVERSAL_BUILD_NUMBER_DOUBLE_DOUBLE)
add_subdirectory("static/highprecision/dd")
Expand Down
243 changes: 243 additions & 0 deletions docs/plans/decimal-hexadecimal-float-implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# Implementation Plan: dfloat (Decimal FP) and hfloat (Hexadecimal FP)

## Context

Universal is an educational C++ template library for custom arithmetic types. It currently has binary floating-point (`cfloat`) but lacks decimal and hexadecimal floating-point systems. IBM mainframes historically provided all three in hardware: binary (IEEE 754), hexadecimal (System/360, 1964 -- hardware benefits from reduced alignment shifts), and decimal (financial industry, exact base-10 representation). Adding `dfloat` (IEEE 754-2008 decimal) and `hfloat` (IBM HFP) completes the floating-point radix family and serves as an educational resource for comparing encoding tradeoffs.

A `dfloat` skeleton already exists but is entirely stubbed. `hfloat` is greenfield.

## Design Decisions (User-Confirmed)

- **dfloat encoding**: Template parameter `DecimalEncoding` enum `{BID, DPD}` -- both encodings in one type for educational comparison
- **dfloat template**: `dfloat<ndigits, es, Encoding, bt>` (keep existing param names, add Encoding)
- **hfloat behavior**: Classic IBM System/360 -- no NaN, no infinity, no subnormals, truncation rounding, overflow saturates
- **hfloat template**: `hfloat<ndigits, es, bt>` where ndigits = hex fraction digits, es = exponent bits (7 for standard)
- **Scope**: Full implementation including math library

---

## Phase 1: dfloat Core Infrastructure (BID)

Rewrite the dfloat skeleton with correct IEEE 754-2008 decimal storage layout and BID encoding.

### Storage Layout

IEEE 754-2008 format: `[sign(1)] [combination(5)] [exponent_continuation(w)] [trailing_significand(t)]`

| Alias | Config | Bits |
|-------|--------|------|
| decimal32 | `dfloat<7, 6, BID>` | 1+5+6+20 = 32 |
| decimal64 | `dfloat<16, 8, BID>` | 1+5+8+50 = 64 |
| decimal128 | `dfloat<34, 12, BID>` | 1+5+12+110 = 128 |

Key static constexprs:
```cpp
static constexpr unsigned p = ndigits; // precision digits
static constexpr unsigned w = es; // exponent continuation bits
static constexpr unsigned t = nbits - 1 - 5 - w; // trailing significand bits
static constexpr int bias = (3 << (w - 1)) + p - 2;
```

Combination field (5 bits `abcde`):
- `ab != 11`: exp MSBs = `ab`, MSD = `0cde` (digit 0-7)
- `ab == 11, c != 1`: exp MSBs = `cd`, MSD = `100e` (digit 8-9)
- `11110`: infinity; `11111`: NaN

### Files to Modify

| File | Changes |
|------|---------|
| `include/sw/universal/number/dfloat/dfloat_fwd.hpp` | Add `DecimalEncoding` enum, 4-param forward decl |
| `include/sw/universal/number/dfloat/dfloat_impl.hpp` | Complete rewrite: fix storage calc, combination field encode/decode, BID significand pack/unpack, `clear/setzero/setinf/setnan/setsign`, `iszero/isinf/isnan/sign/scale`, `maxpos/minpos/zero/minneg/maxneg`, `convert_ieee754` (double->dfloat), `convert_to_ieee754` (dfloat->double), `convert_signed/convert_unsigned`, comparison operators |
| `include/sw/universal/number/dfloat/dfloat.hpp` | Add Encoding template param to aliases, uncomment traits/numeric_limits includes |
| `include/sw/universal/number/dfloat/manipulators.hpp` | Implement `to_binary()` showing field boundaries, `type_tag()` with encoding name |
| `include/sw/universal/number/dfloat/attributes.hpp` | Fix template params, implement `dynamic_range()` |
| `static/float/dfloat/api/api.cpp` | Rewrite for 4-param template, test BID `dfloat<7,6>` and `dfloat<16,8>` |

### Files to Create

| File | Purpose |
|------|---------|
| `include/sw/universal/traits/dfloat_traits.hpp` | `is_dfloat_trait`, `is_dfloat`, `enable_if_dfloat` (pattern: `traits/cfloat_traits.hpp`) |
| `include/sw/universal/number/dfloat/numeric_limits.hpp` | `std::numeric_limits<dfloat>` specialization (radix=10) |

### Reference Files
- `include/sw/universal/number/cfloat/cfloat_impl.hpp` -- class structure pattern
- `include/sw/universal/traits/cfloat_traits.hpp` -- traits pattern

---

## Phase 2: dfloat Arithmetic (BID)

Implement all four arithmetic operations for BID encoding.

### Algorithm Outlines

**Addition**: Unpack both operands to `(sign, exponent, significand_integer)`. Align by dividing smaller-exponent significand by `10^shift`. Add/subtract based on signs. Normalize result to p digits. Round per IEEE 754-2008.

**Multiplication**: `result_sig = sig_a * sig_b` (needs 2p-digit intermediate via `__uint128_t` or custom wide int). `result_exp = exp_a + exp_b`. Normalize to p digits.

**Division**: `result_sig = (sig_a * 10^p) / sig_b`. `result_exp = exp_a - exp_b`. Remainder determines rounding.

### Files to Modify
- `dfloat_impl.hpp`: Implement `operator+=`, `-=`, `*=`, `/=`, `operator-()`, `operator++/--`

### Files to Create
- `static/float/dfloat/conversion/assignment.cpp` -- native type round-trip tests
- `static/float/dfloat/conversion/decimal_conversion.cpp` -- string conversion tests (verify 0.1 exact)
- `static/float/dfloat/logic/logic.cpp` -- comparison tests including NaN semantics
- `static/float/dfloat/arithmetic/addition.cpp`
- `static/float/dfloat/arithmetic/subtraction.cpp`
- `static/float/dfloat/arithmetic/multiplication.cpp`
- `static/float/dfloat/arithmetic/division.cpp`

---

## Phase 3: dfloat DPD Encoding

Add DPD (Densely Packed Decimal) as alternate encoding, branching via `if constexpr`.

DPD maps 3 BCD digits to a 10-bit declet. Each declet classifies digits as "small" (0-7) or "large" (8-9), giving 8 encoding patterns. Encode/decode via constexpr lookup tables (1000-entry encode, 1024-entry decode).

### Files to Create
- `include/sw/universal/number/dfloat/dpd_codec.hpp` -- encode/decode tables + functions
- `static/float/dfloat/standard/dpd_codec.cpp` -- exhaustive verification of all 1000 encodings

### Files to Modify
- `dfloat_impl.hpp`: Add `if constexpr (Encoding == DPD)` branches in significand pack/unpack

---

## Phase 4: dfloat Standard Aliases and Polish

### Files to Modify
- `dfloat.hpp`: Add aliases:
```cpp
using decimal32 = dfloat<7, 6, DecimalEncoding::BID, uint32_t>;
using decimal64 = dfloat<16, 8, DecimalEncoding::BID, uint32_t>;
using decimal128 = dfloat<34, 12, DecimalEncoding::BID, uint32_t>;
using decimal32_dpd = dfloat<7, 6, DecimalEncoding::DPD, uint32_t>;
// etc.
```
- `manipulators.hpp`: Add `color_print`, `pretty_print`, `components`

### Files to Create
- `static/float/dfloat/standard/decimal32.cpp` -- field width verification, known bit patterns
- `static/float/dfloat/standard/decimal64.cpp`
- `static/float/dfloat/standard/decimal128.cpp`

---

## Phase 5: hfloat Core Infrastructure

Create IBM System/360 hexadecimal floating-point from scratch.

### Storage Layout

Format: `[sign(1)] [exponent(7)] [hex_fraction(ndigits*4)]`

Value: `(-1)^sign * 16^(exponent - 64) * 0.f1f2...fn`

| Alias | Config | Bits |
|-------|--------|------|
| hfloat_short | `hfloat<6, 7>` | 1+7+24 = 32 |
| hfloat_long | `hfloat<14, 7>` | 1+7+56 = 64 |
| hfloat_extended | `hfloat<28, 7>` | 1+7+112 = 120 (stored in 128) |

Key behaviors:
- No hidden bit, no NaN, no infinity, no subnormals
- Truncation rounding only
- Overflow saturates to maxpos/maxneg
- Wobbling precision: 0-3 leading zero bits in MSB hex digit
- Zero: fraction all zeros

### Files to Create (all new)

**Headers** (`include/sw/universal/number/hfloat/`):
- `hfloat.hpp` -- umbrella header
- `hfloat_fwd.hpp` -- forward declarations + aliases
- `exceptions.hpp` -- exception hierarchy (no NaN exceptions; overflow/underflow)
- `hfloat_impl.hpp` -- main class: template `<ndigits, es, bt>`, storage, constructors, operators, conversions, comparisons. `setinf()` maps to `maxpos()`. SpecificValue `qnan/snan` maps to zero.
- `numeric_limits.hpp` -- `radix=16`, `has_infinity=false`, `has_quiet_NaN=false`
- `manipulators.hpp` -- `type_tag`, `to_binary` (show hex digit boundaries), `to_hex`
- `attributes.hpp` -- `dynamic_range`, `sign`, `scale` (returns `4*(exp-64)`)

**Traits**: `include/sw/universal/traits/hfloat_traits.hpp`

**Tests** (`static/float/hfloat/`):
- `CMakeLists.txt`
- `api/api.cpp`
- `conversion/assignment.cpp`
- `conversion/hex_conversion.cpp`
- `logic/logic.cpp`
- `standard/short.cpp`, `standard/long.cpp`, `standard/extended.cpp`

### CMake Wiring (root `CMakeLists.txt`)

3 insertion points:
1. ~line 167: `option(UNIVERSAL_BUILD_NUMBER_HFLOATS "Set to ON to build static hfloat tests" OFF)`
2. ~line 802 in `STATICS` cascade: `set(UNIVERSAL_BUILD_NUMBER_HFLOATS ON)`
3. ~line 1027 after dfloat: `if(UNIVERSAL_BUILD_NUMBER_HFLOATS) add_subdirectory("static/float/hfloat") endif()`

---

## Phase 6: hfloat Arithmetic

Implement arithmetic with hex-digit alignment and truncation rounding.

**Addition**: Align by shifting fraction right by `4*(exp_large - exp_small)` bits. Add/subtract. Normalize by shifting hex digits until leading hex digit != 0. Truncate.

**Multiplication**: `result_frac = frac_a * frac_b` (wide multiply). `result_exp = exp_a + exp_b - 64`. Normalize to ndigits hex digits. Truncate.

**Division**: `result_frac = (frac_a << ndigits*4) / frac_b`. `result_exp = exp_a - exp_b + 64`. Normalize. Truncate.

### Files to Create
- `static/float/hfloat/arithmetic/addition.cpp`
- `static/float/hfloat/arithmetic/subtraction.cpp`
- `static/float/hfloat/arithmetic/multiplication.cpp`
- `static/float/hfloat/arithmetic/division.cpp`
- `static/float/hfloat/performance/perf.cpp`

---

## Phase 7: Math Libraries (Both Types)

Initial implementation delegates through `double` for all math functions:
```cpp
template<...> TypeName func(TypeName x) {
return TypeName(std::func(double(x)));
}
```

Functions: `exp/exp2/exp10/expm1`, `log/log2/log10/log1p`, `pow/sqrt/cbrt/hypot`, `sin/cos/tan/asin/acos/atan/atan2`, `sinh/cosh/tanh/asinh/acosh/atanh`, `trunc/floor/ceil/round`, `fmod/remainder`, `copysign/nextafter/fabs/abs`, `fmin/fmax/fdim`, `erf/erfc/tgamma/lgamma`, `fma`

For hfloat: `isnan()` always returns false, `isinf()` always returns false, overflow saturates.

### Files to Create (per type)
- `math/classify.hpp`, `math/exponent.hpp`, `math/logarithm.hpp`, `math/trigonometry.hpp`, `math/hyperbolic.hpp`, `math/sqrt.hpp`, `math/pow.hpp`, `math/minmax.hpp`, `math/next.hpp`, `math/truncate.hpp`, `math/fractional.hpp`, `math/hypot.hpp`, `math/error_and_gamma.hpp`
- `mathlib.hpp` umbrella
- Test files: `static/float/{dfloat,hfloat}/math/*.cpp`

---

## Phase 8: Integration and Testing

1. Build and test with gcc (`make -j4`)
2. Build and test with clang (critical -- clang is stricter on UB, uninitialized vars)
3. Verify `ReportTrivialityOfType` passes for all configurations
4. Verify `constexpr` correctness (no `std::frexp/ldexp` in constexpr paths)
5. Key test: `dfloat` represents 0.1 exactly (unlike binary float)
6. Key test: `hfloat` truncation rounding never rounds up

### Build Commands
```bash
mkdir build_dfloat && cd build_dfloat
cmake -DUNIVERSAL_BUILD_NUMBER_DFLOATS=ON -DUNIVERSAL_BUILD_NUMBER_HFLOATS=ON ..
make -j4
ctest
```

### Safety Reminders
- ONE build at a time, `make -j4` max (never `-j$(nproc)`)
- Test with BOTH gcc and clang before considering done
- Never claim tests pass without running them
Loading
Loading