Skip to content

Commit 1cc5e96

Browse files
Ravenwaterclaudesingul4ri7y
authored
V3.97: hygiene release (#507)
* Incrementing SEMVER to v3.96.1 * Fix UBSan: guard negative exponent overflow in areal conversion When exponent is a large negative (e.g. -72), the shift `1ull << -exponent` exceeds 63 bits, causing undefined behavior. Add `exponent > -64` guard so both positive and negative extremes fall through to the safe ipow() path. * weird difference between double and duble * systems and position paper roadmaps * Add papers/ artifact tree with three mixed-precision solver case studies Create self-contained papers/ directory for systems and position paper artifacts that can be zipped and shared with reviewers: - papers/systems-paper/iterative_refinement.cpp: Carson & Higham three-precision LU-IR across IEEE, posit, cfloat, dd, cross-family - papers/systems-paper/conjugate_gradient.cpp: CG for SPD systems with single-precision and two-precision (low preconditioner) configurations - papers/systems-paper/idrs.cpp: IDR(s) for non-symmetric systems with shadow space dimension sweep and number system comparison Move paper docs from docs/papers/ to papers/docs/ for co-location. Add UNIVERSAL_BUILD_PAPERS CMake option (wired into BUILD_ALL cascade). * Add changelog and session doc for paper artifact tree and solver studies * Add LaTeX scaffolding for arXiv systems paper Plain article class (12pt) with full section structure, TODO placeholders, 29 BibTeX references (14 from JOSS + 15 new), and Makefile for local builds. * Add changelog and session doc for LaTeX paper scaffolding * arxiv paper first draft * arxiv systems paper draft v3 * Adding a mixed-precision Attention head with KV cache test case for the ArXiv paper * Complete posit2 arithmetic, conversion, logic, and assignment test suites Add all four arithmetic operations (sub, mul, div plus existing add) via blocktriple pipeline, port conversion/assignment/logic regression tests from original posit, and fix three bugs discovered during testing: - convert_ieee754() extractBits too small: nbits+4 lost IEEE sticky bits causing false midpoint ties; now uses max(numeric_limits<Real>::digits, nbits+4) - Integer assignment via blocktriple had hidden-bit off-by-one in round(); rerouted through convert_ieee754(static_cast<double>(rhs)) - Literal comparison operators accessed private _block member; replaced with delegation to posit-posit comparison operators * Fix posit2 clang test failures: regime value() and setbits() UB positRegime::value() used manual division (1.0l / uint64_t(1) << -e2) for negative exponents, which produced wrong results under clang due to a codegen issue in this template context. Replace with std::ldexp() matching the original posit implementation. posit::setbits(uint64_t) used an uninitialized blockbinary temporary leaving upper MSU bits as garbage. Replace with _block.setbits(value) which properly masks the MSU. * Guard posit2 long double operators with LONG_DOUBLE_SUPPORT for MSVC On MSVC, LONG_DOUBLE_SUPPORT is 0 but the long double comparison operators were not guarded, causing ambiguous conversion errors since the posit(long double) constructor was correctly excluded. Add matching #if LONG_DOUBLE_SUPPORT guards to friend declarations, operator implementations, and test code. * Implementation of Unum 2.0 (for v3.96) (#505) * Skeletal implementation of Unum 2.0 * Fix point multiplication bug and add support for reverse interval * Added pow() and abs() + some optimizations * Refactor according to Codacy suggestions * Added op table/matrix for efficient operations * Fix unum2 includes * Fix operation matrix bug in unum2_impl.hpp * unum2_fwd.hpp, static test and improvements * Change bitset::_Find_first() to manually finding the bit for compatibility reasons * Make unum2 friend class a little more specific on class lattice * Make lattice parameters public to bypass MSVC build fails * incrementing SEMVER to v3.97 * Fix posit<8,2> fast specialization: replace broken float_assign with proven convert_to_bb The hand-rolled float_assign truncated toward zero instead of rounding to nearest, causing 25K+ failures across arithmetic and conversion tests. Replaced with the battle-tested convert_to_bb path used by posit<16,1>, posit<16,2>, and posit<32,2>. Also fixed reciprocal() NaR handling and enabled regression testing (MANUAL_TESTING 0). * Fix posit<32,2> RISC-V failure: replace long double float_assign with double The root cause of the 100% arithmetic failure rate on RISC-V was float_assign(long double) using std::numeric_limits<long double>::digits to size the conversion bitblock. On x86, long double is 80-bit (dfbits=63); on RISC-V, long double is 128-bit quad (dfbits=112), causing convert_to_bb to instantiate with a different bitblock size that produces wrong results. Since double's 52 fraction bits exceed posit<32,2>'s maximum 28 fraction bits, double is more than sufficient. This matches the proven pattern already used by posit<16,1>, posit<16,2>, and posit<8,2>. Also fixed reciprocal() NaR handling (same fix as posit<8,2>). * Add RISC-V 64 cross-compilation CI job with QEMU emulation Adds a new CI matrix entry that cross-compiles for RISC-V 64 using g++-riscv64-linux-gnu and runs tests via qemu-riscv64-static. This catches architecture-specific issues like the long double dfbits divergence fixed in d0ef1c2. * Fix RISC-V CI: set QEMU_LD_PREFIX for dynamic linker resolution The qemu-user-static package registers a binfmt_misc handler that intercepts RISC-V binaries but runs them without the -L sysroot flag, causing "Could not open '/lib/ld-linux-riscv64-lp64d.so.1'" errors. Setting QEMU_LD_PREFIX ensures QEMU finds the RISC-V sysroot regardless of whether it is invoked via CMAKE_CROSSCOMPILING_EMULATOR or binfmt_misc. * Fix cfloat sNaN test failures on RISC-V and add POWER CI job RISC-V, ARM, and POWER architectures quiet sNaN (signaling NaN) on any FP register contact, so cfloat's sNaN encoding cannot survive a round-trip through native float/double on these platforms. - Add UNIVERSAL_SNAN_ROUND_TRIPS_NATIVE_FP macro to architecture.hpp documenting sNaN behaviour per architecture (only defined on x86) - Guard sNaN round-trip tests in cfloat_test_suite.hpp and assignment.cpp: skip sNaN-specific assertions on non-x86 platforms - Fix gcc_long_double.hpp to_binary/to_triple/color_print for POWER's 128-bit IEEE quad long double (112 fraction bits, no x86 bit63 field) - Add POWER ppc64le cross-compilation CI job with QEMU emulation - Add cmake/toolchains/ppc64le-linux-gnu.cmake toolchain file Tested locally: 389/389 pass on RISC-V, 389/389 pass on POWER, 7/7 cfloat tests pass on native x86. * Fix broken cmake install after include directory reorganization (#503) The include tree was moved from include/universal/ to include/sw/universal/ but the install rules were never updated, causing "file INSTALL cannot find" errors. Fix the install source path, BUILD_INTERFACE, and include_install_dir to match the new layout. --------- Signed-off-by: Theodore Omtzigt <theo@stillwater-sc.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: SD Asif Hossein <101280084+singul4ri7y@users.noreply.github.com>
1 parent 6daf46f commit 1cc5e96

File tree

11 files changed

+247
-134
lines changed

11 files changed

+247
-134
lines changed

.github/workflows/cmake.yml

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: CMake
22

33
on:
44
push:
5-
branches: [ v3.96, main ]
5+
branches: [ v3.97, main ]
66
pull_request:
77
branches: [ main ]
88

@@ -49,6 +49,20 @@ jobs:
4949
name: macOS x64 (Apple Clang)
5050
artifact: macos-x64
5151
cmake_flags: -DUNIVERSAL_BUILD_CI_LITE=ON
52+
# RISC-V cross-compilation with QEMU emulation
53+
- os: ubuntu-latest
54+
name: Linux RISC-V 64 (GCC cross)
55+
artifact: linux-riscv64-gcc
56+
cross: riscv64
57+
toolchain: cmake/toolchains/riscv64-linux-gnu.cmake
58+
cmake_flags: -DUNIVERSAL_BUILD_CI_LITE=ON
59+
# IBM POWER cross-compilation with QEMU emulation
60+
- os: ubuntu-latest
61+
name: Linux POWER 64 LE (GCC cross)
62+
artifact: linux-ppc64le-gcc
63+
cross: ppc64le
64+
toolchain: cmake/toolchains/ppc64le-linux-gnu.cmake
65+
cmake_flags: -DUNIVERSAL_BUILD_CI_LITE=ON
5266

5367
steps:
5468
- name: Checkout
@@ -60,6 +74,22 @@ jobs:
6074
sudo apt-get update
6175
sudo apt-get install -y clang
6276
77+
- name: Install RISC-V cross-compiler and QEMU
78+
if: matrix.cross == 'riscv64'
79+
run: |
80+
sudo apt-get update
81+
sudo apt-get install -y g++-riscv64-linux-gnu qemu-user-static
82+
# Set sysroot so QEMU can find the RISC-V dynamic linker and libs,
83+
# whether invoked via CMAKE_CROSSCOMPILING_EMULATOR or binfmt_misc
84+
echo "QEMU_LD_PREFIX=/usr/riscv64-linux-gnu" >> $GITHUB_ENV
85+
86+
- name: Install POWER cross-compiler and QEMU
87+
if: matrix.cross == 'ppc64le'
88+
run: |
89+
sudo apt-get update
90+
sudo apt-get install -y g++-powerpc64le-linux-gnu qemu-user-static
91+
echo "QEMU_LD_PREFIX=/usr/powerpc64le-linux-gnu" >> $GITHUB_ENV
92+
6393
# ccache for Linux and macOS
6494
- name: Install and configure ccache
6595
if: runner.os != 'Windows'
@@ -98,6 +128,7 @@ jobs:
98128
-DCMAKE_C_COMPILER_LAUNCHER=${{ env.CMAKE_C_COMPILER_LAUNCHER || '' }}
99129
-DCMAKE_CXX_COMPILER_LAUNCHER=${{ env.CMAKE_CXX_COMPILER_LAUNCHER || '' }}
100130
${{ matrix.cmake_flags }}
131+
${{ matrix.toolchain && format('-DCMAKE_TOOLCHAIN_FILE={0}/{1}', github.workspace, matrix.toolchain) || '' }}
101132
${{ matrix.cc && format('-DCMAKE_C_COMPILER={0}', matrix.cc) || '' }}
102133
${{ matrix.cxx && format('-DCMAKE_CXX_COMPILER={0}', matrix.cxx) || '' }}
103134

CMakeLists.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ if(NOT DEFINED UNIVERSAL_VERSION_MAJOR)
2020
set(UNIVERSAL_VERSION_MAJOR 3)
2121
endif()
2222
if(NOT DEFINED UNIVERSAL_VERSION_MINOR)
23-
set(UNIVERSAL_VERSION_MINOR 96)
23+
set(UNIVERSAL_VERSION_MINOR 97)
2424
endif()
2525
if(NOT DEFINED UNIVERSAL_VERSION_PATCH)
2626
set(UNIVERSAL_VERSION_PATCH 1)
@@ -565,8 +565,8 @@ if(WIN32)
565565
set(config_install_dir CMake)
566566
elseif(UNIX)
567567
set(include_install_dir include)
568-
set(include_install_dir_postfix "${project_library_target_name}")
569-
set(include_install_dir_full "${include_install_dir}/${include_install_dir_postfix}")
568+
set(include_install_dir_postfix "")
569+
set(include_install_dir_full "${include_install_dir}")
570570

571571
set(config_install_dir share/${PACKAGE_NAME})
572572
else()
@@ -614,7 +614,7 @@ message(STATUS "include_install_dir_postfix = ${include_install_dir_postfix}")
614614

615615
# configure the library target
616616
target_include_directories(${project_library_target_name}
617-
INTERFACE $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}>
617+
INTERFACE $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/sw>
618618
$<INSTALL_INTERFACE:${include_install_dir_full}>)
619619

620620
# uninstall target
@@ -658,7 +658,7 @@ install(FILES
658658
DESTINATION ${config_install_dir} COMPONENT cmake)
659659

660660
# Install headers
661-
install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/${project_library_target_name}
661+
install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/sw/
662662
DESTINATION ${include_install_dir})
663663

664664
if(UNIVERSAL_BUILD_ALL)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# CMake toolchain file for IBM POWER 64-bit little-endian cross-compilation
2+
# Uses powerpc64le-linux-gnu GCC cross-compiler and QEMU user-mode emulation
3+
#
4+
# Usage:
5+
# cmake -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/ppc64le-linux-gnu.cmake ..
6+
#
7+
# Prerequisites (Ubuntu/Debian):
8+
# sudo apt-get install g++-powerpc64le-linux-gnu qemu-user-static
9+
10+
set(CMAKE_SYSTEM_NAME Linux)
11+
set(CMAKE_SYSTEM_PROCESSOR ppc64le)
12+
13+
set(CMAKE_C_COMPILER powerpc64le-linux-gnu-gcc)
14+
set(CMAKE_CXX_COMPILER powerpc64le-linux-gnu-g++)
15+
16+
# QEMU user-mode emulation for running cross-compiled test binaries
17+
set(CMAKE_CROSSCOMPILING_EMULATOR "qemu-ppc64le-static;-L;/usr/powerpc64le-linux-gnu")
18+
19+
# Search paths for cross-compiled libraries
20+
set(CMAKE_FIND_ROOT_PATH /usr/powerpc64le-linux-gnu)
21+
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
22+
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
23+
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# CMake toolchain file for RISC-V 64-bit cross-compilation
2+
# Uses riscv64-linux-gnu GCC cross-compiler and QEMU user-mode emulation
3+
#
4+
# Usage:
5+
# cmake -DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/riscv64-linux-gnu.cmake ..
6+
#
7+
# Prerequisites (Ubuntu/Debian):
8+
# sudo apt-get install g++-riscv64-linux-gnu qemu-user-static
9+
10+
set(CMAKE_SYSTEM_NAME Linux)
11+
set(CMAKE_SYSTEM_PROCESSOR riscv64)
12+
13+
set(CMAKE_C_COMPILER riscv64-linux-gnu-gcc)
14+
set(CMAKE_CXX_COMPILER riscv64-linux-gnu-g++)
15+
16+
# QEMU user-mode emulation for running cross-compiled test binaries
17+
set(CMAKE_CROSSCOMPILING_EMULATOR "qemu-riscv64-static;-L;/usr/riscv64-linux-gnu")
18+
19+
# Search paths for cross-compiled libraries
20+
set(CMAKE_FIND_ROOT_PATH /usr/riscv64-linux-gnu)
21+
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
22+
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
23+
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)

include/sw/universal/native/nonconstexpr/gcc_long_double.hpp

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ namespace sw { namespace universal {
1515
inline std::tuple<bool, int, std::uint64_t> ieee_components(long double fp) {
1616
static_assert(std::numeric_limits<double>::is_iec559,
1717
"This function only works when double complies with IEC 559 (IEEE 754)");
18-
static_assert(sizeof(long double) == 16, "This function only works when double is 80 bit.");
18+
static_assert(sizeof(long double) == 16, "This function only works when long double is 16 bytes.");
1919

2020
long_double_decoder dd{ fp }; // initializes the first member of the union
2121
// Reading inactive union parts is forbidden in constexpr :-(
@@ -115,14 +115,35 @@ inline std::string to_binary(long double number, bool bNibbleMarker = false) {
115115

116116
s << '.';
117117

118-
// print fraction bits
118+
#if defined(UNIVERSAL_ARCH_POWER)
119+
// POWER: IEEE 754 binary128 — 112 fraction bits (48 upper + 64 lower)
120+
// No explicit integer bit (implicit leading 1 for normals)
121+
{
122+
uint64_t mask = (uint64_t(1) << 47);
123+
for (int i = 47; i >= 0; --i) {
124+
s << ((decoder.parts.upper & mask) ? '1' : '0');
125+
if (bNibbleMarker && i != 0 && (i % 4) == 0) s << '\'';
126+
mask >>= 1;
127+
}
128+
}
129+
{
130+
uint64_t mask = (uint64_t(1) << 63);
131+
for (int i = 63; i >= 0; --i) {
132+
s << ((decoder.parts.fraction & mask) ? '1' : '0');
133+
if (bNibbleMarker && i != 0 && (i % 4) == 0) s << '\'';
134+
mask >>= 1;
135+
}
136+
}
137+
#else
138+
// x86: 80-bit extended — bit63 is the explicit integer bit, then 63 fraction bits
119139
uint64_t mask = (uint64_t(1) << 62);
120140
s << (decoder.parts.bit63 ? '1' : '0');
121141
for (int i = 62; i >= 0; --i) {
122142
s << ((decoder.parts.fraction & mask) ? '1' : '0');
123143
if (bNibbleMarker && i != 0 && (i % 4) == 0) s << '\'';
124144
mask >>= 1;
125145
}
146+
#endif
126147

127148
return s.str();
128149
}
@@ -151,12 +172,30 @@ inline std::string to_triple(long double number) {
151172
s << scale << ',';
152173

153174
// print fraction bits
175+
#if defined(UNIVERSAL_ARCH_POWER)
176+
// POWER: 112 fraction bits (48 upper + 64 lower), implicit leading 1
177+
{
178+
uint64_t mask = (uint64_t(1) << 47);
179+
for (int i = 47; i >= 0; --i) {
180+
s << ((decoder.parts.upper & mask) ? '1' : '0');
181+
mask >>= 1;
182+
}
183+
}
184+
{
185+
uint64_t mask = (uint64_t(1) << 63);
186+
for (int i = 63; i >= 0; --i) {
187+
s << ((decoder.parts.fraction & mask) ? '1' : '0');
188+
mask >>= 1;
189+
}
190+
}
191+
#else
154192
s << (decoder.parts.bit63 ? '1' : '0');
155193
uint64_t mask = (uint64_t(1) << 61);
156194
for (int i = 61; i >= 0; --i) {
157195
s << ((decoder.parts.fraction & mask) ? '1' : '0');
158196
mask >>= 1;
159197
}
198+
#endif
160199

161200
s << ')';
162201
return s.str();
@@ -195,13 +234,33 @@ inline std::string color_print(long double number) {
195234
s << '.';
196235

197236
// print fraction bits
237+
#if defined(UNIVERSAL_ARCH_POWER)
238+
// POWER: 112 fraction bits (48 upper + 64 lower), implicit leading 1
239+
{
240+
uint64_t mask = (uint64_t(1) << 47);
241+
for (int i = 47; i >= 0; --i) {
242+
s << magenta << ((decoder.parts.upper & mask) ? '1' : '0');
243+
if (i > 0 && i % 4 == 0) s << magenta << '\'';
244+
mask >>= 1;
245+
}
246+
}
247+
{
248+
uint64_t mask = (uint64_t(1) << 63);
249+
for (int i = 63; i >= 0; --i) {
250+
s << magenta << ((decoder.parts.fraction & mask) ? '1' : '0');
251+
if (i > 0 && i % 4 == 0) s << magenta << '\'';
252+
mask >>= 1;
253+
}
254+
}
255+
#else
198256
s << magenta << (decoder.parts.bit63 ? '1' : '0');
199257
uint64_t mask = (uint64_t(1) << 61);
200258
for (int i = 61; i >= 0; --i) {
201259
s << magenta << ((decoder.parts.fraction & mask) ? '1' : '0');
202260
if (i > 0 && i % 4 == 0) s << magenta << '\'';
203261
mask >>= 1;
204262
}
263+
#endif
205264

206265
s << def;
207266
return s.str();

include/sw/universal/number/posit/specialized/posit_32_2.hpp

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -90,15 +90,15 @@ class posit<NBITS_IS_32, ES_IS_2> {
9090
constexpr posit& operator=(short rhs) { return integer_assign((long)(rhs)); }
9191
constexpr posit& operator=(int rhs) { return integer_assign((long)(rhs)); }
9292
constexpr posit& operator=(long rhs) { return integer_assign(rhs); }
93-
posit& operator=(long long rhs) { return float_assign((long double)(rhs)); }
93+
posit& operator=(long long rhs) { return float_assign((double)(rhs)); }
9494
constexpr posit& operator=(char rhs) { return integer_assign((long)(rhs)); }
9595
constexpr posit& operator=(unsigned short rhs) { return integer_assign((long)(rhs)); }
9696
constexpr posit& operator=(unsigned int rhs) { return integer_assign((long)(rhs)); }
97-
posit& operator=(unsigned long rhs) { return float_assign((long double)(rhs)); }
98-
posit& operator=(unsigned long long rhs) { return float_assign((long double)(rhs)); }
99-
posit& operator=(float rhs) { return float_assign((long double)rhs); }
100-
posit& operator=(double rhs) { return float_assign((long double)rhs); }
101-
posit& operator=(long double rhs) { return float_assign(rhs); }
97+
posit& operator=(unsigned long rhs) { return float_assign((double)(rhs)); }
98+
posit& operator=(unsigned long long rhs) { return float_assign((double)(rhs)); }
99+
posit& operator=(float rhs) { return float_assign((double)rhs); }
100+
posit& operator=(double rhs) { return float_assign(rhs); }
101+
posit& operator=(long double rhs) { return float_assign((double)rhs); }
102102

103103
explicit operator long double() const { return to_long_double(); }
104104
explicit operator double() const { return to_double(); }
@@ -434,6 +434,11 @@ class posit<NBITS_IS_32, ES_IS_2> {
434434
return tmp;
435435
}
436436
posit reciprocal() const {
437+
if (isnar()) {
438+
posit p;
439+
p.setnar();
440+
return p;
441+
}
437442
posit p = 1.0 / *this;
438443
return p;
439444
}
@@ -663,8 +668,14 @@ class posit<NBITS_IS_32, ES_IS_2> {
663668
_bits = sign ? -raw : raw;
664669
return *this;
665670
}
666-
posit& float_assign(long double rhs) {
667-
constexpr int dfbits = std::numeric_limits<long double>::digits - 1;
671+
// convert a double precision IEEE floating point to a posit<32,2>.
672+
// Use double (not long double) so dfbits is consistent across
673+
// architectures: x86 long double=80-bit, RISC-V long double=128-bit,
674+
// which causes convert_to_bb to instantiate with different bitblock
675+
// sizes and produce wrong results. Double's 52 fraction bits are
676+
// more than sufficient for a 32-bit posit (max 28 fraction bits).
677+
posit& float_assign(double rhs) {
678+
constexpr int dfbits = std::numeric_limits<double>::digits - 1;
668679
internal::value<dfbits> v(rhs);
669680
// special case processing
670681
if (v.iszero()) {

0 commit comments

Comments
 (0)