Skip to content

Commit aedfa5a

Browse files
committed
Recoded one of the carryStep overloads. Saves one assembly instruction (and allowed optimizer to save another).
CarryFused WIDTH=512 drops by about 100 bytes. No measurable change in speed.
1 parent 3ec3c4e commit aedfa5a

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

src/cl/carryutil.cl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,9 @@ Word OVERLOAD carryStep(i64 x, i64 *outCarry, bool isBigWord) {
6969

7070
Word OVERLOAD carryStep(i64 x, i32 *outCarry, bool isBigWord) {
7171
u32 nBits = bitlen(isBigWord);
72-
Word w = lowBits(x, nBits);
73-
*outCarry = xtract32(x, nBits) + (w < 0);
72+
x <<= 32 - nBits;
73+
Word w = as_int2(x).x >> (32 - nBits);
74+
*outCarry = as_int2(x).y + (w < 0);
7475
return w;
7576
}
7677

0 commit comments

Comments
 (0)