Skip to content

Commit 8a3644e

Browse files
committed
[DRAFT] Better explanation for round() vs round-() in decompose
Only the AVX2 one is updated currently. The NEON one will also be updated if this looks good. Signed-off-by: jammychiou1 <[email protected]>
1 parent dc73450 commit 8a3644e

File tree

4 files changed

+96
-8
lines changed

4 files changed

+96
-8
lines changed

dev/x86_64/src/poly_decompose_32_avx2.c

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,32 @@ void mld_poly_decompose_32_avx2(int32_t *a1, int32_t *a0)
7272
* _mm256_mulhi_epu16() below.
7373
*/
7474

75+
/* check-magic: 4290772992 == 1 / (1 / 4092 - 1025 / 2**22) */
7576
/*
7677
* Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
77-
* for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
78-
* 1 / 4092.
78+
* for 0 <= f1' < 2^16.
79+
*
80+
* To see this, consider the (signed) error f1' * (1 / B - 1025 / 2^22)
81+
* between f1' / B and the (under-)approximation f1' * 1025 / 2^22. Because
82+
* eps := 1 / B - 1025 / 2^22 is 1 / 4290772992 ≈ 2^(-31.99) < 2^(-31), we
83+
* have 0 <= f1' * eps < 2^16 * 2^(-31) = 1 / 2^15 < 1 / B (note that f1' is
84+
* non-negative).
85+
*
86+
* On the other hand, 1 / B is the spacing between the integral multiples
87+
* of 1 / B, which includes all rounding boundaries n + 0.5 (since B is
88+
* even). Hence, if f1' / B is not of the form n + 0.5, then it is at least
89+
* 1 / B away from the nearest rounding boundary, so moving from f1' / B to
90+
* f1' * 1025 / 2^22 does not affect the rounding result, no matter the
91+
* type of rounding used in either side. In particular, we have
92+
* round-(f1' / B) = round(f1' * 1025 / 2^22) as claimed.
93+
*
94+
* As for the remaining case where f1' / B _is_ of the form n + 0.5, because
95+
* f1' * 1025 / 2^22 is slightly but strictly below f1' / B = n + 0.5 (note
96+
* that f1' and thus the error f1' * eps cannot be 0 here), it is always
97+
* rounded down to n. More precisely, we have round-(f1' / B) =
98+
* round(f1' * 1025 / 2^22), where the round-down on the LHS is essential,
99+
* and on the RHS the type of rounding again does not matter. This concludes
100+
* the proof.
79101
*
80102
* round(f1' * 1025 / 2^22) is in turn computed in 2 steps as
81103
* round(floor(f1' * 1025 / 2^16) / 2^6). The mulhi computes f1'' =

dev/x86_64/src/poly_decompose_88_avx2.c

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,32 @@ void mld_poly_decompose_88_avx2(int32_t *a1, int32_t *a0)
7373
* _mm256_mulhi_epu16() below.
7474
*/
7575

76+
/* check-magic: 1560281088 == 1 / (1 / 1488 - 11275 / 2**24) */
7677
/*
7778
* Compute f1 = round-(f1' / B) ≈ round(f1' * 11275 / 2^24). This is exact
78-
* for 0 <= f1' < 2^16. Note that half is rounded down since 11275 / 2^24 ≲
79-
* 1 / 1488.
79+
* for 0 <= f1' < 2^16.
80+
*
81+
* To see this, consider the (signed) error f1' * (1 / B - 11275 / 2^24)
82+
* between f1' / B and the (under-)approximation f1' * 11275 / 2^24. Because
83+
* eps := 1 / B - 11275 / 2^24 is 1 / 1560281088 ≈ 2^(-30.54) < 2^(-30), we
84+
* have 0 <= f1' * eps < 2^16 * 2^(-30) = 1 / 2^14 < 1 / B (note that f1' is
85+
* non-negative).
86+
*
87+
* On the other hand, 1 / B is the spacing between the integral multiples
88+
* of 1 / B, which includes all rounding boundaries n + 0.5 (since B is
89+
* even). Hence, if f1' / B is not of the form n + 0.5, then it is at least
90+
* 1 / B away from the nearest rounding boundary, so moving from f1' / B to
91+
* f1' * 11275 / 2^24 does not affect the rounding result, no matter the
92+
* type of rounding used in either side. In particular, we have
93+
* round-(f1' / B) = round(f1' * 11275 / 2^24) as claimed.
94+
*
95+
* As for the remaining case where f1' / B _is_ of the form n + 0.5, because
96+
* f1' * 11275 / 2^24 is slightly but strictly below f1' / B = n + 0.5 (note
97+
* that f1' and thus the error f1' * eps cannot be 0 here), it is always
98+
* rounded down to n. More precisely, we have round-(f1' / B) =
99+
* round(f1' * 11275 / 2^24), where the round-down on the LHS is essential,
100+
* and on the RHS the type of rounding again does not matter. This concludes
101+
* the proof.
80102
*
81103
* round(f1' * 11275 / 2^24) is in turn computed in 2 steps as
82104
* round(floor(f1' * 11275 / 2^16) / 2^8). The mulhi computes f1'' =

mldsa/src/native/x86_64/src/poly_decompose_32_avx2.c

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,32 @@ void mld_poly_decompose_32_avx2(int32_t *a1, int32_t *a0)
7272
* _mm256_mulhi_epu16() below.
7373
*/
7474

75+
/* check-magic: 4290772992 == 1 / (1 / 4092 - 1025 / 2**22) */
7576
/*
7677
* Compute f1 = round-(f1' / B) ≈ round(f1' * 1025 / 2^22). This is exact
77-
* for 0 <= f1' < 2^16. Note that half is rounded down since 1025 / 2^22 ≲
78-
* 1 / 4092.
78+
* for 0 <= f1' < 2^16.
79+
*
80+
* To see this, consider the (signed) error f1' * (1 / B - 1025 / 2^22)
81+
* between f1' / B and the (under-)approximation f1' * 1025 / 2^22. Because
82+
* eps := 1 / B - 1025 / 2^22 is 1 / 4290772992 ≈ 2^(-31.99) < 2^(-31), we
83+
* have 0 <= f1' * eps < 2^16 * 2^(-31) = 1 / 2^15 < 1 / B (note that f1' is
84+
* non-negative).
85+
*
86+
* On the other hand, 1 / B is the spacing between the integral multiples
87+
* of 1 / B, which includes all rounding boundaries n + 0.5 (since B is
88+
* even). Hence, if f1' / B is not of the form n + 0.5, then it is at least
89+
* 1 / B away from the nearest rounding boundary, so moving from f1' / B to
90+
* f1' * 1025 / 2^22 does not affect the rounding result, no matter the
91+
* type of rounding used in either side. In particular, we have
92+
* round-(f1' / B) = round(f1' * 1025 / 2^22) as claimed.
93+
*
94+
* As for the remaining case where f1' / B _is_ of the form n + 0.5, because
95+
* f1' * 1025 / 2^22 is slightly but strictly below f1' / B = n + 0.5 (note
96+
* that f1' and thus the error f1' * eps cannot be 0 here), it is always
97+
* rounded down to n. More precisely, we have round-(f1' / B) =
98+
* round(f1' * 1025 / 2^22), where the round-down on the LHS is essential,
99+
* and on the RHS the type of rounding again does not matter. This concludes
100+
* the proof.
79101
*
80102
* round(f1' * 1025 / 2^22) is in turn computed in 2 steps as
81103
* round(floor(f1' * 1025 / 2^16) / 2^6). The mulhi computes f1'' =

mldsa/src/native/x86_64/src/poly_decompose_88_avx2.c

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,32 @@ void mld_poly_decompose_88_avx2(int32_t *a1, int32_t *a0)
7373
* _mm256_mulhi_epu16() below.
7474
*/
7575

76+
/* check-magic: 1560281088 == 1 / (1 / 1488 - 11275 / 2**24) */
7677
/*
7778
* Compute f1 = round-(f1' / B) ≈ round(f1' * 11275 / 2^24). This is exact
78-
* for 0 <= f1' < 2^16. Note that half is rounded down since 11275 / 2^24 ≲
79-
* 1 / 1488.
79+
* for 0 <= f1' < 2^16.
80+
*
81+
* To see this, consider the (signed) error f1' * (1 / B - 11275 / 2^24)
82+
* between f1' / B and the (under-)approximation f1' * 11275 / 2^24. Because
83+
* eps := 1 / B - 11275 / 2^24 is 1 / 1560281088 ≈ 2^(-30.54) < 2^(-30), we
84+
* have 0 <= f1' * eps < 2^16 * 2^(-30) = 1 / 2^14 < 1 / B (note that f1' is
85+
* non-negative).
86+
*
87+
* On the other hand, 1 / B is the spacing between the integral multiples
88+
* of 1 / B, which includes all rounding boundaries n + 0.5 (since B is
89+
* even). Hence, if f1' / B is not of the form n + 0.5, then it is at least
90+
* 1 / B away from the nearest rounding boundary, so moving from f1' / B to
91+
* f1' * 11275 / 2^24 does not affect the rounding result, no matter the
92+
* type of rounding used in either side. In particular, we have
93+
* round-(f1' / B) = round(f1' * 11275 / 2^24) as claimed.
94+
*
95+
* As for the remaining case where f1' / B _is_ of the form n + 0.5, because
96+
* f1' * 11275 / 2^24 is slightly but strictly below f1' / B = n + 0.5 (note
97+
* that f1' and thus the error f1' * eps cannot be 0 here), it is always
98+
* rounded down to n. More precisely, we have round-(f1' / B) =
99+
* round(f1' * 11275 / 2^24), where the round-down on the LHS is essential,
100+
* and on the RHS the type of rounding again does not matter. This concludes
101+
* the proof.
80102
*
81103
* round(f1' * 11275 / 2^24) is in turn computed in 2 steps as
82104
* round(floor(f1' * 11275 / 2^16) / 2^8). The mulhi computes f1'' =

0 commit comments

Comments
 (0)