gccrs: Fix 128-bit non-decimal integer literal saturation#4454
gccrs: Fix 128-bit non-decimal integer literal saturation#4454nsvke wants to merge 1 commit intoRust-GCC:masterfrom
Conversation
62acb63 to
927d24e
Compare
powerboat9
left a comment
There was a problem hiding this comment.
Just need to fix the endian-dependent behavior in the test
|
Thanks for the feedback! I'll update the test shortly. |
927d24e to
cd32199
Compare
| unsafe { | ||
| let hex_val: u128 = 0x10000000000000000_u128; | ||
| if (hex_val >> 64) as u8 != 1 { | ||
| abort(); |
There was a problem hiding this comment.
I would change this to return 1 and check for the status using dejaGNU to avoid relying on the extern abi and keep the test somewhat minimal.
There was a problem hiding this comment.
Thanks for the suggestion! I removed the extern abi and abort calls. The test now simply returns 1 on failure and 0 on success to leverage dejaGNU's exit status checking. It's indeed much cleaner and minimal this way.
0c95523 to
24c465b
Compare
gcc/rust/lex/rust-lex.cc
Outdated
There was a problem hiding this comment.
I don't think existent_str needs to be a parameter -- seems like it's just adding a 0x or 0 prefix
There was a problem hiding this comment.
You are absolutely right. The original code passed 0x or 0 via existent_str based on a switch-case. However, since my mpz implementation doesn't need the 0x prefix and we already have the base parameter, passing it through the signature is now redundant.
I will remove existent_str from the parameters, clean up the function signature, and rename the local variable to something more appropriate. I'll push the fix as soon as I get back to my computer.
This patch replaces std::strtol with GNU MP (GMP) for arbitrary-precision
parsing to properly support 128-bit literals.
Additionally, it refactors the lexer by removing the redundant
`existent_str` parameter from `parse_non_decimal_int_literal`. The base
is now passed directly, and the parsed digits are collected internally,
making the previous prefix-passing logic ("0x", "0b", "0o") obsolete
and ensuring cleaner compatibility with mpz_set_str.
gcc/rust/ChangeLog:
* lex/rust-lex.cc (Lexer::parse_non_decimal_int_literal): Use GMP
for base conversion to support 128-bit literals. Remove existent_str
parameter.
(Lexer::parse_non_decimal_int_literals): Remove prefix string
initialization and update function calls.
* lex/rust-lex.h (Lexer::parse_non_decimal_int_literal): Update
function signature to remove existent_str.
gcc/testsuite/ChangeLog:
* rust/execute/non_decimal_128_saturation.rs: New test.
Signed-off-by: Enes Cevik <nsvke@proton.me>
24c465b to
d9e6204
Compare
| mpz_set_str (dec_num, raw_str.c_str (), base); | ||
| char *s = mpz_get_str (NULL, 10, dec_num); | ||
| std::string dec_str = s; | ||
| free (s); |
There was a problem hiding this comment.
GMP may use a custom memory allocator, so free() isn't guaranteed to match.
take a look https://gmplib.org/manual/Converting-Integers
| free (s); | |
| void (*freefunc)(void *, size_t); | |
| mp_get_memory_functions (NULL, NULL, &freefunc); | |
| freefunc (s, strlen (s) + 1); |
There was a problem hiding this comment.
This document covers general usage. Looking at the GMP usages across the GCC source, they all call free() directly
| existent_str = std::to_string (dec_num); | ||
| mpz_t dec_num; | ||
| mpz_init (dec_num); | ||
| mpz_set_str (dec_num, raw_str.c_str (), base); |
There was a problem hiding this comment.
mpz_set_str return value unchecked return -1 on invalid input.
Digits are already validated by is_digit_func
assert checking this value would be dfensive.
| mpz_set_str (dec_num, raw_str.c_str (), base); | |
| int ret = mpz_set_str (dec_num, raw_str.c_str (), base); | |
| gcc_assert (ret == 0); |
There was a problem hiding this comment.
Yes, it should be fine since it's validated by is_digit_func. That said, it might still be a good idea, I'll add it.
moabo3li
left a comment
There was a problem hiding this comment.
It would be great to test the other integer types (u/i8, u/i16, u/i32, u/i64, and i128) using their maximum and minimum values.
Also, test an empty raw string.
If the input is 0x_u128, it should fail with the error no valid digits found for number, similar to rustc.
Note: mpz_set_str returns -1. I think the previous code had a similar issue with strtol, which returns 0.
The test currently checks the smallest value that exceeds the minimum for each type, i.e., it verifies whether an overflow occurs. As long as this works, I don't think testing min or max values would broaden the test coverage. |
We are aware of this. There is an issue for it (#4490) and you can check the related discussions on Zulip. This was not addressed in this PR to avoid scope creep. |
I was thinking that since we are going to change the way we parse literals, this will not only affect 128-bit types. |
Description
This PR addresses the saturation bug where large non-decimal integer literals (such as 128-bit hex, bin, and octal values) were truncated to 64-bit limits (
LONG_MAX) during the lexing phase.Key Changes:
std::strtolinLexer::parse_non_decimal_int_literalwith GNU MP (GMP) for arbitrary-precision base conversion.Lexer::parse_non_decimal_int_literalsto pass the raw string without prepending the"0x"prefix, ensuring compatibility withmpz_set_str.non_decimal_128_saturation.rs) to verify thatgcc/rust/ChangeLog:
gcc/testsuite/ChangeLog:
Closes #4453