Skip to content

gccrs: Fix 128-bit non-decimal integer literal saturation#4454

Open
nsvke wants to merge 1 commit intoRust-GCC:masterfrom
nsvke:fix-non-decimal-128-saturation
Open

gccrs: Fix 128-bit non-decimal integer literal saturation#4454
nsvke wants to merge 1 commit intoRust-GCC:masterfrom
nsvke:fix-non-decimal-128-saturation

Conversation

@nsvke
Copy link
Contributor

@nsvke nsvke commented Feb 28, 2026

Description

This PR addresses the saturation bug where large non-decimal integer literals (such as 128-bit hex, bin, and octal values) were truncated to 64-bit limits (LONG_MAX) during the lexing phase.

Key Changes:

  • Replaced the usage of std::strtol in Lexer::parse_non_decimal_int_literal with GNU MP (GMP) for arbitrary-precision base conversion.
  • Corrected the hex dispatcher in Lexer::parse_non_decimal_int_literals to pass the raw string without prepending the "0x" prefix, ensuring compatibility with mpz_set_str.
  • Added a regression test (non_decimal_128_saturation.rs) to verify that $2^{64}$ limits are properly exceeded without truncation.

gcc/rust/ChangeLog:

* lex/rust-lex.cc (Lexer::parse_non_decimal_int_literal): Use GMP for base conversion to support 128-bit literals.
(Lexer::parse_non_decimal_int_literals): Fix hex prefix inconsistency by passing pure string.

gcc/testsuite/ChangeLog:

* rust/execute/non_decimal_128_saturation.rs: New test.

Closes #4453

@nsvke nsvke force-pushed the fix-non-decimal-128-saturation branch from 62acb63 to 927d24e Compare February 28, 2026 18:52
Copy link
Collaborator

@powerboat9 powerboat9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to fix the endian-dependent behavior in the test

@nsvke
Copy link
Contributor Author

nsvke commented Feb 28, 2026

Thanks for the feedback! I'll update the test shortly.

@nsvke nsvke force-pushed the fix-non-decimal-128-saturation branch from 927d24e to cd32199 Compare March 1, 2026 08:14
@nsvke nsvke requested a review from powerboat9 March 1, 2026 08:23
unsafe {
let hex_val: u128 = 0x10000000000000000_u128;
if (hex_val >> 64) as u8 != 1 {
abort();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to return 1 and check for the status using dejaGNU to avoid relying on the extern abi and keep the test somewhat minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I removed the extern abi and abort calls. The test now simply returns 1 on failure and 0 on success to leverage dejaGNU's exit status checking. It's indeed much cleaner and minimal this way.

@nsvke nsvke force-pushed the fix-non-decimal-128-saturation branch 2 times, most recently from 0c95523 to 24c465b Compare March 6, 2026 13:25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think existent_str needs to be a parameter -- seems like it's just adding a 0x or 0 prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. The original code passed 0x or 0 via existent_str based on a switch-case. However, since my mpz implementation doesn't need the 0x prefix and we already have the base parameter, passing it through the signature is now redundant.

I will remove existent_str from the parameters, clean up the function signature, and rename the local variable to something more appropriate. I'll push the fix as soon as I get back to my computer.

This patch replaces std::strtol with GNU MP (GMP) for arbitrary-precision
parsing to properly support 128-bit literals.

Additionally, it refactors the lexer by removing the redundant
`existent_str` parameter from `parse_non_decimal_int_literal`. The base
is now passed directly, and the parsed digits are collected internally,
making the previous prefix-passing logic ("0x", "0b", "0o") obsolete
and ensuring cleaner compatibility with mpz_set_str.

gcc/rust/ChangeLog:

	* lex/rust-lex.cc (Lexer::parse_non_decimal_int_literal): Use GMP
	for base conversion to support 128-bit literals. Remove existent_str
	parameter.
	(Lexer::parse_non_decimal_int_literals): Remove prefix string
	initialization and update function calls.
	* lex/rust-lex.h (Lexer::parse_non_decimal_int_literal): Update
	function signature to remove existent_str.

gcc/testsuite/ChangeLog:

	* rust/execute/non_decimal_128_saturation.rs: New test.

Signed-off-by: Enes Cevik <nsvke@proton.me>
@nsvke nsvke force-pushed the fix-non-decimal-128-saturation branch from 24c465b to d9e6204 Compare March 17, 2026 11:12
mpz_set_str (dec_num, raw_str.c_str (), base);
char *s = mpz_get_str (NULL, 10, dec_num);
std::string dec_str = s;
free (s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GMP may use a custom memory allocator, so free() isn't guaranteed to match.
take a look https://gmplib.org/manual/Converting-Integers

Suggested change
free (s);
void (*freefunc)(void *, size_t);
mp_get_memory_functions (NULL, NULL, &freefunc);
freefunc (s, strlen (s) + 1);

https://gmplib.org/manual/Custom-Allocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document covers general usage. Looking at the GMP usages across the GCC source, they all call free() directly

existent_str = std::to_string (dec_num);
mpz_t dec_num;
mpz_init (dec_num);
mpz_set_str (dec_num, raw_str.c_str (), base);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpz_set_str return value unchecked return -1 on invalid input.
Digits are already validated by is_digit_func
assert checking this value would be dfensive.

Suggested change
mpz_set_str (dec_num, raw_str.c_str (), base);
int ret = mpz_set_str (dec_num, raw_str.c_str (), base);
gcc_assert (ret == 0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be fine since it's validated by is_digit_func. That said, it might still be a good idea, I'll add it.

Copy link
Contributor

@moabo3li moabo3li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to test the other integer types (u/i8, u/i16, u/i32, u/i64, and i128) using their maximum and minimum values.
Also, test an empty raw string.

If the input is 0x_u128, it should fail with the error no valid digits found for number, similar to rustc.

Note: mpz_set_str returns -1. I think the previous code had a similar issue with strtol, which returns 0.

@nsvke
Copy link
Contributor Author

nsvke commented Mar 21, 2026

It would be great to test the other integer types (u/i8, u/i16, u/i32, u/i64, and i128) using their maximum and minimum values.

The test currently checks the smallest value that exceeds the minimum for each type, i.e., it verifies whether an overflow occurs. As long as this works, I don't think testing min or max values would broaden the test coverage.

@nsvke
Copy link
Contributor Author

nsvke commented Mar 21, 2026

If the input is 0x_u128, it should fail with the error no valid digits found for number, similar to rustc.

We are aware of this. There is an issue for it (#4490) and you can check the related discussions on Zulip. This was not addressed in this PR to avoid scope creep.

@moabo3li
Copy link
Contributor

It would be great to test the other integer types (u/i8, u/i16, u/i32, u/i64, and i128) using their maximum and minimum values.

The test currently checks the smallest value that exceeds the minimum for each type, i.e., it verifies whether an overflow occurs. As long as this works, I don't think testing min or max values would broaden the test coverage.

I was thinking that since we are going to change the way we parse literals, this will not only affect 128-bit types.
We already encountered some edge cases, like Alpine 32-bit CI failing at the maximum value, and there are no tests covering this.
(If this is outside the scope of this PR, that’s totally fine.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lexer: Non-decimal integer literals (hex, bin, oct) are saturated at 64-bit limits

4 participants