Skip to content

Conversation

@smilczek
Copy link
Contributor

@smilczek smilczek commented Aug 22, 2025

AsmLexer expects the buffer it's provided for lexing to be
NULL-terminated, where the NULL terminator is pointed to by
CurBuf.end(). However, this expectation isn't explicitly stated
anywhere.

This commit adds a couple of comments as well as an assert as means of
documenting this expectation.

@llvmbot llvmbot added the llvm:mc Machine (object) code label Aug 22, 2025
@smilczek smilczek changed the title [MCParser] AsmLexer invalid read fix. [LLVM] AsmLexer invalid read fix. Aug 22, 2025
@smilczek
Copy link
Contributor Author

@MaskRay

@MaskRay
Copy link
Member

MaskRay commented Aug 22, 2025

Need a llvm/test/MC/AsmParser test to confirm we actually have the out-of-bounds error and fix it.

@nikic nikic changed the title [LLVM] AsmLexer invalid read fix. [MC] AsmLexer invalid read fix. Aug 22, 2025
@smilczek
Copy link
Contributor Author

smilczek commented Sep 9, 2025

Hey, @MaskRay

I added a test case where invalid read is performed without the CurPtr != CurBuf.end() check.

In order to detect the out of bounds access I modified the while loop condition to access the memory through CurBuf, which being a StringRef contains asserts.
However, not to introduce any overhead that might come with this approach, I made it so it happens only on debug builds.

The test I added will be reported a failure on debug builds unless CurPtr != CurBuf.end() condition is present.

This is the simplest approach to creating the test that I came up with but there might be a better way.
If something comes to mind please let me know!

Regards,
Szymon

@smilczek
Copy link
Contributor Author

Ping

@smilczek
Copy link
Contributor Author

@MaskRay Ping

@smilczek
Copy link
Contributor Author

Ping

@github-actions
Copy link

github-actions bot commented Oct 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

if (Lexer.getTok().getKind() == AsmToken::Error)
Error = true;
}
ASSERT_TRUE(Error == false);
Copy link
Contributor Author

@smilczek smilczek Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to EXPECT_THROW here, unfortunately exceptions are disabled.
The test (and with the CI Checks) will fail in current state. Is there a good way to expect failure in unit tests?

This is meant to test if null-termination is at least checked for. A lit test cannot be created, because llvm-mc will always null-terminate whatever buffer it's provided.

Copy link
Member

@MaskRay MaskRay Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

googletest supports https://google.github.io/googletest/reference/assertions.html#death but llvm-project doesn't use the feature. There is possibly portability issue, but if the CI is ok I think it's good to have it.

I think it's also fine not to have the coverage (i.e. drop this negative test), but thanks for thinking about it and adding a MCParser unittest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unittest removed in 92a16a4

// `Buf.end()`. It must be safe to dereference `Buf.end()`.
assert(*Buf.end() == '\0' &&
"Buffer provided to AsmLexer lacks null terminator.");

Copy link
Contributor Author

@smilczek smilczek Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time playing around with buffer allocation in order to ensure buffer allocated in by llvm-mc doesn't allocate any additional memory (WritableMemoryBuffer::getNewUninitMemBuffer adds BufAlign.value() to RealLen instead of BufAlign.value() - 1, which means the buffer will always have at least 1 extra byte allocated)

After a while I finally realized that AsmLexer always expects that the null terminator of the buffer is present and it is NOT included in the buffer's length. That means that CurBuf.end() will always point to \0 and CurBuf.end() is always valid memory.

Unfortunately I couldn't find this expectation documented anywhere.
It also doesn't seem to be enforced anywhere in AsmLexer.
In contrast, it seems that AsmParser with LLLexer (not MCAsmParser) does enforce null terminator.

I could patch AsmLexer to support non-null-terminated buffers, however is that a change we desire?
Should AsmLexer also enforce null terminator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To additionally clarify, since I didn't mention it above, if a buffer IS null-terminated, however the null terminator resides at CurBuf.end() - 1 (which doesn't seem unreasonable), the problem with invalid reads will appear.

bool EndStatementAtEOF) {
// Null terminator must be part of the actual buffer. It must reside at
// `Buf.end()`. It must be safe to dereference `Buf.end()`.
assert(*Buf.end() == '\0' &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the best way to test for null-termination. Invalid read won't be detected by non-LLVM_USE_SANITIZER=Address builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's right to require a NUL terminator, like regular MemoryBuffer APIs provide (getFileOrSTDIN MemoryBuffer::getFile.

We should add a comment to the header file llvm/include/llvm/MC/MCParser/AsmLexer.h about this requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added in 8a0b79b

@smilczek smilczek changed the title [MC] AsmLexer invalid read fix. [MC] AsmLexer assert buffer is null-terminated at CurBuf.end() Oct 16, 2025
@smilczek
Copy link
Contributor Author

My confusion arose from the fact that it is very often the case with buffers (arrays, vectors), that the .end() method returns an 'Iterator past the last element'.

String is an exception, and in its case, the .end() method instead returns a pointer to the null-terminator.
Looks like AsmLexer also expects this property from StringRef by extension, but StringRef's buffer is not necessarily required to be null-terminated.

When receiving a buffer to be parsed, it seems reasonable not to require a null terminator and if the null terminator is present, to treat it as part of the buffer and include it in size passed to StringRef's constructor.

@smilczek smilczek requested a review from MaskRay October 16, 2025 08:24
@smilczek smilczek force-pushed the AsmParser-patch branch 2 times, most recently from c57126b to 18ecc9c Compare October 20, 2025 12:03
AsmLexer expects the buffer it's provided for lexing to be
NULL-terminated, where the NULL terminator is pointed to by
`CurBuf.end()`. However, this expectation isn't explicitly stated
anywhere.

This commit adds a couple of comments as well as an assert as means of
documenting this expectation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:mc Machine (object) code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants