Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions llvm/include/llvm/MC/MCParser/AsmLexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ class AsmLexer {
SmallVector<AsmToken, 1> CurTok;

const char *CurPtr = nullptr;
/// NULL-terminated buffer. NULL terminator must reside at `CurBuf.end()`.
StringRef CurBuf;

/// The location and description of the current error
Expand Down Expand Up @@ -191,6 +192,12 @@ class AsmLexer {
/// literals.
void setLexHLASMStrings(bool V) { LexHLASMStrings = V; }

/// Set buffer to be lexed.
/// `Buf` must be NULL-terminated. NULL terminator must reside at `Buf.end()`.
/// `ptr` if provided must be in range [`Buf.begin()`, `buf.end()`] or NULL.
/// Specifies where lexing of buffer should begin.
/// `EndStatementAtEOF` specifies whether `AsmToken::EndOfStatement` should be
/// returned upon reaching end of buffer.
LLVM_ABI void setBuffer(StringRef Buf, const char *ptr = nullptr,
bool EndStatementAtEOF = true);

Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/MC/MCParser/AsmLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,11 @@ AsmLexer::AsmLexer(const MCAsmInfo &MAI) : MAI(MAI) {

void AsmLexer::setBuffer(StringRef Buf, const char *ptr,
bool EndStatementAtEOF) {
// Buffer must be NULL-terminated. NULL terminator must reside at `Buf.end()`.
// It must be safe to dereference `Buf.end()`.
assert(*Buf.end() == '\0' &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the best way to test for null-termination. Invalid read won't be detected by non-LLVM_USE_SANITIZER=Address builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's right to require a NUL terminator, like regular MemoryBuffer APIs provide (getFileOrSTDIN MemoryBuffer::getFile.

We should add a comment to the header file llvm/include/llvm/MC/MCParser/AsmLexer.h about this requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added in 8a0b79b

"Buffer provided to AsmLexer lacks null terminator.");

Copy link
Contributor Author

@smilczek smilczek Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time playing around with buffer allocation in order to ensure buffer allocated in by llvm-mc doesn't allocate any additional memory (WritableMemoryBuffer::getNewUninitMemBuffer adds BufAlign.value() to RealLen instead of BufAlign.value() - 1, which means the buffer will always have at least 1 extra byte allocated)

After a while I finally realized that AsmLexer always expects that the null terminator of the buffer is present and it is NOT included in the buffer's length. That means that CurBuf.end() will always point to \0 and CurBuf.end() is always valid memory.

Unfortunately I couldn't find this expectation documented anywhere.
It also doesn't seem to be enforced anywhere in AsmLexer.
In contrast, it seems that AsmParser with LLLexer (not MCAsmParser) does enforce null terminator.

I could patch AsmLexer to support non-null-terminated buffers, however is that a change we desire?
Should AsmLexer also enforce null terminator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To additionally clarify, since I didn't mention it above, if a buffer IS null-terminated, however the null terminator resides at CurBuf.end() - 1 (which doesn't seem unreasonable), the problem with invalid reads will appear.

CurBuf = Buf;

if (ptr)
Expand Down