Skip to content

Conversation

@parth-07
Copy link
Contributor

@parth-07 parth-07 commented Jul 2, 2025

The linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen.

The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered.

readExpr() calls readPrimary()
readPrimary() calls readParenExpr()

readParenExpr():

expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...

Closes #146722

@github-actions
Copy link

github-actions bot commented Jul 2, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2025

@llvm/pr-subscribers-lld

Author: Parth (parth-07)

Changes

The linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen.

The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered.

readExpr() calls readPrimary()
readPrimary() calls readParenExpr()

readParenExpr():

expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...

Closes #146722


Full diff: https://github.com/llvm/llvm-project/pull/146723.diff

2 Files Affected:

  • (modified) lld/ELF/ScriptParser.cpp (+3)
  • (modified) lld/test/ELF/linkerscript/align-section.test (+16-1)
diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 593d5636f2455..58b4239c4aefd 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1229,6 +1229,9 @@ SymbolAssignment *ScriptParser::readSymbolAssignment(StringRef name) {
 // This is an operator-precedence parser to parse a linker
 // script expression.
 Expr ScriptParser::readExpr() {
+  // Do not try to read expression if an error has already been encountered.
+  if (atEOF())
+    return {};
   // Our lexer is context-aware. Set the in-expression bit so that
   // they apply different tokenization rules.
   SaveAndRestore saved(lexState, State::Expr);
diff --git a/lld/test/ELF/linkerscript/align-section.test b/lld/test/ELF/linkerscript/align-section.test
index 7a28fef2076ed..db99dc82f5514 100644
--- a/lld/test/ELF/linkerscript/align-section.test
+++ b/lld/test/ELF/linkerscript/align-section.test
@@ -1,7 +1,22 @@
 # REQUIRES: x86
+# RUN: rm -rf %t && split-file %s %t
+
 # RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux /dev/null -o %t.o
-# RUN: ld.lld -o %t --script %s %t.o -shared
+# RUN: ld.lld -o %t.1.out --script %t/a.t %t.o -shared
 
 # lld shouldn't crash.
 
+#--- a.t
 SECTIONS { .foo : ALIGN(2M) {} }
+
+#--- b.t
+SECTIONS
+{
+  S :ALIGN(4096) {}
+}
+
+# RUN: not ld.lld -o /dev/null --script %t/b.t 2>&1 | FileCheck %s
+
+CHECK: error: {{.*}}b.t:3: malformed number: :
+CHECK: >>>   S :ALIGN(4096) {}
+CHECK: >>>

@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2025

@llvm/pr-subscribers-lld-elf

Author: Parth (parth-07)

Changes

The linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen.

The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered.

readExpr() calls readPrimary()
readPrimary() calls readParenExpr()

readParenExpr():

expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...

Closes #146722


Full diff: https://github.com/llvm/llvm-project/pull/146723.diff

2 Files Affected:

  • (modified) lld/ELF/ScriptParser.cpp (+3)
  • (modified) lld/test/ELF/linkerscript/align-section.test (+16-1)
diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 593d5636f2455..58b4239c4aefd 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1229,6 +1229,9 @@ SymbolAssignment *ScriptParser::readSymbolAssignment(StringRef name) {
 // This is an operator-precedence parser to parse a linker
 // script expression.
 Expr ScriptParser::readExpr() {
+  // Do not try to read expression if an error has already been encountered.
+  if (atEOF())
+    return {};
   // Our lexer is context-aware. Set the in-expression bit so that
   // they apply different tokenization rules.
   SaveAndRestore saved(lexState, State::Expr);
diff --git a/lld/test/ELF/linkerscript/align-section.test b/lld/test/ELF/linkerscript/align-section.test
index 7a28fef2076ed..db99dc82f5514 100644
--- a/lld/test/ELF/linkerscript/align-section.test
+++ b/lld/test/ELF/linkerscript/align-section.test
@@ -1,7 +1,22 @@
 # REQUIRES: x86
+# RUN: rm -rf %t && split-file %s %t
+
 # RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux /dev/null -o %t.o
-# RUN: ld.lld -o %t --script %s %t.o -shared
+# RUN: ld.lld -o %t.1.out --script %t/a.t %t.o -shared
 
 # lld shouldn't crash.
 
+#--- a.t
 SECTIONS { .foo : ALIGN(2M) {} }
+
+#--- b.t
+SECTIONS
+{
+  S :ALIGN(4096) {}
+}
+
+# RUN: not ld.lld -o /dev/null --script %t/b.t 2>&1 | FileCheck %s
+
+CHECK: error: {{.*}}b.t:3: malformed number: :
+CHECK: >>>   S :ALIGN(4096) {}
+CHECK: >>>

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I've got a small suggestion that means we don't need the comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work, although I think it may be more idiomatic to use

if (errCount(ctx))
  return 0;

That way you don't need the comment to explain that atEOF will return true if there's an error. The return 0 comes from other places where an ErrAlways has occurred, such as

      ErrAlways(ctx) << loc << ": division by zero";
      return 0;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the CI this may have caused a problem with
FAIL: lld :: ELF/linkerscript/custom-section-type.s (1500 of 3105)

2025-07-02T15:45:44.9280956Z ld.lld: warning: section type mismatch for progbits
2025-07-02T15:45:44.9282076Z >>> /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lld/test/ELF/linkerscript/Output/custom-section-type.s.tmp/mismatch.o:(progbits): SHT_NOTE
2025-07-02T15:45:44.9283202Z >>> output section progbits: SHT_PROGBITS
2025-07-02T15:45:44.9283407Z 
2025-07-02T15:45:44.9283520Z ld.lld: warning: section type mismatch for expr
2025-07-02T15:45:44.9284363Z >>> /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lld/test/ELF/linkerscript/Output/custom-section-type.s.tmp/mismatch.o:(expr): Unknown
2025-07-02T15:45:44.9285740Z >>> output section expr: Unknown

Assuming this is related to the patch. It may be that we've terminated too early before enough context for the error message can be accumulated. Which may mean that the check needs to be put closer to the point where an infinite loop may occur. Or we need a different approach.

Copy link
Contributor Author

@parth-07 parth-07 Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your inputs @smithp35.

although I think it may be more idiomatic to use

 if (errCount(ctx))
   return 0;

I agree that errCount(ctx) would work as well, however, I think it helps to make the code flow easier to understand and more intuitive if the behavior of parsing expression is consistent for both the error-case and the actual end-of-file case, and atEOF() covers both these cases. Please let me know your thoughts on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is related to the patch. It may be that we've terminated too early before enough context for the error message can be accumulated. Which may mean that the check needs to be put closer to the point where an infinite loop may occur. Or we need a different approach.

Yes, it was related to the patch. Thank you for sharing your thoughts. It turned out the error was due to the return value of getExpr() in the error path. The return value was an empty function object. It should be a 0-value equivalent of the lld::elf::Expr. I have fixed the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a small suggestion that means we don't need the comment.

I agree that the comment was a little redundant. I have removed the comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. This looks OK to me. I've added the maintainer MaskRay and MysteryMath, who has been involved in linker script parsing recently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me as well. A few parser functions call atEOF() at the beginning, when they expect to consume at least one token.

@parth-07 parth-07 requested a review from smithp35 July 2, 2025 20:01
@parth-07 parth-07 force-pushed the AlignParseIssue branch 2 times, most recently from ddb5aaf to 11364e6 Compare July 2, 2025 20:19
@smithp35 smithp35 requested review from MaskRay and mysterymath July 3, 2025 09:23
@parth-07 parth-07 requested a review from MaskRay July 3, 2025 17:40
@parth-07 parth-07 requested a review from MaskRay July 5, 2025 11:20
The linker was crashing due to stack overflow when parsing ':ALIGN' in
an output section description. This commit fixes the linker script
parser so that the crash does not happen.

The root cause of the stack overflow is how we parse expressions
(readExpr) in linker script and the behavior of ScriptLexer::expect(...)
utility. ScriptLexer::expect does not do anything if errors have already
been encountered during linker script parsing. In particular, it never
increments the current token position in the script file, even if the
current token is the same as the expected token. This causes an infinite
call cycle on parsing an expression such as '(4096)' when an error
has already been encountered.

readExpr() calls readPrimary()
readPrimary() calls readParenExpr()

readParenExpr():

  expect("("); // no-op, current token still points to '('
  Expression *E = readExpr(); // The cycle continues...

Closes llvm#146722

Signed-off-by: Parth Arora <[email protected]>
@parth-07 parth-07 requested a review from MaskRay July 5, 2025 17:21
@MaskRay MaskRay merged commit 923a3cc into llvm:main Jul 6, 2025
9 checks passed
@github-actions
Copy link

github-actions bot commented Jul 6, 2025

@parth-07 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LLD] Linker crashes when trying to parse ':ALIGN'

4 participants