- 
                Notifications
    
You must be signed in to change notification settings  - Fork 15.1k
 
[LLD] Fix crash on parsing ':ALIGN' in linker script #146723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
          
 Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using  If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.  | 
    
| 
          
 @llvm/pr-subscribers-lld Author: Parth (parth-07) ChangesThe linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen. The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered. readExpr() calls readPrimary() readParenExpr(): expect("("); // no-op, current token still points to '(' Closes #146722 Full diff: https://github.com/llvm/llvm-project/pull/146723.diff 2 Files Affected: 
 diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 593d5636f2455..58b4239c4aefd 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1229,6 +1229,9 @@ SymbolAssignment *ScriptParser::readSymbolAssignment(StringRef name) {
 // This is an operator-precedence parser to parse a linker
 // script expression.
 Expr ScriptParser::readExpr() {
+  // Do not try to read expression if an error has already been encountered.
+  if (atEOF())
+    return {};
   // Our lexer is context-aware. Set the in-expression bit so that
   // they apply different tokenization rules.
   SaveAndRestore saved(lexState, State::Expr);
diff --git a/lld/test/ELF/linkerscript/align-section.test b/lld/test/ELF/linkerscript/align-section.test
index 7a28fef2076ed..db99dc82f5514 100644
--- a/lld/test/ELF/linkerscript/align-section.test
+++ b/lld/test/ELF/linkerscript/align-section.test
@@ -1,7 +1,22 @@
 # REQUIRES: x86
+# RUN: rm -rf %t && split-file %s %t
+
 # RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux /dev/null -o %t.o
-# RUN: ld.lld -o %t --script %s %t.o -shared
+# RUN: ld.lld -o %t.1.out --script %t/a.t %t.o -shared
 
 # lld shouldn't crash.
 
+#--- a.t
 SECTIONS { .foo : ALIGN(2M) {} }
+
+#--- b.t
+SECTIONS
+{
+  S :ALIGN(4096) {}
+}
+
+# RUN: not ld.lld -o /dev/null --script %t/b.t 2>&1 | FileCheck %s
+
+CHECK: error: {{.*}}b.t:3: malformed number: :
+CHECK: >>>   S :ALIGN(4096) {}
+CHECK: >>>
 | 
    
| 
          
 @llvm/pr-subscribers-lld-elf Author: Parth (parth-07) ChangesThe linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen. The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered. readExpr() calls readPrimary() readParenExpr(): expect("("); // no-op, current token still points to '(' Closes #146722 Full diff: https://github.com/llvm/llvm-project/pull/146723.diff 2 Files Affected: 
 diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 593d5636f2455..58b4239c4aefd 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1229,6 +1229,9 @@ SymbolAssignment *ScriptParser::readSymbolAssignment(StringRef name) {
 // This is an operator-precedence parser to parse a linker
 // script expression.
 Expr ScriptParser::readExpr() {
+  // Do not try to read expression if an error has already been encountered.
+  if (atEOF())
+    return {};
   // Our lexer is context-aware. Set the in-expression bit so that
   // they apply different tokenization rules.
   SaveAndRestore saved(lexState, State::Expr);
diff --git a/lld/test/ELF/linkerscript/align-section.test b/lld/test/ELF/linkerscript/align-section.test
index 7a28fef2076ed..db99dc82f5514 100644
--- a/lld/test/ELF/linkerscript/align-section.test
+++ b/lld/test/ELF/linkerscript/align-section.test
@@ -1,7 +1,22 @@
 # REQUIRES: x86
+# RUN: rm -rf %t && split-file %s %t
+
 # RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux /dev/null -o %t.o
-# RUN: ld.lld -o %t --script %s %t.o -shared
+# RUN: ld.lld -o %t.1.out --script %t/a.t %t.o -shared
 
 # lld shouldn't crash.
 
+#--- a.t
 SECTIONS { .foo : ALIGN(2M) {} }
+
+#--- b.t
+SECTIONS
+{
+  S :ALIGN(4096) {}
+}
+
+# RUN: not ld.lld -o /dev/null --script %t/b.t 2>&1 | FileCheck %s
+
+CHECK: error: {{.*}}b.t:3: malformed number: :
+CHECK: >>>   S :ALIGN(4096) {}
+CHECK: >>>
 | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I've got a small suggestion that means we don't need the comment.
        
          
                lld/ELF/ScriptParser.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will work, although I think it may be more idiomatic to use
if (errCount(ctx))
  return 0;
That way you don't need the comment to explain that atEOF will return true if there's an error. The return 0 comes from other places where an ErrAlways has occurred, such as
      ErrAlways(ctx) << loc << ": division by zero";
      return 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the CI this may have caused a problem with
FAIL: lld :: ELF/linkerscript/custom-section-type.s (1500 of 3105)
2025-07-02T15:45:44.9280956Z ld.lld: warning: section type mismatch for progbits
2025-07-02T15:45:44.9282076Z >>> /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lld/test/ELF/linkerscript/Output/custom-section-type.s.tmp/mismatch.o:(progbits): SHT_NOTE
2025-07-02T15:45:44.9283202Z >>> output section progbits: SHT_PROGBITS
2025-07-02T15:45:44.9283407Z 
2025-07-02T15:45:44.9283520Z ld.lld: warning: section type mismatch for expr
2025-07-02T15:45:44.9284363Z >>> /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lld/test/ELF/linkerscript/Output/custom-section-type.s.tmp/mismatch.o:(expr): Unknown
2025-07-02T15:45:44.9285740Z >>> output section expr: Unknown
Assuming this is related to the patch. It may be that we've terminated too early before enough context for the error message can be accumulated. Which may mean that the check needs to be put closer to the point where an infinite loop may occur. Or we need a different approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your inputs @smithp35.
although I think it may be more idiomatic to use
 if (errCount(ctx))
   return 0;
I agree that errCount(ctx) would work as well, however, I think it helps to make the code flow easier to understand and more intuitive if the behavior of parsing expression is consistent for both the error-case and the actual end-of-file case, and atEOF() covers both these cases. Please let me know your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is related to the patch. It may be that we've terminated too early before enough context for the error message can be accumulated. Which may mean that the check needs to be put closer to the point where an infinite loop may occur. Or we need a different approach.
Yes, it was related to the patch. Thank you for sharing your thoughts. It turned out the error was due to the return value of getExpr() in the error path. The return value was an empty function object. It should be a 0-value equivalent of the lld::elf::Expr. I have fixed the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a small suggestion that means we don't need the comment.
I agree that the comment was a little redundant. I have removed the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. This looks OK to me. I've added the maintainer MaskRay and MysteryMath, who has been involved in linker script parsing recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me as well. A few parser functions call atEOF() at the beginning, when they expect to consume at least one token.
ddb5aaf    to
    11364e6      
    Compare
  
    The linker was crashing due to stack overflow when parsing ':ALIGN' in
an output section description. This commit fixes the linker script
parser so that the crash does not happen.
The root cause of the stack overflow is how we parse expressions
(readExpr) in linker script and the behavior of ScriptLexer::expect(...)
utility. ScriptLexer::expect does not do anything if errors have already
been encountered during linker script parsing. In particular, it never
increments the current token position in the script file, even if the
current token is the same as the expected token. This causes an infinite
call cycle on parsing an expression such as '(4096)' when an error
has already been encountered.
readExpr() calls readPrimary()
readPrimary() calls readParenExpr()
readParenExpr():
  expect("("); // no-op, current token still points to '('
  Expression *E = readExpr(); // The cycle continues...
Closes llvm#146722
Signed-off-by: Parth Arora <[email protected]>
    | 
          
 @parth-07 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done!  | 
    
The linker was crashing due to stack overflow when parsing ':ALIGN' in an output section description. This commit fixes the linker script parser so that the crash does not happen.
The root cause of the stack overflow is how we parse expressions (readExpr) in linker script and the behavior of ScriptLexer::expect(...) utility. ScriptLexer::expect does not do anything if errors have already been encountered during linker script parsing. In particular, it never increments the current token position in the script file, even if the current token is the same as the expected token. This causes an infinite call cycle on parsing an expression such as '(4096)' when an error has already been encountered.
readExpr() calls readPrimary()
readPrimary() calls readParenExpr()
readParenExpr():
expect("("); // no-op, current token still points to '('
Expression *E = readExpr(); // The cycle continues...
Closes #146722