Skip to content

Conversation

@cjacek
Copy link
Contributor

@cjacek cjacek commented Oct 14, 2024

The assumption that a symbol is either Defined or Undefined is not always true for some cases. For example, mangleMaybe may create a weak alias to a lazy archive symbol.

The assumption that a symbol is either `Defined` or `Undefined` is not always true
for some cases. For example, `mangleMaybe` may create a weak alias to a lazy archive symbol.
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-lld

Author: Jacek Caban (cjacek)

Changes

The assumption that a symbol is either Defined or Undefined is not always true for some cases. For example, mangleMaybe may create a weak alias to a lazy archive symbol.


Full diff: https://github.com/llvm/llvm-project/pull/112243.diff

4 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+2-2)
  • (modified) lld/COFF/Symbols.cpp (+4-4)
  • (modified) lld/COFF/Symbols.h (+4-1)
  • (added) lld/test/COFF/weak-lazy.s (+18)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 85a58a3677181a..12e1ae62811270 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -1340,7 +1340,7 @@ void LinkerDriver::maybeCreateECExportThunk(StringRef name, Symbol *&sym) {
   if (!sym)
     return;
   if (auto undef = dyn_cast<Undefined>(sym))
-    def = undef->getWeakAlias();
+    def = undef->getDefinedWeakAlias();
   else
     def = dyn_cast<Defined>(sym);
   if (!def)
@@ -1376,7 +1376,7 @@ void LinkerDriver::createECExportThunks() {
       continue;
     Defined *targetSym;
     if (auto undef = dyn_cast<Undefined>(sym))
-      targetSym = undef->getWeakAlias();
+      targetSym = undef->getDefinedWeakAlias();
     else
       targetSym = dyn_cast<Defined>(sym);
     if (!targetSym)
diff --git a/lld/COFF/Symbols.cpp b/lld/COFF/Symbols.cpp
index 567c2b93776c94..89f2da02bdcf42 100644
--- a/lld/COFF/Symbols.cpp
+++ b/lld/COFF/Symbols.cpp
@@ -112,12 +112,12 @@ DefinedImportThunk::DefinedImportThunk(COFFLinkerContext &ctx, StringRef name,
                                        ImportThunkChunk *chunk)
     : Defined(DefinedImportThunkKind, name), wrappedSym(s), data(chunk) {}
 
-Defined *Undefined::getWeakAlias() {
+Symbol *Undefined::getWeakAlias() {
   // A weak alias may be a weak alias to another symbol, so check recursively.
   DenseSet<Symbol *> weakChain;
   for (Symbol *a = weakAlias; a; a = cast<Undefined>(a)->weakAlias) {
-    if (auto *d = dyn_cast<Defined>(a))
-      return d;
+    if (!isa<Undefined>(a))
+      return a;
     if (!weakChain.insert(a).second)
       break; // We have a cycle.
   }
@@ -125,7 +125,7 @@ Defined *Undefined::getWeakAlias() {
 }
 
 bool Undefined::resolveWeakAlias() {
-  Defined *d = getWeakAlias();
+  Defined *d = getDefinedWeakAlias();
   if (!d)
     return false;
 
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 9b21e09bf83a42..a898ebf05fd809 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -340,7 +340,10 @@ class Undefined : public Symbol {
   // If this symbol is external weak, try to resolve it to a defined
   // symbol by searching the chain of fallback symbols. Returns the symbol if
   // successful, otherwise returns null.
-  Defined *getWeakAlias();
+  Symbol *getWeakAlias();
+  Defined *getDefinedWeakAlias() {
+    return dyn_cast_or_null<Defined>(getWeakAlias());
+  }
 
   // If this symbol is external weak, replace this object with aliased symbol.
   bool resolveWeakAlias();
diff --git a/lld/test/COFF/weak-lazy.s b/lld/test/COFF/weak-lazy.s
new file mode 100644
index 00000000000000..2812ba7af8b5f8
--- /dev/null
+++ b/lld/test/COFF/weak-lazy.s
@@ -0,0 +1,18 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=i686-windows %s -o %t.obj
+# RUN: llvm-lib -machine:x86 -out:%t-func.lib %t.obj
+
+# -export:func creates a weak alias to a lazy symbol. Make sure we can handle that when processing -export:func2=func.
+# RUN: lld-link -dll -noentry -machine:x86 -out:%t.dll %t-func.lib -export:func -export:func2=func
+
+        .text
+        .def    @feat.00;
+        .scl    3;
+        .type   0;
+        .endef
+        .globl  @feat.00
+.set @feat.00, 1
+        .globl _func@0
+_func@0:
+        retl

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-lld-coff

Author: Jacek Caban (cjacek)

Changes

The assumption that a symbol is either Defined or Undefined is not always true for some cases. For example, mangleMaybe may create a weak alias to a lazy archive symbol.


Full diff: https://github.com/llvm/llvm-project/pull/112243.diff

4 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+2-2)
  • (modified) lld/COFF/Symbols.cpp (+4-4)
  • (modified) lld/COFF/Symbols.h (+4-1)
  • (added) lld/test/COFF/weak-lazy.s (+18)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 85a58a3677181a..12e1ae62811270 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -1340,7 +1340,7 @@ void LinkerDriver::maybeCreateECExportThunk(StringRef name, Symbol *&sym) {
   if (!sym)
     return;
   if (auto undef = dyn_cast<Undefined>(sym))
-    def = undef->getWeakAlias();
+    def = undef->getDefinedWeakAlias();
   else
     def = dyn_cast<Defined>(sym);
   if (!def)
@@ -1376,7 +1376,7 @@ void LinkerDriver::createECExportThunks() {
       continue;
     Defined *targetSym;
     if (auto undef = dyn_cast<Undefined>(sym))
-      targetSym = undef->getWeakAlias();
+      targetSym = undef->getDefinedWeakAlias();
     else
       targetSym = dyn_cast<Defined>(sym);
     if (!targetSym)
diff --git a/lld/COFF/Symbols.cpp b/lld/COFF/Symbols.cpp
index 567c2b93776c94..89f2da02bdcf42 100644
--- a/lld/COFF/Symbols.cpp
+++ b/lld/COFF/Symbols.cpp
@@ -112,12 +112,12 @@ DefinedImportThunk::DefinedImportThunk(COFFLinkerContext &ctx, StringRef name,
                                        ImportThunkChunk *chunk)
     : Defined(DefinedImportThunkKind, name), wrappedSym(s), data(chunk) {}
 
-Defined *Undefined::getWeakAlias() {
+Symbol *Undefined::getWeakAlias() {
   // A weak alias may be a weak alias to another symbol, so check recursively.
   DenseSet<Symbol *> weakChain;
   for (Symbol *a = weakAlias; a; a = cast<Undefined>(a)->weakAlias) {
-    if (auto *d = dyn_cast<Defined>(a))
-      return d;
+    if (!isa<Undefined>(a))
+      return a;
     if (!weakChain.insert(a).second)
       break; // We have a cycle.
   }
@@ -125,7 +125,7 @@ Defined *Undefined::getWeakAlias() {
 }
 
 bool Undefined::resolveWeakAlias() {
-  Defined *d = getWeakAlias();
+  Defined *d = getDefinedWeakAlias();
   if (!d)
     return false;
 
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 9b21e09bf83a42..a898ebf05fd809 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -340,7 +340,10 @@ class Undefined : public Symbol {
   // If this symbol is external weak, try to resolve it to a defined
   // symbol by searching the chain of fallback symbols. Returns the symbol if
   // successful, otherwise returns null.
-  Defined *getWeakAlias();
+  Symbol *getWeakAlias();
+  Defined *getDefinedWeakAlias() {
+    return dyn_cast_or_null<Defined>(getWeakAlias());
+  }
 
   // If this symbol is external weak, replace this object with aliased symbol.
   bool resolveWeakAlias();
diff --git a/lld/test/COFF/weak-lazy.s b/lld/test/COFF/weak-lazy.s
new file mode 100644
index 00000000000000..2812ba7af8b5f8
--- /dev/null
+++ b/lld/test/COFF/weak-lazy.s
@@ -0,0 +1,18 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=i686-windows %s -o %t.obj
+# RUN: llvm-lib -machine:x86 -out:%t-func.lib %t.obj
+
+# -export:func creates a weak alias to a lazy symbol. Make sure we can handle that when processing -export:func2=func.
+# RUN: lld-link -dll -noentry -machine:x86 -out:%t.dll %t-func.lib -export:func -export:func2=func
+
+        .text
+        .def    @feat.00;
+        .scl    3;
+        .type   0;
+        .endef
+        .globl  @feat.00
+.set @feat.00, 1
+        .globl _func@0
+_func@0:
+        retl

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-platform-windows

Author: Jacek Caban (cjacek)

Changes

The assumption that a symbol is either Defined or Undefined is not always true for some cases. For example, mangleMaybe may create a weak alias to a lazy archive symbol.


Full diff: https://github.com/llvm/llvm-project/pull/112243.diff

4 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+2-2)
  • (modified) lld/COFF/Symbols.cpp (+4-4)
  • (modified) lld/COFF/Symbols.h (+4-1)
  • (added) lld/test/COFF/weak-lazy.s (+18)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 85a58a3677181a..12e1ae62811270 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -1340,7 +1340,7 @@ void LinkerDriver::maybeCreateECExportThunk(StringRef name, Symbol *&sym) {
   if (!sym)
     return;
   if (auto undef = dyn_cast<Undefined>(sym))
-    def = undef->getWeakAlias();
+    def = undef->getDefinedWeakAlias();
   else
     def = dyn_cast<Defined>(sym);
   if (!def)
@@ -1376,7 +1376,7 @@ void LinkerDriver::createECExportThunks() {
       continue;
     Defined *targetSym;
     if (auto undef = dyn_cast<Undefined>(sym))
-      targetSym = undef->getWeakAlias();
+      targetSym = undef->getDefinedWeakAlias();
     else
       targetSym = dyn_cast<Defined>(sym);
     if (!targetSym)
diff --git a/lld/COFF/Symbols.cpp b/lld/COFF/Symbols.cpp
index 567c2b93776c94..89f2da02bdcf42 100644
--- a/lld/COFF/Symbols.cpp
+++ b/lld/COFF/Symbols.cpp
@@ -112,12 +112,12 @@ DefinedImportThunk::DefinedImportThunk(COFFLinkerContext &ctx, StringRef name,
                                        ImportThunkChunk *chunk)
     : Defined(DefinedImportThunkKind, name), wrappedSym(s), data(chunk) {}
 
-Defined *Undefined::getWeakAlias() {
+Symbol *Undefined::getWeakAlias() {
   // A weak alias may be a weak alias to another symbol, so check recursively.
   DenseSet<Symbol *> weakChain;
   for (Symbol *a = weakAlias; a; a = cast<Undefined>(a)->weakAlias) {
-    if (auto *d = dyn_cast<Defined>(a))
-      return d;
+    if (!isa<Undefined>(a))
+      return a;
     if (!weakChain.insert(a).second)
       break; // We have a cycle.
   }
@@ -125,7 +125,7 @@ Defined *Undefined::getWeakAlias() {
 }
 
 bool Undefined::resolveWeakAlias() {
-  Defined *d = getWeakAlias();
+  Defined *d = getDefinedWeakAlias();
   if (!d)
     return false;
 
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 9b21e09bf83a42..a898ebf05fd809 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -340,7 +340,10 @@ class Undefined : public Symbol {
   // If this symbol is external weak, try to resolve it to a defined
   // symbol by searching the chain of fallback symbols. Returns the symbol if
   // successful, otherwise returns null.
-  Defined *getWeakAlias();
+  Symbol *getWeakAlias();
+  Defined *getDefinedWeakAlias() {
+    return dyn_cast_or_null<Defined>(getWeakAlias());
+  }
 
   // If this symbol is external weak, replace this object with aliased symbol.
   bool resolveWeakAlias();
diff --git a/lld/test/COFF/weak-lazy.s b/lld/test/COFF/weak-lazy.s
new file mode 100644
index 00000000000000..2812ba7af8b5f8
--- /dev/null
+++ b/lld/test/COFF/weak-lazy.s
@@ -0,0 +1,18 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=i686-windows %s -o %t.obj
+# RUN: llvm-lib -machine:x86 -out:%t-func.lib %t.obj
+
+# -export:func creates a weak alias to a lazy symbol. Make sure we can handle that when processing -export:func2=func.
+# RUN: lld-link -dll -noentry -machine:x86 -out:%t.dll %t-func.lib -export:func -export:func2=func
+
+        .text
+        .def    @feat.00;
+        .scl    3;
+        .type   0;
+        .endef
+        .globl  @feat.00
+.set @feat.00, 1
+        .globl _func@0
+_func@0:
+        retl

@cjacek
Copy link
Contributor Author

cjacek commented Oct 14, 2024

I encountered this issue while working with ARM64EC code, which makes heavy use of weak (anti-dependency) aliases. However, the problem also affects other targets. The test in this PR fails on an assertion during a cast in getWeakAlias.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM

Symbol *Undefined::getWeakAlias() {
// A weak alias may be a weak alias to another symbol, so check recursively.
DenseSet<Symbol *> weakChain;
for (Symbol *a = weakAlias; a; a = cast<Undefined>(a)->weakAlias) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... previously, this cast<Undefined>(a) was broken (invoking UB etc), if we had a weak alias which pointed at something which wasn't either Undefined or Defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

The assumption that a symbol is either Undefined or Defined is true for some callers, but not all. See the other comment for details.

// symbol by searching the chain of fallback symbols. Returns the symbol if
// successful, otherwise returns null.
Defined *getWeakAlias();
Symbol *getWeakAlias();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In trying to read the diff and the effects of it - I see that a number of callers are switched from getWeakAlias() to getDefinedWeakAlias() - but what I don't see from the point of view of the diff, is what remaining callers there are, that still are calling getWeakAlias()?

Because any caller that is switched to getDefinedWeakAlias() will essentially be doing the same as before (minus the UB in cast<Undefined>()), but which callers do get the new behaviour of getWeakAlias() where it potentially returns something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining callers use getWeakAlias to check if a symbol can be resolved. After the run() call, all such symbols should be replaced with a Defined symbol (assuming valid input, though a malformed archive index could disrupt this). This means that for many callers, getWeakAlias isn’t expected to handle lazy symbols, making it functionally equivalent to getDefinedWeakAlias. They only differ if new weak aliases are defined between run() calls.

In the test example from this PR, using getDefinedWeakAlias in findMangle would mostly work as well, but it's a bit more fragile. Processing -export:func would create a weak alias to _func@0, but -export:func2=func would still consider it unresolvable and attempt to mangle it again. On the other hand, getWeakAlias returns the lazy symbol, so we know it will be resolvable and can skip the unnecessary mangling attempt.

@cjacek cjacek merged commit ba898db into llvm:main Oct 15, 2024
12 checks passed
@cjacek cjacek deleted the weak-lazy branch October 15, 2024 20:58
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
…vm#112243)

The assumption that a symbol is either `Defined` or `Undefined` is not
always true for some cases. For example, `mangleMaybe` may create a weak
alias to a lazy archive symbol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants