Skip to content

Conversation

@kovdan01
Copy link
Contributor

@kovdan01 kovdan01 commented Mar 15, 2025

Symtab is first filled with the data from the bitcode file, and all undefined symbols except TLS ones are STT_NOTYPE. Since auth key for a signed GOT entry depends on the symbol type being STT_FUNC or not, we need to update the symtab after the bitcode is compiled to an ELF object and update symbol types for function symbols. This patch implements the described behavior.

@kovdan01 kovdan01 requested a review from asl March 15, 2025 19:58
@kovdan01 kovdan01 self-assigned this Mar 15, 2025
@kovdan01 kovdan01 moved this to In Progress in Pointer Authentication Tasks Mar 15, 2025
Symtab is first filled with the data from the bitcode file, and all symbols
except TLS ones are `STT_NOTYPE`. Since auth key for a signed GOT entry
depends on the symbol type being `STT_FUNC` or not, we need to update the
symtab after the bitcode is compiled to an ELF object and update symbol types
for function symbols. This patch implements the described behavior.
@kovdan01 kovdan01 force-pushed the pauth-lto-signed-got-key-main branch from eacd62a to 479f585 Compare March 16, 2025 00:24
@kovdan01 kovdan01 marked this pull request as ready for review March 16, 2025 00:24
@llvmbot
Copy link
Member

llvmbot commented Mar 16, 2025

@llvm/pr-subscribers-lld

Author: Daniil Kovalev (kovdan01)

Changes

Symtab is first filled with the data from the bitcode file, and all symbols except TLS ones are STT_NOTYPE. Since auth key for a signed GOT entry depends on the symbol type being STT_FUNC or not, we need to update the symtab after the bitcode is compiled to an ELF object and update symbol types for function symbols. This patch implements the described behavior.


Full diff: https://github.com/llvm/llvm-project/pull/131467.diff

2 Files Affected:

  • (modified) lld/ELF/Driver.cpp (+7)
  • (added) lld/test/ELF/lto/aarch64-pac-got-func.ll (+38)
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 18f9fed0d08e2..434f17786c861 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2618,6 +2618,13 @@ void LinkerDriver::compileBitcodeFiles(bool skipLinkedOutput) {
     auto *obj = cast<ObjFile<ELFT>>(file.get());
     obj->parse(/*ignoreComdats=*/true);
 
+    for (typename ELFT::Sym elfSym : obj->template getGlobalELFSyms<ELFT>()) {
+      StringRef elfSymName = check(elfSym.getName(obj->getStringTable()));
+      if (Symbol *sym = ctx.symtab->find(elfSymName))
+        if (sym->type == STT_NOTYPE)
+          sym->type = elfSym.getType();
+    }
+
     // For defined symbols in non-relocatable output,
     // compute isExported and parse '@'.
     if (!ctx.arg.relocatable)
diff --git a/lld/test/ELF/lto/aarch64-pac-got-func.ll b/lld/test/ELF/lto/aarch64-pac-got-func.ll
new file mode 100644
index 0000000000000..e42f78d30adba
--- /dev/null
+++ b/lld/test/ELF/lto/aarch64-pac-got-func.ll
@@ -0,0 +1,38 @@
+; REQUIRES: aarch64
+
+; RUN: llvm-as %s -o %t.o
+; RUN: ld.lld %t.o -shared -o %t
+; RUN: llvm-readelf -r -x.got %t | FileCheck %s
+
+; CHECK:      Relocation section '.rela.dyn' at offset 0x2a8 contains 2 entries:
+; CHECK-NEXT:     Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
+; CHECK-NEXT: 00000000000203d8  0000000100000412 R_AARCH64_AUTH_GLOB_DAT 0000000000000000 foo + 0
+; CHECK-NEXT: 00000000000203e0  0000000200000412 R_AARCH64_AUTH_GLOB_DAT 0000000000000000 var + 0
+
+; CHECK:      Hex dump of section '.got':
+; CHECK-NEXT: 0x000203d8 00000000 00000080 00000000 000000a0
+;;                                      ^^ 0b10000000 bit 63 address diversity = true, bits 61..60 key = IA
+;;                                                        ^^ 0b10100000 bit 63 address diversity = true, bits 61..60 key = DA
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
+
+@var = external global ptr
+
+declare void @foo()
+
+define void @bar() #0 {
+entry:
+  store ptr ptrauth (ptr @foo, i32 0), ptr @var
+  ret void
+}
+
+define void @_start() {
+entry:
+  ret void
+}
+
+attributes #0 = {"target-features"="+pauth"}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 8, !"ptrauth-elf-got", i32 1}

@llvmbot
Copy link
Member

llvmbot commented Mar 16, 2025

@llvm/pr-subscribers-lld-elf

Author: Daniil Kovalev (kovdan01)

Changes

Symtab is first filled with the data from the bitcode file, and all symbols except TLS ones are STT_NOTYPE. Since auth key for a signed GOT entry depends on the symbol type being STT_FUNC or not, we need to update the symtab after the bitcode is compiled to an ELF object and update symbol types for function symbols. This patch implements the described behavior.


Full diff: https://github.com/llvm/llvm-project/pull/131467.diff

2 Files Affected:

  • (modified) lld/ELF/Driver.cpp (+7)
  • (added) lld/test/ELF/lto/aarch64-pac-got-func.ll (+38)
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 18f9fed0d08e2..434f17786c861 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2618,6 +2618,13 @@ void LinkerDriver::compileBitcodeFiles(bool skipLinkedOutput) {
     auto *obj = cast<ObjFile<ELFT>>(file.get());
     obj->parse(/*ignoreComdats=*/true);
 
+    for (typename ELFT::Sym elfSym : obj->template getGlobalELFSyms<ELFT>()) {
+      StringRef elfSymName = check(elfSym.getName(obj->getStringTable()));
+      if (Symbol *sym = ctx.symtab->find(elfSymName))
+        if (sym->type == STT_NOTYPE)
+          sym->type = elfSym.getType();
+    }
+
     // For defined symbols in non-relocatable output,
     // compute isExported and parse '@'.
     if (!ctx.arg.relocatable)
diff --git a/lld/test/ELF/lto/aarch64-pac-got-func.ll b/lld/test/ELF/lto/aarch64-pac-got-func.ll
new file mode 100644
index 0000000000000..e42f78d30adba
--- /dev/null
+++ b/lld/test/ELF/lto/aarch64-pac-got-func.ll
@@ -0,0 +1,38 @@
+; REQUIRES: aarch64
+
+; RUN: llvm-as %s -o %t.o
+; RUN: ld.lld %t.o -shared -o %t
+; RUN: llvm-readelf -r -x.got %t | FileCheck %s
+
+; CHECK:      Relocation section '.rela.dyn' at offset 0x2a8 contains 2 entries:
+; CHECK-NEXT:     Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
+; CHECK-NEXT: 00000000000203d8  0000000100000412 R_AARCH64_AUTH_GLOB_DAT 0000000000000000 foo + 0
+; CHECK-NEXT: 00000000000203e0  0000000200000412 R_AARCH64_AUTH_GLOB_DAT 0000000000000000 var + 0
+
+; CHECK:      Hex dump of section '.got':
+; CHECK-NEXT: 0x000203d8 00000000 00000080 00000000 000000a0
+;;                                      ^^ 0b10000000 bit 63 address diversity = true, bits 61..60 key = IA
+;;                                                        ^^ 0b10100000 bit 63 address diversity = true, bits 61..60 key = DA
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
+
+@var = external global ptr
+
+declare void @foo()
+
+define void @bar() #0 {
+entry:
+  store ptr ptrauth (ptr @foo, i32 0), ptr @var
+  ret void
+}
+
+define void @_start() {
+entry:
+  ret void
+}
+
+attributes #0 = {"target-features"="+pauth"}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 8, !"ptrauth-elf-got", i32 1}

@kovdan01
Copy link
Contributor Author

Would be glad to see feedback from anyone interested

@kovdan01
Copy link
Contributor Author

@MaskRay Would be glad to see your feedback on the changes

1 similar comment
@kovdan01
Copy link
Contributor Author

kovdan01 commented Apr 7, 2025

@MaskRay Would be glad to see your feedback on the changes

@MaskRay
Copy link
Member

MaskRay commented Apr 8, 2025

This is somewhat unusual, as we typically don’t track the type of undefined symbols. The subject could likely be clarified to refer specifically to undefined symbols.

I see how this happens. When the module flag "ptrauth-elf-got" is set, llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp sets the function symbol MCSA_ELF_TypeFunction (translates to ELF STT_FUNC). The description could be clarified.
elfSym.getName(obj->getStringTable()) is wasteful - as we could zip obj->template getGlobalELFSyms<ELFT>() with obj->getGlobalSymbols(). I wonder whether we could restrict the special behavior to AArch64 PAuth only.


define void @bar() #0 {
entry:
store ptr ptrauth (ptr @foo, i32 0), ptr @var
Copy link
Member

@MaskRay MaskRay Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename var to something else then add

store ptr ptrauth (ptr @func, i32 0), ptr @g
store ptr ptrauth (ptr @func_undef, i32 0), ptr @g
store ptr ptrauth (ptr @var_undef, i32 0), ptr @g
store ptr ptrauth (ptr @var, i32 0), ptr @g

Use llvm.used so that the symbols will not be optimized out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use llvm.used so that the symbols will not be optimized out.

Maybe I'm doing smth wrong, but it looks like that llvm.used has no effect and only the symbol from the last store operation is present in GOT. This is what I've added to the code (right before @bar definition, just in case):

@llvm.used = appending global [4 x ptr] [ ptr @func, ptr @func_undef, ptr @var, ptr @var_undef], section "llvm.metadata"

Adding @g to this array, removing section "llvm.metadata", moving this array definition to the end of the file does not change the behavior.

This smells like a bug, given llvm.used description from IR reference: https://llvm.org/docs/LangRef.html#the-llvm-used-global-variable.

@MaskRay does this look like a bug for you as well, or maybe I'm just mis-using this special array?

As for now, I've just used a separate global for each store - g1, g2, g3, g4. Not elegant, but at least it works :)

@kovdan01
Copy link
Contributor Author

elfSym.getName(obj->getStringTable()) is wasteful - as we could zip obj->template getGlobalELFSyms<ELFT>() with obj->getGlobalSymbols().

@MaskRay Do I get your intention correct - you suggest to change this:

    for (typename ELFT::Sym elfSym : obj->template getGlobalELFSyms<ELFT>()) {
      StringRef elfSymName = check(elfSym.getName(obj->getStringTable()));
      if (Symbol *sym = ctx.symtab->find(elfSymName))
        if (sym->type == STT_NOTYPE)
          sym->type = elfSym.getType();
    }

to smth like this:

    for (const Symbol *globSym : obj->getGlobalSymbols())
      if (Symbol *sym = ctx.symtab->find(globSym->getName()))
        if (sym->type == STT_NOTYPE)
          sym->type = globSym->type;

If so, this does not look correct - function symbols from obj->getGlobalSymbols() will have STT_NOTYPE type, while corresponding ELF symbols will have STT_FUNC type. We want the latter behavior in order to change symbol type in symtab.

Please let me know if I misunderstood you.

@kovdan01
Copy link
Contributor Author

I wonder whether we could restrict the special behavior to AArch64 PAuth only.

I've added a check against AArch64 - see efaa9e2.

As for check against PAuth - probably we can look through relocations, and if at least one AUTH GOT-generating relocation is present (R_AARCH64_AUTH_LD64_GOT_LO12_NC, R_AARCH64_AUTH_GOT_ADD_LO12_NC, R_AARCH64_AUTH_GOT_LD_PREL19, R_AARCH64_AUTH_GOT_ADR_PREL_LO21, R_AARCH64_AUTH_ADR_GOT_PAGE), we enable this special logic (because the whole point of this is specifying correct auth key for AUTH GOT entry).

@MaskRay Please let me know if such a check for PAuth looks OK for you.

Probably, in future we could rely on build attributes section, but, as far as I understand, this is not currently supported in lld (please correct me if I'm wrong).

@kovdan01 kovdan01 requested a review from MaskRay April 14, 2025 13:51
@kovdan01 kovdan01 merged commit 8fdebff into llvm:main Apr 19, 2025
10 of 11 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Pointer Authentication Tasks Apr 19, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…#131467)

Symtab is first filled with the data from the bitcode file, and all
undefined symbols except TLS ones are `STT_NOTYPE`. Since auth key for a
signed GOT entry depends on the symbol type being `STT_FUNC` or not, we
need to update the symtab after the bitcode is compiled to an ELF object
and update symbol types for function symbols. This patch implements the
described behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants