Skip to content

Conversation

@dominik-steenken
Copy link
Contributor

This commit allows -mstack-protector-guard=global for s390x.

It also adds a new arch-specific option -mstack-protector-guard-record, analogous to -mrecord-mcount, which will cause clang to emit a __stack_protector_loc section containing all the locations in the output binary that load the stack guard address, for the purposes of later rewriting of those loads by the kernel. This new option only works together with the global stack protector.

In order to minimize exposure of the stack guard, both the storing of the stack guard onto the stack, and the later comparison of that value against the reference value, are handled via direct mem-to-mem instructions, those being mvc and clc.

This is achieved by introducing two new pseudo instructions, MOVE_STACK_GUARD and COMPARE_STACK_GUARD, which are inserted by the DAGCombiner after SelectionDAG construction. These pseudos stick around throughout the entire backend pipeline, and are lowered only in the AsmPrinter.

This commit also adds tests for both kinds of stack protectors (tls and global), for the proper insertion of the pseudos, the proper emission of the, __stack_protector_loc section, as well as the option compatibility checks for the new options.

@llvmbot llvmbot added backend:SystemZ clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang-driver

Author: Dominik Steenken (dominik-steenken)

Changes

This commit allows -mstack-protector-guard=global for s390x.

It also adds a new arch-specific option -mstack-protector-guard-record, analogous to -mrecord-mcount, which will cause clang to emit a __stack_protector_loc section containing all the locations in the output binary that load the stack guard address, for the purposes of later rewriting of those loads by the kernel. This new option only works together with the global stack protector.

In order to minimize exposure of the stack guard, both the storing of the stack guard onto the stack, and the later comparison of that value against the reference value, are handled via direct mem-to-mem instructions, those being mvc and clc.

This is achieved by introducing two new pseudo instructions, MOVE_STACK_GUARD and COMPARE_STACK_GUARD, which are inserted by the DAGCombiner after SelectionDAG construction. These pseudos stick around throughout the entire backend pipeline, and are lowered only in the AsmPrinter.

This commit also adds tests for both kinds of stack protectors (tls and global), for the proper insertion of the pseudos, the proper emission of the, __stack_protector_loc section, as well as the option compatibility checks for the new options.


Patch is 46.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169317.diff

18 Files Affected:

  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+1)
  • (modified) clang/include/clang/Options/Options.td (+8)
  • (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+8)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+29-11)
  • (added) clang/test/CodeGen/SystemZ/stack-guard-pseudos.c (+16)
  • (modified) clang/test/Driver/stack-protector-guard.c (+18)
  • (modified) llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp (+127)
  • (modified) llvm/lib/Target/SystemZ/SystemZAsmPrinter.h (+2)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp (+7-2)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+107-7)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (-2)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp (+24-35)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.h (-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.td (+13)
  • (added) llvm/test/CodeGen/SystemZ/stack-guard-global-nopic.ll (+157)
  • (added) llvm/test/CodeGen/SystemZ/stack-guard-global-pic.ll (+159)
  • (added) llvm/test/CodeGen/SystemZ/stack-guard-tls.ll (+135)
  • (removed) llvm/test/CodeGen/SystemZ/stack-guard.ll (-33)
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 52360b67b306c..a1c6597a1c90b 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -157,6 +157,7 @@ CODEGENOPT(InstrumentForProfiling , 1, 0, Benign) ///< Set when -pg is enabled.
 CODEGENOPT(CallFEntry , 1, 0, Benign) ///< Set when -mfentry is enabled.
 CODEGENOPT(MNopMCount , 1, 0, Benign) ///< Set when -mnop-mcount is enabled.
 CODEGENOPT(RecordMCount , 1, 0, Benign) ///< Set when -mrecord-mcount is enabled.
+CODEGENOPT(StackProtectorGuardRecord, 1, 0, Benign) ///< Set when -mstack-protector-guard-record is enabled.
 CODEGENOPT(PackedStack , 1, 0, Benign) ///< Set when -mpacked-stack is enabled.
 CODEGENOPT(LessPreciseFPMAD  , 1, 0, Benign) ///< Enable less precise MAD instructions to
                                              ///< be generated.
diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td
index 34a6651d2445c..db6884d734412 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -5749,6 +5749,14 @@ def mstack_protector_guard_reg_EQ : Joined<["-"], "mstack-protector-guard-reg=">
   Visibility<[ClangOption, CC1Option]>,
   HelpText<"Use the given reg for addressing the stack-protector guard">,
   MarshallingInfoString<CodeGenOpts<"StackProtectorGuardReg">>;
+def mstackprotector_guard_record
+    : Flag<["-"], "mstack-protector-guard-record">,
+      HelpText<
+          "Generate a __stack_protector_loc section entry for each load of "
+          "the stackguard address.">,
+      Visibility<[ClangOption, CC1Option]>,
+      Group<m_Group>,
+      MarshallingInfoFlag<CodeGenOpts<"StackProtectorGuardRecord">>;
 def mfentry : Flag<["-"], "mfentry">, HelpText<"Insert calls to fentry at function entry (x86/SystemZ only)">,
   Visibility<[ClangOption, CC1Option]>, Group<m_Group>,
   MarshallingInfoFlag<CodeGenOpts<"CallFEntry">>;
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 64e594d09067b..150bf6dbe9f3f 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -1195,6 +1195,14 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
     }
   }
 
+  if (CGM.getCodeGenOpts().StackProtectorGuardRecord) {
+    if (CGM.getCodeGenOpts().StackProtectorGuard != "global")
+      CGM.getDiags().Report(diag::err_opt_not_valid_without_opt)
+          << "-mstack-protector-guard-record"
+          << "-mstack-protector-guard=global";
+    Fn->addFnAttr("mstackprotector-guard-record");
+  }
+
   if (CGM.getCodeGenOpts().PackedStack) {
     if (getContext().getTargetInfo().getTriple().getArch() !=
         llvm::Triple::systemz)
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 2f0aec3ec3c37..1b0693d7aa8a7 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -3438,22 +3438,24 @@ static void RenderSSPOptions(const Driver &D, const ToolChain &TC,
   }
 
   const std::string &TripleStr = EffectiveTriple.getTriple();
+  StringRef GuardValue;
   if (Arg *A = Args.getLastArg(options::OPT_mstack_protector_guard_EQ)) {
-    StringRef Value = A->getValue();
+    GuardValue = A->getValue();
     if (!EffectiveTriple.isX86() && !EffectiveTriple.isAArch64() &&
         !EffectiveTriple.isARM() && !EffectiveTriple.isThumb() &&
-        !EffectiveTriple.isRISCV() && !EffectiveTriple.isPPC())
+        !EffectiveTriple.isRISCV() && !EffectiveTriple.isPPC() &&
+        !EffectiveTriple.isSystemZ())
       D.Diag(diag::err_drv_unsupported_opt_for_target)
           << A->getAsString(Args) << TripleStr;
     if ((EffectiveTriple.isX86() || EffectiveTriple.isARM() ||
-         EffectiveTriple.isThumb()) &&
-        Value != "tls" && Value != "global") {
+         EffectiveTriple.isThumb() || EffectiveTriple.isSystemZ()) &&
+        GuardValue != "tls" && GuardValue != "global") {
       D.Diag(diag::err_drv_invalid_value_with_suggestion)
-          << A->getOption().getName() << Value << "tls global";
+          << A->getOption().getName() << GuardValue << "tls global";
       return;
     }
     if ((EffectiveTriple.isARM() || EffectiveTriple.isThumb()) &&
-        Value == "tls") {
+        GuardValue == "tls") {
       if (!Args.hasArg(options::OPT_mstack_protector_guard_offset_EQ)) {
         D.Diag(diag::err_drv_ssp_missing_offset_argument)
             << A->getAsString(Args);
@@ -3477,18 +3479,19 @@ static void RenderSSPOptions(const Driver &D, const ToolChain &TC,
       CmdArgs.push_back("-target-feature");
       CmdArgs.push_back("+read-tp-tpidruro");
     }
-    if (EffectiveTriple.isAArch64() && Value != "sysreg" && Value != "global") {
+    if (EffectiveTriple.isAArch64() && GuardValue != "sysreg" &&
+        GuardValue != "global") {
       D.Diag(diag::err_drv_invalid_value_with_suggestion)
-          << A->getOption().getName() << Value << "sysreg global";
+          << A->getOption().getName() << GuardValue << "sysreg global";
       return;
     }
     if (EffectiveTriple.isRISCV() || EffectiveTriple.isPPC()) {
-      if (Value != "tls" && Value != "global") {
+      if (GuardValue != "tls" && GuardValue != "global") {
         D.Diag(diag::err_drv_invalid_value_with_suggestion)
-            << A->getOption().getName() << Value << "tls global";
+            << A->getOption().getName() << GuardValue << "tls global";
         return;
       }
-      if (Value == "tls") {
+      if (GuardValue == "tls") {
         if (!Args.hasArg(options::OPT_mstack_protector_guard_offset_EQ)) {
           D.Diag(diag::err_drv_ssp_missing_offset_argument)
               << A->getAsString(Args);
@@ -3562,6 +3565,21 @@ static void RenderSSPOptions(const Driver &D, const ToolChain &TC,
     }
     A->render(Args, CmdArgs);
   }
+
+  if (Arg *A = Args.getLastArg(options::OPT_mstackprotector_guard_record)) {
+    if (!EffectiveTriple.isSystemZ()) {
+      D.Diag(diag::err_drv_unsupported_opt_for_target)
+          << A->getAsString(Args) << TripleStr;
+      return;
+    }
+    if (GuardValue != "global") {
+      D.Diag(diag::err_drv_argument_only_allowed_with)
+          << "-mstack-protector-guard-record"
+          << "-mstack-protector-guard=global";
+      return;
+    }
+    A->render(Args, CmdArgs);
+  }
 }
 
 static void RenderSCPOptions(const ToolChain &TC, const ArgList &Args,
diff --git a/clang/test/CodeGen/SystemZ/stack-guard-pseudos.c b/clang/test/CodeGen/SystemZ/stack-guard-pseudos.c
new file mode 100644
index 0000000000000..b364aa4028ec7
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/stack-guard-pseudos.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -S -mllvm -stop-after=systemz-isel -stack-protector 1 -triple=s390x-ibm-linux < %s -o - | FileCheck -check-prefix=CHECK-PSEUDOS %s
+// RUN: not %clang_cc1 -S -stack-protector 1 -mstack-protector-guard-record -triple=s390x-ibm-linux < %s -o - 2>&1 | FileCheck -check-prefix=CHECK-OPTS %s 
+// CHECK-PSEUDOS:   bb.0.entry:
+// CHECK-PSEUDOS:     %3:addr64bit = LOAD_STACK_GUARD_ADDRESS
+// CHECK-PSEUDOS:     MOVE_STACK_GUARD %stack.0.StackGuardSlot, 0, %3
+// CHECK-PSEUDOS:     COMPARE_STACK_GUARD %stack.0.StackGuardSlot, 0, %3, implicit-def $cc
+
+extern char *strcpy (char * D, const char * S);
+int main(int argc, char *argv[])
+{
+    char Buffer[8] = {0};
+    strcpy(Buffer, argv[1]);
+    return 0;
+}
+
+// CHECK-OPTS: error: option '-mstack-protector-guard-record' cannot be specified without '-mstack-protector-guard=global'
diff --git a/clang/test/Driver/stack-protector-guard.c b/clang/test/Driver/stack-protector-guard.c
index 666c83079e519..8c8aacfa574c7 100644
--- a/clang/test/Driver/stack-protector-guard.c
+++ b/clang/test/Driver/stack-protector-guard.c
@@ -155,3 +155,21 @@
 
 // CHECK-TLS-POWERPC32: "-cc1" {{.*}}"-mstack-protector-guard=tls" "-mstack-protector-guard-offset=24" "-mstack-protector-guard-reg=r2"
 // INVALID-REG-POWERPC32: error: invalid value 'r3' in 'mstack-protector-guard-reg=', expected one of: r2
+
+// RUN: %clang -### -target systemz-unknown-elf -mstack-protector-guard=tls %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=CHECK_TLS_SYSTEMZ %s
+// CHECK_TLS_SYSTEMZ: "-cc1" {{.*}}"-mstack-protector-guard=tls"
+
+// RUN: %clang -### -target systemz-unknown-elf -mstack-protector-guard=global %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=CHECK_GLOBAL_SYSTEMZ %s
+// CHECK_GLOBAL_SYSTEMZ: "-cc1" {{.*}}"-mstack-protector-guard=global"
+
+// RUN: %clang -### -target systemz-unknown-elf -mstack-protector-guard=global \
+// RUN:  -mstack-protector-guard-record %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=CHECK_GLOBAL_RECORD_SYSTEMZ %s
+// CHECK_GLOBAL_RECORD_SYSTEMZ: "-cc1" {{.*}}"-mstack-protector-guard=global" "-mstack-protector-guard-record"
+
+// RUN: not %clang -target systemz-unknown-elf -mstack-protector-guard=tls \
+// RUN:  -mstack-protector-guard-record %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=INVALID_TLS_RECORD_SYSTEMZ %s
+// INVALID_TLS_RECORD_SYSTEMZ: error: invalid argument '-mstack-protector-guard-record' only allowed with '-mstack-protector-guard=global'
diff --git a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
index e31d7c6a86476..ca27ff5d28241 100644
--- a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
@@ -24,9 +24,11 @@
 #include "llvm/BinaryFormat/GOFF.h"
 #include "llvm/CodeGen/MachineModuleInfoImpls.h"
 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
+#include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Mangler.h"
 #include "llvm/IR/Module.h"
 #include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstBuilder.h"
 #include "llvm/MC/MCSectionELF.h"
 #include "llvm/MC/MCStreamer.h"
@@ -213,6 +215,16 @@ SystemZAsmPrinter::AssociatedDataAreaTable::insert(const MachineOperand MO) {
   return insert(Sym, ADAslotType);
 }
 
+namespace {
+unsigned long getStackGuardOffset(const MachineBasicBlock *MBB) {
+  // In the TLS (default) case, AddrReg will contain the thread pointer, so we
+  // need to add 40 bytes to get the actual address of the stack guard.
+  StringRef GuardType =
+      MBB->getParent()->getFunction().getParent()->getStackProtectorGuard();
+  return (GuardType == "global") ? 0 : 40;
+}
+} // namespace
+
 void SystemZAsmPrinter::emitInstruction(const MachineInstr *MI) {
   SystemZ_MC::verifyInstructionPredicates(MI->getOpcode(),
                                           getSubtargetInfo().getFeatureBits());
@@ -740,6 +752,42 @@ void SystemZAsmPrinter::emitInstruction(const MachineInstr *MI) {
   case SystemZ::EH_SjLj_Setup:
     return;
 
+  case SystemZ::LOAD_STACK_GUARD: {
+    // If requested, record address of stack guard address load
+    if (MF->getFunction().hasFnAttribute("mstackprotector-guard-record"))
+      emitStackProtectorLocEntry();
+    Register AddrReg = emitLoadStackGuardAddress(MI);
+    LoweredMI = MCInstBuilder(SystemZ::LG)
+                    .addReg(AddrReg)
+                    .addImm(getStackGuardOffset(MI->getParent()))
+                    .addReg(0);
+  } break;
+
+  case SystemZ::LOAD_STACK_GUARD_ADDRESS:
+    // If requested, record address of stack guard address load
+    if (MF->getFunction().hasFnAttribute("mstackprotector-guard-record"))
+      emitStackProtectorLocEntry();
+    emitLoadStackGuardAddress(MI);
+    return;
+
+  case SystemZ::COMPARE_STACK_GUARD:
+    LoweredMI = MCInstBuilder(SystemZ::CLC)
+                    .addReg(MI->getOperand(0).getReg())
+                    .addImm(MI->getOperand(1).getImm())
+                    .addImm(8)
+                    .addReg(MI->getOperand(2).getReg())
+                    .addImm(getStackGuardOffset(MI->getParent()));
+    break;
+
+  case SystemZ::MOVE_STACK_GUARD:
+    LoweredMI = MCInstBuilder(SystemZ::MVC)
+                    .addReg(MI->getOperand(0).getReg())
+                    .addImm(MI->getOperand(1).getImm())
+                    .addImm(8)
+                    .addReg(MI->getOperand(2).getReg())
+                    .addImm(getStackGuardOffset(MI->getParent()));
+    break;
+
   default:
     Lower.lower(MI, LoweredMI);
     break;
@@ -747,6 +795,85 @@ void SystemZAsmPrinter::emitInstruction(const MachineInstr *MI) {
   EmitToStreamer(*OutStreamer, LoweredMI);
 }
 
+void SystemZAsmPrinter::emitStackProtectorLocEntry() {
+  MCSymbol *Sym = OutContext.createTempSymbol();
+  OutStreamer->pushSection();
+  OutStreamer->switchSection(OutContext.getELFSection(
+      "__stack_protector_loc", ELF::SHT_PROGBITS, ELF::SHF_ALLOC));
+  OutStreamer->emitSymbolValue(Sym, getDataLayout().getPointerSize());
+  OutStreamer->popSection();
+  OutStreamer->emitLabel(Sym);
+}
+
+// Emit the stack guard address load, depending on guard type.
+// Return the register the stack guard address was loaded into.
+Register SystemZAsmPrinter::emitLoadStackGuardAddress(const MachineInstr *MI) {
+  const MachineBasicBlock *MBB = MI->getParent();
+  const MachineFunction &MF = *MBB->getParent();
+  const Register AddrReg = MI->getOperand(0).getReg();
+  const MCRegisterInfo &MRI = *TM.getMCRegisterInfo();
+  const Register Reg32 = MRI.getSubReg(AddrReg, SystemZ::subreg_l32);
+
+  const Module *M = MF.getFunction().getParent();
+  StringRef GuardType = M->getStackProtectorGuard();
+
+  if (GuardType.empty() || (GuardType == "tls")) {
+    // EAR can only load the low subregister so use a shift for %a0 to produce
+    // the GR containing %a0 and %a1.
+
+    // ear <reg>, %a0
+    MCInst EAR1 = MCInstBuilder(SystemZ::EAR)
+                      .addReg(Reg32)
+                      .addReg(SystemZ::A0)
+                      .addReg(AddrReg);
+
+    // sllg <reg>, <reg>, 32
+    MCInst SLLG = MCInstBuilder(SystemZ::SLLG)
+                      .addReg(AddrReg)
+                      .addReg(AddrReg)
+                      .addReg(0)
+                      .addImm(32);
+
+    // ear <reg>, %a1
+    MCInst EAR2 = MCInstBuilder(SystemZ::EAR)
+                      .addReg(Reg32)
+                      .addReg(SystemZ::A1)
+                      .addReg(AddrReg);
+
+    EmitToStreamer(*OutStreamer, EAR1);
+    EmitToStreamer(*OutStreamer, SLLG);
+    EmitToStreamer(*OutStreamer, EAR2);
+  } else if (GuardType == "global") {
+    // Obtain the global value.
+    const auto *GV = M->getGlobalVariable(
+        "__stack_chk_guard", PointerType::getUnqual(M->getContext()));
+    assert(GV &&
+           "could not create reference to global variable __stack_chk_guard");
+    auto *Sym = TM.getSymbol(GV);
+    // Ref->
+    // Emit the address load.
+    MCInst Load;
+    if (M->getPICLevel() == PICLevel::NotPIC) {
+      Load = MCInstBuilder(SystemZ::LARL)
+                 .addReg(AddrReg)
+                 .addExpr(MCSymbolRefExpr::create(Sym, OutContext));
+    } else {
+      Load =
+          MCInstBuilder(SystemZ::LGRL)
+              .addReg(AddrReg)
+              .addExpr(MCSymbolRefExpr::create(Sym, SystemZ::S_GOT, OutContext))
+              .addExpr(getGlobalOffsetTable(OutContext));
+    }
+    EmitToStreamer(*OutStreamer, Load);
+  } else {
+    llvm_unreachable(
+        (Twine("Unknown stack protector type \"") + GuardType + "\"")
+            .str()
+            .c_str());
+  }
+  return AddrReg;
+}
+
 // Emit the largest nop instruction smaller than or equal to NumBytes
 // bytes.  Return the size of nop emitted.
 static unsigned EmitNop(MCContext &OutContext, MCStreamer &OutStreamer,
diff --git a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
index cb101e472824f..22e82a691be64 100644
--- a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+++ b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
@@ -139,6 +139,8 @@ class LLVM_LIBRARY_VISIBILITY SystemZAsmPrinter : public AsmPrinter {
   void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,
                                      SystemZMCInstLower &Lower);
   void LowerPATCHABLE_RET(const MachineInstr &MI, SystemZMCInstLower &Lower);
+  Register emitLoadStackGuardAddress(const MachineInstr *MI);
+  void emitStackProtectorLocEntry();
   void emitAttributes(Module &M);
 };
 } // end namespace llvm
diff --git a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
index a05fdc74e6366..fa1daa8bf8c54 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
@@ -10,10 +10,11 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "SystemZTargetMachine.h"
 #include "SystemZISelLowering.h"
+#include "SystemZTargetMachine.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/SelectionDAGISel.h"
+#include "llvm/IR/Module.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/KnownBits.h"
 #include "llvm/Support/raw_ostream.h"
@@ -369,7 +370,11 @@ class SystemZDAGToDAGISel : public SelectionDAGISel {
       if (F.hasFnAttribute("mrecord-mcount"))
         report_fatal_error("mrecord-mcount only supported with fentry-call");
     }
-
+    if (F.getParent()->getStackProtectorGuard() != "global") {
+      if (F.hasFnAttribute("mstack-protector-guard-record"))
+        report_fatal_error("mstack-protector-guard-record only supported with "
+                           "mstack-protector-guard=global");
+    }
     Subtarget = &MF.getSubtarget<SystemZSubtarget>();
     return SelectionDAGISel::runOnMachineFunction(MF);
   }
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index eb93024bed35c..e1662dbedae51 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -14,12 +14,13 @@
 #include "SystemZCallingConv.h"
 #include "SystemZConstantPoolValue.h"
 #include "SystemZMachineFunctionInfo.h"
+#include "SystemZRegisterInfo.h"
 #include "SystemZTargetMachine.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/CodeGen/CallingConvLower.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
-#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -8026,6 +8027,25 @@ SDValue SystemZTargetLowering::combineSTORE(
                                SN->getMemOperand());
     }
   }
+
+  // combine STORE (LOAD_STACK_GUARD) into MOVE_STACK_GUARD
+  if (Op1->isMachineOpcode() &&
+      (Op1->getMachineOpcode() == SystemZ::LOAD_STACK_GUARD)) {
+    // If so, create a MOVE_STACK_GUARD node to replace the store,
+    // and a LOAD_STACK_GUARD_ADDRESS to replace the LOAD_STACK_GUARD
+    MachineSDNode *LoadAddr = DAG.getMachineNode(
+        SystemZ::LOAD_STACK_GUARD_ADDRESS, SDLoc(SN), MVT::i64);
+    int FI = cast<FrameIndexSDNode>(SN->getOperand(2))->getIndex();
+    // FrameIndex, Dummy Displacement
+    SDValue Ops[] = {DAG.getTargetFrameIndex(FI, MVT::i64),
+                     DAG.getTargetConstant(0, SDLoc(SN), MVT::i64),
+                     SDValue(LoadAddr, 0), SN->getChain()};
+    MachineSDNode *Move = DAG.getMachineNode(SystemZ::MOVE_STACK_GUARD,
+                                             SDLoc(SN), MVT::Other, Ops);
+
+    return SDValue(Move, 0);
+  }
+
   // Combine STORE (BSWAP) into STRVH/STRV/STRVG/VSTBR
   if (!SN->isTruncatingStore() &&
       Op1.getOpcode() == ISD::BSWAP &&
@@ -8845,25 +8865,103 @@ SystemZTargetLowering::getJumpConditionMergingParams(Instruction::BinaryOps Opc,
   return {-1, -1, -1};
 }
 
+namespace {
+bool isStackGuardCheck(SDNode const *N, int &FI, SDValue &InChain,
+                       SDValue &OutChain, SDValue &StackGuardLoad,
+                       SystemZTargetLowering::DAGCombinerInfo &DCI) {
+  auto Comp = N->getOperand(4);
+  if (Comp->getOpcode() != SystemZISD::ICMP)
+    return false;
+
+  if (!Comp->hasOneUse())
+    return false;
+
+  SDValue LHS = Comp->getOperand(0);
+  SDValue RHS = Comp->getOperand(1);
+  LoadSDNode *FILoad;
+
+  if (LHS.isMachineOpcode() &&
+      LHS.getMachineOpcode() == SystemZ::LOAD_STACK_GUARD &&
+      ISD::isNormalLoad(RHS.getNode()) &&
+      dyn_cast<FrameIndexSDNode>(RHS.getOperand(1))) {
+    StackGuardLoad = LHS;
+    FILoad = cast<LoadSDNode>(RHS);
+  } else...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-backend-systemz

Author: Dominik Steenken (dominik-steenken)

Changes

This commit allows -mstack-protector-guard=global for s390x.

It also adds a new arch-specific option -mstack-protector-guard-record, analogous to -mrecord-mcount, which will cause clang to emit a __stack_protector_loc section containing all the locations in the output binary that load the stack guard address, for the purposes of later rewriting of those loads by the kernel. This new option only works together with the global stack protector.

In order to minimize exposure of the stack guard, both the storing of the stack guard onto the stack, and the later comparison of that value against the reference value, are handled via direct mem-to-mem instructions, those being mvc and clc.

This is achieved by introducing two new pseudo instructions, MOVE_STACK_GUARD and COMPARE_STACK_GUARD, which are inserted by the DAGCombiner after SelectionDAG construction. These pseudos stick around throughout the entire backend pipeline, and are lowered only in the AsmPrinter.

This commit also adds tests for both kinds of stack protectors (tls and global), for the proper insertion of the pseudos, the proper emission of the, __stack_protector_loc section, as well as the option compatibility checks for the new options.


Patch is 46.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169317.diff

18 Files Affected:

  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+1)
  • (modified) clang/include/clang/Options/Options.td (+8)
  • (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+8)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+29-11)
  • (added) clang/test/CodeGen/SystemZ/stack-guard-pseudos.c (+16)
  • (modified) clang/test/Driver/stack-protector-guard.c (+18)
  • (modified) llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp (+127)
  • (modified) llvm/lib/Target/SystemZ/SystemZAsmPrinter.h (+2)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp (+7-2)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+107-7)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (-2)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp (+24-35)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.h (-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.td (+13)
  • (added) llvm/test/CodeGen/SystemZ/stack-guard-global-nopic.ll (+157)
  • (added) llvm/test/CodeGen/SystemZ/stack-guard-global-pic.ll (+159)
  • (added) llvm/test/CodeGen/SystemZ/stack-guard-tls.ll (+135)
  • (removed) llvm/test/CodeGen/SystemZ/stack-guard.ll (-33)
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 52360b67b306c..a1c6597a1c90b 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -157,6 +157,7 @@ CODEGENOPT(InstrumentForProfiling , 1, 0, Benign) ///< Set when -pg is enabled.
 CODEGENOPT(CallFEntry , 1, 0, Benign) ///< Set when -mfentry is enabled.
 CODEGENOPT(MNopMCount , 1, 0, Benign) ///< Set when -mnop-mcount is enabled.
 CODEGENOPT(RecordMCount , 1, 0, Benign) ///< Set when -mrecord-mcount is enabled.
+CODEGENOPT(StackProtectorGuardRecord, 1, 0, Benign) ///< Set when -mstack-protector-guard-record is enabled.
 CODEGENOPT(PackedStack , 1, 0, Benign) ///< Set when -mpacked-stack is enabled.
 CODEGENOPT(LessPreciseFPMAD  , 1, 0, Benign) ///< Enable less precise MAD instructions to
                                              ///< be generated.
diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td
index 34a6651d2445c..db6884d734412 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -5749,6 +5749,14 @@ def mstack_protector_guard_reg_EQ : Joined<["-"], "mstack-protector-guard-reg=">
   Visibility<[ClangOption, CC1Option]>,
   HelpText<"Use the given reg for addressing the stack-protector guard">,
   MarshallingInfoString<CodeGenOpts<"StackProtectorGuardReg">>;
+def mstackprotector_guard_record
+    : Flag<["-"], "mstack-protector-guard-record">,
+      HelpText<
+          "Generate a __stack_protector_loc section entry for each load of "
+          "the stackguard address.">,
+      Visibility<[ClangOption, CC1Option]>,
+      Group<m_Group>,
+      MarshallingInfoFlag<CodeGenOpts<"StackProtectorGuardRecord">>;
 def mfentry : Flag<["-"], "mfentry">, HelpText<"Insert calls to fentry at function entry (x86/SystemZ only)">,
   Visibility<[ClangOption, CC1Option]>, Group<m_Group>,
   MarshallingInfoFlag<CodeGenOpts<"CallFEntry">>;
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 64e594d09067b..150bf6dbe9f3f 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -1195,6 +1195,14 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
     }
   }
 
+  if (CGM.getCodeGenOpts().StackProtectorGuardRecord) {
+    if (CGM.getCodeGenOpts().StackProtectorGuard != "global")
+      CGM.getDiags().Report(diag::err_opt_not_valid_without_opt)
+          << "-mstack-protector-guard-record"
+          << "-mstack-protector-guard=global";
+    Fn->addFnAttr("mstackprotector-guard-record");
+  }
+
   if (CGM.getCodeGenOpts().PackedStack) {
     if (getContext().getTargetInfo().getTriple().getArch() !=
         llvm::Triple::systemz)
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 2f0aec3ec3c37..1b0693d7aa8a7 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -3438,22 +3438,24 @@ static void RenderSSPOptions(const Driver &D, const ToolChain &TC,
   }
 
   const std::string &TripleStr = EffectiveTriple.getTriple();
+  StringRef GuardValue;
   if (Arg *A = Args.getLastArg(options::OPT_mstack_protector_guard_EQ)) {
-    StringRef Value = A->getValue();
+    GuardValue = A->getValue();
     if (!EffectiveTriple.isX86() && !EffectiveTriple.isAArch64() &&
         !EffectiveTriple.isARM() && !EffectiveTriple.isThumb() &&
-        !EffectiveTriple.isRISCV() && !EffectiveTriple.isPPC())
+        !EffectiveTriple.isRISCV() && !EffectiveTriple.isPPC() &&
+        !EffectiveTriple.isSystemZ())
       D.Diag(diag::err_drv_unsupported_opt_for_target)
           << A->getAsString(Args) << TripleStr;
     if ((EffectiveTriple.isX86() || EffectiveTriple.isARM() ||
-         EffectiveTriple.isThumb()) &&
-        Value != "tls" && Value != "global") {
+         EffectiveTriple.isThumb() || EffectiveTriple.isSystemZ()) &&
+        GuardValue != "tls" && GuardValue != "global") {
       D.Diag(diag::err_drv_invalid_value_with_suggestion)
-          << A->getOption().getName() << Value << "tls global";
+          << A->getOption().getName() << GuardValue << "tls global";
       return;
     }
     if ((EffectiveTriple.isARM() || EffectiveTriple.isThumb()) &&
-        Value == "tls") {
+        GuardValue == "tls") {
       if (!Args.hasArg(options::OPT_mstack_protector_guard_offset_EQ)) {
         D.Diag(diag::err_drv_ssp_missing_offset_argument)
             << A->getAsString(Args);
@@ -3477,18 +3479,19 @@ static void RenderSSPOptions(const Driver &D, const ToolChain &TC,
       CmdArgs.push_back("-target-feature");
       CmdArgs.push_back("+read-tp-tpidruro");
     }
-    if (EffectiveTriple.isAArch64() && Value != "sysreg" && Value != "global") {
+    if (EffectiveTriple.isAArch64() && GuardValue != "sysreg" &&
+        GuardValue != "global") {
       D.Diag(diag::err_drv_invalid_value_with_suggestion)
-          << A->getOption().getName() << Value << "sysreg global";
+          << A->getOption().getName() << GuardValue << "sysreg global";
       return;
     }
     if (EffectiveTriple.isRISCV() || EffectiveTriple.isPPC()) {
-      if (Value != "tls" && Value != "global") {
+      if (GuardValue != "tls" && GuardValue != "global") {
         D.Diag(diag::err_drv_invalid_value_with_suggestion)
-            << A->getOption().getName() << Value << "tls global";
+            << A->getOption().getName() << GuardValue << "tls global";
         return;
       }
-      if (Value == "tls") {
+      if (GuardValue == "tls") {
         if (!Args.hasArg(options::OPT_mstack_protector_guard_offset_EQ)) {
           D.Diag(diag::err_drv_ssp_missing_offset_argument)
               << A->getAsString(Args);
@@ -3562,6 +3565,21 @@ static void RenderSSPOptions(const Driver &D, const ToolChain &TC,
     }
     A->render(Args, CmdArgs);
   }
+
+  if (Arg *A = Args.getLastArg(options::OPT_mstackprotector_guard_record)) {
+    if (!EffectiveTriple.isSystemZ()) {
+      D.Diag(diag::err_drv_unsupported_opt_for_target)
+          << A->getAsString(Args) << TripleStr;
+      return;
+    }
+    if (GuardValue != "global") {
+      D.Diag(diag::err_drv_argument_only_allowed_with)
+          << "-mstack-protector-guard-record"
+          << "-mstack-protector-guard=global";
+      return;
+    }
+    A->render(Args, CmdArgs);
+  }
 }
 
 static void RenderSCPOptions(const ToolChain &TC, const ArgList &Args,
diff --git a/clang/test/CodeGen/SystemZ/stack-guard-pseudos.c b/clang/test/CodeGen/SystemZ/stack-guard-pseudos.c
new file mode 100644
index 0000000000000..b364aa4028ec7
--- /dev/null
+++ b/clang/test/CodeGen/SystemZ/stack-guard-pseudos.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -S -mllvm -stop-after=systemz-isel -stack-protector 1 -triple=s390x-ibm-linux < %s -o - | FileCheck -check-prefix=CHECK-PSEUDOS %s
+// RUN: not %clang_cc1 -S -stack-protector 1 -mstack-protector-guard-record -triple=s390x-ibm-linux < %s -o - 2>&1 | FileCheck -check-prefix=CHECK-OPTS %s 
+// CHECK-PSEUDOS:   bb.0.entry:
+// CHECK-PSEUDOS:     %3:addr64bit = LOAD_STACK_GUARD_ADDRESS
+// CHECK-PSEUDOS:     MOVE_STACK_GUARD %stack.0.StackGuardSlot, 0, %3
+// CHECK-PSEUDOS:     COMPARE_STACK_GUARD %stack.0.StackGuardSlot, 0, %3, implicit-def $cc
+
+extern char *strcpy (char * D, const char * S);
+int main(int argc, char *argv[])
+{
+    char Buffer[8] = {0};
+    strcpy(Buffer, argv[1]);
+    return 0;
+}
+
+// CHECK-OPTS: error: option '-mstack-protector-guard-record' cannot be specified without '-mstack-protector-guard=global'
diff --git a/clang/test/Driver/stack-protector-guard.c b/clang/test/Driver/stack-protector-guard.c
index 666c83079e519..8c8aacfa574c7 100644
--- a/clang/test/Driver/stack-protector-guard.c
+++ b/clang/test/Driver/stack-protector-guard.c
@@ -155,3 +155,21 @@
 
 // CHECK-TLS-POWERPC32: "-cc1" {{.*}}"-mstack-protector-guard=tls" "-mstack-protector-guard-offset=24" "-mstack-protector-guard-reg=r2"
 // INVALID-REG-POWERPC32: error: invalid value 'r3' in 'mstack-protector-guard-reg=', expected one of: r2
+
+// RUN: %clang -### -target systemz-unknown-elf -mstack-protector-guard=tls %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=CHECK_TLS_SYSTEMZ %s
+// CHECK_TLS_SYSTEMZ: "-cc1" {{.*}}"-mstack-protector-guard=tls"
+
+// RUN: %clang -### -target systemz-unknown-elf -mstack-protector-guard=global %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=CHECK_GLOBAL_SYSTEMZ %s
+// CHECK_GLOBAL_SYSTEMZ: "-cc1" {{.*}}"-mstack-protector-guard=global"
+
+// RUN: %clang -### -target systemz-unknown-elf -mstack-protector-guard=global \
+// RUN:  -mstack-protector-guard-record %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=CHECK_GLOBAL_RECORD_SYSTEMZ %s
+// CHECK_GLOBAL_RECORD_SYSTEMZ: "-cc1" {{.*}}"-mstack-protector-guard=global" "-mstack-protector-guard-record"
+
+// RUN: not %clang -target systemz-unknown-elf -mstack-protector-guard=tls \
+// RUN:  -mstack-protector-guard-record %s 2>&1 | \
+// RUN:  FileCheck -check-prefix=INVALID_TLS_RECORD_SYSTEMZ %s
+// INVALID_TLS_RECORD_SYSTEMZ: error: invalid argument '-mstack-protector-guard-record' only allowed with '-mstack-protector-guard=global'
diff --git a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
index e31d7c6a86476..ca27ff5d28241 100644
--- a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
@@ -24,9 +24,11 @@
 #include "llvm/BinaryFormat/GOFF.h"
 #include "llvm/CodeGen/MachineModuleInfoImpls.h"
 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
+#include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Mangler.h"
 #include "llvm/IR/Module.h"
 #include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstBuilder.h"
 #include "llvm/MC/MCSectionELF.h"
 #include "llvm/MC/MCStreamer.h"
@@ -213,6 +215,16 @@ SystemZAsmPrinter::AssociatedDataAreaTable::insert(const MachineOperand MO) {
   return insert(Sym, ADAslotType);
 }
 
+namespace {
+unsigned long getStackGuardOffset(const MachineBasicBlock *MBB) {
+  // In the TLS (default) case, AddrReg will contain the thread pointer, so we
+  // need to add 40 bytes to get the actual address of the stack guard.
+  StringRef GuardType =
+      MBB->getParent()->getFunction().getParent()->getStackProtectorGuard();
+  return (GuardType == "global") ? 0 : 40;
+}
+} // namespace
+
 void SystemZAsmPrinter::emitInstruction(const MachineInstr *MI) {
   SystemZ_MC::verifyInstructionPredicates(MI->getOpcode(),
                                           getSubtargetInfo().getFeatureBits());
@@ -740,6 +752,42 @@ void SystemZAsmPrinter::emitInstruction(const MachineInstr *MI) {
   case SystemZ::EH_SjLj_Setup:
     return;
 
+  case SystemZ::LOAD_STACK_GUARD: {
+    // If requested, record address of stack guard address load
+    if (MF->getFunction().hasFnAttribute("mstackprotector-guard-record"))
+      emitStackProtectorLocEntry();
+    Register AddrReg = emitLoadStackGuardAddress(MI);
+    LoweredMI = MCInstBuilder(SystemZ::LG)
+                    .addReg(AddrReg)
+                    .addImm(getStackGuardOffset(MI->getParent()))
+                    .addReg(0);
+  } break;
+
+  case SystemZ::LOAD_STACK_GUARD_ADDRESS:
+    // If requested, record address of stack guard address load
+    if (MF->getFunction().hasFnAttribute("mstackprotector-guard-record"))
+      emitStackProtectorLocEntry();
+    emitLoadStackGuardAddress(MI);
+    return;
+
+  case SystemZ::COMPARE_STACK_GUARD:
+    LoweredMI = MCInstBuilder(SystemZ::CLC)
+                    .addReg(MI->getOperand(0).getReg())
+                    .addImm(MI->getOperand(1).getImm())
+                    .addImm(8)
+                    .addReg(MI->getOperand(2).getReg())
+                    .addImm(getStackGuardOffset(MI->getParent()));
+    break;
+
+  case SystemZ::MOVE_STACK_GUARD:
+    LoweredMI = MCInstBuilder(SystemZ::MVC)
+                    .addReg(MI->getOperand(0).getReg())
+                    .addImm(MI->getOperand(1).getImm())
+                    .addImm(8)
+                    .addReg(MI->getOperand(2).getReg())
+                    .addImm(getStackGuardOffset(MI->getParent()));
+    break;
+
   default:
     Lower.lower(MI, LoweredMI);
     break;
@@ -747,6 +795,85 @@ void SystemZAsmPrinter::emitInstruction(const MachineInstr *MI) {
   EmitToStreamer(*OutStreamer, LoweredMI);
 }
 
+void SystemZAsmPrinter::emitStackProtectorLocEntry() {
+  MCSymbol *Sym = OutContext.createTempSymbol();
+  OutStreamer->pushSection();
+  OutStreamer->switchSection(OutContext.getELFSection(
+      "__stack_protector_loc", ELF::SHT_PROGBITS, ELF::SHF_ALLOC));
+  OutStreamer->emitSymbolValue(Sym, getDataLayout().getPointerSize());
+  OutStreamer->popSection();
+  OutStreamer->emitLabel(Sym);
+}
+
+// Emit the stack guard address load, depending on guard type.
+// Return the register the stack guard address was loaded into.
+Register SystemZAsmPrinter::emitLoadStackGuardAddress(const MachineInstr *MI) {
+  const MachineBasicBlock *MBB = MI->getParent();
+  const MachineFunction &MF = *MBB->getParent();
+  const Register AddrReg = MI->getOperand(0).getReg();
+  const MCRegisterInfo &MRI = *TM.getMCRegisterInfo();
+  const Register Reg32 = MRI.getSubReg(AddrReg, SystemZ::subreg_l32);
+
+  const Module *M = MF.getFunction().getParent();
+  StringRef GuardType = M->getStackProtectorGuard();
+
+  if (GuardType.empty() || (GuardType == "tls")) {
+    // EAR can only load the low subregister so use a shift for %a0 to produce
+    // the GR containing %a0 and %a1.
+
+    // ear <reg>, %a0
+    MCInst EAR1 = MCInstBuilder(SystemZ::EAR)
+                      .addReg(Reg32)
+                      .addReg(SystemZ::A0)
+                      .addReg(AddrReg);
+
+    // sllg <reg>, <reg>, 32
+    MCInst SLLG = MCInstBuilder(SystemZ::SLLG)
+                      .addReg(AddrReg)
+                      .addReg(AddrReg)
+                      .addReg(0)
+                      .addImm(32);
+
+    // ear <reg>, %a1
+    MCInst EAR2 = MCInstBuilder(SystemZ::EAR)
+                      .addReg(Reg32)
+                      .addReg(SystemZ::A1)
+                      .addReg(AddrReg);
+
+    EmitToStreamer(*OutStreamer, EAR1);
+    EmitToStreamer(*OutStreamer, SLLG);
+    EmitToStreamer(*OutStreamer, EAR2);
+  } else if (GuardType == "global") {
+    // Obtain the global value.
+    const auto *GV = M->getGlobalVariable(
+        "__stack_chk_guard", PointerType::getUnqual(M->getContext()));
+    assert(GV &&
+           "could not create reference to global variable __stack_chk_guard");
+    auto *Sym = TM.getSymbol(GV);
+    // Ref->
+    // Emit the address load.
+    MCInst Load;
+    if (M->getPICLevel() == PICLevel::NotPIC) {
+      Load = MCInstBuilder(SystemZ::LARL)
+                 .addReg(AddrReg)
+                 .addExpr(MCSymbolRefExpr::create(Sym, OutContext));
+    } else {
+      Load =
+          MCInstBuilder(SystemZ::LGRL)
+              .addReg(AddrReg)
+              .addExpr(MCSymbolRefExpr::create(Sym, SystemZ::S_GOT, OutContext))
+              .addExpr(getGlobalOffsetTable(OutContext));
+    }
+    EmitToStreamer(*OutStreamer, Load);
+  } else {
+    llvm_unreachable(
+        (Twine("Unknown stack protector type \"") + GuardType + "\"")
+            .str()
+            .c_str());
+  }
+  return AddrReg;
+}
+
 // Emit the largest nop instruction smaller than or equal to NumBytes
 // bytes.  Return the size of nop emitted.
 static unsigned EmitNop(MCContext &OutContext, MCStreamer &OutStreamer,
diff --git a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
index cb101e472824f..22e82a691be64 100644
--- a/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
+++ b/llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
@@ -139,6 +139,8 @@ class LLVM_LIBRARY_VISIBILITY SystemZAsmPrinter : public AsmPrinter {
   void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI,
                                      SystemZMCInstLower &Lower);
   void LowerPATCHABLE_RET(const MachineInstr &MI, SystemZMCInstLower &Lower);
+  Register emitLoadStackGuardAddress(const MachineInstr *MI);
+  void emitStackProtectorLocEntry();
   void emitAttributes(Module &M);
 };
 } // end namespace llvm
diff --git a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
index a05fdc74e6366..fa1daa8bf8c54 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
@@ -10,10 +10,11 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "SystemZTargetMachine.h"
 #include "SystemZISelLowering.h"
+#include "SystemZTargetMachine.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/CodeGen/SelectionDAGISel.h"
+#include "llvm/IR/Module.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/KnownBits.h"
 #include "llvm/Support/raw_ostream.h"
@@ -369,7 +370,11 @@ class SystemZDAGToDAGISel : public SelectionDAGISel {
       if (F.hasFnAttribute("mrecord-mcount"))
         report_fatal_error("mrecord-mcount only supported with fentry-call");
     }
-
+    if (F.getParent()->getStackProtectorGuard() != "global") {
+      if (F.hasFnAttribute("mstack-protector-guard-record"))
+        report_fatal_error("mstack-protector-guard-record only supported with "
+                           "mstack-protector-guard=global");
+    }
     Subtarget = &MF.getSubtarget<SystemZSubtarget>();
     return SelectionDAGISel::runOnMachineFunction(MF);
   }
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index eb93024bed35c..e1662dbedae51 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -14,12 +14,13 @@
 #include "SystemZCallingConv.h"
 #include "SystemZConstantPoolValue.h"
 #include "SystemZMachineFunctionInfo.h"
+#include "SystemZRegisterInfo.h"
 #include "SystemZTargetMachine.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/CodeGen/CallingConvLower.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
-#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/IntrinsicInst.h"
@@ -8026,6 +8027,25 @@ SDValue SystemZTargetLowering::combineSTORE(
                                SN->getMemOperand());
     }
   }
+
+  // combine STORE (LOAD_STACK_GUARD) into MOVE_STACK_GUARD
+  if (Op1->isMachineOpcode() &&
+      (Op1->getMachineOpcode() == SystemZ::LOAD_STACK_GUARD)) {
+    // If so, create a MOVE_STACK_GUARD node to replace the store,
+    // and a LOAD_STACK_GUARD_ADDRESS to replace the LOAD_STACK_GUARD
+    MachineSDNode *LoadAddr = DAG.getMachineNode(
+        SystemZ::LOAD_STACK_GUARD_ADDRESS, SDLoc(SN), MVT::i64);
+    int FI = cast<FrameIndexSDNode>(SN->getOperand(2))->getIndex();
+    // FrameIndex, Dummy Displacement
+    SDValue Ops[] = {DAG.getTargetFrameIndex(FI, MVT::i64),
+                     DAG.getTargetConstant(0, SDLoc(SN), MVT::i64),
+                     SDValue(LoadAddr, 0), SN->getChain()};
+    MachineSDNode *Move = DAG.getMachineNode(SystemZ::MOVE_STACK_GUARD,
+                                             SDLoc(SN), MVT::Other, Ops);
+
+    return SDValue(Move, 0);
+  }
+
   // Combine STORE (BSWAP) into STRVH/STRV/STRVG/VSTBR
   if (!SN->isTruncatingStore() &&
       Op1.getOpcode() == ISD::BSWAP &&
@@ -8845,25 +8865,103 @@ SystemZTargetLowering::getJumpConditionMergingParams(Instruction::BinaryOps Opc,
   return {-1, -1, -1};
 }
 
+namespace {
+bool isStackGuardCheck(SDNode const *N, int &FI, SDValue &InChain,
+                       SDValue &OutChain, SDValue &StackGuardLoad,
+                       SystemZTargetLowering::DAGCombinerInfo &DCI) {
+  auto Comp = N->getOperand(4);
+  if (Comp->getOpcode() != SystemZISD::ICMP)
+    return false;
+
+  if (!Comp->hasOneUse())
+    return false;
+
+  SDValue LHS = Comp->getOperand(0);
+  SDValue RHS = Comp->getOperand(1);
+  LoadSDNode *FILoad;
+
+  if (LHS.isMachineOpcode() &&
+      LHS.getMachineOpcode() == SystemZ::LOAD_STACK_GUARD &&
+      ISD::isNormalLoad(RHS.getNode()) &&
+      dyn_cast<FrameIndexSDNode>(RHS.getOperand(1))) {
+    StackGuardLoad = LHS;
+    FILoad = cast<LoadSDNode>(RHS);
+  } else...
[truncated]

@dominik-steenken
Copy link
Contributor Author

fixes #147976.

unsigned Size = (MI.getOpcode() == TargetOpcode::LOAD_STACK_GUARD)
? 6
: 0; // lg to load value
if (GuardType == "global")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why size 0 for the LG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is a case of clang-format introducing line breaks that change the perceived meaning of a comment. The comment was meant for the whole statement. Basically, the size of LOAD_STACK_GUARD[_ADDRESS] is computed by the size required for loading the stack guard's address (this depends on the guard's type), and then we need to either add 6 for the lg in case of LOAD_STACK_GUARD, or nothing in case of LOAD_STACK_GUARD_ADDRESS.

I'll move the comment one line up, that should make that clear.

#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous #include.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@dominik-steenken dominik-steenken force-pushed the global-stack-protector-s390x branch from 63cdc78 to 931e649 Compare December 5, 2025 09:25
@uweigand
Copy link
Member

uweigand commented Dec 5, 2025

The latest commit 931e649 seem to be broken?

Before reviewing the details, I still have more general questions on the approach. Combining the LOAD_STACK_GUARD into MOVE_STACK_GUARD and COMPARE_STACK_GUARD does make sense to me to ensure the stack canary never ends up in any register. [ As an aside, are there any uses of LOAD_STACK_GUARD remaining after that optimization? That would unfortunate - maybe we should rather error out if that happens? ]

However, with your current approach, the address of the stack canary not only end up in a register (this seems to be unavoidable), but it might actually end up spilled to a stack slot - the LOAD_STACK_GUARD_ADDRESS is created before register allocation, so it loads the address into a virtual register that might get spilled later. [ Note that given that fact, it seems pointless to move the implementation of loading the address all the way to the AsmPrinter. ]

Having checked some of the history of the stack protector implementation in GCC, it does appear that having the address of the stack canary spilled to the stack is a potential weakness that should be avoided. Given that, this new implementation is even worse in this respect then the current implementation - the current implementation does leak the canary into a register, but at least neither the canary nor its address can end up spilled to the stack.

I seem to recall that in an earlier attempt, you delayed the split between loading and using the address to post-RA, but that ran into register allocation problems with the scratch register? I think we might have to go back to that approach and fix those register allocation problems - sorry!

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

🐧 Linux x64 Test Results

  • 193672 tests passed
  • 6238 tests skipped

✅ The build succeeded and all tests passed.

@dominik-steenken
Copy link
Contributor Author

dominik-steenken commented Dec 5, 2025

The latest commit 931e649 seem to be broken?

That is quite strange, debugging it now.

Before reviewing the details, I still have more general questions on the approach. Combining the LOAD_STACK_GUARD into MOVE_STACK_GUARD and COMPARE_STACK_GUARD does make sense to me to ensure the stack canary never ends up in any register. [ As an aside, are there any uses of LOAD_STACK_GUARD remaining after that optimization? That would unfortunate - maybe we should rather error out if that happens? ]

I don't think that uses of LOAD_STACK_GUARD would remain. We could assert out in the asm printer if we encounter one instead of trying to properly expand it.

However, with your current approach, the address of the stack canary not only end up in a register (this seems to be unavoidable), but it might actually end up spilled to a stack slot - the LOAD_STACK_GUARD_ADDRESS is created before register allocation, so it loads the address into a virtual register that might get spilled later. [ Note that given that fact, it seems pointless to move the implementation of loading the address all the way to the AsmPrinter. ]

Could we not use the same approach that the existing LOAD_STACK_GUARD uses in order to avoid the canary being spilled? I.e. could we not mark LOAD_STACK_GUARD_ADDRESS isReMaterializable, and do some further nudging with kill flags to ensure that the value isn't reused? Defining it as mayLoad and adding a volatile MachineMemOperand might also, help, right? Although it is not technically a load in all cases, and mayLoad = 1 also adds some code paths that assume that there is an operand attached to the instruction that is loaded "from", so there may be additional work there.

Having checked some of the history of the stack protector implementation in GCC, it does appear that having the address of the stack canary spilled to the stack is a potential weakness that should be avoided. Given that, this new implementation is even worse in this respect then the current implementation - the current implementation does leak the canary into a register, but at least neither the canary nor its address can end up spilled to the stack.

I understand but i am hoping that this can be avoided with the same approach that current code uses to avoid leaking the canary to the stack (see above).

I seem to recall that in an earlier attempt, you delayed the split between loading and using the address to post-RA, but that ran into register allocation problems with the scratch register? I think we might have to go back to that approach and fix those register allocation problems - sorry!

I did have issues with scratch registers, yes. From what i'm reading, i would have to use the RegisterScavenger to obtain an unused register, and I was having some trouble with that. I guess that could be taken up again.

@uweigand
Copy link
Member

uweigand commented Dec 5, 2025

Could we not use the same approach that the existing LOAD_STACK_GUARD uses in order to avoid the canary being spilled? I.e. could we not mark LOAD_STACK_GUARD_ADDRESS isReMaterializable, and do some further nudging with kill flags to ensure that the value isn't reused?

If that works reliably, I guess I'd be fine with it. We should have test cases to demonstrate that it is isn't spilled, however.

This commit allows `-mstack-protector-guard=global` for `s390x`.

It also adds a new arch-specific option `-mstack-protector-guard-record`,
analogous to `-mrecord-mcount`, which will cause `clang` to emit a
`__stack_protector_loc` section containing all the locations in the output
binary that load the stack guard address, for the purposes of later rewriting
of those loads by the kernel. This new option only works together with the
`global` stack protector.

In order to minimize exposure of the stack guard, both the storing of the
stack guard onto the stack, and the later comparison of that value against
the reference value, are handled via direct mem-to-mem instructions, those
being `mvc` and `clc`.

This is achieved by introducing two new pseudo instructions, `MOVE_STACK_GUARD`
and `COMPARE_STACK_GUARD`, which are inserted by the DAGCombiner after
SelectionDAG construction. These pseudos stick around throughout the entire
backend pipeline, and are lowered only in the AsmPrinter.

This commit also adds tests for both kinds of stack protectors (tls and global),
for the proper insertion of the pseudos, the proper emission of the,
`__stack_protector_loc` section, as well as the option compatibility checks
for the new options.
@dominik-steenken dominik-steenken force-pushed the global-stack-protector-s390x branch from 931e649 to a70200d Compare December 5, 2025 14:29
@JonPsson1
Copy link
Contributor

I wonder why the LOAD_STACK_GUARD_ADDRESS MI is needed at all? It seems the only purpose it has is to provide an ADDR64 register to emit an address in AsmPrinter. Wouldn't it be better to add an extra ADDR64 use operand to the MOVE_STACK_GUARD / COMPARE_STACK_GUARD pseudos, which should make a physreg allocated for it, which could then be used in emitLoadStackGuardAddress(). That way it would never be loaded into a vreg and risk being spilled.

@uweigand
Copy link
Member

uweigand commented Dec 5, 2025

I wonder why the LOAD_STACK_GUARD_ADDRESS MI is needed at all? It seems the only purpose it has is to provide an ADDR64 register to emit an address in AsmPrinter. Wouldn't it be better to add an extra ADDR64 use operand to the MOVE_STACK_GUARD / COMPARE_STACK_GUARD pseudos, which should make a physreg allocated for it, which could then be used in emitLoadStackGuardAddress(). That way it would never be loaded into a vreg and risk being spilled.

Yes, that's what Dominik tried. The problem was that if the stack slot operand to MOVE_STACK_GUARD goes out of displacement range and requires a temporary register, the compiler would sometimes allocate that same extra physregs for it. This shouldn't happen if the physreg is marked as earlyclobber, but it still did somehow ... This is what I meant above we'd have to go back to and investigate and fix.

@JonPsson1
Copy link
Contributor

ah, I was thinking about the LOAD_STACK_GUARD_ADDRESS preg being potentially spilled.

I guess I am not quite sure about the details about when the FI is known to go out of range. Isn't that decided during PrologEpilog insertion, after all the spill slots have been created, and FrameIndices are eliminated? If it is then out of range, a vreg is allocated and used to load the address, and the vreg is handled with the RegScavenger, but that would mean that some other value is being preemptied, but the address is now in the scavenged reg. So then it wouldn't be on the stack. Or maybe this is about when there already is e.g. a huge alloca before regalloc, so the LAY is somehow produced earlier? Maybe that should be avoided somehow so it never happens early so that it always falls back on the scavenger?

I am thinking that since this is in the entry block, and in the return block(s), maybe it wouldn't be such a big loss to allocate needed regs (one for the external canary address, one for the FI address when it is out of range) as extra use operands. It would at least be simpler: one pseudo for each move/compare to be expanded late, which would also be easier to verify the location of in relation to other memory accesses. This would make this more robust against all other optimizations as well, even though those MI flags should generally be enough. But maybe the scavenger solution is better in the sense that it is only used if actually needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:SystemZ clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema"

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants