[ELF] handle new NVIDIA GPU variants. #151604

Artem-B · 2025-07-31T22:09:14Z

No description provided.

llvmbot · 2025-07-31T22:09:46Z

@llvm/pr-subscribers-llvm-binary-utilities

Author: Artem Belevich (Artem-B)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/151604.diff

3 Files Affected:

(modified) llvm/include/llvm/BinaryFormat/ELF.h (+3)
(modified) llvm/lib/Object/ELFObjectFile.cpp (+9)
(modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+3-1)

diff --git a/llvm/include/llvm/BinaryFormat/ELF.h b/llvm/include/llvm/BinaryFormat/ELF.h
index ad35d7f05d5da..749971e354f66 100644
--- a/llvm/include/llvm/BinaryFormat/ELF.h
+++ b/llvm/include/llvm/BinaryFormat/ELF.h
@@ -973,7 +973,10 @@ enum : unsigned {
 
   // SM based processor values.
   EF_CUDA_SM100 = 0x6400,
+  EF_CUDA_SM101 = 0x6500,
+  EF_CUDA_SM103 = 0x6700,
   EF_CUDA_SM120 = 0x7800,
+  EF_CUDA_SM121 = 0x7900,
 
   // Set when using an accelerator variant like sm_100a.
   EF_CUDA_ACCELERATORS = 0x8,
diff --git a/llvm/lib/Object/ELFObjectFile.cpp b/llvm/lib/Object/ELFObjectFile.cpp
index 0919c6aad74f2..aff047c297cc2 100644
--- a/llvm/lib/Object/ELFObjectFile.cpp
+++ b/llvm/lib/Object/ELFObjectFile.cpp
@@ -688,11 +688,20 @@ StringRef ELFObjectFileBase::getNVPTXCPUName() const {
   case ELF::EF_CUDA_SM100:
     return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_100a"
                                                           : "sm_100";
+  case ELF::EF_CUDA_SM101:
+    return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_101a"
+                                                          : "sm_101";
+  case ELF::EF_CUDA_SM103:
+    return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_103a"
+                                                          : "sm_103";
 
   // Rubin architecture.
   case ELF::EF_CUDA_SM120:
     return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_120a"
                                                           : "sm_120";
+  case ELF::EF_CUDA_SM121:
+    return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_121a"
+                                                          : "sm_121";
   default:
     llvm_unreachable("Unknown EF_CUDA_SM value");
   }
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index 94ce38605f5c9..1321d594416c5 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1683,7 +1683,9 @@ const EnumEntry<unsigned> ElfHeaderNVPTXFlags[] = {
     ENUM_ENT(EF_CUDA_SM75, "sm_75"),   ENUM_ENT(EF_CUDA_SM80, "sm_80"),
     ENUM_ENT(EF_CUDA_SM86, "sm_86"),   ENUM_ENT(EF_CUDA_SM87, "sm_87"),
     ENUM_ENT(EF_CUDA_SM89, "sm_89"),   ENUM_ENT(EF_CUDA_SM90, "sm_90"),
-    ENUM_ENT(EF_CUDA_SM100, "sm_100"), ENUM_ENT(EF_CUDA_SM120, "sm_120"),
+    ENUM_ENT(EF_CUDA_SM100, "sm_100"), ENUM_ENT(EF_CUDA_SM101, "sm_101"),
+    ENUM_ENT(EF_CUDA_SM103, "sm_103"), ENUM_ENT(EF_CUDA_SM120, "sm_120"),
+    ENUM_ENT(EF_CUDA_SM121, "sm_121"),
 };
 
 const EnumEntry<unsigned> ElfHeaderRISCVFlags[] = {

jhuber6

LG, thanks.

Artem-B · 2025-07-31T22:11:39Z

@jhuber6 The patch does not work correctly. It appears that those EF_CUDA_SM1xx enums are treated as bitfields somewhere. For SM103a, I see readelf reporting sm100,101, and 103a:

% bin/llvm-readelf -h foo103
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 41 08 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            41
  ABI Version:                       8
  Type:                              EXEC (Executable file)
  Machine:                           NVIDIA CUDA architecture
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          2104 (bytes into file)
  Start of section headers:          1400 (bytes into file)
  Flags:                             0x600670A, sm_100, sm_101, sm_103a
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         11
  Section header string table index: 1

jhuber6 · 2025-07-31T22:14:02Z

Hm, that's surprising, it's something in ELFDumper I'd say.

jhuber6

Hm, I might've left a bug that accidentally worked.

I think this here needs to use the correct bitmask

llvm-project/llvm/tools/llvm-readobj/ELFDumper.cpp

Line 3663 in d3a9cde

unsigned(ELF::EF_CUDA_SM));

.

Artem-B · 2025-07-31T22:40:57Z

I think this here needs to use the correct bitmask

Yup. That was it. Fixed now.

Artem-B · 2025-08-01T00:34:21Z

@jhuber6 Fun fact: thef architecture variant does not have any distinguishing marks in the ELF header. It looks like a plain un-suffixed arch. It does get packed as an f arch into the fatbin, but NVIDIA apparently forgot to reflect the f arch in cubin. :-/

Either that, or I'm missing something here.

@AlexMaclean would you happen to have an idea whether the cubin compiled for the f variant of the architectures is expected to look exactly like a cubin for the plain architecture, without suffix?

Is the idea that 'sm_101f' allows (subset of) instructions from sm_101a, but unlike the binaries for the actual 101a can be executed on sm_103, as well. In that sense, keeping the ELF marked as sm_101 sort of makes sense, as it would behave exactly the same way as sm_101, and the difference is only on the ptxas compilation level in terms of which instructions are allowed. The PTX with the instructions available in f variants only may not be compileable for the newer GPUs, but for whatever instructions ptxas accepts, the binary will behave the same way for sm_101 and sm_101f.

jhuber6 · 2025-08-01T01:59:04Z

@jhuber6 Fun fact: thef architecture variant does not have any distinguishing marks in the ELF header. It looks like a plain un-suffixed arch. It does get packed as an f arch into the fatbin, but NVIDIA apparently forgot to reflect the f arch in cubin. :-/

I've learned to stop questioning things when it comes to NVIDIA's binary decisions. Add it to such gems like the PTX .section keyword only working in debug mode or weak linkage being implemented wrong.

Artem-B · 2025-08-01T18:21:37Z

To be fair, it heavily depends on particular team at NVIDIA. Anecdotally, the more exposed to the open source (or other kinds of external influence) their team is, the better things tend to work. NVIDIA's teams working on NVPTX, CCCL, and OpenXLA are great to work with. Oddities tend to surface in NVIDIA's black boxes that they never intended for external tinkering (e.g. nvcc front-end, binary tools, some binary-only libraries). The problem is that there's usually no good communication channel to the owners of the components with those issues -- there's still no public bug tracker of any kind for CUDA SDK components.

AlexMaclean · 2025-08-01T20:02:27Z

I've forwarded the question along internally but this isn't an area I have much familiarity with.

Artem-B requested a review from jhuber6 July 31, 2025 22:09

llvmbot added the llvm:binary-utilities label Jul 31, 2025

jhuber6 approved these changes Jul 31, 2025

View reviewed changes

jhuber6 reviewed Jul 31, 2025

View reviewed changes

Artem-B added 3 commits July 31, 2025 15:49

[ELF] handle new NVIDIA GPU variants.

f396d5a

correctly handle new GPU values

ebfa740

clang-format

0981290

Artem-B force-pushed the sm103-elf branch from d9e2c71 to 0981290 Compare July 31, 2025 22:49

jhuber6 approved these changes Jul 31, 2025

View reviewed changes

Artem-B merged commit 4e596fc into llvm:main Aug 1, 2025
9 checks passed

Artem-B deleted the sm103-elf branch August 1, 2025 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ELF] handle new NVIDIA GPU variants. #151604

[ELF] handle new NVIDIA GPU variants. #151604

Uh oh!

Artem-B commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025

Uh oh!

jhuber6 left a comment

Uh oh!

Artem-B commented Jul 31, 2025

Uh oh!

jhuber6 commented Jul 31, 2025

Uh oh!

jhuber6 left a comment

Uh oh!

Artem-B commented Jul 31, 2025

Uh oh!

Uh oh!

Artem-B commented Aug 1, 2025

Uh oh!

jhuber6 commented Aug 1, 2025 •

edited

Loading

Uh oh!

Artem-B commented Aug 1, 2025

Uh oh!

AlexMaclean commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ELF] handle new NVIDIA GPU variants. #151604

[ELF] handle new NVIDIA GPU variants. #151604

Uh oh!

Conversation

Artem-B commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Artem-B commented Jul 31, 2025

Uh oh!

jhuber6 commented Jul 31, 2025

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Artem-B commented Jul 31, 2025

Uh oh!

Uh oh!

Artem-B commented Aug 1, 2025

Uh oh!

jhuber6 commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B commented Aug 1, 2025

Uh oh!

AlexMaclean commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jhuber6 commented Aug 1, 2025 •

edited

Loading