Skip to content

Conversation

@Artem-B
Copy link
Member

@Artem-B Artem-B commented Jul 31, 2025

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Artem Belevich (Artem-B)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/151604.diff

3 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/ELF.h (+3)
  • (modified) llvm/lib/Object/ELFObjectFile.cpp (+9)
  • (modified) llvm/tools/llvm-readobj/ELFDumper.cpp (+3-1)
diff --git a/llvm/include/llvm/BinaryFormat/ELF.h b/llvm/include/llvm/BinaryFormat/ELF.h
index ad35d7f05d5da..749971e354f66 100644
--- a/llvm/include/llvm/BinaryFormat/ELF.h
+++ b/llvm/include/llvm/BinaryFormat/ELF.h
@@ -973,7 +973,10 @@ enum : unsigned {
 
   // SM based processor values.
   EF_CUDA_SM100 = 0x6400,
+  EF_CUDA_SM101 = 0x6500,
+  EF_CUDA_SM103 = 0x6700,
   EF_CUDA_SM120 = 0x7800,
+  EF_CUDA_SM121 = 0x7900,
 
   // Set when using an accelerator variant like sm_100a.
   EF_CUDA_ACCELERATORS = 0x8,
diff --git a/llvm/lib/Object/ELFObjectFile.cpp b/llvm/lib/Object/ELFObjectFile.cpp
index 0919c6aad74f2..aff047c297cc2 100644
--- a/llvm/lib/Object/ELFObjectFile.cpp
+++ b/llvm/lib/Object/ELFObjectFile.cpp
@@ -688,11 +688,20 @@ StringRef ELFObjectFileBase::getNVPTXCPUName() const {
   case ELF::EF_CUDA_SM100:
     return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_100a"
                                                           : "sm_100";
+  case ELF::EF_CUDA_SM101:
+    return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_101a"
+                                                          : "sm_101";
+  case ELF::EF_CUDA_SM103:
+    return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_103a"
+                                                          : "sm_103";
 
   // Rubin architecture.
   case ELF::EF_CUDA_SM120:
     return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_120a"
                                                           : "sm_120";
+  case ELF::EF_CUDA_SM121:
+    return getPlatformFlags() & ELF::EF_CUDA_ACCELERATORS ? "sm_121a"
+                                                          : "sm_121";
   default:
     llvm_unreachable("Unknown EF_CUDA_SM value");
   }
diff --git a/llvm/tools/llvm-readobj/ELFDumper.cpp b/llvm/tools/llvm-readobj/ELFDumper.cpp
index 94ce38605f5c9..1321d594416c5 100644
--- a/llvm/tools/llvm-readobj/ELFDumper.cpp
+++ b/llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1683,7 +1683,9 @@ const EnumEntry<unsigned> ElfHeaderNVPTXFlags[] = {
     ENUM_ENT(EF_CUDA_SM75, "sm_75"),   ENUM_ENT(EF_CUDA_SM80, "sm_80"),
     ENUM_ENT(EF_CUDA_SM86, "sm_86"),   ENUM_ENT(EF_CUDA_SM87, "sm_87"),
     ENUM_ENT(EF_CUDA_SM89, "sm_89"),   ENUM_ENT(EF_CUDA_SM90, "sm_90"),
-    ENUM_ENT(EF_CUDA_SM100, "sm_100"), ENUM_ENT(EF_CUDA_SM120, "sm_120"),
+    ENUM_ENT(EF_CUDA_SM100, "sm_100"), ENUM_ENT(EF_CUDA_SM101, "sm_101"),
+    ENUM_ENT(EF_CUDA_SM103, "sm_103"), ENUM_ENT(EF_CUDA_SM120, "sm_120"),
+    ENUM_ENT(EF_CUDA_SM121, "sm_121"),
 };
 
 const EnumEntry<unsigned> ElfHeaderRISCVFlags[] = {

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks.

@Artem-B
Copy link
Member Author

Artem-B commented Jul 31, 2025

@jhuber6 The patch does not work correctly. It appears that those EF_CUDA_SM1xx enums are treated as bitfields somewhere. For SM103a, I see readelf reporting sm100,101, and 103a:

% bin/llvm-readelf -h foo103
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 41 08 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            41
  ABI Version:                       8
  Type:                              EXEC (Executable file)
  Machine:                           NVIDIA CUDA architecture
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          2104 (bytes into file)
  Start of section headers:          1400 (bytes into file)
  Flags:                             0x600670A, sm_100, sm_101, sm_103a
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         11
  Section header string table index: 1

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 31, 2025

Hm, that's surprising, it's something in ELFDumper I'd say.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I might've left a bug that accidentally worked.

I think this here needs to use the correct bitmask

unsigned(ELF::EF_CUDA_SM));
.

@Artem-B
Copy link
Member Author

Artem-B commented Jul 31, 2025

I think this here needs to use the correct bitmask

Yup. That was it. Fixed now.

@Artem-B Artem-B merged commit 4e596fc into llvm:main Aug 1, 2025
9 checks passed
@Artem-B Artem-B deleted the sm103-elf branch August 1, 2025 00:21
@Artem-B
Copy link
Member Author

Artem-B commented Aug 1, 2025

@jhuber6 Fun fact: thef architecture variant does not have any distinguishing marks in the ELF header. It looks like a plain un-suffixed arch. It does get packed as an f arch into the fatbin, but NVIDIA apparently forgot to reflect the f arch in cubin. :-/

Either that, or I'm missing something here.

@AlexMaclean would you happen to have an idea whether the cubin compiled for the f variant of the architectures is expected to look exactly like a cubin for the plain architecture, without suffix?

Is the idea that 'sm_101f' allows (subset of) instructions from sm_101a, but unlike the binaries for the actual 101a can be executed on sm_103, as well. In that sense, keeping the ELF marked as sm_101 sort of makes sense, as it would behave exactly the same way as sm_101, and the difference is only on the ptxas compilation level in terms of which instructions are allowed. The PTX with the instructions available in f variants only may not be compileable for the newer GPUs, but for whatever instructions ptxas accepts, the binary will behave the same way for sm_101 and sm_101f.

@jhuber6
Copy link
Contributor

jhuber6 commented Aug 1, 2025

@jhuber6 Fun fact: thef architecture variant does not have any distinguishing marks in the ELF header. It looks like a plain un-suffixed arch. It does get packed as an f arch into the fatbin, but NVIDIA apparently forgot to reflect the f arch in cubin. :-/

I've learned to stop questioning things when it comes to NVIDIA's binary decisions. Add it to such gems like the PTX .section keyword only working in debug mode or weak linkage being implemented wrong.

@Artem-B
Copy link
Member Author

Artem-B commented Aug 1, 2025

To be fair, it heavily depends on particular team at NVIDIA. Anecdotally, the more exposed to the open source (or other kinds of external influence) their team is, the better things tend to work. NVIDIA's teams working on NVPTX, CCCL, and OpenXLA are great to work with. Oddities tend to surface in NVIDIA's black boxes that they never intended for external tinkering (e.g. nvcc front-end, binary tools, some binary-only libraries). The problem is that there's usually no good communication channel to the owners of the components with those issues -- there's still no public bug tracker of any kind for CUDA SDK components.

@AlexMaclean
Copy link
Member

I've forwarded the question along internally but this isn't an area I have much familiarity with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants