Skip to content

Conversation

@modiking
Copy link
Contributor

PTX source files are expected to only contain ASCII text (https://docs.nvidia.com/cuda/parallel-thread-execution/#source-format).

ptxas has so far not enforced this but is moving towards doing so. This revealed a problem where the null terminator is getting printed out in the output file in MLIR path when outputting ptx directly. Remove the extraneous null.

@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-mlir-gpu
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-llvm

Author: None (modiking)

Changes

PTX source files are expected to only contain ASCII text (https://docs.nvidia.com/cuda/parallel-thread-execution/#source-format).

ptxas has so far not enforced this but is moving towards doing so. This revealed a problem where the null terminator is getting printed out in the output file in MLIR path when outputting ptx directly. Remove the extraneous null.


Full diff: https://github.com/llvm/llvm-project/pull/133019.diff

2 Files Affected:

  • (modified) mlir/lib/Target/LLVM/NVVM/Target.cpp (+2-5)
  • (modified) mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp (+4-1)
diff --git a/mlir/lib/Target/LLVM/NVVM/Target.cpp b/mlir/lib/Target/LLVM/NVVM/Target.cpp
index cccb2c276ae37..ebdc993a93caf 100644
--- a/mlir/lib/Target/LLVM/NVVM/Target.cpp
+++ b/mlir/lib/Target/LLVM/NVVM/Target.cpp
@@ -722,11 +722,8 @@ NVPTXSerializer::moduleToObject(llvm::Module &llvmModule) {
 
   // Return PTX if the compilation target is `assembly`.
   if (targetOptions.getCompilationTarget() ==
-      gpu::CompilationTarget::Assembly) {
-    // Make sure to include the null terminator.
-    StringRef bin(serializedISA->c_str(), serializedISA->size() + 1);
-    return SmallVector<char, 0>(bin.begin(), bin.end());
-  }
+      gpu::CompilationTarget::Assembly)
+    return SmallVector<char, 0>(serializedISA->begin(), serializedISA->end());
 
   std::optional<SmallVector<char, 0>> result;
   moduleToObjectTimer.startTimer();
diff --git a/mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp b/mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
index eabfd1c4d32eb..1231019e5dadf 100644
--- a/mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
+++ b/mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
@@ -129,7 +129,10 @@ TEST_F(MLIRTargetLLVMNVVM, SKIP_WITHOUT_NVPTX(SerializeNVVMToPTX)) {
     ASSERT_TRUE(!object->empty());
 
     ASSERT_TRUE(
-        StringRef(object->data(), object->size()).contains("nvvm_kernel"));
+        StringRef(object->data(), object->size()).contains("nvvm_kernel"));     
+    ASSERT_TRUE(
+        StringRef(object->data(), object->size()).count('\0') == 0);
+
   }
 }
 

@github-actions
Copy link

github-actions bot commented Mar 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@modiking
Copy link
Contributor Author

modiking commented Mar 25, 2025

Looking around I couldn't find if there was a convention to what should be outputted by moduleToObject. I did see that the ROCDL target omits the null terminator (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVM/ROCDL/Target.cpp#L440). @fabianmcg do you remember if there was a specific reason the null terminator got added in 5093413 and only for NVPTX?

@fabianmcg
Copy link
Contributor

fabianmcg commented Mar 26, 2025

Looking around I couldn't find if there was a convention to what should be outputted by moduleToObject. I did see that the ROCDL target omits the null terminator (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVM/ROCDL/Target.cpp#L440).

They don't have it because there's no JIT path for ROCm.

@fabianmcg do you remember if there was a specific reason the null terminator got added in 5093413 and only for NVPTX?

Yes, that was added for the ptx-JIT execution path, see the driver function https://github.com/llvm/llvm-project/blob/main/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp#L143-L144
Which according to the CUDA driver docs it requires a null-terminated string.

The image may be a cubin or fatbin as output by nvcc, or a NULL-terminated PTX, either as output by nvcc or hand-written.

Also, FWIW \0 is ASCII https://en.wikipedia.org/wiki/ASCII

@joker-eph
Copy link
Collaborator

If this is about ptxas then the fix may rather be at the point where we call ptxas. We have to write this down to a file somehow to pass it to ptxas and this is where we write an extra null character to the file.

Here is how we prepare the input to ptxas:

llvm::raw_fd_ostream ptxStream(ptxFile->first, ec);
if (ec) {
emitError(loc) << "Couldn't open the file: `" << ptxFile->first
<< "`, error message: " << ec.message();
return std::nullopt;
}
ptxStream << ptxCode;
if (ptxStream.has_error()) {
emitError(loc) << "An error occurred while writing the PTX to: `"
<< ptxFile->first << "`.";
return std::nullopt;
}
ptxStream.flush();
}

It's not clear to me why there would be the extra null character in the file though?

@modiking
Copy link
Contributor Author

modiking commented Mar 29, 2025

Looking around I couldn't find if there was a convention to what should be outputted by moduleToObject. I did see that the ROCDL target omits the null terminator (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVM/ROCDL/Target.cpp#L440).

They don't have it because there's no JIT path for ROCm.

@fabianmcg do you remember if there was a specific reason the null terminator got added in 5093413 and only for NVPTX?

Yes, that was added for the ptx-JIT execution path, see the driver function https://github.com/llvm/llvm-project/blob/main/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp#L143-L144 Which according to the CUDA driver docs it requires a null-terminated string.

That's good to know, thanks! I'm thinking then if this is specific to the JIT execution path it may be better to add the null there instead of to the ptx under gpu::CompilationTarget::Assembly. WDYT?

The image may be a cubin or fatbin as output by nvcc, or a NULL-terminated PTX, either as output by nvcc or hand-written.

Also, FWIW \0 is ASCII https://en.wikipedia.org/wiki/ASCII

That's true 😅. Nevertheless ptxas team does want to exclude '\0' as well so a way to do that would still be great. I'll make sure the documentation gets updated to reflect this as well.

If this is about ptxas then the fix may rather be at the point where we call ptxas. We have to write this down to a file somehow to pass it to ptxas and this is where we write an extra null character to the file.

On that path we're in gpu::CompilationTarget::Binary or gpu::CompilationTarget::Fatbin and there's no extra null character written in the file. serializedISA has no null character at the end but one gets added under the gpu::CompilationTarget::Assembly path:

// Return PTX if the compilation target is `assembly`.
if (targetOptions.getCompilationTarget() ==
gpu::CompilationTarget::Assembly) {
// Make sure to include the null terminator.
StringRef bin(serializedISA->c_str(), serializedISA->size() + 1);
return SmallVector<char, 0>(bin.begin(), bin.end());
}
std::optional<SmallVector<char, 0>> result;
moduleToObjectTimer.startTimer();
// Compile to binary.
#if MLIR_ENABLE_NVPTXCOMPILER
result = compileToBinaryNVPTX(*serializedISA);
#else
result = compileToBinary(*serializedISA);
#endif // MLIR_ENABLE_NVPTXCOMPILER

On the specific path that's causing issues, gpu::CompilationTarget::Assembly is being requested and then the contents of the buffer is dumped to a file. This file would contain a null character and then when passed to 12.8+ ptxas causes an error.

@fabianmcg
Copy link
Contributor

fabianmcg commented Mar 29, 2025

That's good to know, thanks! I'm thinking then if this is specific to the JIT execution path it may be better to add the null there instead of to the ptx under gpu::CompilationTarget::Assembly. WDYT?

I think it's cheaper to just not print the terminator, rather than having to copy the string to add one for the JIT path, you can do that with:

fileStream << StringRef(ptxData).drop_back();
// Or
fileStream << StringRef(ptxData).consume_back('\0');

See Mehdi's comment as well.

But I'm ok with either option.

@joker-eph
Copy link
Collaborator

On that path we're in gpu::CompilationTarget::Binary or gpu::CompilationTarget::Fatbin and there's no extra null character written in the file. serializedISA has no null character at the end but one gets added under the gpu::CompilationTarget::Assembly path:

I don't follow what you're describing.
Can you provide a reproducer for the bug and a backtrace of the point where the error occurs?

@joker-eph
Copy link
Collaborator

I dug a bit more to see how things are fitting.

The moduleToObject interface is returning a "bag of bytes" (as materialized by the use of SmallVector<char, 0> and not any *String* class), as such there shouldn't be an expectation of null termination in the output of moduleToObject.

When the output is embedded in the IR, we will always add a null terminator since we store it as a StringAttr, but with the correct size (that is not including the null terminator): https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Support/StorageUniquer.h#L112-L114

However we currently have also a null terminator within the string:

  gpu.binary @kernel_module1  [#gpu.object<#nvvm.target<chip = "sm_70">, properties = {O = 2 : i32}, assembly = "//\0A// Generated by NVIDIA NVVM Compiler\0A//\0A// Compiler Build ID: UNKNOWN\0A// UNKNOWN\0A// Based on LLVM 20.0.0git\0A//\0A\0A.version 6.0\0A.target sm_70\0A.address_size 64\0A\0A\09// .globl\09kernel\0A\0A.visible .func kernel()\0A{\0A\0A\0A\09ret;\0A\0A}\0A\00">]

See the \00 at the end of the string.

So this change LGTM.

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can you rebase to retrigger CI?

@modiking
Copy link
Contributor Author

modiking commented Apr 3, 2025

I dug a bit more to see how things are fitting.

The moduleToObject interface is returning a "bag of bytes" (as materialized by the use of SmallVector<char, 0> and not any *String* class), as such there shouldn't be an expectation of null termination in the output of moduleToObject.

When the output is embedded in the IR, we will always add a null terminator since we store it as a StringAttr, but with the correct size (that is not including the null terminator): https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Support/StorageUniquer.h#L112-L114

However we currently have also a null terminator within the string:

  gpu.binary @kernel_module1  [#gpu.object<#nvvm.target<chip = "sm_70">, properties = {O = 2 : i32}, assembly = "//\0A// Generated by NVIDIA NVVM Compiler\0A//\0A// Compiler Build ID: UNKNOWN\0A// UNKNOWN\0A// Based on LLVM 20.0.0git\0A//\0A\0A.version 6.0\0A.target sm_70\0A.address_size 64\0A\0A\09// .globl\09kernel\0A\0A.visible .func kernel()\0A{\0A\0A\0A\09ret;\0A\0A}\0A\00">]

See the \00 at the end of the string.

So this change LGTM.

Thanks for taking a look!

I'm double checking that the integration tests in JIT mode pass with the change and I'm noticing that:

/data/mmo/llvm-project/mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA$ /data/mmo/llvm-project/build-jit-mlir/bin/mlir-runner    --shared-libs=/data/mmo/llvm-project/build-jit-mlir/lib/libmlir_cuda_runtime.so    --shared-libs=/data/mmo/llvm-project/build-jit-mlir/lib/libmlir_c_runner_utils.so    --e main --entry-point-result
=void out.ll --dump-object-file
JIT compilation failed with: 'ptxas application ptx input, line 147; fatal   : Parsing error near 'kernel0': syntax error
ptxas fatal   : Ptx assembly aborted due to errors'

Checking out the dumped object file I'm seeing that when the assembly is embedded into the JIT program it doesn't automatically get a null character at the end:

/data/mmo/llvm-project/mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA$ readelf -x .lrodata out.ll.o | tai
l
  0x00000f50 09257264 36382c20 25726436 382c2031 .%rd68, %rd68, 1
  0x00000f60 363b0a09 6164642e 73363420 09257264 6;..add.s64 .%rd
  0x00000f70 36372c20 25726436 372c2031 363b0a09 67, %rd67, 16;..
  0x00000f80 73657470 2e6c742e 73363420 09257036 setp.lt.s64 .%p6
  0x00000f90 2c202572 6436392c 20257264 31333b0a , %rd69, %rd13;.
  0x00000fa0 09402570 36206272 61200924 4c5f5f42 .@%p6 bra .$L__B
  0x00000fb0 42305f37 3b0a0962 72612e75 6e692009 B0_7;..bra.uni .
  0x00000fc0 244c5f5f 4242305f 383b0a24 4c5f5f42 $L__BB0_8;.$L__B
  0x00000fd0 42305f39 3a0a0972 65743b0a 0a7d0a   B0_9:..ret;..}.

So when it gets embedded in the binary we don't get a null character. Fortunately when this gets embedded as a llvm::ConstantDataArray we can specify whether we want it null-terminated or not so I made a change there. I think it's not any more invasive than currently always embedding it but take a look and LMK.

@modiking modiking force-pushed the nvptx_null_terminator branch from abdafae to c2dad87 Compare April 3, 2025 06:03
Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@modiking modiking merged commit 9f2feeb into llvm:main Apr 3, 2025
11 checks passed
@modiking
Copy link
Contributor Author

modiking commented Apr 3, 2025

Thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants