Skip to content

Conversation

@IgWod-IMG
Copy link
Contributor

By default, mlir-translate writes all output into a single file even when --split-input-file is used. This is not an issue for text files as they can be easily split with an output separator. However, this causes issues with binary SPIR-V modules.

Firstly, a binary file with multiple modules is not a valid SPIR-V, but will be created if multiple modules are specified in the same file and separated by "// -----". This does not cause issues with MLIR internal tools but does not work with SPIRV-Tools.

Secondly, splitting binary files after serialization is non-trivial, when compared to text files, so using an external tool is not desirable.

This patch adds a SPIR-V serialization option that write SPIR-V modules to separate files in addition to writing them to the mlir-translate output file. This is not the ideal solution and ideally mlir-translate would allow generating multiple output files when --split-input-file is used, however adding such functionality is again non-trival due to how processing and splitting is done: function handles writing to a single os that are passed around, and number of split buffers is not known ahead of time. As such a I propose to have a SPIR-V internal option that will dump modules to files in the form they can be processed by spirv-val. The behaviour of the new added argument may be confusing, but benefits from being internal to SPIR-V target.

Alternatively, we could expose the spirv option in mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp, and slice the output file on the SPIR-V magic number, and not keep the file generated by default by mlir-translate. This would be a bit cleaner in API sense, as it would not generate the additional file containing all modules together. However, it pushes SPIR-V specific code into the generic part of the mlir-translate and slicing is potentially more error prone that just writing a single module after it was serialized.

… files

By default, `mlir-translate` writes all output into a single file even
when `--split-input-file` is used. This is not an issue for text files
as they can be easily split with an output separator. However, this
causes issues with binary SPIR-V modules.

Firstly, a binary file with multiple modules is not a valid SPIR-V,
but will be created if multiple modules are specified in the same
file and separated by "// -----". This does not cause issues with
MLIR internal tools but does not work with SPIRV-Tools.

Secondly, splitting binary files after serialization is non-trivial,
when compared to text files, so using an external tool is not
desirable.

This patch adds a SPIR-V serialization option that write SPIR-V modules
to separate files in addition to writing them to the `mlir-translate`
output file. This is not the ideal solution and ideally `mlir-translate`
would allow generating multiple output files when `--split-input-file`
is used, however adding such functionality is again non-trival due to
how processing and splitting is done: function handles writing to
a single `os` that are passed around, and number of split buffers is not
known ahead of time. As such a I propose to have a SPIR-V internal option
that will dump modules to files in the form they can be processed by
`spirv-val`. The behaviour of the new added argument may be confusing,
but benefits from being internal to SPIR-V target.

Alternatively, we could expose the spirv option in `mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp`,
and slice the output file on the SPIR-V magic number, and not keep
the file generated by default by `mlir-translate`. This would be a bit
cleaner in API sense, as it would not generate the additional file
containing all modules together. However, it pushes SPIR-V specific
code into the generic part of the `mlir-translate` and slicing is
potentially more error prone that just writing a single module
after it was serialized.
@llvmbot
Copy link
Member

llvmbot commented Aug 8, 2025

@llvm/pr-subscribers-mlir-spirv

@llvm/pr-subscribers-mlir

Author: Igor Wodiany (IgWod-IMG)

Changes

By default, mlir-translate writes all output into a single file even when --split-input-file is used. This is not an issue for text files as they can be easily split with an output separator. However, this causes issues with binary SPIR-V modules.

Firstly, a binary file with multiple modules is not a valid SPIR-V, but will be created if multiple modules are specified in the same file and separated by "// -----". This does not cause issues with MLIR internal tools but does not work with SPIRV-Tools.

Secondly, splitting binary files after serialization is non-trivial, when compared to text files, so using an external tool is not desirable.

This patch adds a SPIR-V serialization option that write SPIR-V modules to separate files in addition to writing them to the mlir-translate output file. This is not the ideal solution and ideally mlir-translate would allow generating multiple output files when --split-input-file is used, however adding such functionality is again non-trival due to how processing and splitting is done: function handles writing to a single os that are passed around, and number of split buffers is not known ahead of time. As such a I propose to have a SPIR-V internal option that will dump modules to files in the form they can be processed by spirv-val. The behaviour of the new added argument may be confusing, but benefits from being internal to SPIR-V target.

Alternatively, we could expose the spirv option in mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp, and slice the output file on the SPIR-V magic number, and not keep the file generated by default by mlir-translate. This would be a bit cleaner in API sense, as it would not generate the additional file containing all modules together. However, it pushes SPIR-V specific code into the generic part of the mlir-translate and slicing is potentially more error prone that just writing a single module after it was serialized.


Full diff: https://github.com/llvm/llvm-project/pull/152678.diff

2 Files Affected:

  • (modified) mlir/include/mlir/Target/SPIRV/Serialization.h (+8)
  • (modified) mlir/lib/Target/SPIRV/TranslateRegistration.cpp (+39-5)
diff --git a/mlir/include/mlir/Target/SPIRV/Serialization.h b/mlir/include/mlir/Target/SPIRV/Serialization.h
index 225777e25d607..feb80b7d0970e 100644
--- a/mlir/include/mlir/Target/SPIRV/Serialization.h
+++ b/mlir/include/mlir/Target/SPIRV/Serialization.h
@@ -27,6 +27,14 @@ struct SerializationOptions {
   bool emitSymbolName = true;
   /// Whether to emit `OpLine` location information for SPIR-V ops.
   bool emitDebugInfo = false;
+  /// Whether to store a module to an additional file during
+  /// serialization. This is used to store the SPIR-V module to the
+  /// file in addition to writing it to `os` passed from the calling
+  /// tool. This saved file is later used for validation.
+  bool saveModuleForValidation = false;
+  /// A prefix prepended to the file used when `saveModuleForValidation`
+  /// is set to `true`.
+  std::string validationFilePrefix = "";
 };
 
 /// Serializes the given SPIR-V `module` and writes to `binary`. On failure,
diff --git a/mlir/lib/Target/SPIRV/TranslateRegistration.cpp b/mlir/lib/Target/SPIRV/TranslateRegistration.cpp
index ac338d555e320..5272c63db6831 100644
--- a/mlir/lib/Target/SPIRV/TranslateRegistration.cpp
+++ b/mlir/lib/Target/SPIRV/TranslateRegistration.cpp
@@ -23,6 +23,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/SourceMgr.h"
+#include "llvm/Support/ToolOutputFile.h"
 
 using namespace mlir;
 
@@ -76,24 +77,57 @@ void registerFromSPIRVTranslation() {
 // Serialization registration
 //===----------------------------------------------------------------------===//
 
-static LogicalResult serializeModule(spirv::ModuleOp module,
-                                     raw_ostream &output) {
+// Static variable is probably not ideal, but it lets us have unique files names
+// without taking additional parameters from `mlir-translate`.
+static size_t validationFileCounter = 0;
+
+static LogicalResult
+serializeModule(spirv::ModuleOp module, raw_ostream &output,
+                const spirv::SerializationOptions &options) {
+
   SmallVector<uint32_t, 0> binary;
   if (failed(spirv::serialize(module, binary)))
     return failure();
 
-  output.write(reinterpret_cast<char *>(binary.data()),
-               binary.size() * sizeof(uint32_t));
+  size_t sizeInBytes = binary.size() * sizeof(uint32_t);
+
+  output.write(reinterpret_cast<char *>(binary.data()), sizeInBytes);
+
+  if (options.saveModuleForValidation) {
+    std::string errorMessage;
+    std::string filename =
+        options.validationFilePrefix + std::to_string(validationFileCounter++);
+    auto validationOutput = openOutputFile(filename, &errorMessage);
+    if (!validationOutput) {
+      llvm::errs() << errorMessage << "\n";
+      return failure();
+    }
+    validationOutput->os().write(reinterpret_cast<char *>(binary.data()),
+                                 sizeInBytes);
+    validationOutput->keep();
+  }
 
   return mlir::success();
 }
 
 namespace mlir {
 void registerToSPIRVTranslation() {
+  static llvm::cl::opt<std::string> validationFilesPrefix(
+      "spirv-save-validation-files-with-prefix",
+      llvm::cl::desc(
+          "When non-empty string is passed each serialized SPIR-V module is "
+          "saved to an additional file that starts with the given prefix. This "
+          "is used to generate separate binaries for validation, where "
+          "`--split-input-file` normally combines all outputs into one. The "
+          "one combined output (`-o`) is still written."),
+      llvm::cl::init(""));
+
   TranslateFromMLIRRegistration toBinary(
       "serialize-spirv", "serialize SPIR-V dialect",
       [](spirv::ModuleOp module, raw_ostream &output) {
-        return serializeModule(module, output);
+        return serializeModule(module, output,
+                               {true, false, (validationFilesPrefix != ""),
+                                validationFilesPrefix});
       },
       [](DialectRegistry &registry) {
         registry.insert<spirv::SPIRVDialect>();

@IgWod-IMG
Copy link
Contributor Author

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a global variable counter is inherently unsafe

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a lit test so that this code gets executed? Even if we don't use spirv-val, we can at least make sure the expected number of files is produced (using requires: shell and ls dir | wc -l)

@IgWod-IMG
Copy link
Contributor Author

IgWod-IMG commented Aug 11, 2025

I agree the global counter solution was not great. I wanted to keep it simple, but in the hindsight, I realised how many issues I haven't considered. I have updated the code to use llvm::sys::fs::createUniqueFile (link) that I have only found. It is thread safe and generates a unique name. I've just pushed a relevant change, and I'll work on adding a test now.

@IgWod-IMG
Copy link
Contributor Author

I have added a test and an example in the comment. Please let me know if there is anything else. Once that’s merged, I’ll start updating Target tests to use it for validation checks.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just a few small issues

@IgWod-IMG
Copy link
Contributor Author

@mahabadm @davidegrohmann Any more comments or can I merge it once green?

// RUN: rm -rf %t
// RUN: mkdir %t && mlir-translate --serialize-spirv --no-implicit-module \
// RUN: --split-input-file --spirv-save-validation-files-with-prefix=%t/foo %s \
// RUN: && ls %t | wc -l | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is wc -l here intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ls prints all the files and wc -l counts how many files were printed, then // CHECK: 4 checks that the number printed by wc is 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok all good then

@IgWod-IMG IgWod-IMG enabled auto-merge (squash) August 12, 2025 12:48
@IgWod-IMG IgWod-IMG merged commit 0f346a4 into llvm:main Aug 12, 2025
10 checks passed
@IgWod-IMG IgWod-IMG deleted the img_serialize-into-files branch August 12, 2025 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants