[mlir][spirv] Enable serializer to write SPIR-V modules into separate files #152678

IgWod-IMG · 2025-08-08T09:40:47Z

By default, mlir-translate writes all output into a single file even when --split-input-file is used. This is not an issue for text files as they can be easily split with an output separator. However, this causes issues with binary SPIR-V modules.

Firstly, a binary file with multiple modules is not a valid SPIR-V, but will be created if multiple modules are specified in the same file and separated by "// -----". This does not cause issues with MLIR internal tools but does not work with SPIRV-Tools.

Secondly, splitting binary files after serialization is non-trivial, when compared to text files, so using an external tool is not desirable.

This patch adds a SPIR-V serialization option that write SPIR-V modules to separate files in addition to writing them to the mlir-translate output file. This is not the ideal solution and ideally mlir-translate would allow generating multiple output files when --split-input-file is used, however adding such functionality is again non-trival due to how processing and splitting is done: function handles writing to a single os that are passed around, and number of split buffers is not known ahead of time. As such a I propose to have a SPIR-V internal option that will dump modules to files in the form they can be processed by spirv-val. The behaviour of the new added argument may be confusing, but benefits from being internal to SPIR-V target.

Alternatively, we could expose the spirv option in mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp, and slice the output file on the SPIR-V magic number, and not keep the file generated by default by mlir-translate. This would be a bit cleaner in API sense, as it would not generate the additional file containing all modules together. However, it pushes SPIR-V specific code into the generic part of the mlir-translate and slicing is potentially more error prone that just writing a single module after it was serialized.

… files By default, `mlir-translate` writes all output into a single file even when `--split-input-file` is used. This is not an issue for text files as they can be easily split with an output separator. However, this causes issues with binary SPIR-V modules. Firstly, a binary file with multiple modules is not a valid SPIR-V, but will be created if multiple modules are specified in the same file and separated by "// -----". This does not cause issues with MLIR internal tools but does not work with SPIRV-Tools. Secondly, splitting binary files after serialization is non-trivial, when compared to text files, so using an external tool is not desirable. This patch adds a SPIR-V serialization option that write SPIR-V modules to separate files in addition to writing them to the `mlir-translate` output file. This is not the ideal solution and ideally `mlir-translate` would allow generating multiple output files when `--split-input-file` is used, however adding such functionality is again non-trival due to how processing and splitting is done: function handles writing to a single `os` that are passed around, and number of split buffers is not known ahead of time. As such a I propose to have a SPIR-V internal option that will dump modules to files in the form they can be processed by `spirv-val`. The behaviour of the new added argument may be confusing, but benefits from being internal to SPIR-V target. Alternatively, we could expose the spirv option in `mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp`, and slice the output file on the SPIR-V magic number, and not keep the file generated by default by `mlir-translate`. This would be a bit cleaner in API sense, as it would not generate the additional file containing all modules together. However, it pushes SPIR-V specific code into the generic part of the `mlir-translate` and slicing is potentially more error prone that just writing a single module after it was serialized.

llvmbot · 2025-08-08T09:41:20Z

@llvm/pr-subscribers-mlir-spirv

@llvm/pr-subscribers-mlir

Author: Igor Wodiany (IgWod-IMG)

Changes

By default, mlir-translate writes all output into a single file even when --split-input-file is used. This is not an issue for text files as they can be easily split with an output separator. However, this causes issues with binary SPIR-V modules.

Firstly, a binary file with multiple modules is not a valid SPIR-V, but will be created if multiple modules are specified in the same file and separated by "// -----". This does not cause issues with MLIR internal tools but does not work with SPIRV-Tools.

Secondly, splitting binary files after serialization is non-trivial, when compared to text files, so using an external tool is not desirable.

This patch adds a SPIR-V serialization option that write SPIR-V modules to separate files in addition to writing them to the mlir-translate output file. This is not the ideal solution and ideally mlir-translate would allow generating multiple output files when --split-input-file is used, however adding such functionality is again non-trival due to how processing and splitting is done: function handles writing to a single os that are passed around, and number of split buffers is not known ahead of time. As such a I propose to have a SPIR-V internal option that will dump modules to files in the form they can be processed by spirv-val. The behaviour of the new added argument may be confusing, but benefits from being internal to SPIR-V target.

Alternatively, we could expose the spirv option in mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp, and slice the output file on the SPIR-V magic number, and not keep the file generated by default by mlir-translate. This would be a bit cleaner in API sense, as it would not generate the additional file containing all modules together. However, it pushes SPIR-V specific code into the generic part of the mlir-translate and slicing is potentially more error prone that just writing a single module after it was serialized.

Full diff: https://github.com/llvm/llvm-project/pull/152678.diff

2 Files Affected:

(modified) mlir/include/mlir/Target/SPIRV/Serialization.h (+8)
(modified) mlir/lib/Target/SPIRV/TranslateRegistration.cpp (+39-5)

diff --git a/mlir/include/mlir/Target/SPIRV/Serialization.h b/mlir/include/mlir/Target/SPIRV/Serialization.h
index 225777e25d607..feb80b7d0970e 100644
--- a/mlir/include/mlir/Target/SPIRV/Serialization.h
+++ b/mlir/include/mlir/Target/SPIRV/Serialization.h
@@ -27,6 +27,14 @@ struct SerializationOptions {
   bool emitSymbolName = true;
   /// Whether to emit `OpLine` location information for SPIR-V ops.
   bool emitDebugInfo = false;
+  /// Whether to store a module to an additional file during
+  /// serialization. This is used to store the SPIR-V module to the
+  /// file in addition to writing it to `os` passed from the calling
+  /// tool. This saved file is later used for validation.
+  bool saveModuleForValidation = false;
+  /// A prefix prepended to the file used when `saveModuleForValidation`
+  /// is set to `true`.
+  std::string validationFilePrefix = "";
 };
 
 /// Serializes the given SPIR-V `module` and writes to `binary`. On failure,
diff --git a/mlir/lib/Target/SPIRV/TranslateRegistration.cpp b/mlir/lib/Target/SPIRV/TranslateRegistration.cpp
index ac338d555e320..5272c63db6831 100644
--- a/mlir/lib/Target/SPIRV/TranslateRegistration.cpp
+++ b/mlir/lib/Target/SPIRV/TranslateRegistration.cpp
@@ -23,6 +23,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/SourceMgr.h"
+#include "llvm/Support/ToolOutputFile.h"
 
 using namespace mlir;
 
@@ -76,24 +77,57 @@ void registerFromSPIRVTranslation() {
 // Serialization registration
 //===----------------------------------------------------------------------===//
 
-static LogicalResult serializeModule(spirv::ModuleOp module,
-                                     raw_ostream &output) {
+// Static variable is probably not ideal, but it lets us have unique files names
+// without taking additional parameters from `mlir-translate`.
+static size_t validationFileCounter = 0;
+
+static LogicalResult
+serializeModule(spirv::ModuleOp module, raw_ostream &output,
+                const spirv::SerializationOptions &options) {
+
   SmallVector<uint32_t, 0> binary;
   if (failed(spirv::serialize(module, binary)))
     return failure();
 
-  output.write(reinterpret_cast<char *>(binary.data()),
-               binary.size() * sizeof(uint32_t));
+  size_t sizeInBytes = binary.size() * sizeof(uint32_t);
+
+  output.write(reinterpret_cast<char *>(binary.data()), sizeInBytes);
+
+  if (options.saveModuleForValidation) {
+    std::string errorMessage;
+    std::string filename =
+        options.validationFilePrefix + std::to_string(validationFileCounter++);
+    auto validationOutput = openOutputFile(filename, &errorMessage);
+    if (!validationOutput) {
+      llvm::errs() << errorMessage << "\n";
+      return failure();
+    }
+    validationOutput->os().write(reinterpret_cast<char *>(binary.data()),
+                                 sizeInBytes);
+    validationOutput->keep();
+  }
 
   return mlir::success();
 }
 
 namespace mlir {
 void registerToSPIRVTranslation() {
+  static llvm::cl::opt<std::string> validationFilesPrefix(
+      "spirv-save-validation-files-with-prefix",
+      llvm::cl::desc(
+          "When non-empty string is passed each serialized SPIR-V module is "
+          "saved to an additional file that starts with the given prefix. This "
+          "is used to generate separate binaries for validation, where "
+          "`--split-input-file` normally combines all outputs into one. The "
+          "one combined output (`-o`) is still written."),
+      llvm::cl::init(""));
+
   TranslateFromMLIRRegistration toBinary(
       "serialize-spirv", "serialize SPIR-V dialect",
       [](spirv::ModuleOp module, raw_ostream &output) {
-        return serializeModule(module, output);
+        return serializeModule(module, output,
+                               {true, false, (validationFilesPrefix != ""),
+                                validationFilesPrefix});
       },
       [](DialectRegistry &registry) {
         registry.insert<spirv::SPIRVDialect>();

IgWod-IMG · 2025-08-08T09:43:57Z

CC: @davidegrohmann @mahabadm @Hardcode84

mlir/lib/Target/SPIRV/TranslateRegistration.cpp

kuhar

I think having a global variable counter is inherently unsafe

mlir/include/mlir/Target/SPIRV/Serialization.h

mlir/lib/Target/SPIRV/TranslateRegistration.cpp

kuhar

Can you also add a lit test so that this code gets executed? Even if we don't use spirv-val, we can at least make sure the expected number of files is produced (using requires: shell and ls dir | wc -l)

IgWod-IMG · 2025-08-11T13:00:14Z

I agree the global counter solution was not great. I wanted to keep it simple, but in the hindsight, I realised how many issues I haven't considered. I have updated the code to use llvm::sys::fs::createUniqueFile (link) that I have only found. It is thread safe and generates a unique name. I've just pushed a relevant change, and I'll work on adding a test now.

IgWod-IMG · 2025-08-11T14:08:39Z

I have added a test and an example in the comment. Please let me know if there is anything else. Once that’s merged, I’ll start updating Target tests to use it for validation checks.

mlir/include/mlir/Target/SPIRV/Serialization.h

kuhar

Looks good overall, just a few small issues

mlir/test/Target/SPIRV/mlir-translate.mlir

mlir/lib/Target/SPIRV/TranslateRegistration.cpp

mlir/test/Target/SPIRV/mlir-translate.mlir

IgWod-IMG · 2025-08-11T16:58:00Z

@mahabadm @davidegrohmann Any more comments or can I merge it once green?

davidegrohmann · 2025-08-12T07:38:39Z

mlir/test/Target/SPIRV/mlir-translate.mlir

+// RUN: rm -rf %t
+// RUN: mkdir %t && mlir-translate --serialize-spirv --no-implicit-module \
+// RUN: --split-input-file --spirv-save-validation-files-with-prefix=%t/foo %s \
+// RUN: && ls %t | wc -l | FileCheck %s


Is wc -l here intended?

Yes, ls prints all the files and wc -l counts how many files were printed, then // CHECK: 4 checks that the number printed by wc is 4.

Ok all good then

IgWod-IMG requested review from antiagainst and kuhar as code owners August 8, 2025 09:40

llvmbot added mlir:spirv mlir labels Aug 8, 2025

missing header for mlir-capi-execution-engine-test

cbfbd1b

davidegrohmann reviewed Aug 8, 2025

View reviewed changes

mlir/lib/Target/SPIRV/TranslateRegistration.cpp Outdated Show resolved Hide resolved

mlir/lib/Target/SPIRV/TranslateRegistration.cpp Outdated Show resolved Hide resolved

Check if prefix dir exists

c2bda5a

kuhar requested changes Aug 8, 2025

View reviewed changes

kuhar reviewed Aug 8, 2025

View reviewed changes

IgWod-IMG added 2 commits August 11, 2025 10:20

Address minor comments

625308d

Use createUniqueFile

dc922ea

IgWod-IMG added 2 commits August 11, 2025 13:50

Add test

6c84de6

Update comment

2462ceb

davidegrohmann reviewed Aug 11, 2025

View reviewed changes

mlir/include/mlir/Target/SPIRV/Serialization.h Outdated Show resolved Hide resolved

kuhar reviewed Aug 11, 2025

View reviewed changes

Address more feedback

ad50d49

kuhar approved these changes Aug 11, 2025

View reviewed changes

mlir/lib/Target/SPIRV/TranslateRegistration.cpp Outdated Show resolved Hide resolved

mlir/test/Target/SPIRV/mlir-translate.mlir Outdated Show resolved Hide resolved

Simplify return and break cmd

45453b1

davidegrohmann reviewed Aug 12, 2025

View reviewed changes

IgWod-IMG enabled auto-merge (squash) August 12, 2025 12:48

IgWod-IMG merged commit 0f346a4 into llvm:main Aug 12, 2025
10 checks passed

IgWod-IMG deleted the img_serialize-into-files branch August 12, 2025 12:48

IgWod-IMG mentioned this pull request Aug 28, 2025

spirv-val: Allow spir-val to process directories KhronosGroup/SPIRV-Tools#6276

Closed

[mlir][spirv] Enable serializer to write SPIR-V modules into separate files #152678

[mlir][spirv] Enable serializer to write SPIR-V modules into separate files #152678

Uh oh!

Conversation

IgWod-IMG commented Aug 8, 2025

Uh oh!

llvmbot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IgWod-IMG commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kuhar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IgWod-IMG commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IgWod-IMG commented Aug 11, 2025

Uh oh!

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IgWod-IMG commented Aug 11, 2025

Uh oh!

davidegrohmann Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

IgWod-IMG Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

davidegrohmann Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Aug 8, 2025 •

edited

Loading

kuhar left a comment •

edited

Loading

IgWod-IMG commented Aug 11, 2025 •

edited

Loading