Skip to content

Conversation

bcardosolopes
Copy link
Member

For a total of 20 attributes, 18 debug information related + 2 regular ones (loop and alias_scope).

Quick background on how this work: if a given attribute isn't supported, by default its textual form is dumped into the bytecode. In order to get proper encoding, an attribute needs a tablegen description of it and its element. There's an additional rule here: if an attribute is only used by another attribute, it's user need also to have an encoding in order for it to be encoded. (e.g. DICompileUnitAttr only gets encoded while in DISubprogramAttr if the later also has an encoded form), otherwise text is used. For this reason, this PR does a bunch at the same time, otherwise there isn't really much to test (easy to break it down if needed though).

The PR is tested against some of our internal apps, successfully round-tripping around 14Gb of llvm dialect text. Some interesting findings include a 800K mlir textual file that used to become 1.2G in bytecode format - now down to 100K due to proper encoding of debug info attributes.

In the future we should find a way to merge this together in the attribute definitions (perhaps autogenerate the entries from LLVM attribute descriptions), seems like we can benefit from the boilerplate. It's not clear yet how to solve some of the tablegen issues; some fields require manual translation of flag values using LocalVar, others require custom getters, etc. Ideas on that front are welcome.

A next natural step here is to add type support, LLVM structs can also lead to non-neglible disk footprint.

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2025

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Bruno Cardoso Lopes (bcardosolopes)

Changes

For a total of 20 attributes, 18 debug information related + 2 regular ones (loop and alias_scope).

Quick background on how this work: if a given attribute isn't supported, by default its textual form is dumped into the bytecode. In order to get proper encoding, an attribute needs a tablegen description of it and its element. There's an additional rule here: if an attribute is only used by another attribute, it's user need also to have an encoding in order for it to be encoded. (e.g. DICompileUnitAttr only gets encoded while in DISubprogramAttr if the later also has an encoded form), otherwise text is used. For this reason, this PR does a bunch at the same time, otherwise there isn't really much to test (easy to break it down if needed though).

The PR is tested against some of our internal apps, successfully round-tripping around 14Gb of llvm dialect text. Some interesting findings include a 800K mlir textual file that used to become 1.2G in bytecode format - now down to 100K due to proper encoding of debug info attributes.

In the future we should find a way to merge this together in the attribute definitions (perhaps autogenerate the entries from LLVM attribute descriptions), seems like we can benefit from the boilerplate. It's not clear yet how to solve some of the tablegen issues; some fields require manual translation of flag values using LocalVar, others require custom getters, etc. Ideas on that front are welcome.

A next natural step here is to add type support, LLVM structs can also lead to non-neglible disk footprint.


Patch is 29.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162577.diff

9 Files Affected:

  • (modified) mlir/include/mlir/Dialect/LLVMIR/CMakeLists.txt (+4)
  • (added) mlir/include/mlir/Dialect/LLVMIR/LLVMDialectBytecode.td (+357)
  • (modified) mlir/lib/Dialect/LLVMIR/CMakeLists.txt (+2)
  • (modified) mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp (+3)
  • (added) mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.cpp (+155)
  • (added) mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.h (+27)
  • (added) mlir/test/Dialect/LLVMIR/bytecode.mlir (+69)
  • (modified) mlir/test/Dialect/LLVMIR/debuginfo.mlir (+1)
  • (modified) mlir/test/Dialect/LLVMIR/roundtrip.mlir (+3-2)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/CMakeLists.txt b/mlir/include/mlir/Dialect/LLVMIR/CMakeLists.txt
index 8d9474bf37894..c301e0b40e8fe 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/CMakeLists.txt
+++ b/mlir/include/mlir/Dialect/LLVMIR/CMakeLists.txt
@@ -48,6 +48,10 @@ mlir_tablegen(LLVMIntrinsicFromLLVMIRConversions.inc -gen-intr-from-llvmir-conve
 mlir_tablegen(LLVMConvertibleLLVMIRIntrinsics.inc -gen-convertible-llvmir-intrinsics)
 add_mlir_dialect_tablegen_target(MLIRLLVMIntrinsicConversionsIncGen)
 
+set(LLVM_TARGET_DEFINITIONS LLVMDialectBytecode.td)
+mlir_tablegen(LLVMDialectBytecode.cpp.inc -gen-bytecode -bytecode-dialect="LLVM")
+add_public_tablegen_target(MLIRLLVMDialectBytecodeIncGen)
+
 set(LLVM_TARGET_DEFINITIONS BasicPtxBuilderInterface.td)
 mlir_tablegen(BasicPtxBuilderInterface.h.inc -gen-op-interface-decls)
 mlir_tablegen(BasicPtxBuilderInterface.cpp.inc -gen-op-interface-defs)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMDialectBytecode.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMDialectBytecode.td
new file mode 100644
index 0000000000000..2474dece3f7c3
--- /dev/null
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMDialectBytecode.td
@@ -0,0 +1,357 @@
+//===-- LLVMDialectBytecode.td - LLVM bytecode defs --------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This is the LLVM bytecode reader/writer definition file.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_DIALECT_BYTECODE
+#define LLVM_DIALECT_BYTECODE
+
+include "mlir/IR/BytecodeBase.td"
+
+//===----------------------------------------------------------------------===//
+// Bytecode classes for attributes and types.
+//===----------------------------------------------------------------------===//
+
+def String :
+  WithParser <"succeeded($_reader.readString($_var))",
+  WithBuilder<"$_args",
+  WithPrinter<"$_writer.writeOwnedString($_getter)",
+  WithType   <"StringRef">>>>;
+
+class Attr<string type> : WithType<type, Attribute>;
+
+class OptionalAttribute<string type> :
+  WithParser <"succeeded($_reader.readOptionalAttribute($_var))",
+  WithPrinter<"$_writer.writeOptionalAttribute($_getter)",
+  WithType<type, Attribute>>>;
+
+class OptionalInt<string type> :
+  WithParser <"succeeded(readOptionalInt($_reader, $_var))",
+  WithPrinter<"writeOptionalInt($_writer, $_getter)",
+  WithType<"std::optional<" # type # ">", VarInt>>>;
+
+class OptionalArrayRef<string eltType> :
+  WithParser <"succeeded(readOptionalArrayRef<"
+    # eltType # ">($_reader, $_var))",
+  WithPrinter<"writeOptionalArrayRef<"
+    # eltType # ">($_writer, $_getter)",
+  WithType<"SmallVector<"
+    # eltType # ">", Attribute>>>;
+
+class EnumClassFlag<string flag, string getter> :
+    WithParser<"succeeded($_reader.readVarInt($_var))",
+    WithBuilder<"(" # flag # ")$_args",
+    WithPrinter<"$_writer.writeVarInt((uint64_t)$_name." # getter # ")",
+    WithType<"uint64_t", VarInt>>>>;
+
+//===----------------------------------------------------------------------===//
+// General notes
+// - For each attribute or type entry, the argument names should match
+//   LLVMAttrDefs.td
+// - The mnemonics are either LLVM or builtin MLIR attributes and types, but
+//   regular C++ types are also allowed to match builders and parsers.
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// AliasScopeAttr
+//===----------------------------------------------------------------------===//
+
+def AliasScopeAttr : DialectAttribute<(attr
+  Attr<"Attribute">:$id,
+  Attr<"AliasScopeDomainAttr">:$domain,
+  OptionalAttribute<"StringAttr">:$description
+)>;
+
+//===----------------------------------------------------------------------===//
+// DIBasicTypeAttr
+//===----------------------------------------------------------------------===//
+
+def DIBasicTypeAttr : DialectAttribute<(attr
+  VarInt:$tag,
+  String:$name,
+  VarInt:$sizeInBits,
+  VarInt:$encoding
+)>;
+
+//===----------------------------------------------------------------------===//
+// DIExpressionAttr, DIExpressionElemAttr
+//===----------------------------------------------------------------------===//
+
+def DIExpressionElemAttr : DialectAttribute<(attr
+  VarInt:$opcode,
+  OptionalArrayRef<"uint64_t">:$arguments
+)>;
+
+def DIExpressionAttr : DialectAttribute<(attr
+  OptionalArrayRef<"DIExpressionElemAttr">:$operations
+)>;
+
+//===----------------------------------------------------------------------===//
+// DIFileAttr
+//===----------------------------------------------------------------------===//
+
+def DIFileAttr : DialectAttribute<(attr
+  String:$name,
+  String:$directory
+)>;
+
+//===----------------------------------------------------------------------===//
+// DILocalVariableAttr
+//===----------------------------------------------------------------------===//
+
+def DILocalVariableAttr : DialectAttribute<(attr
+  Attr<"DIScopeAttr">:$scope, // Non-optional attribute
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$line,
+  VarInt:$arg,
+  VarInt:$alignInBits,
+  OptionalAttribute<"DITypeAttr">:$type,
+  EnumClassFlag<"DIFlags", "getFlags()">:$_rawflags,
+  LocalVar<"DIFlags", "(DIFlags)_rawflags">:$flags
+)> {
+  // DILocalVariableAttr direct getter uses a `StringRef` for `name`. Since the
+  // more direct getter is prefered during bytecode reading, force the base one
+  // and prevent crashes for empty `StringAttr`.
+  let cBuilder = "$_resultType::get(context, $_args)";
+}
+
+//===----------------------------------------------------------------------===//
+// DISubroutineTypeAttr
+//===----------------------------------------------------------------------===//
+
+def DISubroutineTypeAttr : DialectAttribute<(attr
+  VarInt:$callingConvention,
+  OptionalArrayRef<"DITypeAttr">:$types
+)>;
+
+//===----------------------------------------------------------------------===//
+// DICompileUnitAttr
+//===----------------------------------------------------------------------===//
+
+def DICompileUnitAttr : DialectAttribute<(attr
+  Attr<"DistinctAttr">:$id,
+  VarInt:$sourceLanguage,
+  Attr<"DIFileAttr">:$file,
+  OptionalAttribute<"StringAttr">:$producer,
+  Bool:$isOptimized,
+  EnumClassFlag<"DIEmissionKind", "getEmissionKind()">:$_rawEmissionKind,
+  LocalVar<"DIEmissionKind", "(DIEmissionKind)_rawEmissionKind">:$emissionKind,
+  EnumClassFlag<"DINameTableKind", "getNameTableKind()">:$_rawNameTableKind,
+  LocalVar<"DINameTableKind",
+           "(DINameTableKind)_rawNameTableKind">:$nameTableKind
+)>;
+
+//===----------------------------------------------------------------------===//
+// DISubprogramAttr
+//===----------------------------------------------------------------------===//
+
+def DISubprogramAttr : DialectAttribute<(attr
+  OptionalAttribute<"DistinctAttr">:$recId,
+  Bool:$isRecSelf,
+  OptionalAttribute<"DistinctAttr">:$id,
+  OptionalAttribute<"DICompileUnitAttr">:$compileUnit,
+  OptionalAttribute<"DIScopeAttr">:$scope, // TODO: DIScopeAttr
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"StringAttr">:$linkageName,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$line,
+  VarInt:$scopeLine,
+  EnumClassFlag<"DISubprogramFlags", "getSubprogramFlags()">:$_rawflags,
+  LocalVar<"DISubprogramFlags", "(DISubprogramFlags)_rawflags">:$subprogramFlags,
+  OptionalAttribute<"DISubroutineTypeAttr">:$type,
+  OptionalArrayRef<"DINodeAttr">:$retainedNodes,
+  OptionalArrayRef<"DINodeAttr">:$annotations
+)>;
+
+//===----------------------------------------------------------------------===//
+// DICompositeTypeAttr
+//===----------------------------------------------------------------------===//
+
+def DICompositeTypeAttr : DialectAttribute<(attr
+  OptionalAttribute<"DistinctAttr">:$recId,
+  Bool:$isRecSelf,
+  VarInt:$tag,
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$line,
+  OptionalAttribute<"DIScopeAttr">:$scope,
+  OptionalAttribute<"DITypeAttr">:$baseType,
+  EnumClassFlag<"DIFlags", "getFlags()">:$_rawflags,
+  LocalVar<"DIFlags", "(DIFlags)_rawflags">:$flags,
+  VarInt:$sizeInBits,
+  VarInt:$alignInBits,
+  OptionalArrayRef<"DINodeAttr">:$elements,
+  OptionalAttribute<"DIExpressionAttr">:$dataLocation,
+  OptionalAttribute<"DIExpressionAttr">:$rank,
+  OptionalAttribute<"DIExpressionAttr">:$allocated,
+  OptionalAttribute<"DIExpressionAttr">:$associated
+)>;
+
+//===----------------------------------------------------------------------===//
+// DIDerivedTypeAttr
+//===----------------------------------------------------------------------===//
+
+def DIDerivedTypeAttr : DialectAttribute<(attr
+  VarInt:$tag,
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"DITypeAttr">:$baseType,
+  VarInt:$sizeInBits,
+  VarInt:$alignInBits,
+  VarInt:$offsetInBits,
+  OptionalInt<"unsigned">:$dwarfAddressSpace,
+  OptionalAttribute<"DINodeAttr">:$extraData // TODO: DINodeAttr
+)>;
+
+//===----------------------------------------------------------------------===//
+// DIImportedEntityAttr
+//===----------------------------------------------------------------------===//
+
+def DIImportedEntityAttr : DialectAttribute<(attr
+  VarInt:$tag,
+  Attr<"DIScopeAttr">:$scope,
+  Attr<"DINodeAttr">:$entity,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$line,
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalArrayRef<"DINodeAttr">:$elements
+)>;
+
+//===----------------------------------------------------------------------===//
+// DIGlobalVariableAttr, DIGlobalVariableExpressionAttr
+//===----------------------------------------------------------------------===//
+
+def DIGlobalVariableAttr : DialectAttribute<(attr
+  OptionalAttribute<"DIScopeAttr">:$scope,
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"StringAttr">:$linkageName,
+  Attr<"DIFileAttr">:$file,
+  VarInt:$line,
+  Attr<"DITypeAttr">:$type,
+  Bool:$isLocalToUnit,
+  Bool:$isDefined,
+  VarInt:$alignInBits
+)>;
+
+def DIGlobalVariableExpressionAttr : DialectAttribute<(attr
+  Attr<"DIGlobalVariableAttr">:$var,
+  OptionalAttribute<"DIExpressionAttr">:$expr
+)>;
+
+//===----------------------------------------------------------------------===//
+// DILabelAttr
+//===----------------------------------------------------------------------===//
+
+def DILabelAttr : DialectAttribute<(attr
+  Attr<"DIScopeAttr">:$scope,
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$line
+)> {
+  // DILabelAttr direct getter uses a `StringRef` for `name`. Since the
+  // more direct getter is prefered during bytecode reading, force the base one
+  // and prevent crashes for empty `StringAttr`.
+  let cBuilder = "$_resultType::get(context, $_args)";
+}
+
+//===----------------------------------------------------------------------===//
+// DILexicalBlockAttr, DILexicalBlockFileAttr
+//===----------------------------------------------------------------------===//
+
+def DILexicalBlockAttr : DialectAttribute<(attr
+  Attr<"DIScopeAttr">:$scope,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$line,
+  VarInt:$column
+)>;
+
+def DILexicalBlockFileAttr : DialectAttribute<(attr
+  Attr<"DIScopeAttr">:$scope,
+  OptionalAttribute<"DIFileAttr">:$file,
+  VarInt:$discriminator
+)>;
+
+//===----------------------------------------------------------------------===//
+// DINamespaceAttr
+//===----------------------------------------------------------------------===//
+
+def DINamespaceAttr : DialectAttribute<(attr
+  OptionalAttribute<"StringAttr">:$name,
+  OptionalAttribute<"DIScopeAttr">:$scope,
+  Bool:$exportSymbols
+)>;
+
+//===----------------------------------------------------------------------===//
+// DISubrangeAttr
+//===----------------------------------------------------------------------===//
+
+def DISubrangeAttr : DialectAttribute<(attr
+  OptionalAttribute<"Attribute">:$count,
+  OptionalAttribute<"Attribute">:$lowerBound,
+  OptionalAttribute<"Attribute">:$upperBound,
+  OptionalAttribute<"Attribute">:$stride
+)>;
+
+//===----------------------------------------------------------------------===//
+// LoopAnnotationAttr
+//===----------------------------------------------------------------------===//
+
+def LoopAnnotationAttr : DialectAttribute<(attr
+  OptionalAttribute<"BoolAttr">:$disableNonforced,
+  OptionalAttribute<"LoopVectorizeAttr">:$vectorize, // TODO: LoopVectorizeAttr
+  OptionalAttribute<"LoopInterleaveAttr">:$interleave, // TODO: LoopInterleaveAttr
+  OptionalAttribute<"LoopUnrollAttr">:$unroll, // TODO: LoopUnrollAttr
+  OptionalAttribute<"LoopUnrollAndJamAttr">:$unrollAndJam, // TODO: LoopUnrollAndJamAttr
+  OptionalAttribute<"LoopLICMAttr">:$licm, // TODO: LoopLICMAttr
+  OptionalAttribute<"LoopDistributeAttr">:$distribute, // TODO: LoopDistributeAttr
+  OptionalAttribute<"LoopPipelineAttr">:$pipeline, // TODO: LoopPipelineAttr
+  OptionalAttribute<"LoopPeeledAttr">:$peeled, // TODO: LoopPeeledAttr
+  OptionalAttribute<"LoopUnswitchAttr">:$unswitch, // TODO: LoopUnswitchAttr
+  OptionalAttribute<"BoolAttr">:$mustProgress,
+  OptionalAttribute<"BoolAttr">:$isVectorized,
+  OptionalAttribute<"FusedLoc">:$startLoc,
+  OptionalAttribute<"FusedLoc">:$endLoc,
+  OptionalArrayRef<"AccessGroupAttr">:$parallelAccesses // TODO: AccessGroupAttr
+)>;
+
+//===----------------------------------------------------------------------===//
+// Attributes & Types with custom bytecode handling.
+//===----------------------------------------------------------------------===//
+
+def LLVMDialectAttributes : DialectAttributes<"LLVM"> {
+  let elems = [
+    AliasScopeAttr,
+    DIBasicTypeAttr,
+    DICompileUnitAttr,
+    DICompositeTypeAttr,
+    DIDerivedTypeAttr,
+    DIExpressionElemAttr,
+    DIExpressionAttr,
+    DIFileAttr,
+    DIGlobalVariableAttr,
+    DIGlobalVariableExpressionAttr,
+    DIImportedEntityAttr,
+    DILabelAttr,
+    DILexicalBlockAttr,
+    DILexicalBlockFileAttr,
+    DILocalVariableAttr,
+    DINamespaceAttr,
+    DISubprogramAttr,
+    DISubrangeAttr,
+    DISubroutineTypeAttr,
+    LoopAnnotationAttr
+  ];
+}
+
+def LLVMDialectTypes : DialectTypes<"LLVM"> {
+  let elems = [];
+}
+
+#endif // LLVM_DIALECT_BYTECODE
diff --git a/mlir/lib/Dialect/LLVMIR/CMakeLists.txt b/mlir/lib/Dialect/LLVMIR/CMakeLists.txt
index ec581ac7277e3..cc66face1c002 100644
--- a/mlir/lib/Dialect/LLVMIR/CMakeLists.txt
+++ b/mlir/lib/Dialect/LLVMIR/CMakeLists.txt
@@ -8,11 +8,13 @@ add_mlir_dialect_library(MLIRLLVMDialect
   IR/LLVMMemorySlot.cpp
   IR/LLVMTypes.cpp
   IR/LLVMTypeSyntax.cpp
+  IR/LLVMDialectBytecode.cpp
 
   ADDITIONAL_HEADER_DIRS
   ${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/LLVMIR
 
   DEPENDS
+  MLIRLLVMDialectBytecodeIncGen
   MLIRLLVMOpsIncGen
   MLIRLLVMTypesIncGen
   MLIRLLVMIntrinsicOpsIncGen
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
index 5d08cccb4faab..7ca09d9c943e0 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
@@ -29,6 +29,8 @@
 #include "llvm/IR/DataLayout.h"
 #include "llvm/Support/Error.h"
 
+#include "LLVMDialectBytecode.h"
+
 #include <numeric>
 #include <optional>
 
@@ -4237,6 +4239,7 @@ void LLVMDialect::initialize() {
   // Support unknown operations because not all LLVM operations are registered.
   allowUnknownOperations();
   declarePromisedInterface<DialectInlinerInterface, LLVMDialect>();
+  detail::addBytecodeInterface(this);
 }
 
 #define GET_OP_CLASSES
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.cpp
new file mode 100644
index 0000000000000..f06c764ab5f6d
--- /dev/null
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.cpp
@@ -0,0 +1,155 @@
+//===- LLVMDialectBytecode.cpp - LLVM Bytecode Implementation -------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "LLVMDialectBytecode.h"
+#include "mlir/Bytecode/BytecodeImplementation.h"
+#include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/LLVMIR/LLVMTypes.h"
+#include "mlir/IR/Diagnostics.h"
+#include "llvm/ADT/APFloat.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/TypeSwitch.h"
+#include <type_traits>
+
+using namespace mlir;
+using namespace mlir::LLVM;
+
+namespace {
+
+// Provide some forward declarations of the functions that will be generated by
+// the include below.
+static void write(DIExpressionElemAttr attribute,
+                  DialectBytecodeWriter &writer);
+static LogicalResult writeAttribute(Attribute attribute,
+                                    DialectBytecodeWriter &writer);
+
+//===--------------------------------------------------------------------===//
+// Optional ArrayRefs
+//
+// Note that both the writer and reader functions consider attributes to be
+// optional. This is because the attribute may be present by empty.
+//===--------------------------------------------------------------------===//
+
+template <class EntryTy>
+static void writeOptionalArrayRef(DialectBytecodeWriter &writer,
+                                  ArrayRef<EntryTy> storage) {
+  if (!storage.empty()) {
+    writer.writeOwnedBool(true);
+    writer.writeList(storage, [&](EntryTy val) {
+      if constexpr (std::is_base_of_v<Attribute, EntryTy>) {
+        (void)writer.writeOptionalAttribute(val);
+      } else if constexpr (std::is_integral_v<EntryTy>) {
+        (void)writer.writeVarInt(val);
+      } else
+        static_assert(true, "EntryTy not supported");
+    });
+  } else {
+    writer.writeOwnedBool(false);
+  }
+}
+
+template <class EntryTy>
+static LogicalResult readOptionalArrayRef(DialectBytecodeReader &reader,
+                                          SmallVectorImpl<EntryTy> &storage) {
+  bool isPresent = false;
+  if (failed(reader.readBool(isPresent)))
+    return failure();
+  // Nothing to do here, the array is empty.
+  if (!isPresent)
+    return success();
+
+  auto readOperations = [&]() -> FailureOr<EntryTy> {
+    EntryTy temp;
+    if constexpr (std::is_base_of_v<Attribute, EntryTy>) {
+      if (succeeded(reader.readOptionalAttribute(temp)))
+        return temp;
+    } else if constexpr (std::is_integral_v<EntryTy>) {
+      if (succeeded(reader.readVarInt(temp)))
+        return temp;
+    } else
+      static_assert(true, "EntryTy not supported");
+    return failure();
+  };
+
+  return reader.readList(storage, readOperations);
+}
+
+//===--------------------------------------------------------------------===//
+// Optional integral types
+//===--------------------------------------------------------------------===//
+
+template <class EntryTy>
+static void writeOptionalInt(DialectBytecodeWriter &writer,
+                             std::optional<EntryTy> storage) {
+  static_assert(std::is_integral_v<EntryTy>,
+                "EntryTy must be an integral type");
+  EntryTy val = storage.value_or(0);
+  writer.writeVarIntWithFlag(val, storage.has_value());
+}
+
+template <class EntryTy>
+static LogicalResult readOptionalInt(DialectBytecodeReader &reader,
+                                     std::optional<EntryTy> &storage) {
+  static_assert(std::is_integral_v<EntryTy>,
+                "EntryTy must be an integral type");
+  uint64_t result = 0;
+  bool flag = false;
+  if (failed(reader.readVarIntWithFlag(result, flag)))
+    return failure();
+  if (flag)
+    storage = (decltype(storage.value()))result;
+  else
+    storage = std::nullopt;
+  return success();
+}
+
+//===--------------------------------------------------------------------===//
+// Tablegen generated bytecode functions
+//===--------------------------------------------------------------------===//
+
+#include "mlir/Dialect/LLVMIR/LLVMDialectBytecode.cpp.inc"
+
+//===---------------...
[truncated]

@bcardosolopes bcardosolopes requested a review from Dinistro October 9, 2025 00:58
Copy link
Contributor

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding bytecode support!

What does happen if we will modify an attribute in the future. Will there be some kind of compilation error if we forget to update the bytecode equivalent?

Comment on lines +117 to +120
EnumClassFlag<"DIFlags", "getFlags()">:$_rawflags,
LocalVar<"DIFlags", "(DIFlags)_rawflags">:$flags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this trick somehow needed to convert from an integer to the actual enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, inspired by bytecode support in other dialects, far from pretty tho =(

Possible future work is to improve tablegen emitter to maybe do more about these, but AFAIK we can't do better right now with tablegen - one possibility is to write plain C++ for these attributes, but that doesn't sounds like much win either.

@bcardosolopes
Copy link
Member Author

Thanks for the fast review!

What does happen if we will modify an attribute in the future. Will there be some kind of compilation error if we forget to update the bytecode equivalent?

Not in the way we'd like (a nice tablegen choke perhaps) but because these tablegen generate call to the builders, we'd get an error for the builder call mismatch, which is not ideal but works (for most part?). Note that the first extra commit in this PR was due to exactly this: when bringing the work upstream, DICompositeTypeAttr had a different order and that was caught at build time.

Copy link
Contributor

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the left over comments (plus possibly using the roundtrip flag in the test).

It seems there are some left over comments from the first review that github kindly folded away (in this file mlir/lib/Dialect/LLVMIR/IR/LLVMDialectBytecode.cpp). And maybe giving the roundtrip flag a shot makes sense as well.

@@ -0,0 +1,81 @@
// RUN: mlir-opt -emit-bytecode %s | mlir-opt --mlir-print-debuginfo | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RUN: mlir-opt -emit-bytecode %s | mlir-opt --mlir-print-debuginfo | FileCheck %s
// RUN: mlir-opt -verify-roundtrip %s

It seems like we could use the roundtrip flag to check this. Looking at the implementation

static LogicalResult doVerifyRoundTrip(Operation *op,
the flag seems to check both the bytecode and the textual roundtrip.

The benefit of using this flag is that we don't need the check lines. If there is an easy way to verify that flag indeed detects bytecode failures - for example by introducing a bug - then the roundtrip flag would be preferable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roundtrip flag has been created specifically to validate bytecode.

@gysit
Copy link
Contributor

gysit commented Oct 10, 2025

Not in the way we'd like (a nice tablegen choke perhaps) but because these tablegen generate call to the builders, we'd get an error for the builder call mismatch, which is not ideal but works (for most part?). Note that the first extra commit in this PR was due to exactly this: when bringing the work upstream, DICompositeTypeAttr had a different order and that was caught at build time.

That is indeed not optimal. E.g. the builder may still match if the position of two integers is swapped. I guess we have to rely on roundtrip tests then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants