[uArch][XeGPU] Add XeGPU uArch definition. #153706

mshahneo · 2025-08-14T22:34:52Z

The uArch infrastructure provides:

A set data structures to represent, uArch and it's necessary components (e.g., instructions, register-files, caches).
A set of utility interfaces that are common to a family of ops (e.g., mma ops, 2DBlockIO ops). The implementation of these interfaces are provided by the specific instructions. Each family of ops provides these 5 common APIs. However, some family of ops may have more utility APIs. The common 5 APIs are:
- getSupportedShapes
- getSupportedTypes
- checkSupportedShapesAndTypes
- checkSupportedTypes
- validate

Add support for PVC and BMG architectures.
Add support for DPAS instruction.

llvmbot · 2025-08-14T22:35:22Z

@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Md Abdullah Shahneous Bari (mshahneo)

Changes

The uArch infrastructure provides:

A set data structures to represent, uArch and it's necessary components (e.g., instructions, register-files, caches).
A set of utility interfaces that are common to a family of ops (e.g., mma ops, 2DBlockIO ops). The implementation of these interfaces are provided by the specific instructions. Each family of ops provides these 5 common APIs. However, some family of ops may have more utility APIs. The common 5 APIs are:
- getSupportedShapes
- getSupportedTypes
- checkSupportedShapesAndTypes
- checkSupportedTypes
- validate

Add support for PVC and BMG architectures.
Add support for DPAS instruction.

Patch is 32.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153706.diff

12 Files Affected:

(added) mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h (+182)
(added) mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h (+266)
(added) mlir/include/mlir/Dialect/XeGPU/uArch/uArchInterfaces.h (+75)
(modified) mlir/lib/Dialect/LLVMIR/CMakeLists.txt (+1)
(modified) mlir/lib/Dialect/LLVMIR/IR/XeVMDialect.cpp (+9)
(modified) mlir/lib/Dialect/XeGPU/CMakeLists.txt (+1)
(modified) mlir/lib/Dialect/XeGPU/IR/CMakeLists.txt (+1)
(modified) mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp (+9)
(modified) mlir/lib/Dialect/XeGPU/Transforms/CMakeLists.txt (+1)
(modified) mlir/lib/Dialect/XeGPU/Utils/CMakeLists.txt (+2-1)
(added) mlir/lib/Dialect/XeGPU/uArch/CMakeLists.txt (+11)
(added) mlir/lib/Dialect/XeGPU/uArch/IntelGpuXe2.cpp (+197)

diff --git a/mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h b/mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
new file mode 100644
index 0000000000000..9179838f8c148
--- /dev/null
+++ b/mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h
@@ -0,0 +1,182 @@
+//===--- IntelGpuXe2.h ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Xe2 uArch definition.
+///
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_DIALECT_XEGPU_UARCH_INTEL_GPU_XE2_H
+#define MLIR_DIALECT_XEGPU_UARCH_INTEL_GPU_XE2_H
+
+#include "mlir/Dialect/XeGPU/uArch/uArchInterfaces.h"
+#include "mlir/IR/BuiltinTypes.h"
+#include "mlir/IR/TypeUtilities.h"
+#include <map>
+#include <string>
+#include <vector>
+
+namespace mlir {
+namespace xegpu {
+namespace uArch {
+namespace Xe2Plus {
+struct XeCoreInfo {
+  uint32_t num_threads;
+  SharedMemory shared_memory;
+  uint32_t num_vector_units;
+  uint32_t num_matrix_units;
+
+  // Constructor
+  XeCoreInfo(uint32_t num_threads, const SharedMemory &shared_memory,
+             uint32_t num_vector_units, uint32_t num_matrix_units)
+      : num_threads(num_threads), shared_memory(shared_memory),
+        num_vector_units(num_vector_units), num_matrix_units(num_matrix_units) {
+  }
+};
+
+struct Xe2Plus : public uArch {
+  XeCoreInfo xe_core;
+  Xe2Plus(
+      const std::string &archName, const std::string &archDescription,
+      const XeCoreInfo &xeCore,
+      const std::vector<uArchHierarchyComponent> &hierarchy = {},
+      const std::map<std::string, RegisterFileInfo> &regInfo = {},
+      const std::vector<CacheInfo> &cacheInfo = {},
+      const std::map<std::string, std::shared_ptr<Instruction>> &instrs = {})
+      : uArch(archName, archDescription, hierarchy, regInfo, cacheInfo, instrs),
+        xe_core(xeCore) {}
+};
+
+// struct to represent DPAS instruction
+struct DPASInstruction : public Instruction, public MMAInstructionInterface {
+  DPASInstruction()
+      : Instruction("dpas",                   // name
+                    "Dot Product Accumulate") // description
+  {}
+
+  // Override all virtuals from MatrixOpInterface
+  virtual std::vector<std::pair<uint32_t, uint32_t>>
+  getSupportedShapes(mlir::Type dataType, MMAOpndEnum matrixType) override;
+  virtual std::vector<mlir::Type>
+  getSupportedTypes(MLIRContext &context, MMAOpndEnum matrixType) override;
+  virtual bool
+  checkSupportedShapesAndTypes(std::pair<uint32_t, uint32_t> AShape,
+                               std::pair<uint32_t, uint32_t> BShape,
+                               std::pair<uint32_t, uint32_t> CShape,
+                               std::pair<uint32_t, uint32_t> DShape,
+                               mlir::Type AType, mlir::Type BType,
+                               mlir::Type CType, mlir::Type DType) override;
+  virtual bool checkSupportedTypes(mlir::Type AType, mlir::Type BType,
+                                   mlir::Type CType, mlir::Type DType) override;
+  virtual bool validate(std::pair<uint32_t, uint32_t> AShape,
+                        std::pair<uint32_t, uint32_t> BShape,
+                        std::pair<uint32_t, uint32_t> CShape,
+                        std::pair<uint32_t, uint32_t> DShape, mlir::Type AType,
+                        mlir::Type BType, mlir::Type CType,
+                        mlir::Type DType) override;
+  virtual std::vector<uint32_t> getSupportedM(mlir::Type type) override;
+  virtual std::vector<uint32_t> getSupportedK(mlir::Type type) override;
+  virtual std::vector<uint32_t> getSupportedN(mlir::Type type) override;
+};
+
+namespace PVCuArch {
+struct PVCuArch : public Xe2Plus {
+  // Maintaines ownership of the instructions owned by PVUarch
+  std::vector<std::shared_ptr<Instruction>> owned_instructions;
+  PVCuArch()
+      : Xe2Plus("pvc",                        // archName
+                "Ponte Vecchio Architecture", // archDescription
+                XeCoreInfo(8, SharedMemory(512 * 1024, 4), 8, 8), // xeCore
+                {/* register_file_info */}, // Optional: empty
+                {/* cache_info */},         // Optional: empty
+                {/* instructions */}        // Optional: empty
+        ) {
+    // Initialize uArchHierarchy
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("thread", 0));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeCore", 8));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeSlice", 16));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeStack", 4));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("gpu", 2));
+    // Intialize register file info
+    // GRF
+    this->register_file_info.emplace(
+        "GRF",
+        RegisterFileInfo(64 * 1024,          // size in bits
+                         {"small", "large"}, // GRF modes
+                         {128, 256},         // registers per thread per mode
+                         0,                  // number of banks
+                         0                   // bank size
+                         ));
+    // Initialize cache info
+    // L1 cache, XeCore level
+    this->cache_info.push_back(
+        CacheInfo(512 * 1024, 64, this->uArch_hierarchy[1]));
+    // L3 cache, XeStack level
+    this->cache_info.push_back(
+        CacheInfo(512 * 1024, 64, this->uArch_hierarchy[3]));
+
+    // Add the instructions
+    auto dpas = std::make_shared<DPASInstruction>();
+    instructions.emplace(dpas->getName(), dpas);
+    // instructions[dpas->name] = dpas.get();
+    owned_instructions.push_back(dpas);
+  }
+};
+} // namespace PVCuArch
+
+namespace BMGuArch {
+struct BMGuArch : public Xe2Plus {
+  // Maintaines ownership of the instructions owned by PVUarch
+  std::vector<std::shared_ptr<Instruction>> owned_instructions;
+  BMGuArch()
+      : Xe2Plus("bmg",                     // archName
+                "Battlemage Architecture", // archDescription
+                XeCoreInfo(8, SharedMemory(256 * 1024, 4), 8, 8), // xeCore
+                {/* register_file_info */}, // Optional: empty
+                {/* cache_info */},         // Optional: empty
+                {/* instructions */},       // Optional: empty
+                {/* restrictions */}        // Optional: empty
+        ) {
+    // Initialize uArchHierarchy
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("thread", 0));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeCore", 8));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeSlice", 4));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeStack", 5));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("gpu", 1));
+    // Intialize register file info
+    // GRF
+    this->register_file_info["GRF"] =
+        RegisterFileInfo(64 * 1024,          // size in bits
+                         {"small", "large"}, // GRF modes
+                         {128, 256},         // registers per thread per mode
+                         0,                  // number of banks
+                         0                   // bank size
+        );
+    // Initialize cache info
+    // L1 cache, XeCore level
+    this->cache_info.push_back(
+        CacheInfo(256 * 1024, 64, this->uArch_hierarchy[1]));
+    // L3 cache, XeStack level
+    this->cache_info.push_back(
+        CacheInfo(18 * 1024 * 1024, 256, this->uArch_hierarchy[3]));
+
+    // Add the instructions
+    auto dpas = std::make_shared<DPASInstruction>();
+    instructions.emplace(dpas->getName(), dpas);
+    // instructions[dpas->name] = dpas.get();
+    owned_instructions.push_back(dpas);
+  }
+};
+} // namespace BMGuArch
+
+} // namespace Xe2Plus
+} // namespace uArch
+} // namespace xegpu
+} // namespace mlir
+
+#endif // MLIR_DIALECT_XEGPU_UARCH_INTEL_GPU_XE2_H
diff --git a/mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h b/mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h
new file mode 100644
index 0000000000000..9bda86df2aff9
--- /dev/null
+++ b/mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h
@@ -0,0 +1,266 @@
+//===--- uArch.h ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Base uArch definition for different architectures.
+///
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_DIALECT_XEGPU_UARCH_BASE_H
+#define MLIR_DIALECT_XEGPU_UARCH_BASE_H
+
+#include <any>
+#include <functional>
+#include <iostream>
+#include <map>
+#include <mutex>
+#include <shared_mutex>
+#include <tuple>
+
+#include "mlir/IR/Types.h"
+
+namespace mlir {
+namespace xegpu {
+namespace uArch {
+// Architecture HW component hierarchy to present thread, core, socket ...
+struct uArchHierarchyComponent {
+  std::string name = ""; // optional name of the hierarchy component
+  // no. of lower hierarchy component it contains, e.g., for PVC XeCore it
+  // contains 8 threads, so no_of_component=8
+  uint32_t no_of_component;
+  // Constructor
+  uArchHierarchyComponent(const std::string &name, uint32_t no_of_component)
+      : name(name), no_of_component(no_of_component) {}
+};
+
+// An enum class to represent the scope of an instruction
+enum class InstructionScopeEnum { WorkItem, Subgroup, Workgroup, Cluster };
+
+// A struct to represent basic information about an instruction
+// This struct is used to represent the information about an instruction in the
+// uArch The information includes:
+// - the name of the instruction,
+// - the description of the instruction
+// - the scope of the instruction,
+//
+// The information is represented as strings
+// For example, the information about an instruction can be represented as:
+// Instruction instr = {"dpas", "Dot Product Accumulate Systolic  (DPAS) is a
+// matrix multiply-add operation", "subgroup"};
+
+// The primary purpose of the Instruction struct is to provide a generic way to
+// represent information about an instruction and to use this information to
+// generate the uArch. Specifc instruction in a uArch can inherit from this
+// struct and add more fields as needed
+
+struct Instruction {
+  // @TODO: Add more fields as needed
+  Instruction(std::string name, std::string desc)
+      : name(std::move(name)), description(std::move(desc)) {}
+
+  virtual ~Instruction() = default;
+  // Get methods
+  std::string getName() { return name; }
+  std::string getDescription() { return description; }
+  InstructionScopeEnum getScope() { return scope; }
+
+protected:
+  std::string name;
+  std::string description;
+  InstructionScopeEnum scope;
+};
+
+// A struct to represent register file information
+struct RegisterFileInfo {
+  // Constructor
+  RegisterFileInfo() = default;
+  RegisterFileInfo(uint32_t size, const std::vector<std::string> &mode,
+                   const std::vector<uint32_t> &numRegs, uint32_t num_banks,
+                   uint32_t bank_size)
+      : size(size), mode(mode), num_regs_per_thread_per_mode(numRegs),
+        num_banks(num_banks), bank_size(bank_size) {}
+
+  // Get methods
+  uint32_t getSize() const { return size; }
+
+  const std::vector<std::string> &getModes() const { return mode; }
+
+  const std::vector<uint32_t> &getNumRegsPerThreadPerMode() const {
+    return num_regs_per_thread_per_mode;
+  }
+
+  uint32_t getNumBanks() const { return num_banks; }
+
+  uint32_t getBankSize() const { return bank_size; }
+
+protected:
+  uint32_t size;                 // size per register in bits
+  std::vector<std::string> mode; // e.g., "small", "large" GRF modes
+  std::vector<uint32_t>
+      num_regs_per_thread_per_mode; // number of registers per thread per mode
+  uint32_t num_banks;
+  uint32_t bank_size;
+};
+
+// A struct to represent cache information
+
+struct CacheInfo {
+  // Constructor
+  CacheInfo(uint32_t size, uint32_t line_size,
+            const uArchHierarchyComponent &component)
+      : size(size), line_size(line_size), component(component) {}
+
+  virtual ~CacheInfo() = default;
+
+  // Get methods
+  uint32_t getSize() const { return size; }
+  uint32_t getLineSize() const { return line_size; }
+  const uArchHierarchyComponent &getComponent() const { return component; }
+
+protected:
+  uint32_t size;
+  uint32_t line_size;
+  // At which component level the cache is shared
+  uArchHierarchyComponent component;
+
+  // @TODO: Add more fields as needed (e.g., associativity, num_banks,
+  // bank_size, num_ports, port_width, bank_conflicts)
+};
+
+// A struct to represent the uArch
+// This struct is used to represent the microarchitecture of a target device
+// The uArch includes:
+// - the name of the uArch,
+// - the description of the uArch,
+// - uArch hierarchy
+// - Rgister File information
+// - Cache information
+// - the set of instructions supported by the uArch,
+struct uArch {
+  // Constructor
+  uArch() = default;
+  uArch(const std::string &name, const std::string &description,
+        const std::vector<uArchHierarchyComponent> &uArch_hierarchy = {},
+        const std::map<std::string, RegisterFileInfo> &register_file_info = {},
+        const std::vector<CacheInfo> &cache_info = {},
+        const std::map<std::string, std::shared_ptr<Instruction>>
+            &instructions = {})
+      : name(name), description(description), uArch_hierarchy(uArch_hierarchy),
+        register_file_info(register_file_info), cache_info(cache_info),
+        instructions(instructions) {}
+
+  // Get methods
+  const std::string &getName() const { return name; }
+
+  const std::string &getDescription() const { return description; }
+
+  const std::vector<uArchHierarchyComponent> &getHierarchy() const {
+    return uArch_hierarchy;
+  }
+
+  const std::map<std::string, RegisterFileInfo> &getRegisterFileInfo() const {
+    return register_file_info;
+  }
+
+  const std::vector<CacheInfo> &getCacheInfo() const { return cache_info; }
+
+  const std::map<std::string, std::shared_ptr<Instruction>> &
+  getInstructions() const {
+    return instructions;
+  }
+
+  // Get the name of the supported instruction names for that
+  // architecture. It returns the names of the instructions added to the uArch.
+  std::vector<std::string> getSupportedInstructionNames() const {
+    std::vector<std::string> instructionNames;
+    for (const auto &inst : instructions) {
+      instructionNames.push_back(inst.first);
+    }
+    return instructionNames;
+  }
+
+  // Checks if an instruction is supported in this uArch
+  bool checkSupportedInstruction(const std::string &instructionName) const {
+    return instructions.find(instructionName) != instructions.end();
+  }
+
+protected:
+  std::string name; // Similar to target triple
+  std::string description;
+  std::vector<uArchHierarchyComponent> uArch_hierarchy;
+  std::map<std::string, RegisterFileInfo> register_file_info;
+  std::vector<CacheInfo> cache_info;
+  std::map<std::string, std::shared_ptr<Instruction>> instructions;
+};
+
+// A struct to represent shared memory information
+struct SharedMemory {
+  // Constructor
+  SharedMemory(uint32_t size, uint32_t alignment)
+      : size(size), alignment(alignment) {}
+
+  // Getters
+  uint32_t getSize() const { return size; }
+  uint32_t getAlignment() const { return alignment; }
+
+protected:
+  uint32_t size;      // in bytes
+  uint32_t alignment; // in bytes
+  // @TODO: Add more fields as needed (e.g., latency, throughput, bandwidth)
+};
+
+struct uArchMap {
+public:
+  // Singleton instance
+  static uArchMap &instance() {
+    static uArchMap instance;
+    return instance;
+  }
+
+  // Insert or update a key-value pair
+  void insert(const std::string &key, std::shared_ptr<uArch> value) {
+    std::unique_lock<std::shared_mutex> lock(mutex_);
+    // map_[key] = std::move(value); // safe to overwrite
+    map_.emplace(key, std::move(value)); // safe to overwrite
+  }
+
+  // Get a value by key (concurrent safe read)
+  std::shared_ptr<uArch> get(const std::string &key) const {
+    std::shared_lock<std::shared_mutex> lock(mutex_);
+    auto it = map_.find(key);
+    if (it != map_.end())
+      return it->second;
+    return nullptr;
+  }
+
+  // Check if a key exists
+  bool contains(const std::string &key) const {
+    std::shared_lock<std::shared_mutex> lock(mutex_);
+    return map_.find(key) != map_.end();
+  }
+
+  // Remove a key
+  bool erase(const std::string &key) {
+    std::unique_lock<std::shared_mutex> lock(mutex_);
+    return map_.erase(key) > 0;
+  }
+
+private:
+  uArchMap() = default;
+  uArchMap(const uArchMap &) = delete;
+  uArchMap &operator=(const uArchMap &) = delete;
+
+  mutable std::shared_mutex mutex_;
+  std::map<std::string, std::shared_ptr<uArch>> map_;
+};
+
+} // namespace uArch
+} // namespace xegpu
+} // namespace mlir
+
+#endif // MLIR_DIALECT_XEGPU_UARCH_BASE_H
diff --git a/mlir/include/mlir/Dialect/XeGPU/uArch/uArchInterfaces.h b/mlir/include/mlir/Dialect/XeGPU/uArch/uArchInterfaces.h
new file mode 100644
index 0000000000000..27d44c38317a1
--- /dev/null
+++ b/mlir/include/mlir/Dialect/XeGPU/uArch/uArchInterfaces.h
@@ -0,0 +1,75 @@
+//===--- uArchInterfaces.h ---*- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Defines the utility interfaces that are implemented by individual
+/// instructions.
+///
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_DIALECT_XEGPU_UARCH_INTERFACES_H
+#define MLIR_DIALECT_XEGPU_UARCH_INTERFACES_H
+
+#include "mlir/Dialect/XeGPU/uArch/uArchBase.h"
+#include "mlir/IR/BuiltinTypes.h"
+#include "mlir/IR/TypeUtilities.h"
+#include <map>
+#include <string>
+#include <vector>
+
+namespace mlir {
+namespace xegpu {
+namespace uArch {
+
+enum class MMAOpndEnum { MatrixA, MatrixB, MatrixC, MatrixD };
+struct MMAInstructionInterface {
+  // Get supported Matrix shapes
+  virtual std::vector<std::pair<uint32_t, uint32_t>>
+  getSupportedShapes(mlir::Type dataType, MMAOpndEnum matrixType) = 0;
+
+  // @TODO: This method takes an context object as a parameter, this is to
+  // create the mlir::Type objects from the same context. Since type objects are
+  // uniqued in a specific context, to do things like "aType == bType" (where
+  // aType and bType are both same type) kind of checks, the both types should
+  // be from the same context.
+  //
+  // One alternative to this is to create enum to represent each types, but this
+  // adds an extra burden to user to convert these enums to specific types. In
+  // fact the utility that would convert enumToType() and vice versa would still
+  // have to use the context object.
+  //
+  // Untill we have a better solution, we stick to passing context object to
+  // this method.
+  virtual std::vector<mlir::Type> getSupportedTypes(MLIRContext &context,
+                                                    MMAOpndEnum matrixType) = 0;
+  virtual bool
+  checkSupportedShapesAndTypes(std::pair<uint32_t, uint32_t> AShape,
+                               std::pair<uint32_t, uint32_t> BShape,
+                               std::pair<uint32_t, uint32_t> CShape,
+                               std::pair<uint32_t, uint32_t> DShape,
+                               mlir::Type AType, mlir::Type BType,
+                               mlir::Type CType, mlir::Type DType) = 0;
+  virtual bool checkSupportedTypes(mlir::Type AType, mlir::Type BType,
+                                   mlir::Type CType, mlir::Type DType) = 0;
+  virtual bool validate(std::pair<uint32_t, uint32_t> AShape,
+                        std::pair<uint32_t, uint32_t> BShape,
+                        std::pair<uint32_t, uint32_t> CShape,
+                   ...
[truncated]

The uArch infrastructure provides: - A set data structures to represent, uArch and it's necessary components (e.g., instructions, register-files, caches). - A set of utility interfaces that are common to a family of ops (e.g., mma ops, 2DBlockIO ops). The implementation of these interfaces are provided by the specific instructions. Each family of ops provides these 5 common APIs. However, some family of ops may have more utility APIs. The common 5 APIs are: - getSupportedShapes - getSupportedTypes - checkSupportedShapesAndTypes - checkSupportedTypes - validate Add support for PVC and BMG architectures. Add support for DPAS instruction.

rolfmorel · 2025-08-19T15:39:08Z

High-level comment: could the "family of ops" be represented by OpInterfaces? That is, are the XeVM ops (and those of XeGPU) in each of the families such that they correspond to each of the Instruction structs that you have? If so, might be nicer to attach the required shape information directly to the ops (though presumably through static methods (on interfaces) so you don't first need an instance of an op to learn about its required shapes).

mshahneo · 2025-08-20T19:50:06Z

High-level comment: could the "family of ops" be represented by OpInterfaces? That is, are the XeVM ops (and those of XeGPU) in each of the families such that they correspond to each of the Instruction structs that you have? If so, might be nicer to attach the required shape information directly to the ops (though presumably through static methods (on interfaces) so you don't first need an instance of an op to learn about its required shapes).

Hi Rolf,

Thank you for bringing it up. I don't have any opposition to this in principle. However, I have a few concerns/discussion points,

The instructions (and their restrictions) have a scope (work-item, subgroup, workgroup). However, XeGPU ops operate in all 3 of those scopes, how the interfaces would look like for a XeGPU op that is not the native scope of that instruction. For example, DPAS instruction is natively subgroup scoped. Now, An XeGPU op with workgroup scope, how would specific interfaces (e.g., getSupportedShapes) would work? Do they return null, or something else.
The second concern is, now that both XeGPU and XeVM ops would have to implement the interfaces, that may add extra code. One way to resolve it would be to have the common implementation of these interfaces someplace. In that vein, I would argue that the current setup could be that. In the next PRs, we could make XeGPU/XeVM ops implement these interfaces, but internally call, the instruction implemented version. (@chencha3, @silee2 what it is you opinion as XeGPU and XeVM code owner?)

adam-smnk

High-level question: is there any info that gets pulled at runtime?
I imagine most of these hardware properties don't really change. Can't all this be static compile time info?

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

mlir/lib/Dialect/XeGPU/uArch/IntelGpuXe2.cpp

adam-smnk · 2025-08-21T15:53:22Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+  virtual std::vector<uint32_t> getSupportedN(mlir::Type type) override;
+};
+
+namespace PVCuArch {


nit: I think we can skip this extra nested namespace

I was actually keeping this in case we decide to make specific version of uArchs (i.e. different version of BMG).

Agreed that the namespace nesting is becoming very deep.

Isn't inheritance enough to capture the different generations of uArch? E.g. all just living in ::mlir::xegpu::uarch.

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

adam-smnk · 2025-08-21T16:00:16Z

mlir/lib/Dialect/XeGPU/IR/XeGPUDialect.cpp

 #include <mlir/Dialect/XeGPU/IR/XeGPUAttrs.cpp.inc>
      >();
+
+  // Populate the uArchMap with the supported target devices


I don't think it belong to dialect initializer.
Besides that, let's start with a simple design and let users create uArch instances on demand.

I can remove this from the dialect initializer.
I kept it here because this way, the uArch map is populated once the dialect is loaded. Any pass or the dialect can use it from there on. But other option can also work, where we a pass can initialize the uArch map and use it afterwards.

Any specific reason why it can't be in dialect initializer?

Right now, uArch info is not strictly necessary for the dialect i.e., you can still do a lot like transforming ops and running generic transforms without pulling in the hardware info. It makes me "pay the price" for sth I don't need.
I can see that moving forward this might change as a lot of passes might want to use it. But that's a bit premature decision.

Also, AFAIK the whole runtime initialization is the reason why you had to introduce mutexes into table builders. It adds extra complexity to both design and runtime without a clear payoff.

I see, makes sense.
Removed them.

mshahneo · 2025-08-27T04:09:28Z

High-level question: is there any info that gets pulled at runtime? I imagine most of these hardware properties don't really change. Can't all this be static compile time info?

Nothing in particular, all are static info. I actually tried to follow the MLIR way. Since it is like an utility, I tried to follow the utility functions in different dialects. But I am open to change.

akroviakov

The choice of runtime vs static allocations and data structures can be optimized in the subsequent iterations.
The "string vs enum" is more of a structure/API decision, so it can be discussed in this PR. Is there any particular reason to favor strings over enums?

The uarch has some non-obvious structure (no_of_component for lower level, hierarchy level inclusion in cache info struct). Without any motivating example these seem overcomplicated and are likely to be replaced/moved anyway when a concrete use case appears.

How about a lightweight definition first (e.g., excluding num_matrix_units), but with at least one practical usage in any part of xegpu (e.g., dpas)? As you noted, there are quite a number of fields that can be added, so why not let use cases decide which fields should actually be added.

akroviakov · 2025-09-02T13:33:58Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+                {/* restrictions */}        // Optional: empty
+        ) {
+    // Initialize uArchHierarchy
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("thread", 0));


How shall one interpret no_of_component being 0? The end of hierarchy? Where is this "lower level component count" semantics required, that it is better than a direct correspondence (eliminating the gpu level)?

Removed it to make the design simpler.

akroviakov · 2025-09-02T13:36:21Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+
+struct Instruction {
+  // @TODO: Add more fields as needed
+  Instruction(std::string name, std::string desc)


How is the scope set?

Ah, missed it, fixed.

akroviakov · 2025-09-02T13:38:03Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+
+protected:
+  uint32_t size;                 // size per register in bits
+  std::vector<std::string> mode; // e.g., "small", "large" GRF modes


Should these really be strings and not enums?

Agreed - if there's a fixed list of valid modes, just make them into an enum.

Thanks, using enums now.

akroviakov · 2025-09-02T13:44:14Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+        ) {
+    // Initialize uArchHierarchy
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("thread", 0));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeCore", 8));


"XeCore"

String vs enums?

vs normal member

Removed it for simpler design.

akroviakov · 2025-09-02T14:05:13Z

mlir/lib/Dialect/XeGPU/uArch/IntelGpuXe2.cpp

+}
+
+std::vector<mlir::Type>
+DPASInstruction::getSupportedTypes(MLIRContext &context,


What is the motivating use case behind this method?

There are times when it is necessary to know the types but not the shape (e.g., XeGPU shapes are often forgiving, but one can use the supported types to check for early failure.)

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

rolfmorel · 2025-09-09T10:59:14Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+  virtual std::vector<uint32_t> getSupportedN(mlir::Type type) override;
+};
+
+namespace PVCuArch {


Agreed that the namespace nesting is becoming very deep.

Isn't inheritance enough to capture the different generations of uArch? E.g. all just living in ::mlir::xegpu::uarch.

rolfmorel · 2025-09-09T11:05:48Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+
+protected:
+  uint32_t size;                 // size per register in bits
+  std::vector<std::string> mode; // e.g., "small", "large" GRF modes


Agreed - if there's a fixed list of valid modes, just make them into an enum.

rolfmorel · 2025-09-09T11:06:33Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+// - the name of the uArch,
+// - the description of the uArch,
+// - uArch hierarchy
+// - Rgister File information


nit: Register

Reduced namepaces.
Using enums.
Fixed the typo.

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

rolfmorel · 2025-09-09T16:08:33Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+// An enum class to represent the scope of an instruction
+enum class InstructionScopeEnum { WorkItem, Subgroup, Workgroup, Cluster };
+
+// A struct to represent basic information about an instruction


Nit: /// and no empty line before struct definition?

Also elsewhere.

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

rolfmorel · 2025-09-11T12:07:41Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+    // L1 cache, XeCore level
+    this->cache_info.push_back(
+        CacheInfo(512 * 1024, 64, this->uArch_hierarchy[1]));
+    // L3 cache, XeStack level


I am probably missing some context, though why is L2 not added here as well?

This is due to naming in GPU, often the the last level cache is called L3, instead of L2.

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

rolfmorel · 2025-09-11T12:25:10Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+        ) {
+    // Initialize uArchHierarchy
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("thread", 0));
+    this->uArch_hierarchy.push_back(uArchHierarchyComponent("XeCore", 8));


vs normal member

adam-smnk

I still have concerns when it comes to wrapping fully static data into runtime structures.
It adds both complexity to the design and runtime overhead.

The choice of runtime vs static allocations and data structures can be optimized in the subsequent iterations.

Pragmatically speaking, this rarely happens.

Design choices aside, overall info and utilities provided look useful.
The exact APIs can be refined once we start integrating uArch info into passes. Then we'll see what works and what's missing.

It might be better to just take it for a test drive once a general cleanup is done.

adam-smnk · 2025-09-15T10:50:08Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+namespace uArch {
+// Architecture HW component hierarchy to present thread, core, socket ...
+struct uArchHierarchyComponent {
+  std::string name = ""; // optional name of the hierarchy component


I don't think this can be optional. Without some kind of identifier, these objects are meaningless.

Removed it for simpler design. Will add it later.

adam-smnk · 2025-09-15T10:50:26Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+  // no. of lower hierarchy component it contains, e.g., for PVC XeCore it
+  // contains 8 threads, so no_of_component=8
+  uint32_t no_of_component;
+  // Constructor


I'd also remove the default constructor.

Removed it for simpler design.

adam-smnk · 2025-09-15T11:22:09Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+};
+
+// struct to represent DPAS instruction
+struct DPASInstruction : public Instruction, public MMAInstructionInterface {


nit: It would helpful to somewhere document to which abstraction level this info corresponds to.
For methods like getSupportedShapes whether it is subgroup or work-item level etc.

The instruction itself actually contains the scope.
Sorry, it was missing in the previous iteration.

adam-smnk · 2025-09-15T11:23:41Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+                               mlir::Type CType, mlir::Type DType) override;
+  virtual bool checkSupportedTypes(mlir::Type AType, mlir::Type BType,
+                                   mlir::Type CType, mlir::Type DType) override;
+  virtual bool validate(std::pair<uint32_t, uint32_t> AShape,


What's the difference between this and checkSupportedShapesAndTypes?

For this specific case, none.

We wanted to keep the 5 interfaces consistent across all the Instruction Interfaces. That's why it exists here.

adam-smnk · 2025-09-15T11:33:17Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+  // Get methods
+  uint32_t getSize() const { return size; }
+
+  const std::vector<std::string> &getModes() const { return mode; }


nit: generally it's preferred to stick to LLVM's STL wrappers e.g., SmallVector instead of std::vector
I'd suggest replacing std::vector uses unless you have a specific reason to use std

adam-smnk · 2025-09-15T11:45:39Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+    // Add the instructions
+    auto dpas = std::make_shared<DPASInstruction>();
+    instructions.emplace(dpas->getName(), dpas);
+    // instructions[dpas->name] = dpas.get();


nit: remove dead code

Thanks, done.

adam-smnk · 2025-09-15T11:48:33Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+
+protected:
+  std::string name;
+  std::string description;


What's the use case for these descriptions? Is there anything that would need them at runtime?
Seems like standard code comments could replace these and make objects leaner.

I see, makes sense. Removed.

I can still see the member string, is this intended?

adam-smnk · 2025-09-15T11:50:00Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchInterfaces.h

@@ -0,0 +1,75 @@
+//===--- uArchInterfaces.h ---*- C++-*-===//


nit: header formatting

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

adam-smnk · 2025-09-15T11:59:46Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+#include <vector>
+
+namespace mlir {
+namespace xegpu {


nit: maybe it should live under xevm.

Otherwise, adding this info into xevm dialect initialization messes up layering as now lower level dialect depends on the more abstract one. Not sure if xegpu currently explicitly depends on xevm but logically it does. So there's a risk for circular dependency.

I understand the issue.
The problem with XeVM dialect is that it is part of the target dialects and lives with other leaf dialects. Didn't want to convolute that.

adam-smnk · 2025-09-15T12:05:44Z

@Jianhui-Li ping for review

Simplify the design: - Remove uArchHierarchyComponent LLVMize names. Replace String usage with enum whenever possible.

Simplify design: - Remove dialect initialization and necessary mechanism.

Move all the implementation to the .h file. Move uArchInterfaces to uArchBase.

Use LLVM data structures whenever possible.

mshahneo · 2025-09-25T16:41:49Z

Thank you so much, @adam-smnk , @akroviakov , @rolfmorel for your suggestions.
I tried to address your suggestions and concerns.
Please let me know how it looks.

Jianhui-Li

LGTM.

Jianhui-Li · 2025-09-26T18:49:00Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+namespace mlir {
+namespace xegpu {
+namespace uArch {
+struct XeCoreInfo {


nit: suggest move to uArchBase.h

Thanks, Jianhui, fixed.

mshahneo · 2025-10-06T16:00:22Z

Hi @adam-smnk , @akroviakov , @rolfmorel,

Ping for your final thoughts :).

akroviakov

LGTM

akroviakov · 2025-10-07T11:47:26Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+//
+// The information is represented as strings
+// For example, the information about an instruction can be represented as:
+// Instruction instr = {"dpas", "Dot Product Accumulate Systolic  (DPAS) is a


This example appears to be not valid anymore, judging by the constructor.

Fixed. Thanks, Artem.

akroviakov · 2025-10-07T12:03:36Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+
+protected:
+  std::string name;
+  std::string description;


I can still see the member string, is this intended?

mshahneo · 2025-10-07T17:15:50Z

LGTM

Thanks, Artem, fixed your suggestions.

rolfmorel · 2025-10-07T19:53:22Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+  // @TODO: Add more instructions as needed
+};

+llvm::StringRef toString(InstructionKind name) {


Nit: would this make more sense as a method?

(Similarly, parseInstructionKind could be a static method.)

Matter of taste though. If there's precedence in the codebase, please ignore this comment.

rolfmorel · 2025-10-07T20:00:43Z

This looks a fair bit simpler now - many thanks for iterating, @mshahneo ! I am happy for this to go in.

mshahneo requested a review from adam-smnk August 14, 2025 22:34

llvmbot added mlir:llvm mlir:gpu mlir labels Aug 14, 2025

mshahneo requested a review from rolfmorel August 14, 2025 22:35

mshahneo requested review from Jianhui-Li, chencha3, charithaintc and silee2 August 14, 2025 22:35

mshahneo force-pushed the uarch_definition_upstream_pr_1 branch from ed85437 to b1d37c0 Compare August 14, 2025 22:42

adam-smnk reviewed Aug 21, 2025

View reviewed changes

adam-smnk requested a review from rengolin August 21, 2025 16:02

Address review comments.

31e02af

akroviakov reviewed Sep 2, 2025

View reviewed changes

rolfmorel reviewed Sep 11, 2025

View reviewed changes

adam-smnk reviewed Sep 15, 2025

View reviewed changes

mshahneo added 6 commits September 24, 2025 17:10

Address review comments.

84e28a9

Simplify the design: - Remove uArchHierarchyComponent LLVMize names. Replace String usage with enum whenever possible.

Address review comments.

22dbba0

Simplify design: - Remove dialect initialization and necessary mechanism.

Address review comments.

82737ce

Move all the implementation to the .h file. Move uArchInterfaces to uArchBase.

Address review comments.

38ff707

Use LLVM data structures whenever possible.

Add/Remove some spacings.

e5a5ac6

Address review comments.

f1535d1

Jianhui-Li approved these changes Sep 26, 2025

View reviewed changes

akroviakov approved these changes Oct 7, 2025

View reviewed changes

mshahneo added 2 commits October 7, 2025 17:09

Address review comments.

3f4136a

Address review comments.

b50f200

Fix a small compile error.

140a3aa

rolfmorel reviewed Oct 7, 2025

View reviewed changes

		@@ -0,0 +1,75 @@
		//===--- uArchInterfaces.h ---- C++--===//

[uArch][XeGPU] Add XeGPU uArch definition. #153706

Are you sure you want to change the base?

[uArch][XeGPU] Add XeGPU uArch definition. #153706

Uh oh!

Conversation

mshahneo commented Aug 14, 2025

Uh oh!

llvmbot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolfmorel commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshahneo commented Aug 20, 2025

Uh oh!

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshahneo Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam-smnk Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshahneo commented Aug 27, 2025

Uh oh!

akroviakov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Aug 14, 2025 •

edited

Loading

rolfmorel commented Aug 19, 2025 •

edited

Loading

mshahneo Aug 27, 2025 •

edited

Loading

adam-smnk Sep 15, 2025 •

edited

Loading