Skip to content

Conversation

@Muzammiluddin-Syed-ECE
Copy link
Contributor

@Muzammiluddin-Syed-ECE Muzammiluddin-Syed-ECE commented Mar 27, 2025

When performing cross-lane reductions using subgroup_reduce ops across contiguous lanes on AMD GPUs, lower to Data Parallel Primitives (DPP) ops when possible. This reduces latency on applicable devices.
See related Issue
To do:

  • Improve lowering to subgroup_reduce in compatible matvecs (these get directly lowered to gpu.shuffles in an earlier pass)

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Mar 27, 2025

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-backend-amdgpu

Author: Muzammil (Muzammiluddin-Syed-ECE)

Changes

[DRAFT]
See related Issue
We can better leverage DPP ops in the AMDGPU dialect when lowering subgroup reduce ops.

To this end this PR implements a new pass where we perform such a lowering.

To do:

  • Improve lowering to subgroup_reduce in compatible matvecs (these get directly lowered to gpu.shuffles in an earlier pass)
  • Add test for pass

Patch is 52.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133204.diff

10 Files Affected:

  • (added) mlir/include/mlir/Conversion/GPUToAMDGPU/GPUToAMDGPU.h (+35)
  • (modified) mlir/include/mlir/Conversion/Passes.h (+1)
  • (modified) mlir/include/mlir/Conversion/Passes.td (+16)
  • (modified) mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp (+1)
  • (modified) mlir/lib/Conversion/CMakeLists.txt (+1)
  • (added) mlir/lib/Conversion/GPUToAMDGPU/CMakeLists.txt (+22)
  • (added) mlir/lib/Conversion/GPUToAMDGPU/GPUToAMDGPU.cpp (+203)
  • (modified) mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt (+1)
  • (modified) mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp (+161-1)
  • (added) mlir/test/Conversion/GPUToAMDGPU/gpu-to-amdgpu.mlir (+463)
diff --git a/mlir/include/mlir/Conversion/GPUToAMDGPU/GPUToAMDGPU.h b/mlir/include/mlir/Conversion/GPUToAMDGPU/GPUToAMDGPU.h
new file mode 100644
index 0000000000000..fea9b7ed50bcc
--- /dev/null
+++ b/mlir/include/mlir/Conversion/GPUToAMDGPU/GPUToAMDGPU.h
@@ -0,0 +1,35 @@
+//===- GPUToAMDGPU.h - Convert AMDGPU to ROCDL dialect --*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+#ifndef MLIR_CONVERSION_GPUTOAMDGPU_GPUTOAMDGPU_H_
+#define MLIR_CONVERSION_GPUTOAMDGPU_GPUTOAMDGPU_H_
+
+
+#include "mlir/IR/PatternMatch.h"
+#include <memory>
+#include <string>
+
+namespace mlir {
+
+class LLVMTypeConverter;
+class RewritePatternSet;
+class TypeConverter;
+class Pass;
+
+#define GEN_PASS_DECL_CONVERTGPUTOAMDGPUPASS
+#include "mlir/Conversion/Passes.h.inc"
+
+void populateSubgroupReduceLoweringPatterns(LLVMTypeConverter &converter,
+                                            RewritePatternSet &patterns,
+                                            unsigned subgroupSize,
+                                            PatternBenefit benefit);
+// void populateGPUToAMDGPUConversionPatterns(LLVMTypeConverter &converter,
+//                                             RewritePatternSet &patterns);
+
+} // namespace mlir
+
+#endif // MLIR_CONVERSION_GPUTOAMDGPU_GPUTOAMDGPU_H_
\ No newline at end of file
diff --git a/mlir/include/mlir/Conversion/Passes.h b/mlir/include/mlir/Conversion/Passes.h
index ccd862f67c068..1189423799092 100644
--- a/mlir/include/mlir/Conversion/Passes.h
+++ b/mlir/include/mlir/Conversion/Passes.h
@@ -34,6 +34,7 @@
 #include "mlir/Conversion/FuncToLLVM/ConvertFuncToLLVMPass.h"
 #include "mlir/Conversion/FuncToSPIRV/FuncToSPIRVPass.h"
 #include "mlir/Conversion/GPUCommon/GPUCommonPass.h"
+#include "mlir/Conversion/GPUToAMDGPU/GPUToAMDGPU.h"
 #include "mlir/Conversion/GPUToLLVMSPV/GPUToLLVMSPVPass.h"
 #include "mlir/Conversion/GPUToNVVM/GPUToNVVMPass.h"
 #include "mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h"
diff --git a/mlir/include/mlir/Conversion/Passes.td b/mlir/include/mlir/Conversion/Passes.td
index bbba495e613b2..6a1deeb230794 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -643,6 +643,22 @@ def ConvertGPUToSPIRV : Pass<"convert-gpu-to-spirv", "ModuleOp"> {
   ];
 }
 
+//===----------------------------------------------------------------------===//
+// GPUToAMDGPU
+//===----------------------------------------------------------------------===//
+
+def ConvertGPUToAMDGPUPass : Pass<"convert-gpu-to-amdgpu"> {
+  let summary = "Generate AMDGPU operations for gpu operations";
+  let dependentDialects = [
+    "LLVM::LLVMDialect",
+    "::mlir::gpu::GPUDialect",
+    "amdgpu::AMDGPUDialect",
+  ];
+  let options = [Option<"subgroupSize", "subgroup-size", "unsigned",
+                        /*default=*/"64",
+                        "Size of subgroup">];
+}
+
 //===----------------------------------------------------------------------===//
 // ConvertIndexToLLVMPass
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
index 949424db7c4d6..5296f75571188 100644
--- a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+++ b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
@@ -1214,6 +1214,7 @@ struct ConvertAMDGPUToROCDLPass
   using Base::Base;
 
   void runOnOperation() override {
+    llvm::errs() << " WHEN DOES AMDGPU TO ROCDL RUN\n";
     MLIRContext *ctx = &getContext();
     FailureOr<Chipset> maybeChipset = Chipset::parse(chipset);
     if (failed(maybeChipset)) {
diff --git a/mlir/lib/Conversion/CMakeLists.txt b/mlir/lib/Conversion/CMakeLists.txt
index b6c21440c571c..b957a4473f1e6 100644
--- a/mlir/lib/Conversion/CMakeLists.txt
+++ b/mlir/lib/Conversion/CMakeLists.txt
@@ -24,6 +24,7 @@ add_subdirectory(FuncToEmitC)
 add_subdirectory(FuncToLLVM)
 add_subdirectory(FuncToSPIRV)
 add_subdirectory(GPUCommon)
+add_subdirectory(GPUToAMDGPU)
 add_subdirectory(GPUToLLVMSPV)
 add_subdirectory(GPUToNVVM)
 add_subdirectory(GPUToROCDL)
diff --git a/mlir/lib/Conversion/GPUToAMDGPU/CMakeLists.txt b/mlir/lib/Conversion/GPUToAMDGPU/CMakeLists.txt
new file mode 100644
index 0000000000000..9b82b5dc63d9c
--- /dev/null
+++ b/mlir/lib/Conversion/GPUToAMDGPU/CMakeLists.txt
@@ -0,0 +1,22 @@
+add_mlir_conversion_library(MLIRGPUToAMDGPU
+  GPUToAMDGPU.cpp
+
+  ADDITIONAL_HEADER_DIRS
+  ${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/GPUToAMDGPU
+  
+  DEPENDS
+  MLIRConversionPassIncGen
+
+  LINK_COMPONENTS
+  Core
+
+  LINK_LIBS PUBLIC
+  MLIRLLVMCommonConversion
+  MLIRLLVMDialect
+  MLIRGPUDialect
+  MLIRAMDGPUDialect
+  MLIRAMDGPUUtils
+  MLIRROCDLDialect
+  MLIRPass
+  MLIRTransforms
+  )
diff --git a/mlir/lib/Conversion/GPUToAMDGPU/GPUToAMDGPU.cpp b/mlir/lib/Conversion/GPUToAMDGPU/GPUToAMDGPU.cpp
new file mode 100644
index 0000000000000..c2fc8b2e19ae6
--- /dev/null
+++ b/mlir/lib/Conversion/GPUToAMDGPU/GPUToAMDGPU.cpp
@@ -0,0 +1,203 @@
+//===- GPUToAMDGPU.cpp - GPU to AMDGPU dialect conversion -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Conversion/GPUToAMDGPU/GPUToAMDGPU.h"
+
+#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
+#include "mlir/Conversion/LLVMCommon/Pattern.h"
+#include "mlir/Conversion/LLVMCommon/TypeConverter.h"
+#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
+#include "mlir/Dialect/AMDGPU/Utils/Chipset.h"
+#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
+#include "mlir/IR/BuiltinTypes.h"
+#include "mlir/IR/TypeUtilities.h"
+#include "mlir/Pass/Pass.h"
+
+#include "mlir/Conversion/GPUCommon/GPUCommonPass.h"
+#include "mlir/Dialect/GPU/IR/GPUDialect.h"
+#include "mlir/Dialect/Vector/IR/VectorOps.h"
+
+#include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MathExtras.h"
+#include <cassert>
+#include <cstdint>
+
+#include "../LLVMCommon/MemRefDescriptor.h"
+
+#include "llvm/ADT/STLExtras.h"
+#include <optional>
+
+namespace mlir {
+#define GEN_PASS_DEF_CONVERTGPUTOAMDGPUPASS
+#include "mlir/Conversion/Passes.h.inc"
+} // namespace mlir
+
+using namespace mlir;
+
+namespace {
+struct ClusterInfo {
+  unsigned clusterStride;
+  unsigned clusterSize;
+  unsigned subgroupSize;
+};
+
+static FailureOr<ClusterInfo>
+getAndValidateClusterInfo(gpu::SubgroupReduceOp op, unsigned subgroupSize) {
+  assert(llvm::isPowerOf2_32(subgroupSize));
+
+  std::optional<uint32_t> clusterSize = op.getClusterSize();
+  assert(!clusterSize ||
+         llvm::isPowerOf2_32(*clusterSize)); // Verifier should've caught this.
+  if (clusterSize && *clusterSize > subgroupSize)
+    return op.emitOpError()
+           << "cluster size " << *clusterSize
+           << " is greater than subgroup size " << subgroupSize;
+  unsigned effectiveClusterSize = clusterSize.value_or(subgroupSize);
+
+  auto clusterStride = op.getClusterStride();
+  assert(llvm::isPowerOf2_32(clusterStride)); // Verifier should've caught this.
+  if (clusterStride >= subgroupSize)
+    return op.emitOpError()
+           << "cluster stride " << clusterStride
+           << " is not less than subgroup size " << subgroupSize;
+
+  return ClusterInfo{clusterStride, effectiveClusterSize, subgroupSize};
+}
+
+Value createSubgroupDPPReduction(OpBuilder &b, Location loc, Value input,
+                                 gpu::AllReduceOperation mode,
+                                 const ClusterInfo &ci) {
+  Value result = input;
+  if (ci.clusterSize >= 2) {
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 1);
+    Value dppResult =
+        b.create<amdgpu::DPPOp>(loc, result.getType(), result, result,
+                                amdgpu::DPPPerm::row_shr, permArg);
+    result = vector::makeArithReduction(b, loc, gpu::convertReductionKind(mode),
+                                        result, dppResult);
+  }
+
+  if (ci.clusterSize >= 4) {
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 2);
+    Value dppResult =
+        b.create<amdgpu::DPPOp>(loc, result.getType(), result, result,
+                                amdgpu::DPPPerm::row_shr, permArg);
+    result = vector::makeArithReduction(b, loc, gpu::convertReductionKind(mode),
+                                        result, dppResult);
+  }
+
+  if (ci.clusterSize >= 8) {
+    Value dppResult = b.create<amdgpu::DPPOp>(
+        loc, result.getType(), result, result, amdgpu::DPPPerm::row_half_mirror,
+        b.getUnitAttr());
+    result = vector::makeArithReduction(b, loc, gpu::convertReductionKind(mode),
+                                        result, dppResult);
+  }
+
+  if (ci.clusterSize >= 16) {
+    Value dppResult =
+        b.create<amdgpu::DPPOp>(loc, result.getType(), result, result,
+                                amdgpu::DPPPerm::row_mirror, b.getUnitAttr());
+    result = vector::makeArithReduction(b, loc, gpu::convertReductionKind(mode),
+                                        result, dppResult);
+  }
+
+  if (ci.clusterSize >= 32) {
+    // auto permArg = builder.getInt32(15);
+    // auto rowMask = builder.getInt32("0xa");
+    // auto bankMask = builder.getInt32("0xf");
+    // auto boundCtrl = builder.getBoolAttr(false);
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 15);
+    Value dppResult = b.create<amdgpu::DPPOp>(
+        loc, result.getType(), result, result, amdgpu::DPPPerm::row_bcast_15,
+        b.getUnitAttr(), 10, 15, false);
+    result = vector::makeArithReduction(b, loc, gpu::convertReductionKind(mode),
+                                        result, dppResult);
+  }
+
+  if (ci.clusterSize == 64) {
+    // auto permArg = builder.getInt32(31);
+    // auto rowMask = builder.getInt32("0xc");
+    // auto bankMask = builder.getInt32("0xf");
+    // auto boundCtrl = builder.getBoolAttr(false);
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 31);
+    Value dppResult = b.create<amdgpu::DPPOp>(
+        loc, result.getType(), result, result, amdgpu::DPPPerm::row_bcast_31,
+        b.getUnitAttr(), 12, 15, false);
+    result = vector::makeArithReduction(b, loc, gpu::convertReductionKind(mode),
+                                        result, dppResult);
+  }
+
+  // // read lane 63 with the final result.
+  // auto lane = b.getIntegerAttr(b.getIntegerType(32), 63);
+  // result = b.create<ROCDL::ReadLaneOp>(loc, input.getType(), result, lane);
+  assert(result.getType() == input.getType());
+  return result;
+}
+
+struct ScalarSubgroupReduceToShuffles final
+    : OpRewritePattern<gpu::SubgroupReduceOp> {
+  ScalarSubgroupReduceToShuffles(MLIRContext *ctx, unsigned subgroupSize,
+                                 bool matchClustered, PatternBenefit benefit)
+      : OpRewritePattern(ctx, benefit), subgroupSize(subgroupSize),
+        matchClustered(matchClustered) {}
+
+  LogicalResult matchAndRewrite(gpu::SubgroupReduceOp op,
+                                PatternRewriter &rewriter) const override {
+    llvm::errs() << "ScalarSubgroupReduceToShuffles" << "\n";
+    if (op.getClusterSize().has_value() != matchClustered) {
+      return rewriter.notifyMatchFailure(
+          op, llvm::formatv("op is {0}clustered but pattern is configured to "
+                            "only match {1}clustered ops",
+                            matchClustered ? "non-" : "",
+                            matchClustered ? "" : "non-"));
+    }
+
+    auto ci = getAndValidateClusterInfo(op, subgroupSize);
+    if (failed(ci))
+      return failure();
+
+    Location loc = op.getLoc();
+    rewriter.replaceOp(op, createSubgroupDPPReduction(
+                               rewriter, loc, op.getValue(), op.getOp(), *ci));
+    return success();
+  }
+
+private:
+  unsigned subgroupSize = 0;
+  bool matchClustered = false;
+};
+
+struct ConvertGPUToAMDGPUPass
+    : public impl::ConvertGPUToAMDGPUPassBase<ConvertGPUToAMDGPUPass> {
+  using Base::Base;
+
+  void runOnOperation() override {
+    RewritePatternSet patterns(&getContext());
+    LLVMTypeConverter converter(&getContext());
+    LLVMConversionTarget target(getContext());
+    target.addLegalDialect<::mlir::LLVM::LLVMDialect>();
+    target.addLegalDialect<::mlir::amdgpu::AMDGPUDialect>();
+    target.addLegalDialect<::mlir::ROCDL::ROCDLDialect>();
+
+    int subgroupSizeInt = static_cast<int>(subgroupSize);
+    populateSubgroupReduceLoweringPatterns(converter, patterns, subgroupSizeInt,
+                                           PatternBenefit(1));
+    if (failed(applyPartialConversion(getOperation(), target,
+                                      std::move(patterns))))
+      signalPassFailure();
+  }
+};
+} // namespace
+
+void mlir::populateSubgroupReduceLoweringPatterns(
+    LLVMTypeConverter &converter, RewritePatternSet &patterns, unsigned subgroupSize, PatternBenefit benefit) {
+  patterns.add<ScalarSubgroupReduceToShuffles>(
+      patterns.getContext(), subgroupSize, /*matchClustered=*/true, benefit);
+}
\ No newline at end of file
diff --git a/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt b/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt
index 945e3ccdfa87b..52484ac69a3e2 100644
--- a/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt
+++ b/mlir/lib/Conversion/GPUToROCDL/CMakeLists.txt
@@ -15,6 +15,7 @@ add_mlir_conversion_library(MLIRGPUToROCDLTransforms
   MLIRMathToLLVM
   MLIRMathToROCDL
   MLIRAMDGPUToROCDL
+  MLIRGPUToAMDGPU
   MLIRFuncToLLVM
   MLIRGPUDialect
   MLIRGPUToGPURuntimeTransforms
diff --git a/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp b/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
index 43eff3eddcc49..0b553274eceb4 100644
--- a/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/SubgroupReduceLowering.cpp
@@ -11,10 +11,12 @@
 //===----------------------------------------------------------------------===//
 
 #include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/GPU/Transforms/Passes.h"
 #include "mlir/Dialect/GPU/Utils/GPUUtils.h"
 #include "mlir/Dialect/Vector/IR/VectorOps.h"
+#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
 #include "mlir/IR/BuiltinTypes.h"
 #include "mlir/IR/Location.h"
 #include "mlir/IR/PatternMatch.h"
@@ -24,6 +26,8 @@
 #include <cassert>
 #include <cstdint>
 
+#define DPP
+
 using namespace mlir;
 
 namespace {
@@ -188,6 +192,8 @@ Value createSubgroupShuffleReduction(OpBuilder &builder, Location loc,
                                      function_ref<Value(Value)> unpackFn) {
   // Lane value always stays in the original type. We use it to perform arith
   // reductions.
+  llvm::errs() << "Cluster Stride: " << ci.clusterStride << "\n";
+  llvm::errs() << "Cluster Size: " << ci.clusterSize << "\n";
   Value laneVal = input;
   // Parallel reduction using butterfly shuffles.
   for (unsigned i = ci.clusterStride; i < ci.clusterStride * ci.clusterSize;
@@ -206,6 +212,146 @@ Value createSubgroupShuffleReduction(OpBuilder &builder, Location loc,
   return laneVal;
 }
 
+#ifdef DPP
+Value createSubgroupDPPReduction(OpBuilder &b, Location loc,
+  Value input, gpu::AllReduceOperation mode,
+  const ClusterInfo &ci,
+  function_ref<Value(Value)> packFn,
+  function_ref<Value(Value)> unpackFn) {
+  llvm::errs() << "createSubgroupDPPReduction" << "\n";
+  Value result = input;
+  if (ci.clusterSize >= 2) {
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 1);
+    Value dppResult = b.create<amdgpu::DPPOp>(loc, result.getType(), result, result, amdgpu::DPPPerm::row_shr, permArg);
+    llvm::errs() << dppResult << " c 2 \n";
+    result = vector::makeArithReduction(b, loc,
+      gpu::convertReductionKind(mode),
+      result, dppResult);
+  }
+
+  if (ci.clusterSize >= 4) {
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 2);
+    Value dppResult = b.create<amdgpu::DPPOp>(loc, result.getType(), result, result, amdgpu::DPPPerm::row_shr, permArg);
+    llvm::errs() << dppResult << " c 4 \n";
+    result = vector::makeArithReduction(b, loc,
+      gpu::convertReductionKind(mode),
+      result, dppResult);
+  }
+
+  if (ci.clusterSize >= 8) {
+
+    Value dppResult = b.create<amdgpu::DPPOp>(loc, result.getType(), result, result, amdgpu::DPPPerm::row_half_mirror, b.getUnitAttr());
+    llvm::errs() << dppResult << " c 8 \n";
+    result = vector::makeArithReduction(b, loc,
+      gpu::convertReductionKind(mode),
+      result, dppResult);
+  }
+
+  if (ci.clusterSize >= 16) {
+    Value dppResult = b.create<amdgpu::DPPOp>(loc, result.getType(), result, result, amdgpu::DPPPerm::row_mirror, b.getUnitAttr());
+    llvm::errs() << dppResult << " c 16 \n";
+    result = vector::makeArithReduction(b, loc,
+      gpu::convertReductionKind(mode),
+      result, dppResult);
+  }
+
+  if (ci.clusterSize >= 32) {
+    // auto permArg = builder.getInt32(15);
+    // auto rowMask = builder.getInt32("0xa");
+    // auto bankMask = builder.getInt32("0xf");
+    // auto boundCtrl = builder.getBoolAttr(false);
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 15);
+    Value dppResult = b.create<amdgpu::DPPOp>(loc, result.getType(), result, result, amdgpu::DPPPerm::row_bcast_15, b.getUnitAttr(), 10, 15, false);
+    llvm::errs() << dppResult << " c 32 \n";
+    result = vector::makeArithReduction(b, loc,
+      gpu::convertReductionKind(mode),
+      result, dppResult);
+  }
+
+  if (ci.clusterSize == 64) {
+    // auto permArg = builder.getInt32(31);
+    // auto rowMask = builder.getInt32("0xc");
+    // auto bankMask = builder.getInt32("0xf");
+    // auto boundCtrl = builder.getBoolAttr(false);
+    auto permArg = b.getIntegerAttr(b.getIntegerType(32), 31);
+    Value dppResult = b.create<amdgpu::DPPOp>(loc, result.getType(), result, result, amdgpu::DPPPerm::row_bcast_31, b.getUnitAttr(), 12, 15, false);
+    llvm::errs() << dppResult << " c 64 \n";
+    result = vector::makeArithReduction(b, loc,
+      gpu::convertReductionKind(mode),
+      result, dppResult);
+  }
+  
+  // // read lane 63 with the final result. 
+  // auto lane = b.getIntegerAttr(b.getIntegerType(32), 63);
+  // result = b.create<ROCDL::ReadLaneOp>(loc, input.getType(), result, lane);  
+  assert(result.getType() == input.getType());
+  return result;
+}
+#endif
+
+// Value createSubgroupDPPReduction(OpBuilder &b, Location loc,
+//   Value input, gpu::AllReduceOperation mode,
+//   const ClusterInfo &ci,
+//   function_ref<Value(Value)> packFn,
+//   function_ref<Value(Value)> unpackFn) {
+
+//   Value result = input;
+//   if (ci.clusterSize >= 2) {
+//     auto permArg = b.getInt32(1);
+//     Value dppResult = builder.create<amdgpu::DPPOp>(packFn(result), packFn(result), amdgpu::DPPPerm::row_shr, permArg);
+//     result = vector::makeArithReduction(builder, loc,
+//       gpu::convertReductionKind(mode),
+//       result, unpackFn(dppResult));
+//   }
+
+//   if (ci.clusterSize >= 4) {
+//     auto permArg = builder.getInt32(2);
+//     Value dppResult = builder.create<amdgpu::DPPOp>(packFn(result), packFn(result), amdgpu::DPPPerm::row_shr, permArg);
+//     result = vector::makeArithReduction(builder, loc,
+//       gpu::convertReductionKind(mode),
+//       result, unpackFn(dppResult));
+//   }
+
+//   if (ci.clusterSize >= 8) {
+//     Value dppResult = builder.create<amdgpu::DPPOp>(packFn(result), packFn(result), amdgpu::DPPPerm::row_half_mirror);
+//     result = vector::makeArithReduction(builder, loc,
+//       gpu::convertReductionKind(mode),
+//       result, unpackFn(dppResult));
+//   }
+
+//   if (ci.clusterSize >= 16) {
+//     Value dppResult = builder.create<amdgpu::DPPOp>(packFn(result), packFn(result), amdgpu::DPPPerm::row_mirror);
+//     result = vector::makeArithReduction(builder, loc,
+//       gpu::convertReductionKind(mode),
+//       result, unpackFn(dppResult));
+//   }
+
+//   if (ci.clusterSize >= 32) {
+//     auto permArg = builder.getInt32(15);...
[truncated]

@Muzammiluddin-Syed-ECE Muzammiluddin-Syed-ECE marked this pull request as draft March 27, 2025 04:18
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some notes here

@arsenm
Copy link
Contributor

arsenm commented Mar 27, 2025

This doesn't belong in mlir. We have a separate PR adding more wave reduce intrinsics already

@arsenm
Copy link
Contributor

arsenm commented Mar 27, 2025

This doesn't belong in mlir. We have a separate PR adding more wave reduce intrinsics already

#126469 and related

@krzysz00
Copy link
Contributor

@arsenm This is meant to implement https://github.com/GPUOpen-Drivers/llpc/blob/188bbf6a5b9403813e51d39f6fc8429550dbf267/lgc/builder/SubgroupBuilder.cpp#L570 - do the intrinsics you mention handle all those cases?

@kuhar
Copy link
Member

kuhar commented Mar 28, 2025

@arsenm This is meant to implement https://github.com/GPUOpen-Drivers/llpc/blob/188bbf6a5b9403813e51d39f6fc8429550dbf267/lgc/builder/SubgroupBuilder.cpp#L570 - do the intrinsics you mention handle all those cases?

+1, the gpu.subgroup_reduce seems to be more flexible around vector types and clusters (with strides): https://mlir.llvm.org/docs/Dialects/GPU/#gpusubgroup_reduce-gpusubgroupreduceop

@github-actions
Copy link

github-actions bot commented Mar 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@arsenm
Copy link
Contributor

arsenm commented Mar 28, 2025

@arsenm This is meant to implement https://github.com/GPUOpen-Drivers/llpc/blob/188bbf6a5b9403813e51d39f6fc8429550dbf267/lgc/builder/SubgroupBuilder.cpp#L570 - do the intrinsics you mention handle all those cases?

The current use is just for backend internal uses, but the hope is to generalize it for other uses

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume tests for gpu-to-amdgpu are coming?

@Muzammiluddin-Syed-ECE Muzammiluddin-Syed-ECE changed the title Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs Mar 28, 2025
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I've just realized I have a structural comment - that is, darn it, we might want to move the code again. If you look at LowerGpuOpsToNVVMOps.cpp , that implements the subgroup reduce lowering as part of the conversion to Nvidia-flavored LLVM IR.

Can you take a look and see why that pattern works / why we can't just stick this in LowerGPUOpsToROCDL?

@stefankoncarevic
Copy link
Contributor

Are there plans to ensure compatibility with RDNA architectures specifically Navi3 and Navi4 in this implementation?
If so, it's important to note that bcast_15 and bcast_31 are not supported for Navi3 and Navi4. Instead, you might want to consider using permlane as an alternative for these architectures.

@krzysz00
Copy link
Contributor

krzysz00 commented Apr 2, 2025

Specific note: looking at the device libraries, they use the share(0) permutation and then specifically permlanex16 to get the row broadcasts (and might shift left instead of right)

@Muzammiluddin-Syed-ECE
Copy link
Contributor Author

Muzammiluddin-Syed-ECE commented Apr 2, 2025

... I've just realized I have a structural comment - that is, darn it, we might want to move the code again. If you look at LowerGpuOpsToNVVMOps.cpp , that implements the subgroup reduce lowering as part of the conversion to Nvidia-flavored LLVM IR.

Can you take a look and see why that pattern works / why we can't just stick this in LowerGPUOpsToROCDL?

Ok so after actually running a few sizes of MatVecs, I see that it runs into the same issue as our pass of "ExpandGPUOps" decomposing the subgroup_reduce before it can make it to these passes.

So, in conclusion, why does that that pattern work? It doesn't...

@Muzammiluddin-Syed-ECE
Copy link
Contributor Author

Specific note: looking at the device libraries, they use the share(0) permutation and then specifically permlanex16 to get the row broadcasts (and might shift left instead of right)

I can't seem to find an equivalent op to permlanex16 defined in the ROCDL or AMDGPU dialects in mlir, should I be using the intrinsics from llvm here instead?

@krzysz00
Copy link
Contributor

krzysz00 commented Apr 2, 2025

Ok so after actually running a few sizes of MatVecs, I see that it runs into the same issue as our pass of "ExpandGPUOps" decomposing the subgroup_reduce before it can make it to these passes.

So, in conclusion, why does that that pattern work? It doesn't...

I think this means IREE's ExpandGPUOps needs to be fixed to run this pattern before - or with a higher benefit - than the expansion to shuffles, then.

I can't seem to find an equivalent op to permlanex16 defined in the ROCDL or AMDGPU dialects in mlir, should I be using the intrinsics from llvm here instead?

You'll at the very least want to add rocdl.permlanex16 if it doesn't - and you may want amdgpu.permlanex16 if there're bitcasts / splitting up vectors required

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall:

We need - and this is probably calling out for making a separate PR just for permlanex16 - tests in mlir/test/Dialect/LLVMIR/rocdl.mlir and mlir/test/Target/LLVMIR/rocdl.mlir for permlanex16

Looking at the Nvidia code, they're testing their equivalent of this pattern using the transform dialect - see mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir . While we aren't doing a conversion pattern and so we can't just copy of them, defining a transform dialect op for these rewrites may be a good idea.

Signed-off-by: Muzammiluddin Syed <[email protected]>
Signed-off-by: Muzammiluddin Syed <[email protected]>
Signed-off-by: Muzammiluddin Syed <[email protected]>
Signed-off-by: Muzammiluddin Syed <[email protected]>
Signed-off-by: Muzammiluddin Syed <[email protected]>
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall seems to be in a good state, just have one thing I want to check before I approve this

Signed-off-by: Muzammiluddin Syed <[email protected]>
Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please wait for @krzysz00 's approval before landing

Signed-off-by: Muzammiluddin Syed <[email protected]>
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the work on this

@pashu123 pashu123 merged commit 905f1d8 into llvm:main Apr 24, 2025
11 checks passed
@github-actions
Copy link

@Muzammiluddin-Syed-ECE Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 24, 2025

LLVM Buildbot has detected a new failure on builder amdgpu-offload-rhel-8-cmake-build-only running on rocm-docker-rhel-8 while building mlir at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/204/builds/7329

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py --jobs=32' (failure)
...
[7645/7780] Linking CXX shared library lib/libclang-cpp.so.21.0git
[7646/7780] Creating library symlink lib/libclang-cpp.so
[7647/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestSPIRVCPURunnerPipeline.cpp.o
[7648/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o
[7649/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o
[7650/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o
[7651/7780] Building CXX object tools/mlir/test/lib/Dialect/GPU/CMakeFiles/MLIRGPUTestPasses.dir/TestGpuRewrite.cpp.o
[7652/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o
[7653/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestVulkanRunnerPipeline.cpp.o
[7654/7780] Linking CXX shared library lib/libMLIRGPUTransforms.so.21.0git
FAILED: lib/libMLIRGPUTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRGPUTransforms.so.21.0git -o lib/libMLIRGPUTransforms.so.21.0git tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AllReduceLowering.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AsyncRegionRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/BufferDeallocationOpInterfaceImpl.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/GlobalIdRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/MemoryPromotion.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ParallelLoopMapper.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ShuffleRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SPIRVAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib:"  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRAsyncDialect.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRNVVMTarget.so.21.0git  lib/libMLIRSPIRVTarget.so.21.0git  lib/libMLIRROCDLTarget.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRNVVMToLLVMIRTranslation.so.21.0git  lib/libMLIRSPIRVSerialization.so.21.0git  lib/libMLIRSPIRVDialect.so.21.0git  lib/libMLIRSPIRVImageInterfaces.so.21.0git  lib/libMLIRSPIRVBinaryUtils.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRTargetLLVM.so.21.0git  lib/libMLIRExecutionEngineUtils.so.21.0git  lib/libMLIRROCDLToLLVMIRTranslation.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  lib/libMLIRROCDLDialect.so.21.0git  lib/libMLIRLLVMDialect.so.21.0git  lib/libLLVMAMDGPU
ibLLVMLinker.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMFrontendOffloading.so.21.0git  lib/libLLVMTarget.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAMDGPUDesc.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMMCParser.so.21.0git  lib/libLLVMIRReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMAMDGPUInfo.so.21.0git  lib/libLLVMMC.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  -lpthread  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib && :
tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o: In function `(anonymous namespace)::ScalarSubgroupReduceToDPP::matchAndRewrite(mlir::gpu::SubgroupReduceOp, mlir::PatternRewriter&) const':
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x2f0): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x33f): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x376): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x418): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x46c): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x4a3): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x533): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x58d): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x5c4): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x653): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6ad): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6e4): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xb0f): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xb62): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xb99): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xc2a): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xc7f): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xcb6): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
collect2: error: ld returned 1 exit status
[7655/7780] Building CXX object tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
[7656/7780] Building CXX object tools/mlir/lib/Conversion/GPUToROCDL/CMakeFiles/obj.MLIRGPUToROCDLTransforms.dir/LowerGpuOpsToROCDLOps.cpp.o
[7657/7780] Building CXX object tools/mlir/lib/Dialect/SparseTensor/Pipelines/CMakeFiles/obj.MLIRSparseTensorPipelines.dir/SparseTensorPipelines.cpp.o
[7658/7780] Building CXX object tools/mlir/lib/Conversion/GPUCommon/CMakeFiles/obj.MLIRGPUToGPURuntimeTransforms.dir/GPUToLLVMConversion.cpp.o
[7659/7780] Building CXX object tools/mlir/lib/Dialect/GPU/TransformOps/CMakeFiles/obj.MLIRGPUTransformOps.dir/GPUTransformOps.cpp.o
[7660/7780] Building CXX object tools/mlir/lib/Conversion/GPUToNVVM/CMakeFiles/obj.MLIRGPUToNVVMTransforms.dir/LowerGpuOpsToNVVMOps.cpp.o
[7661/7780] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7662/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7663/7780] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7664/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7665/7780] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.
['ninja'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 50, in step
Step 7 (build cmake config) failure: build cmake config (failure)
...
[7645/7780] Linking CXX shared library lib/libclang-cpp.so.21.0git
[7646/7780] Creating library symlink lib/libclang-cpp.so
[7647/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestSPIRVCPURunnerPipeline.cpp.o
[7648/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o
[7649/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o
[7650/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o
[7651/7780] Building CXX object tools/mlir/test/lib/Dialect/GPU/CMakeFiles/MLIRGPUTestPasses.dir/TestGpuRewrite.cpp.o
[7652/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o
[7653/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestVulkanRunnerPipeline.cpp.o
[7654/7780] Linking CXX shared library lib/libMLIRGPUTransforms.so.21.0git
FAILED: lib/libMLIRGPUTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRGPUTransforms.so.21.0git -o lib/libMLIRGPUTransforms.so.21.0git tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AllReduceLowering.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AsyncRegionRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/BufferDeallocationOpInterfaceImpl.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/GlobalIdRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/MemoryPromotion.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ParallelLoopMapper.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ShuffleRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SPIRVAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib:"  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRAsyncDialect.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRNVVMTarget.so.21.0git  lib/libMLIRSPIRVTarget.so.21.0git  lib/libMLIRROCDLTarget.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRNVVMToLLVMIRTranslation.so.21.0git  lib/libMLIRSPIRVSerialization.so.21.0git  lib/libMLIRSPIRVDialect.so.21.0git  lib/libMLIRSPIRVImageInterfaces.so.21.0git  lib/libMLIRSPIRVBinaryUtils.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRTargetLLVM.so.21.0git  lib/libMLIRExecutionEngineUtils.so.21.0git  lib/libMLIRROCDLToLLVMIRTranslation.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  lib/libMLIRROCDLDialect.so.21.0git  lib/libMLIRLLVMDialect.so.21.0git  lib/libLLVMAMDGPU
ibLLVMLinker.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMFrontendOffloading.so.21.0git  lib/libLLVMTarget.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAMDGPUDesc.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMMCParser.so.21.0git  lib/libLLVMIRReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMAMDGPUInfo.so.21.0git  lib/libLLVMMC.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  -lpthread  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib && :
tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o: In function `(anonymous namespace)::ScalarSubgroupReduceToDPP::matchAndRewrite(mlir::gpu::SubgroupReduceOp, mlir::PatternRewriter&) const':
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x2f0): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x33f): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x376): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x418): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x46c): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x4a3): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x533): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x58d): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x5c4): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x653): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6ad): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6e4): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xb0f): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xb62): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xb99): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xc2a): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xc7f): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xcb6): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
collect2: error: ld returned 1 exit status
[7655/7780] Building CXX object tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
[7656/7780] Building CXX object tools/mlir/lib/Conversion/GPUToROCDL/CMakeFiles/obj.MLIRGPUToROCDLTransforms.dir/LowerGpuOpsToROCDLOps.cpp.o
[7657/7780] Building CXX object tools/mlir/lib/Dialect/SparseTensor/Pipelines/CMakeFiles/obj.MLIRSparseTensorPipelines.dir/SparseTensorPipelines.cpp.o
[7658/7780] Building CXX object tools/mlir/lib/Conversion/GPUCommon/CMakeFiles/obj.MLIRGPUToGPURuntimeTransforms.dir/GPUToLLVMConversion.cpp.o
[7659/7780] Building CXX object tools/mlir/lib/Dialect/GPU/TransformOps/CMakeFiles/obj.MLIRGPUTransformOps.dir/GPUTransformOps.cpp.o
[7660/7780] Building CXX object tools/mlir/lib/Conversion/GPUToNVVM/CMakeFiles/obj.MLIRGPUToNVVMTransforms.dir/LowerGpuOpsToNVVMOps.cpp.o
[7661/7780] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7662/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7663/7780] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7664/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7665/7780] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.
['ninja'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 50, in step

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 24, 2025

LLVM Buildbot has detected a new failure on builder amdgpu-offload-ubuntu-22-cmake-build-only running on rocm-docker-ubu-22 while building mlir at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/203/builds/8516

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py --jobs=32' (failure)
...
[7646/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o
[7647/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestSPIRVCPURunnerPipeline.cpp.o
[7648/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o
[7649/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o
[7650/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o
[7651/7780] Building CXX object tools/mlir/test/lib/Dialect/GPU/CMakeFiles/MLIRGPUTestPasses.dir/TestGpuRewrite.cpp.o
[7652/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestVulkanRunnerPipeline.cpp.o
[7653/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o
[7654/7780] Building CXX object tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
[7655/7780] Linking CXX shared library lib/libMLIRGPUTransforms.so.21.0git
FAILED: lib/libMLIRGPUTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRGPUTransforms.so.21.0git -o lib/libMLIRGPUTransforms.so.21.0git tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AllReduceLowering.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AsyncRegionRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/BufferDeallocationOpInterfaceImpl.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/GlobalIdRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/MemoryPromotion.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ParallelLoopMapper.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ShuffleRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SPIRVAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:"  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRAsyncDialect.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRNVVMTarget.so.21.0git  lib/libMLIRSPIRVTarget.so.21.0git  lib/libMLIRROCDLTarget.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRNVVMToLLVMIRTranslation.so.21.0git  lib/libMLIRSPIRVSerialization.so.21.0git  lib/libMLIRSPIRVDialect.so.21.0git  lib/libMLIRSPIRVImageInterfaces.so.21.0git  lib/libMLIRSPIRVBinaryUtils.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRTargetLLVM.so.21.0git  lib/libMLIRExecutionEngineUtils.so.21.0git  lib/libMLIRROCDLToLLVMIRTranslation.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0g
LLVMAMDGPUCodeGen.so.21.0git  lib/libLLVMPasses.so.21.0git  lib/libLLVMipo.so.21.0git  lib/libLLVMLinker.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMFrontendOffloading.so.21.0git  lib/libLLVMTarget.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAMDGPUDesc.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMMCParser.so.21.0git  lib/libLLVMIRReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMAMDGPUInfo.so.21.0git  lib/libLLVMMC.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o: in function `(anonymous namespace)::ScalarSubgroupReduceToDPP::matchAndRewrite(mlir::gpu::SubgroupReduceOp, mlir::PatternRewriter&) const':
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x3a2): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x411): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x454): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x566): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x62c): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x69c): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6d3): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x767): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x7d7): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x80e): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xde3): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe50): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe87): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf0d): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf7a): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xfb1): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
collect2: error: ld returned 1 exit status
[7656/7780] Building CXX object tools/mlir/lib/Conversion/GPUToROCDL/CMakeFiles/obj.MLIRGPUToROCDLTransforms.dir/LowerGpuOpsToROCDLOps.cpp.o
[7657/7780] Building CXX object tools/mlir/lib/Dialect/SparseTensor/Pipelines/CMakeFiles/obj.MLIRSparseTensorPipelines.dir/SparseTensorPipelines.cpp.o
[7658/7780] Building CXX object tools/mlir/lib/Conversion/GPUCommon/CMakeFiles/obj.MLIRGPUToGPURuntimeTransforms.dir/GPUToLLVMConversion.cpp.o
[7659/7780] Building CXX object tools/mlir/lib/Dialect/GPU/TransformOps/CMakeFiles/obj.MLIRGPUTransformOps.dir/GPUTransformOps.cpp.o
[7660/7780] Building CXX object tools/mlir/lib/Conversion/GPUToNVVM/CMakeFiles/obj.MLIRGPUToNVVMTransforms.dir/LowerGpuOpsToNVVMOps.cpp.o
[7661/7780] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7662/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7663/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7664/7780] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7665/7780] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.
['ninja'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 50, in step
    yield
  File "/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 41, in main
    run_command(["ninja"])
Step 7 (build cmake config) failure: build cmake config (failure)
...
[7646/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o
[7647/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestSPIRVCPURunnerPipeline.cpp.o
[7648/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o
[7649/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o
[7650/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o
[7651/7780] Building CXX object tools/mlir/test/lib/Dialect/GPU/CMakeFiles/MLIRGPUTestPasses.dir/TestGpuRewrite.cpp.o
[7652/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestVulkanRunnerPipeline.cpp.o
[7653/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o
[7654/7780] Building CXX object tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
[7655/7780] Linking CXX shared library lib/libMLIRGPUTransforms.so.21.0git
FAILED: lib/libMLIRGPUTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRGPUTransforms.so.21.0git -o lib/libMLIRGPUTransforms.so.21.0git tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AllReduceLowering.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AsyncRegionRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/BufferDeallocationOpInterfaceImpl.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/GlobalIdRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/MemoryPromotion.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ParallelLoopMapper.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ShuffleRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SPIRVAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:"  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRAsyncDialect.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRNVVMTarget.so.21.0git  lib/libMLIRSPIRVTarget.so.21.0git  lib/libMLIRROCDLTarget.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRNVVMToLLVMIRTranslation.so.21.0git  lib/libMLIRSPIRVSerialization.so.21.0git  lib/libMLIRSPIRVDialect.so.21.0git  lib/libMLIRSPIRVImageInterfaces.so.21.0git  lib/libMLIRSPIRVBinaryUtils.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRTargetLLVM.so.21.0git  lib/libMLIRExecutionEngineUtils.so.21.0git  lib/libMLIRROCDLToLLVMIRTranslation.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0g
LLVMAMDGPUCodeGen.so.21.0git  lib/libLLVMPasses.so.21.0git  lib/libLLVMipo.so.21.0git  lib/libLLVMLinker.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMFrontendOffloading.so.21.0git  lib/libLLVMTarget.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAMDGPUDesc.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMMCParser.so.21.0git  lib/libLLVMIRReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMAMDGPUInfo.so.21.0git  lib/libLLVMMC.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o: in function `(anonymous namespace)::ScalarSubgroupReduceToDPP::matchAndRewrite(mlir::gpu::SubgroupReduceOp, mlir::PatternRewriter&) const':
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x3a2): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x411): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x454): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x566): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x62c): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x69c): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6d3): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x767): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x7d7): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x80e): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xde3): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe50): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe87): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf0d): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf7a): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xfb1): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
collect2: error: ld returned 1 exit status
[7656/7780] Building CXX object tools/mlir/lib/Conversion/GPUToROCDL/CMakeFiles/obj.MLIRGPUToROCDLTransforms.dir/LowerGpuOpsToROCDLOps.cpp.o
[7657/7780] Building CXX object tools/mlir/lib/Dialect/SparseTensor/Pipelines/CMakeFiles/obj.MLIRSparseTensorPipelines.dir/SparseTensorPipelines.cpp.o
[7658/7780] Building CXX object tools/mlir/lib/Conversion/GPUCommon/CMakeFiles/obj.MLIRGPUToGPURuntimeTransforms.dir/GPUToLLVMConversion.cpp.o
[7659/7780] Building CXX object tools/mlir/lib/Dialect/GPU/TransformOps/CMakeFiles/obj.MLIRGPUTransformOps.dir/GPUTransformOps.cpp.o
[7660/7780] Building CXX object tools/mlir/lib/Conversion/GPUToNVVM/CMakeFiles/obj.MLIRGPUToNVVMTransforms.dir/LowerGpuOpsToNVVMOps.cpp.o
[7661/7780] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7662/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7663/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7664/7780] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7665/7780] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.
['ninja'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 50, in step
    yield
  File "/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 41, in main
    run_command(["ninja"])

@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 24, 2025

LLVM Buildbot has detected a new failure on builder amdgpu-offload-rhel-9-cmake-build-only running on rocm-docker-rhel-9 while building mlir at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/205/builds/7307

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py --jobs=32' (failure)
...
[7644/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o
[7645/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o
[7646/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o
[7647/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestSPIRVCPURunnerPipeline.cpp.o
[7648/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o
[7649/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o
[7650/7780] Building CXX object tools/mlir/test/lib/Dialect/GPU/CMakeFiles/MLIRGPUTestPasses.dir/TestGpuRewrite.cpp.o
[7651/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o
[7652/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o
[7653/7780] Linking CXX shared library lib/libMLIRGPUTransforms.so.21.0git
FAILED: lib/libMLIRGPUTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRGPUTransforms.so.21.0git -o lib/libMLIRGPUTransforms.so.21.0git tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AllReduceLowering.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AsyncRegionRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/BufferDeallocationOpInterfaceImpl.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/GlobalIdRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/MemoryPromotion.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ParallelLoopMapper.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ShuffleRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SPIRVAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib:"  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRAsyncDialect.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRNVVMTarget.so.21.0git  lib/libMLIRSPIRVTarget.so.21.0git  lib/libMLIRROCDLTarget.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRNVVMToLLVMIRTranslation.so.21.0git  lib/libMLIRSPIRVSerialization.so.21.0git  lib/libMLIRSPIRVDialect.so.21.0git  lib/libMLIRSPIRVImageInterfaces.so.21.0git  lib/libMLIRSPIRVBinaryUtils.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRTargetLLVM.so.21.0git  lib/libMLIRExecutionEngineUtils.so.21.0git  lib/libMLIRROCDLToLLVMIRTranslation.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  li
DGPUCodeGen.so.21.0git  lib/libLLVMPasses.so.21.0git  lib/libLLVMipo.so.21.0git  lib/libLLVMLinker.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMFrontendOffloading.so.21.0git  lib/libLLVMTarget.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAMDGPUDesc.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMMCParser.so.21.0git  lib/libLLVMIRReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMAMDGPUInfo.so.21.0git  lib/libLLVMMC.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o: in function `(anonymous namespace)::ScalarSubgroupReduceToDPP::matchAndRewrite(mlir::gpu::SubgroupReduceOp, mlir::PatternRewriter&) const':
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x3e1): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x452): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x489): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x59c): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x660): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6d7): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x70e): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x7a1): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x818): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x84f): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xdac): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe19): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe50): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xedf): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf4a): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf81): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
collect2: error: ld returned 1 exit status
[7654/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestVulkanRunnerPipeline.cpp.o
[7655/7780] Building CXX object tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
[7656/7780] Building CXX object tools/mlir/lib/Conversion/GPUToROCDL/CMakeFiles/obj.MLIRGPUToROCDLTransforms.dir/LowerGpuOpsToROCDLOps.cpp.o
[7657/7780] Building CXX object tools/mlir/lib/Dialect/SparseTensor/Pipelines/CMakeFiles/obj.MLIRSparseTensorPipelines.dir/SparseTensorPipelines.cpp.o
[7658/7780] Building CXX object tools/mlir/lib/Conversion/GPUCommon/CMakeFiles/obj.MLIRGPUToGPURuntimeTransforms.dir/GPUToLLVMConversion.cpp.o
[7659/7780] Building CXX object tools/mlir/lib/Dialect/GPU/TransformOps/CMakeFiles/obj.MLIRGPUTransformOps.dir/GPUTransformOps.cpp.o
[7660/7780] Building CXX object tools/mlir/lib/Conversion/GPUToNVVM/CMakeFiles/obj.MLIRGPUToNVVMTransforms.dir/LowerGpuOpsToNVVMOps.cpp.o
[7661/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7662/7780] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7663/7780] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7664/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7665/7780] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.
['ninja'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 50, in step
    yield
Step 7 (build cmake config) failure: build cmake config (failure)
...
[7644/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o
[7645/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o
[7646/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o
[7647/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestSPIRVCPURunnerPipeline.cpp.o
[7648/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o
[7649/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o
[7650/7780] Building CXX object tools/mlir/test/lib/Dialect/GPU/CMakeFiles/MLIRGPUTestPasses.dir/TestGpuRewrite.cpp.o
[7651/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o
[7652/7780] Building CXX object tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o
[7653/7780] Linking CXX shared library lib/libMLIRGPUTransforms.so.21.0git
FAILED: lib/libMLIRGPUTransforms.so.21.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRGPUTransforms.so.21.0git -o lib/libMLIRGPUTransforms.so.21.0git tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AllReduceLowering.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/AsyncRegionRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/BufferDeallocationOpInterfaceImpl.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/DecomposeMemRefs.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/EliminateBarriers.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/GlobalIdRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/KernelOutlining.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/MemoryPromotion.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ModuleToBinary.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/NVVMAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ParallelLoopMapper.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ROCDLAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/ShuffleRewriter.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SPIRVAttachTarget.cpp.o tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib:"  lib/libMLIRAffineUtils.so.21.0git  lib/libMLIRAsyncDialect.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRGPUUtils.so.21.0git  lib/libMLIRIndexDialect.so.21.0git  lib/libMLIRNVVMTarget.so.21.0git  lib/libMLIRSPIRVTarget.so.21.0git  lib/libMLIRROCDLTarget.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRAffineAnalysis.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRNVVMToLLVMIRTranslation.so.21.0git  lib/libMLIRSPIRVSerialization.so.21.0git  lib/libMLIRSPIRVDialect.so.21.0git  lib/libMLIRSPIRVImageInterfaces.so.21.0git  lib/libMLIRSPIRVBinaryUtils.so.21.0git  lib/libMLIRGPUDialect.so.21.0git  lib/libMLIRTargetLLVM.so.21.0git  lib/libMLIRExecutionEngineUtils.so.21.0git  lib/libMLIRROCDLToLLVMIRTranslation.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  li
DGPUCodeGen.so.21.0git  lib/libLLVMPasses.so.21.0git  lib/libLLVMipo.so.21.0git  lib/libLLVMLinker.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMFrontendOffloading.so.21.0git  lib/libLLVMTarget.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAMDGPUDesc.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMMCParser.so.21.0git  lib/libLLVMIRReader.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMAMDGPUInfo.so.21.0git  lib/libLLVMMC.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/GPU/CMakeFiles/obj.MLIRGPUTransforms.dir/Transforms/SubgroupReduceLowering.cpp.o: in function `(anonymous namespace)::ScalarSubgroupReduceToDPP::matchAndRewrite(mlir::gpu::SubgroupReduceOp, mlir::PatternRewriter&) const':
SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x3e1): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x452): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x489): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x59c): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x660): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x6d7): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x70e): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x7a1): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x818): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0x84f): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xdac): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe19): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xe50): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xedf): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf4a): undefined reference to `mlir::amdgpu::DPPOp::build(mlir::OpBuilder&, mlir::OperationState&, mlir::Type, mlir::Value, mlir::Value, mlir::amdgpu::DPPPerm, mlir::Attribute, unsigned int, unsigned int, bool)'
/usr/bin/ld: SubgroupReduceLowering.cpp:(.text._ZNK12_GLOBAL__N_125ScalarSubgroupReduceToDPP15matchAndRewriteEN4mlir3gpu16SubgroupReduceOpERNS1_15PatternRewriterE+0xf81): undefined reference to `mlir::detail::TypeIDResolver<mlir::amdgpu::DPPOp, void>::id'
collect2: error: ld returned 1 exit status
[7654/7780] Building CXX object tools/mlir/test/lib/Pass/CMakeFiles/MLIRTestPass.dir/TestVulkanRunnerPipeline.cpp.o
[7655/7780] Building CXX object tools/mlir/lib/Dialect/GPU/Pipelines/CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
[7656/7780] Building CXX object tools/mlir/lib/Conversion/GPUToROCDL/CMakeFiles/obj.MLIRGPUToROCDLTransforms.dir/LowerGpuOpsToROCDLOps.cpp.o
[7657/7780] Building CXX object tools/mlir/lib/Dialect/SparseTensor/Pipelines/CMakeFiles/obj.MLIRSparseTensorPipelines.dir/SparseTensorPipelines.cpp.o
[7658/7780] Building CXX object tools/mlir/lib/Conversion/GPUCommon/CMakeFiles/obj.MLIRGPUToGPURuntimeTransforms.dir/GPUToLLVMConversion.cpp.o
[7659/7780] Building CXX object tools/mlir/lib/Dialect/GPU/TransformOps/CMakeFiles/obj.MLIRGPUTransformOps.dir/GPUTransformOps.cpp.o
[7660/7780] Building CXX object tools/mlir/lib/Conversion/GPUToNVVM/CMakeFiles/obj.MLIRGPUToNVVMTransforms.dir/LowerGpuOpsToNVVMOps.cpp.o
[7661/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7662/7780] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7663/7780] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7664/7780] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7665/7780] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.
['ninja'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py", line 50, in step
    yield

@Kewen12
Copy link
Contributor

Kewen12 commented Apr 24, 2025

Hi, this PR breaks our buildbots. Could you please fix it?

FAILED: lib/libMLIRGPUTransforms.so.21.0git

@Muzammiluddin-Syed-ECE
Copy link
Contributor Author

Addressed in #137107

IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…MD GPUs (llvm#133204)

When performing cross-lane reductions using subgroup_reduce ops across
contiguous lanes on AMD GPUs, lower to Data Parallel Primitives (DPP)
ops when possible. This reduces latency on applicable devices.
See related [Issue](iree-org/iree#20007)
To do:
- Improve lowering to subgroup_reduce in compatible matvecs (these get
directly lowered to gpu.shuffles in an earlier pass)

---------

Signed-off-by: Muzammiluddin Syed <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants