Skip to content

Conversation

@jroelofs
Copy link
Contributor

Transposes are only inverses of each other when they have matching shapes.

rdar://145592582

Transposes are only inverses of each other when they have matching shapes.

rdar://145592582
@llvmbot
Copy link
Member

llvmbot commented Apr 11, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Jon Roelofs (jroelofs)

Changes

Transposes are only inverses of each other when they have matching shapes.

rdar://145592582


Full diff: https://github.com/llvm/llvm-project/pull/135397.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp (+3-2)
  • (added) llvm/test/Transforms/LowerMatrixIntrinsics/transpose-fold.ll (+35)
diff --git a/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp b/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
index ab16ec77be105..60619dbe2f589 100644
--- a/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
+++ b/llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
@@ -804,9 +804,10 @@ class LowerMatrixIntrinsics {
                        m_Value(TA), m_ConstantInt(R), m_ConstantInt(C))))
       return nullptr;
 
-    // Transpose of a transpose is a nop
+    // Transpose of a transpose is a nop when the shapes match.
     Value *TATA;
-    if (match(TA, m_Intrinsic<Intrinsic::matrix_transpose>(m_Value(TATA)))) {
+    if (match(TA, m_Intrinsic<Intrinsic::matrix_transpose>(
+                      m_Value(TATA), m_Specific(C), m_Specific(R)))) {
       updateShapeAndReplaceAllUsesWith(I, TATA);
       eraseFromParentAndMove(&I, II, BB);
       eraseFromParentAndMove(TA, II, BB);
diff --git a/llvm/test/Transforms/LowerMatrixIntrinsics/transpose-fold.ll b/llvm/test/Transforms/LowerMatrixIntrinsics/transpose-fold.ll
new file mode 100644
index 0000000000000..ef2e224f69c5e
--- /dev/null
+++ b/llvm/test/Transforms/LowerMatrixIntrinsics/transpose-fold.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes='lower-matrix-intrinsics' -S < %s | FileCheck %s
+
+; We can only fold matching transposes.
+
+define void @reshape(ptr %in, ptr %out) {
+; CHECK-LABEL: define void @reshape(
+; CHECK-SAME: ptr [[IN:%.*]], ptr [[OUT:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[COL_LOAD:%.*]] = load <4 x double>, ptr [[IN]], align 8
+; CHECK-NEXT:    [[SPLIT:%.*]] = shufflevector <4 x double> [[COL_LOAD]], <4 x double> poison, <2 x i32> <i32 0, i32 1>
+; CHECK-NEXT:    [[SPLIT1:%.*]] = shufflevector <4 x double> [[COL_LOAD]], <4 x double> poison, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[TMP0:%.*]] = extractelement <2 x double> [[SPLIT]], i64 0
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x double> poison, double [[TMP0]], i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <2 x double> [[SPLIT1]], i64 0
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <2 x double> [[TMP1]], double [[TMP2]], i64 1
+; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x double> [[SPLIT]], i64 1
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <2 x double> poison, double [[TMP4]], i64 0
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x double> [[SPLIT1]], i64 1
+; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <2 x double> [[TMP5]], double [[TMP6]], i64 1
+; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[OUT]], align 8
+; CHECK-NEXT:    [[VEC_GEP:%.*]] = getelementptr double, ptr [[OUT]], i64 2
+; CHECK-NEXT:    store <2 x double> [[TMP7]], ptr [[VEC_GEP]], align 8
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = load <4 x double>, ptr %in, align 8
+  %1 = tail call <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> %0, i32 4, i32 1)
+  %2 = tail call <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> %1, i32 1, i32 4)
+  %3 = tail call <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> %2, i32 2, i32 2)
+  %4 = tail call <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> %3, i32 2, i32 2)
+  %5 = tail call <4 x double> @llvm.matrix.transpose.v4f64(<4 x double> %4, i32 2, i32 2)
+  store <4 x double> %5, ptr %out, align 8
+  ret void
+}

@jroelofs jroelofs requested a review from fpetrogalli April 11, 2025 19:00
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@jroelofs jroelofs merged commit e838b8b into llvm:main Apr 12, 2025
13 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 12, 2025

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-win running on avx512-intel64-win while building llvm at step 4 "cmake stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/81/builds/6246

Here is the relevant piece of the build log for the reference
Step 4 (cmake stage 1) failure: 'cmake -G ...' (failure)
'cmake' is not recognized as an internal or external command,
operable program or batch file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants