Skip to content

[InstCombine] Propagate invariant.load metadata across unpacked loads #152186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

VedantParanjape
Copy link
Contributor

For loads that operate on aggregate type, instcombine unpacks the loads. It does not preserve the invariant.load metadata. This patch fixes that, it looks for the metadata in the parent load and attaches the metadata to the unpacked loads.

%struct.double2 = type { double, double }
%struct.double1 = type { double }

define %struct.double2 @func1(ptr %a) {
  %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
  ret %struct.double2 %1
}

!1 = !{}

Reproducer: https://godbolt.org/z/hcY8MMvYh

For loads that operate on aggregate type, instcombine unpacks the
loads. It does not preserve the invariant.load metadata. This patch
fixes that, it looks for the metadata in the parent load and attaches
the metadata to the unpacked loads.
@VedantParanjape VedantParanjape requested a review from nikic as a code owner August 5, 2025 18:11
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Aug 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Vedant Paranjape (VedantParanjape)

Changes

For loads that operate on aggregate type, instcombine unpacks the loads. It does not preserve the invariant.load metadata. This patch fixes that, it looks for the metadata in the parent load and attaches the metadata to the unpacked loads.

%struct.double2 = type { double, double }
%struct.double1 = type { double }

define %struct.double2 @<!-- -->func1(ptr %a) {
  %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
  ret %struct.double2 %1
}

!1 = !{}

Reproducer: https://godbolt.org/z/hcY8MMvYh


Full diff: https://github.com/llvm/llvm-project/pull/152186.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (+10)
  • (added) llvm/test/Transforms/InstCombine/invariant-metadata-propagation.ll (+46)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index 0be1034b046b6..68252b06e3a9a 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -718,6 +718,14 @@ static Instruction *combineLoadToOperationType(InstCombinerImpl &IC,
   return nullptr;
 }
 
+// Check if the aggregate load has a invariant.load metadata
+// If aggregate load has invariant.load metadata, add it to the
+// unpacked loads as well.
+static void copyInvariantLoadMetadata(LoadInst &LI, LoadInst *NewLoad) {
+  if (MDNode *MD = LI.getMetadata("invariant.load"))
+    NewLoad->setMetadata("invariant.load", MD);
+}
+
 static Instruction *unpackLoadToAggregate(InstCombinerImpl &IC, LoadInst &LI) {
   // FIXME: We could probably with some care handle both volatile and atomic
   // stores here but it isn't clear that this is important.
@@ -737,6 +745,7 @@ static Instruction *unpackLoadToAggregate(InstCombinerImpl &IC, LoadInst &LI) {
       LoadInst *NewLoad = IC.combineLoadToNewType(LI, ST->getTypeAtIndex(0U),
                                                   ".unpack");
       NewLoad->setAAMetadata(LI.getAAMetadata());
+      copyInvariantLoadMetadata(LI, NewLoad);
       return IC.replaceInstUsesWith(LI, IC.Builder.CreateInsertValue(
         PoisonValue::get(T), NewLoad, 0, Name));
     }
@@ -764,6 +773,7 @@ static Instruction *unpackLoadToAggregate(InstCombinerImpl &IC, LoadInst &LI) {
           Name + ".unpack");
       // Propagate AA metadata. It'll still be valid on the narrowed load.
       L->setAAMetadata(LI.getAAMetadata());
+      copyInvariantLoadMetadata(LI, L);
       V = IC.Builder.CreateInsertValue(V, L, i);
     }
 
diff --git a/llvm/test/Transforms/InstCombine/invariant-metadata-propagation.ll b/llvm/test/Transforms/InstCombine/invariant-metadata-propagation.ll
new file mode 100644
index 0000000000000..acc5e7ca8d2b4
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/invariant-metadata-propagation.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S < %s -passes=instcombine | FileCheck %s
+
+%struct.double2 = type { double, double }
+%struct.double1 = type { double }
+
+define %struct.double2 @func1(ptr addrspace(1) %a) {
+; CHECK-LABEL: define %struct.double2 @func1(
+; CHECK-SAME: ptr addrspace(1) [[A:%.*]]) {
+; CHECK-NEXT:    [[DOTUNPACK:%.*]] = load double, ptr addrspace(1) [[A]], align 16, !invariant.load [[META0:![0-9]+]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue [[STRUCT_DOUBLE2:%.*]] poison, double [[DOTUNPACK]], 0
+; CHECK-NEXT:    [[DOTELT1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(1) [[A]], i64 8
+; CHECK-NEXT:    [[DOTUNPACK2:%.*]] = load double, ptr addrspace(1) [[DOTELT1]], align 8, !invariant.load [[META0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue [[STRUCT_DOUBLE2]] [[TMP1]], double [[DOTUNPACK2]], 1
+; CHECK-NEXT:    ret [[STRUCT_DOUBLE2]] [[TMP2]]
+;
+  %1 = load %struct.double2, ptr addrspace(1) %a, align 16, !invariant.load !1
+  ret %struct.double2 %1
+}
+
+define %struct.double2 @func2(ptr %a) {
+; CHECK-LABEL: define %struct.double2 @func2(
+; CHECK-SAME: ptr [[A:%.*]]) {
+; CHECK-NEXT:    [[DOTUNPACK:%.*]] = load double, ptr [[A]], align 16, !invariant.load [[META0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue [[STRUCT_DOUBLE2:%.*]] poison, double [[DOTUNPACK]], 0
+; CHECK-NEXT:    [[DOTELT1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT:    [[DOTUNPACK2:%.*]] = load double, ptr [[DOTELT1]], align 8, !invariant.load [[META0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue [[STRUCT_DOUBLE2]] [[TMP1]], double [[DOTUNPACK2]], 1
+; CHECK-NEXT:    ret [[STRUCT_DOUBLE2]] [[TMP2]]
+;
+  %1 = load %struct.double2, ptr %a, align 16, !invariant.load !1
+  ret %struct.double2 %1
+}
+
+define %struct.double1 @func3(ptr %a) {
+; CHECK-LABEL: define %struct.double1 @func3(
+; CHECK-SAME: ptr [[A:%.*]]) {
+; CHECK-NEXT:    [[DOTUNPACK:%.*]] = load double, ptr [[A]], align 16, !invariant.load [[META0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue [[STRUCT_DOUBLE1:%.*]] poison, double [[DOTUNPACK]], 0
+; CHECK-NEXT:    ret [[STRUCT_DOUBLE1]] [[TMP1]]
+;
+  %1 = load %struct.double1, ptr %a, align 16, !invariant.load !1
+  ret %struct.double1 %1
+}
+
+!1 = !{}

@VedantParanjape VedantParanjape requested review from fhahn and arsenm August 5, 2025 18:18
@VedantParanjape VedantParanjape self-assigned this Aug 5, 2025
// If aggregate load has invariant.load metadata, add it to the
// unpacked loads as well.
static void copyInvariantLoadMetadata(LoadInst &LI, LoadInst *NewLoad) {
if (MDNode *MD = LI.getMetadata("invariant.load"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should access by MD_ enum instead of by name. Also isn't this what copyMetadataForLoad is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that function will correctly propagate metadata for unpacking loads. For example, it copies MD_tbaa as is. But the tbaa nodes would change for scalar loads vs aggregate loads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm now using enums instead of by name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is still that most of these metadata should propagate, this isn't unique to invariant.load

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use copyMetadataForLoad.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyMetadataForLoad sounds like only designed for load type casting. And some code that checks NewType->isPointerTy() implicitly assumes the OldType is also a pointer, which sounds incorrect for spliting loads.

    // Note, essentially every kind of metadata should be preserved here! This
    // routine is supposed to clone a load instruction changing *only its type*.
    // The only metadata it makes sense to drop is metadata which is invalidated
    // when the pointer type changes. This should essentially never be the case
    // in LLVM, but we explicitly switch over only known metadata to be
    // conservatively correct. If you are adding metadata to LLVM which pertains
    // to loads, you almost certainly want to add it here.

Shall we add a copyMetadataForSplitLoad?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyMetadataForLoad sounds like only designed for load type casting.

You are right. If we have to repeat it multiple times in the future, it is definitely better to have a helper function.

@VedantParanjape VedantParanjape removed the request for review from fhahn August 6, 2025 22:07
Copy link

github-actions bot commented Aug 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@VedantParanjape
Copy link
Contributor Author

gentle ping! This is part of my internship project and I will finish in a week. Could the reviewers please take a look at it, and let me know how to proceed with this patch? Thanks.

Comment on lines 775 to 777
AAMDNodes adjustedAANodes = LI.getAAMetadata().adjustForAccess(
SL->getElementOffset(i),
SL->getElementOffset(i).getKnownMinValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AAMDNodes adjustedAANodes = LI.getAAMetadata().adjustForAccess(
SL->getElementOffset(i),
SL->getElementOffset(i).getKnownMinValue());
TypeSize offset = SL->getElementOffset(i);
AAMDNodes adjustedAANodes = LI.getAAMetadata();
if (offset.isFixed()) {
adjustedAANodes = adjustedAANodes.adjustForAccess(
offset.getFixedValue(),
ST->getElementType(i), DL);
}
else {
adjustedAANodes.TBAA = nullptr;
adjustedAANodes.TBAAStruct = nullptr;
}

For structs with scalable vector elements, the TBAA information is not applicable. You can either zero out or assert here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file a separate patch for this change. I don't want to block this patch.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is a separate issue unrelated to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will upload this as a separate patch.

@@ -737,6 +745,7 @@ static Instruction *unpackLoadToAggregate(InstCombinerImpl &IC, LoadInst &LI) {
LoadInst *NewLoad = IC.combineLoadToNewType(LI, ST->getTypeAtIndex(0U),
".unpack");
NewLoad->setAAMetadata(LI.getAAMetadata());
copyInvariantLoadMetadata(LI, NewLoad);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
copyInvariantLoadMetadata(LI, NewLoad);
NewLI->copyMetadata(LI, LLVMContext::MD_invariant_load);

// Propagate AA metadata. It'll still be valid on the narrowed load.
L->setAAMetadata(LI.getAAMetadata());
L->setAAMetadata(adjustedAANodes);
copyInvariantLoadMetadata(LI, L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
copyInvariantLoadMetadata(LI, L);
NewLI->copyMetadata(LI, LLVMContext::MD_invariant_load);

Copy link
Contributor Author

@VedantParanjape VedantParanjape Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtcxzyw I have fixed these issues, could you please give a lgtm? I will create another PR for adjusting AA metadata.

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 13, 2025

Just out of curiosity, is the original IR generated from the CUDA intrinsic __ldg(const double2 *p)? If this is the case, is it profitable to unpack the aggregate load?

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 13, 2025

Just out of curiosity, is the original IR generated from the CUDA intrinsic __ldg(const double2 *p)? If this is the case, is it profitable to unpack the aggregate load?

Interesting, the result with your patch looks the best :)
https://godbolt.org/z/fTanbbqq1

@fiigii
Copy link

fiigii commented Aug 13, 2025

is the original IR generated from the CUDA intrinsic __ldg(const double2 *p)?

Yes, other high-level compilers also can use invariant.load to express ld.global.nc with NVPTX.

If this is the case, is it profitable to unpack the aggregate load?

Yes, load store vectorization relies on this IC function to vectorize overaligned aggregate loads, which is one of the most important optimizations for CUDA

@VedantParanjape
Copy link
Contributor Author

VedantParanjape commented Aug 13, 2025

This change seems to crash the following testcase. Seems totally unrelated to my change, and it is a llc testcase.

-- Testing: 60849 tests, 64 workers --
  Testing:  0.. 10.. 20.. 30..
  FAIL: LLVM :: CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll (23899 of 60849)
  ******************** TEST 'LLVM :: CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll' FAILED ********************
  Exit Code: 1

@dtcxzyw
Copy link
Member

dtcxzyw commented Aug 14, 2025

This change seems to crash the following testcase. Seems totally unrelated to my change, and it is a llc testcase.

-- Testing: 60849 tests, 64 workers --
  Testing:  0.. 10.. 20.. 30..
  FAIL: LLVM :: CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll (23899 of 60849)
  ******************** TEST 'LLVM :: CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll' FAILED ********************
  Exit Code: 1

Yeah it was also observed in another PR #152851 (comment)

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@VedantParanjape VedantParanjape removed the request for review from nikic August 14, 2025 02:27
@VedantParanjape
Copy link
Contributor Author

LGTM.

can I merge bypassing build requirements? or wait for the build issue to get resolved.

@VedantParanjape VedantParanjape enabled auto-merge (squash) August 14, 2025 02:50
@nikic
Copy link
Contributor

nikic commented Aug 14, 2025

LGTM.

can I merge bypassing build requirements? or wait for the build issue to get resolved.

Generally, feel free to merge as-is if there are only pre-existing test failures (like in this case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants