Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21198,7 +21198,11 @@ bool SLPVectorizerPass::vectorizeStores(
}
}

// MaxRegVF represents the number of instructions (scalar, or vector in
// case of revec) that can be vectorized to naturally fit in a vector
// register.
unsigned MaxRegVF = MaxVF;

MaxVF = std::min<unsigned>(MaxVF, bit_floor(Operands.size()));
if (MaxVF < MinVF) {
LLVM_DEBUG(dbgs() << "SLP: Vectorization infeasible as MaxVF (" << MaxVF
Expand All @@ -21207,13 +21211,11 @@ bool SLPVectorizerPass::vectorizeStores(
continue;
}

unsigned Sz = 1 + Log2_32(MaxVF) - Log2_32(MinVF);
SmallVector<unsigned> CandidateVFs(Sz + (NonPowerOf2VF > 0 ? 1 : 0));
unsigned Size = MinVF;
for (unsigned &VF : reverse(CandidateVFs)) {
VF = Size > MaxVF ? NonPowerOf2VF : Size;
Size *= 2;
}
SmallVector<unsigned> CandidateVFs;
for (unsigned VF = std::max(MaxVF, NonPowerOf2VF); VF >= MinVF;
VF = divideCeil(VF, 2))
CandidateVFs.push_back(VF);

Comment on lines +21214 to +21218
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to keep the preallocated version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for making this change is because I found it really hard to understand:

  • What the min-max range of resulting VFs are.
  • Whether the list of VF goes from high to low, or low to high.

The parts that I found specifically difficult to follow are:

  • Calculation for the size of the SmallVector. This requires adding and subtracting logarithm of min/max VF, and a conditional + 1. I'm also not sure what the value is of passing a size to the constructor. By default this list will be small and SmallVector already allocates a minimum size which is likely to be big enough in practice.
  • The reverse when iterating the values in CandidateVFs to initialise the VF seemed unnecessary.

I actually had to print the data in CandidateVFs to confirm the code did what I suspected it did :)

In the current code I think it is easier to see what the minimum and maximum values are (the loop starts either at MaxVF or NonPowerOf2VF, and ends at MinVF), and to understand the increment (it goes from high -> low, dividing by 2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer the push_back based version. Computing the size of the vector beforehand adds a lot of unnecessary complexity, whereas we can just push the candidate VFs one by one and not worry about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for making this change is because I found it really hard to understand:

  • What the min-max range of resulting VFs are.
  • Whether the list of VF goes from high to low, or low to high.

The code is pretty small, so I don't see much difference

The parts that I found specifically difficult to follow are:

  • Calculation for the size of the SmallVector. This requires adding and subtracting logarithm of min/max VF, and a conditional + 1. I'm also not sure what the value is of passing a size to the constructor. By default this list will be small and SmallVector already allocates a minimum size which is likely to be big enough in practice.

It highly depends on the target. True for something like X86 to ARM NEON, false for others

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It highly depends on the target. True for something like X86 to ARM NEON, false for others

I see how for very wide vectors it may need to allocate a wider buffer. But I doubt that has any meaningful impact on compile-time in practice, given that this code is not on a critical path. Unless that assumption is false, I would favour readability over performance.

The code is pretty small, so I don't see much difference

All I can say is that @gbossu and myself tripped over this code recently and we find this change an improvement in the readability of this function. I hope that you're open to accepting this change, to make the code easier for us and others to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you think it makes it easier to understand, though for me the original version is much easier to read :)

unsigned End = Operands.size();
unsigned Repeat = 0;
constexpr unsigned MaxAttempts = 4;
Expand Down
Loading