-
Notifications
You must be signed in to change notification settings - Fork 30
[AIEX] Add a combiners to optimize unaligned vector loads #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
d3c310b to
4838126
Compare
| const bool IsSExtExtract = (Opcode == SExtExtractOpcode); | ||
| const bool IsPlainExtract = (Opcode == TargetOpcode::G_EXTRACT_VECTOR_ELT); | ||
|
|
||
| if (!IsZExtExtract && !IsSExtExtract && !IsPlainExtract) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks as if the pattern's opcode check precludes this case?
|
|
||
| // Get the index operand | ||
| const Register IdxReg = MI.getOperand(2).getReg(); | ||
| auto IdxCst = getIConstantVRegValWithLookThrough(IdxReg, MRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const?
| if (MMO->getAlign().value() >= LoadVecSizeInBytes) | ||
| return false; | ||
|
|
||
| const unsigned ElemSizeInBytes = ElemSize / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare closer to its first use
| const Register PtrReg = LoadMI->getOperand(1).getReg(); | ||
| const LLT S20 = LLT::scalar(20); | ||
|
|
||
| // Calculate byte offset: Index * ElemSizeInBytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is redundant
| /// %new_ptr:_(p0) = G_PTR_ADD %ptr, %offset | ||
| /// %elt:_(sX) = G_LOAD %new_ptr :: (align 1) | ||
| /// %result:_(s32) = G_[Z/S]EXT %elt | ||
| bool llvm::matchUnalignedExtractLoad(MachineInstr &MI, MachineRegisterInfo &MRI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more descriptive name for MI? ExtractMI?
|
|
||
| // Set insertion point right after the original vector load | ||
| if (LoadMI->getNextNode()) | ||
| B.setInstr(*LoadMI->getNextNode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't std::next(iterator(LoadMI)) be always valid for setInsertPt() ?
| // alignment using GCD to find the maximum provable alignment | ||
| const unsigned OrigAlign = MMO->getAlign().value(); | ||
| const unsigned ScalarAlign = | ||
| ByteOffset == 0 ? OrigAlign : std::gcd(OrigAlign, (unsigned)ByteOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be written as std::gcd(OrigAlign, OrigAlign + ByteOffset);
| // Handle the result based on the original opcode | ||
| if (IsZExtExtract || IsSExtExtract) { | ||
| // Need to extend to s32 | ||
| const Register DstReg = MI.getOperand(0).getReg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be hoisted, it's necessary for all cases. Then we can do
if (IsZExtract) {
} else if (IsSExtract) {
} else {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second set of extracts can be removed actually.
4838126 to
e684035
Compare
|
Another combiner was included to improve scalarization of unaligned vector loads. |
In this case, we can load the scalar value directly instead of building a full vector (legalizer will scalarize this load anyway) to extract.
| unsigned NewElemSize = 0; | ||
| if (Alignment >= 8 && ElemSize < 64) { | ||
| NewElemSize = 64; | ||
| } else if (Alignment >= 4 && ElemSize < 32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will only arrive here with ElemSize 8 or 16, both smaller than 32 and 64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this will be simplified.
e684035 to
20e7808
Compare
|
|
||
| // Check if the vector size is compatible with the new element size | ||
| const unsigned VecSizeInBits = DstTy.getSizeInBits(); | ||
| if (VecSizeInBits % NewElemSize != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be discarding 64 while 32 or 16 might still work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not handling 64 anymore.
| const unsigned ElemSize = ElemTy.getSizeInBits(); | ||
|
|
||
| // Skip if the load is only used for extracts - let matchUnalignedExtractLoad | ||
| // handle it This prevents the two combiners from competing for the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing .
| return false; | ||
|
|
||
| // Skip if the load has a single user that is a G_STORE with the same | ||
| // alignment This case can be perfectly scalarized during legalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing .
| return false; | ||
|
|
||
| // Calculate new number of elements | ||
| const unsigned NewNumElems = VecSizeInBits / NewElemSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing that both these variable are in bits, but not both called InBits
| const unsigned NewNumElems = VecSizeInBits / NewElemSize; | ||
|
|
||
| // Capture the pointer register before creating the lambda | ||
| const Register PtrReg = LoadMI.getOperand(1).getReg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can writePtrReg = LoadMI.getOperand(1).getReg()in the capture list, giving it local scope
| MachineFunction &MF = B.getMF(); | ||
|
|
||
| // Create the new vector type with better-aligned elements | ||
| const LLT NewVecTy = LLT::fixed_vector(NewNumElems, NewElemSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture NewVecTy, dropping the two constructor parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved NewNumElems to the lambda.
…gnment In this case, we can improve the legalized code.
20e7808 to
ec55fce
Compare



In this case, we can load the scalar value directly instead of building a full vector (legalizer will scalarize this load anyway) to extract.
This combiner was motivated by the following real case (there are more pathological cases, related to <32 x s6> for example):
Peano's current final code for this case is:
With this PR: