-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Insert symbols for prefetch targets read from basic blocks section profile. #168439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🐧 Linux x64 Test Results
|
| // the block if `SubblockIndex` is the last call in the block). | ||
| struct SubblockID { | ||
| UniqueBBID BBID; | ||
| unsigned SubblockIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the callsite index within the block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had originally named this callsiteIndex, but it will be a misnomer if there are not calls in the block (where we will use the index 0). What do u suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can define two 'pseudo calls' one at the beginning of a block and one at the end, so the first real call always get id == 1.
Besides we may want to differentiate between the code point before and after the call, so perhaps we can use negative sign for it. For instance t 2,-1 means the target is at block 2 before the first call, while t 2,1 means the target is in block 2 after the first call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CallSiteIndex seems more readable to me. It is fine that when it is zero it means the start of the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefetch targets and instructions are only placed at the beginning of a subblock (either the beginning of the block , or after calls). So we don't need the negativity.
we can define two 'pseudo calls' one at the beginning of a block and one at the end, so the first real call always get id == 1.
No need for a pseudo call at the end, but with one at the beginning, we get subBlockIndex+1. I can change it back to int callsiteIndex and have callsiteIndex == -1 refer to the prefetch target being mapped to the beginning of the block. WDYT about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I went back and changed it to unsigned 1-based callsiteIndexes. I explain in comments that callsite index of zero means the start of the block.
| DenseMap<UniqueBBID, SmallVector<unsigned>> PrefetchTargetsByBBID; | ||
| for (const auto &Target : PrefetchTargets) | ||
| PrefetchTargetsByBBID[Target.BBID].push_back(Target.SubblockIndex); | ||
| for (auto &MBB : MF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to iterate the BBIDs and use MF->getBlockNumbered(ID) method to get MBB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't store a map from BBIDs to MBBs. So there is no way around once iterating over MBBs if that's what you mean (We can create the map here but it would still require an iteration).
| ;; | ||
| ;; Specify the bb sections profile: | ||
| ; RUN: echo 'v1' > %t | ||
| ; RUN: echo 'f _Z3foob' >> %t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format is different from what is described in RFC which uses , instead of @
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I changed it to use comma. Alternatively, we can use spaces which will make the parsing code simpler.
|
Is this usable for anyone outside of google? I'm not fluent in the propeller codebase, so possible I overlooked something, but I don't see any code dealing with instruction prefetching in the propeller code at https://github.com/google/llvm-propeller (main branch)... |
The feature was added just today to the main branch: google/llvm-propeller@f36e59d We have also discussed how this pass can be reused in other frameworks in the RFC; TLDR is we'll get there when we get there. Currently, we rely on BB_ADDR_MAP to do precise mapping to subblocks. |
b676430 to
b3202a1
Compare
85649e7 to
0c17e45
Compare
tmsri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially reviewed.
| // the block if `SubblockIndex` is the last call in the block). | ||
| struct SubblockID { | ||
| UniqueBBID BBID; | ||
| unsigned SubblockIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CallSiteIndex seems more readable to me. It is fine that when it is zero it means the start of the block.
| // the edge a -> b (a is not cloned). The index of the path in this vector | ||
| // determines the `UniqueBBID::CloneID` of the cloned blocks in that path. | ||
| SmallVector<SmallVector<unsigned>> ClonePaths; | ||
| // Code prefetch targets, specified by the subblock ID of which beginning must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest rephrasing this comment to something like: "Code Prefetch Target list, the beginning of each Subblock is the target".
| /// `i`th call (or the beginning of the block if `i==0`) to before the`i+1`th | ||
| /// callsite (or the end of the block). The prefetch target is always the | ||
| /// beginning of the subblock. | ||
| SmallVector<unsigned> PrefetchTargetSubblockIndexes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define CallSiteIndex or SubblockIndex as unsigned type and use that instead of unsigned?
| }; | ||
|
|
||
| for (auto &MI : MBB) { | ||
| EmitPrefetchTargetSymbolIfNeeded(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have EmitPrefetchTargetIt take a parameter which is the unsigned callsiteindex. I would remove the updates to the PrefetchTargets from the lambda and do it within the loop and replace this line with:
if (PrefetchTargetIt != PrefetchTargets.end() &&
*PrefetchTargetIt == LastCallSiteIndex) {
EmitPrefetchTargetSymbolIfNeeded(*PrefetchTargetIt);
++PrefetchTargetIt;
}| // version, in which case the prefetch targets should also be replaced. | ||
| OutStreamer->emitSymbolAttribute( | ||
| PrefetchTargetSymbol, | ||
| MF->getFunction().isWeakForLinker() ? MCSA_Weak : MCSA_Global); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to internal linkage? Can you make this symbol a global, that would conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We are relying on funique as mentioned in the RFC. Should we add a check here to ensure funique has been used (it's possible that funique is not applied on some functions).
This is the first PR to enable the prefetch optimization via Propeller based on our RFC. It enables emitting special symbols prefixed with
__llvm_prefetch_targetto point to the prefetch targets as specified via directives in the profile. A prefetch target is uniquely identified by its function name, basic block ID, and the subblock index (used when the target is after a call instruction).A new pass is added which sets a field in basic blocks which have prefetch targets. The next PR will add the prefetch insertion logic into the same pass.