This repository was archived by the owner on May 7, 2024. It is now read-only.
[bitmanip][WiP] [RFC] Add automatic generation of pack*#267
Draft
rdolbeau wants to merge 1 commit intoriscvarchive:riscv-gcc-10.2.0-rvbfrom
Draft
[bitmanip][WiP] [RFC] Add automatic generation of pack*#267rdolbeau wants to merge 1 commit intoriscvarchive:riscv-gcc-10.2.0-rvbfrom
rdolbeau wants to merge 1 commit intoriscvarchive:riscv-gcc-10.2.0-rvbfrom
Conversation
015ed51 to
2e39046
Compare
2e39046 to
5370038
Compare
This adds automatic generation of pack* instructions (pack, packu, packh) beyond zero-extension. This is implemented via a custom pass that a) reorganize chains of '[ix]or' to exhibit regular patterns; b) matches common pattern of pack/packu/packh and replace them by the appropriate instruction
5370038 to
0c74449
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds automatic generation of pack* instructions (pack, packu, packh) beyond zero-extension.
This is implemented via a custom pass that
a) reorganize chains of '[ix]or' to exhibit regular patterns;
b) matches common pattern of pack/packu/packh and replace them by the appropriate instruction
(as the code is very different, this PR draft replaces #262)
Not sure if it's completely suitable, but it should help quantify how useful those instructions are on a given code.
For instance, in openssl 1.1.1k on
rv32gcbk_zbr_zbt, the compiler reorganize 470 sequences of three[ix]orand produces 1125packhand 682pack(plus some more that are matched directly). Many (most?) of them seems to be related to byte-by-byte loading of 32 bits words (lbu/lbu/lbu/lbu/packh/packh/pack). If the addresses were provably aligned they could be replaced by the much more efficientlw/grevi 0x18, but it's difficult to achieve in the back-end - the source code should probably be changed to use an explicit load-word/byte-reversal when using a 32-bits load is legal. For comparison, the resulting objects (all.oin thedirectory collectively contain (the source was patched to implement an AES with scalarK, hence theaes32*):[edit] Adding a source file showing some of the patterns that match (or don't in some cases)
pack-pattern.txt