core_arch::x86
: Fix the implementation of _kshift
instructions
#1930
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The
_kshiftri_mask32
,_kshiftri_mask64
,_kshiftli_mask32
and_kshiftli_mask64
intrinsics incore_arch::x86
module do not handle a critical edge case when the shift amount exceeds the bit-length of the input argument.EDIT: updated the 8-bit and 16-bit versions of the same.
Current behaviour
For
_kshiftri_mask32
and_kshiftli_mask32
intrinsics, when the shift amount (passed as a const-generic argument) exceeds 32, the shift amount applied to the argument becomesshift % 32
.Similar is the case with the 8-bit, 16-bit and 64-bit variants of the same.
Godbolt link for the 32-bit variant with minimal replication.
Godbolt link for the 64-bit variant with minimal replication.
Expected behaviour
When the shift amount exceeds 32 (for the 32-bit versions) or 64 (for the 64-bit versions) the result becomes zero, in line with Intel's documentation on the same.
Similar is the case with 8-bit and 16-bit versions.
Fix
Use
unbounded_shr()
andunbounded_shl()
functions instead of>>
and<<
operations.Godbolt link that shows the corrected version of the 32-bit implementation.
Godbolt link that shows the corrected version of the 64-bit implementation
r? @sayantn
cc: @folkertdev