`core_arch::x86` : Fix the implementation of `_kshift` instructions #1930

madhav-madhusoodanan · 2025-10-02T20:48:21Z

Summary

The _kshiftri_mask32, _kshiftri_mask64, _kshiftli_mask32 and _kshiftli_mask64 intrinsics in core_arch::x86 module do not handle a critical edge case when the shift amount exceeds the bit-length of the input argument.

EDIT: updated the 8-bit and 16-bit versions of the same.

Current behaviour

For _kshiftri_mask32 and _kshiftli_mask32 intrinsics, when the shift amount (passed as a const-generic argument) exceeds 32, the shift amount applied to the argument becomes shift % 32.

Similar is the case with the 8-bit, 16-bit and 64-bit variants of the same.

Godbolt link for the 32-bit variant with minimal replication.
Godbolt link for the 64-bit variant with minimal replication.

Expected behaviour

When the shift amount exceeds 32 (for the 32-bit versions) or 64 (for the 64-bit versions) the result becomes zero, in line with Intel's documentation on the same.

Similar is the case with 8-bit and 16-bit versions.

Fix

Use unbounded_shr() and unbounded_shl() functions instead of >> and << operations.

Godbolt link that shows the corrected version of the 32-bit implementation.
Godbolt link that shows the corrected version of the 64-bit implementation

r? @sayantn
cc: @folkertdev

sayantn · 2025-10-02T20:50:50Z

There are 8- and 16-bit versions too in avx512dq, check them too pls

_kshiftli_mask32 and _kshiftli_mask64 to zero out when the amount of shift exceeds the bit length of the input argument.

zero out when the amount of shift exceeds the bit length of the input argument.

to zero out when the amount of shift exceeds 16.

madhav-madhusoodanan · 2025-10-02T21:20:47Z

Done @sayantn

sayantn · 2025-10-02T22:42:50Z

Thanks, I have opened #1931 to fix the other shift-related bugs

rustbot assigned sayantn Oct 2, 2025

fix: update the implementation of _kshiftri_mask32, _kshiftri_mask64,

6e263ec

_kshiftli_mask32 and _kshiftli_mask64 to zero out when the amount of shift exceeds the bit length of the input argument.

madhav-madhusoodanan force-pushed the x86_fix_kshift_instructions branch from 4be142a to 6e263ec Compare October 2, 2025 20:51

madhav-madhusoodanan added 2 commits October 3, 2025 02:27

fix: update the implementation of _kshiftri_mask8 and _kshiftli_mask8 to

0697a43

zero out when the amount of shift exceeds the bit length of the input argument.

fix: update the implementation of _kshiftri_mask16 and _kshiftli_mask16

29027b6

to zero out when the amount of shift exceeds 16.

sayantn added this pull request to the merge queue Oct 2, 2025

Merged via the queue into rust-lang:master with commit 03ad8a7 Oct 2, 2025
63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`core_arch::x86` : Fix the implementation of `_kshift` instructions #1930

`core_arch::x86` : Fix the implementation of `_kshift` instructions #1930

Uh oh!

madhav-madhusoodanan commented Oct 2, 2025 •

edited

Loading

Uh oh!

sayantn commented Oct 2, 2025

Uh oh!

madhav-madhusoodanan commented Oct 2, 2025

Uh oh!

sayantn commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

core_arch::x86 : Fix the implementation of _kshift instructions #1930

core_arch::x86 : Fix the implementation of _kshift instructions #1930

Uh oh!

Conversation

madhav-madhusoodanan commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Current behaviour

Expected behaviour

Fix

Uh oh!

sayantn commented Oct 2, 2025

Uh oh!

madhav-madhusoodanan commented Oct 2, 2025

Uh oh!

sayantn commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

`core_arch::x86` : Fix the implementation of `_kshift` instructions #1930

`core_arch::x86` : Fix the implementation of `_kshift` instructions #1930

madhav-madhusoodanan commented Oct 2, 2025 •

edited

Loading