-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[X86][CodeGen] - Use shift operators for const value shifts, instead of built-ins for SSE emulation of MMX intrinsics. #129197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
a824ded
5ebc6b4
34d5244
3c961a8
74ae03e
d405230
96f7c43
317dd6f
06f09f4
cbe7d0b
d48773a
9e37e86
26ee8a9
b9d2cf8
7a77677
9970720
a71759d
fa04409
ad86002
e6a6646
f303f37
00faf91
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -880,11 +880,11 @@ _mm_sll_si64(__m64 __m, __m64 __count) | |
| /// A 32-bit integer value. | ||
| /// \returns A 64-bit integer vector containing the left-shifted value. If | ||
| /// \a __count is greater or equal to 64, the result is set to 0. | ||
| static __inline__ __m64 __DEFAULT_FN_ATTRS_SSE2 | ||
| _mm_slli_si64(__m64 __m, int __count) | ||
| { | ||
| return __trunc64(__builtin_ia32_psllqi128((__v2di)__anyext128(__m), | ||
| __count)); | ||
| static __inline__ __m64 __DEFAULT_FN_ATTRS_SSE2 _mm_slli_si64(__m64 __m, | ||
| int __count) { | ||
| if (__builtin_constant_p(__count)) | ||
| return (__m64)((__count > 63) ? 0 : ((long long)__m << __count)); | ||
| return __trunc64(__builtin_ia32_psllqi128((__v2di)__anyext128(__m), __count)); | ||
|
Comment on lines
+885
to
+887
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just do this unconditionally? The fold can always be done in the backend
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The non-constant shifts become worse https://godbolt.org/z/9xv5hxjv7 I think to accommodate both the conditions at DAG level might not be as trivial.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -1 on the whole concept of doing this in the header
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arsenm do we have options? Because solving it in CG is too late as middle end could have optimized shifts. |
||
| } | ||
|
|
||
| /// Right-shifts each 16-bit integer element of the first parameter, | ||
|
|
@@ -1115,11 +1115,11 @@ _mm_srl_si64(__m64 __m, __m64 __count) | |
| /// \param __count | ||
| /// A 32-bit integer value. | ||
| /// \returns A 64-bit integer vector containing the right-shifted value. | ||
| static __inline__ __m64 __DEFAULT_FN_ATTRS_SSE2 | ||
| _mm_srli_si64(__m64 __m, int __count) | ||
| { | ||
| return __trunc64(__builtin_ia32_psrlqi128((__v2di)__anyext128(__m), | ||
| __count)); | ||
| static __inline__ __m64 __DEFAULT_FN_ATTRS_SSE2 _mm_srli_si64(__m64 __m, | ||
| int __count) { | ||
| if (__builtin_constant_p(__count)) | ||
| return (__m64)((__count > 63) ? 0 : ((long long)__m >> __count)); | ||
| return __trunc64(__builtin_ia32_psrlqi128((__v2di)__anyext128(__m), __count)); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to notice that we change the behavior for negative immediates. Before this change we returned a zero, now we return it as is because shift on negative number is UB. Intrinsic description doesn't specify what should happen in case of negative |
||
| } | ||
|
|
||
| /// Performs a bitwise AND of two 64-bit integer vectors. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should limit it to 64-bit only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean also make similar change for _mm_slli_si32? if yes then we may not have an example at hand where that is needed. I will add as needed/requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean
long longis less efficient on 32-bit: https://godbolt.org/z/4KeY4hrhq