Skip to content

[SSE2] poor performance with emulated _mm_extract_epi8 #28111

@llvmbot

Description

@llvmbot
Bugzilla Link 27737
Version 3.8
OS All
Reporter LLVM Bugzilla Contributor
CC @d0k,@RKSimon,@rotateright

Extended Description

I have some code that heavy uses _mm_extract_epi8().
Then I build with clang 3.8 and -msse2 (without -msse4.1) then program work very slow.

To build with GCC and clang 3.4, 3.6, 3.7 I use macro:

#ifndef _mm_extract_epi8 /* SSE4.1 required. */
#define _mm_extract_epi8(__xmm, __n)					\
    ((_mm_extract_epi16(__xmm, ((__n) >> 1)) >> (8 * ((__n) & 1))) & 0xff)
#endif

Test results:

AMD Athlon(tm) 5350 APU with Radeon(tm) R3      (2050.04-MHz K8-class CPU)
GCC:		20391006000 (SSE4.1) /  20116413000 (SSE2)
clang 3.8:	22329895000 (SSE4.1) / 117304135000 (SSE2) !!!
clang 3.7:	22367008000 (SSE4.1) /  25542571000 (SSE2)
clang 3.6:	22306648000 (SSE4.1) /  25914115000 (SSE2)
clang 3.4:	23684031000 (SSE4.1) /  25914115000 (SSE2)

Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (2999.72-MHz K8-class CPU)
GCC:		12031595000 (SSE4.1) / 12011303000 (SSE2)
clang 3.8:	12431116000 (SSE4.1) / 73035466000 (SSE2) !!!
clang 3.7:	12458839000 (SSE4.1) / 13317058000 (SSE2)
clang 3.6:	12462181000 (SSE4.1) / 14119683000 (SSE2)
clang 3.4:	13555167000 (SSE4.1) / 13178893000 (SSE2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions