implemented AVX path #3629

gjaegy · 2019-07-09T13:32:35Z

gjaegy
Jul 9, 2019

Hi Erwin,

I've spent a few days to implement an AVX path (using the SSE one as reference) for the double-precision version of Bullet.

Most of the functions that have a SSE implementation now also have a AVX code path - apart 2-3 of them (search "TODO_AVX" in the source code to find them out).

Actually, before spending some more time on implementing an AVX path on these big function (maxdot/mindot), I decided to benchmark the current implementation (i.e. all methods/functions having a SSE implementation now have a AVX path as well, apart the 2-3 mentionned above). The gain was negligible on my test scenario.

So, there is a reference AVX implementation available, it's correct and working well, but actually doesn't provide any significant improvement (at least, not enough to accept the compatibility break with old non-AVX CPUs). At least not on our typical scenario.

I assume the memory latency overhead is huge compared to the few SIMD instructions performed.

The commit can be found in my fork on github : gjaegy@f2e7c78

I'm happy to answer any question in the case you want to have a look at this work. Maybe someone with a better AVX expertise could have a look and potentially find some way to speed it up.

TrianglesPCT · 2019-07-09T21:21:31Z

TrianglesPCT
Jul 9, 2019

I see that this is double precision AVX, are you comparing against single precision SSE? I'd expect not much difference as they are both 4 wide, AVX might even be slower since it uses more memory. The AVX version should be more accurate if nothing else :)

I agree that the memory latency is the issue, Bullet just isn't very SoA friendly the way it is currently written, to truly benefit from AVX(especially single precision 8 wide, and AVX512 at 16 wide) large parts of Bullet would need to be rewritten.

0 replies

gjaegy · 2019-07-10T08:18:54Z

gjaegy
Jul 10, 2019
Author

No, I compared against the double precision scalar version (we always use the double precision version since we simulate on the whole Earth, so big coordinates).

Thanks for your feedback, I think your analysis is correct (I came to the same one). Having AVX used for a single matrix/vector transformation, or vector normalization, is not really worth it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

implemented AVX path #3629

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

implemented AVX path #3629

Uh oh!

gjaegy Jul 9, 2019

Replies: 2 comments

Uh oh!

Uh oh!

TrianglesPCT Jul 9, 2019

Uh oh!

gjaegy Jul 10, 2019 Author

gjaegy
Jul 9, 2019

TrianglesPCT
Jul 9, 2019

gjaegy
Jul 10, 2019
Author