Skip to content

SSE/AVX code optimization#2

Open
troosh wants to merge 5 commits intosirzooro:boincfrom
troosh:boinc
Open

SSE/AVX code optimization#2
troosh wants to merge 5 commits intosirzooro:boincfrom
troosh:boinc

Conversation

@troosh
Copy link

@troosh troosh commented Jan 12, 2018

I tried to optimize SSE/AVX version of MovePairSearch::MovePairSearch().
However, I'm not sure about the correctness of the work (why does WU for test something does, but not find anything?).

@sirzooro
Copy link
Owner

Thanks for your contribution!.

Mask uses 9 bits, so after packing vector elements to int8 you are loosing one bit. You can fix this by calling _mm_packs_epi16 on result of _mm_cmpeq_epi16.

BTW, I have few other optimizations waiting on my PC, which are not pushed here yet. One of them was to change type of elements in squareA_MaskT to uint16_t, so one SSE vector can hold whole row, like AVX2 does now. By looking on your changes I have realized that code can be optimized further, by using packs instruction. Thanks again!

@troosh
Copy link
Author

troosh commented Jan 12, 2018

It's a pity that you did not allow to create Issues in the repository. So I'll write here, sorry.

It would be useful to use a WUs from https://github.com/sirzooro/RakeSearch/releases/download/v1.0/test.tgz how default workunit with a script to check. Well, or ask @CrystalFrost about this.

@sirzooro
Copy link
Owner

I have enabled issues, they were disabled by default (probably inherited this when I forked this repo).

I also pushed all my new changes on branch optimizations2. Please take a look, to avoid duplicate work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants