Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Aug 14, 2025

After #132797, the method BQVectorUtils#transposeHalfByte can be easily vectorize by using a helper array that defines the shifts we need to apply to the vector elements.

For 128 bits in my mac, the improvements are around 60%:

Benchmark                                           (dims)   Mode  Cnt  Score   Error   Units
TransposeHalfByteBenchmark.transposeHalfByte           384  thrpt    5  3.344 ± 0.100  ops/ms
TransposeHalfByteBenchmark.transposeHalfByte           782  thrpt    5  1.599 ± 0.020  ops/ms
TransposeHalfByteBenchmark.transposeHalfByte          1024  thrpt    5  1.263 ± 0.010  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy     384  thrpt    5  1.661 ± 0.031  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy     782  thrpt    5  0.805 ± 0.034  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy    1024  thrpt    5  0.625 ± 0.022  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama     384  thrpt    5  5.891 ± 0.047  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama     782  thrpt    5  2.829 ± 0.025  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama    1024  thrpt    5  2.205 ± 0.042  ops/ms

For 256 bits on a GCP instance shows up to 3x faster:

Benchmark                                           (dims)   Mode  Cnt  Score   Error   Units
TransposeHalfByteBenchmark.transposeHalfByte           384  thrpt    5  1.198 ± 0.035  ops/ms
TransposeHalfByteBenchmark.transposeHalfByte           782  thrpt    5  0.931 ± 0.055  ops/ms
TransposeHalfByteBenchmark.transposeHalfByte          1024  thrpt    5  0.454 ± 0.006  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy     384  thrpt    5  0.751 ± 0.033  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy     782  thrpt    5  0.347 ± 0.004  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy    1024  thrpt    5  0.284 ± 0.010  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama     384  thrpt    5  3.368 ± 0.187  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama     782  thrpt    5  1.625 ± 0.004  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama    1024  thrpt    5  1.354 ± 0.059  ops/ms

For 512 bits on a GCP instance up to 2x faster:

Benchmark                                           (dims)   Mode  Cnt  Score   Error   Units
TransposeHalfByteBenchmark.transposeHalfByte           384  thrpt    5  1.251 ± 0.049  ops/ms
TransposeHalfByteBenchmark.transposeHalfByte           782  thrpt    5  0.737 ± 0.013  ops/ms
TransposeHalfByteBenchmark.transposeHalfByte          1024  thrpt    5  0.469 ± 0.036  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy     384  thrpt    5  0.683 ± 0.064  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy     782  thrpt    5  0.341 ± 0.007  ops/ms
TransposeHalfByteBenchmark.transposeHalfByteLegacy    1024  thrpt    5  0.260 ± 0.002  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama     384  thrpt    5  2.634 ± 0.056  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama     782  thrpt    5  1.258 ± 0.016  ops/ms
TransposeHalfByteBenchmark.transposeHalfBytePanama    1024  thrpt    5  1.026 ± 0.054  ops/ms

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL the panama vector version is easier to read than the regular one !!!!

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're on 🔥 @iverase !! LGTM

@iverase iverase merged commit 8c01b67 into elastic:main Aug 14, 2025
32 of 33 checks passed
@iverase iverase deleted the transposeHalfByte-vector branch August 14, 2025 16:06
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants