Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Aug 4, 2025

I read somewhere that it is generally faster to perform multiplication than divisions when using SIMD instructions. I had a go and change those operations in the panamized methods used by OptimizedScalarQuantizer and we get a measurable speed up!

Before:

Benchmark                                 (bits)  (dims)   Mode  Cnt    Score     Error   Units
OptimizedScalarQuantizerBenchmark.scalar       1     384  thrpt   15  168.238 ±  23.585  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1     702  thrpt   15   93.736 ±  14.887  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1    1024  thrpt   15   62.421 ±   5.945  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     384  thrpt   15  173.864 ±  35.138  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     702  thrpt   15   88.646 ±  19.756  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4    1024  thrpt   15   57.225 ±   8.842  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     384  thrpt   15  173.472 ±  16.121  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     702  thrpt   15   93.513 ±   9.126  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7    1024  thrpt   15   68.805 ±  10.624  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     384  thrpt   15  656.885 ± 114.497  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     702  thrpt   15  357.374 ±  66.247  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1    1024  thrpt   15  234.931 ±  30.954  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     384  thrpt   15  596.822 ±  86.150  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     702  thrpt   15  314.400 ±  30.220  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4    1024  thrpt   15  218.100 ±  15.042  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     384  thrpt   15  647.844 ±  41.396  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     702  thrpt   15  358.260 ±  20.754  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7    1024  thrpt   15  250.937 ±  20.001  ops/ms

After:

Benchmark                                 (bits)  (dims)   Mode  Cnt    Score     Error   Units
OptimizedScalarQuantizerBenchmark.scalar       1     384  thrpt   15  181.036 ±  32.714  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1     702  thrpt   15   91.953 ±  10.118  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       1    1024  thrpt   15   64.169 ±   9.674  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     384  thrpt   15  177.412 ±  38.977  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4     702  thrpt   15   89.876 ±  14.582  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       4    1024  thrpt   15   60.455 ±  13.816  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     384  thrpt   15  192.380 ±  21.849  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7     702  thrpt   15   96.493 ±   8.962  ops/ms
OptimizedScalarQuantizerBenchmark.scalar       7    1024  thrpt   15   69.664 ±   5.691  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     384  thrpt   15  735.320 ±  87.558  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1     702  thrpt   15  397.016 ±  48.072  ops/ms
OptimizedScalarQuantizerBenchmark.vector       1    1024  thrpt   15  278.980 ±  31.580  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     384  thrpt   15  721.900 ± 117.902  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4     702  thrpt   15  359.691 ±  49.942  ops/ms
OptimizedScalarQuantizerBenchmark.vector       4    1024  thrpt   15  230.584 ±   9.912  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     384  thrpt   15  725.781 ±  49.995  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7     702  thrpt   15  419.100 ±  42.762  ops/ms
OptimizedScalarQuantizerBenchmark.vector       7    1024  thrpt   15  278.497 ±  26.565  ops/ms

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 4, 2025
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice lgtm

@iverase iverase merged commit a4045d8 into elastic:main Aug 4, 2025
33 checks passed
@iverase iverase deleted the div2mul branch August 4, 2025 14:50
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 5, 2025
…cking

* upstream/main: (26 commits)
  [Fleet] add privileges to `kibana_system` to read integrations data (elastic#132400)
  Add `TestEntitlementsRule` with support for dynamic entitled node paths for testing (elastic#132077)
  Reduce logging frequency for GCS per project clients (elastic#132429)
  Skip update/100_synthetic_source tests in yamlRestCompatTests (elastic#132296)
  Correct exception for missing nested path (elastic#132408)
  Fixing esql release tests elastic#132369 (elastic#132406)
  Adjust date docvalue formatting to return 4xx instead of 5xx (elastic#132414)
  Handle nested fields with the termvectors REST API in artificial docs (elastic#92568)
  Only collect bulk scored vectors when exceeding min competitive (elastic#132293)
  Fix release tests diskbbq update (elastic#132405)
  ESQL: Fix skipping of generative tests (elastic#132390)
  Short circuit failure handling in OIDC flow (elastic#130618)
  Small optimization in OptimizedScalarQuantizer by using mul instead of div (elastic#132397)
  Aggs: Add validation to Bucket script pipeline agg (elastic#132320)
  ESQL: Multiple parameters in ungrouped aggs (elastic#132375)
  ESQL: Explain test operators (elastic#132374)
  EQL: Deal with internally created IN in a different way for EQL (elastic#132167)
  Speed up hierarchical k-means by computing distances in bulk (elastic#132384)
  Reduce the number of fields per document (elastic#132322)
  Assert current thread in ESQL (elastic#132324)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants