Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Jun 30, 2025

This change adds the n_probe value to the output which will be 0 in the case of non-ivf runs. In addition it separates index and search data, so a normal output looks like:

index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
----------  --------  --------------  --------------------  ------------  
ivf          1000000           50382                132819             0

index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited
----------  -------  -----------  ----------------  -------------  ------  ------  --------  
ivf             100         3.69              0.00           0.00  271.00    0.97  58917.00

In addition, this change allows to define an array of n_probe in the configuration file so we can test different values in the same run, so for example defining an n_probe like:

  "n_probe" : [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],

will produce the following output:

index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
----------  --------  --------------  --------------------  ------------  
ivf          1000000           50382                132819             0

index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited
----------  -------  -----------  ----------------  -------------  ------  ------  --------  
ivf              10         1.18              0.00           0.00  847.46    0.82   7244.59
ivf              20         1.36              0.00           0.00  735.29    0.89  13288.69
ivf              30         1.66              0.00           0.00  602.41    0.92  19266.67
ivf              40         1.93              0.00           0.00  518.13    0.94  24995.41
ivf              50         2.21              0.00           0.00  452.49    0.94  30739.60
ivf              60         2.51              0.00           0.00  398.41    0.95  36428.00
ivf              70         2.76              0.00           0.00  362.32    0.96  41952.59
ivf              80         2.99              0.00           0.00  334.45    0.96  47599.64
ivf              90         3.31              0.00           0.00  302.11    0.96  53254.45
ivf             100         3.69              0.00           0.00  271.00    0.97  58917.00

This makes easier to plot the n_probe curve while doing changes.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 30, 2025
Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like it! lgtm

@iverase iverase added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 30, 2025
@elasticsearchmachine elasticsearchmachine merged commit 7bc215a into elastic:main Jun 30, 2025
32 checks passed
@iverase iverase deleted the nProbeIndexTester branch June 30, 2025 12:47
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
…ic#130316)

This change adds the n_probe value to the output which will be 0 in the
case of non-ivf runs. In addition it separates index and search data, so
a normal output looks like:

```
index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
----------  --------  --------------  --------------------  ------------  
ivf          1000000           50382                132819             0

index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited
----------  -------  -----------  ----------------  -------------  ------  ------  --------  
ivf             100         3.69              0.00           0.00  271.00    0.97  58917.00
```

In addition, this change allows to define an array of n_probe in the
configuration file so we can test different values in the same run, so
for example defining an n_probe like:

```
  "n_probe" : [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
```

will produce the following output:

```
index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
----------  --------  --------------  --------------------  ------------  
ivf          1000000           50382                132819             0

index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall   visited
----------  -------  -----------  ----------------  -------------  ------  ------  --------  
ivf              10         1.18              0.00           0.00  847.46    0.82   7244.59
ivf              20         1.36              0.00           0.00  735.29    0.89  13288.69
ivf              30         1.66              0.00           0.00  602.41    0.92  19266.67
ivf              40         1.93              0.00           0.00  518.13    0.94  24995.41
ivf              50         2.21              0.00           0.00  452.49    0.94  30739.60
ivf              60         2.51              0.00           0.00  398.41    0.95  36428.00
ivf              70         2.76              0.00           0.00  362.32    0.96  41952.59
ivf              80         2.99              0.00           0.00  334.45    0.96  47599.64
ivf              90         3.31              0.00           0.00  302.11    0.96  53254.45
ivf             100         3.69              0.00           0.00  271.00    0.97  58917.00
```

This makes easier to plot the n_probe curve while doing changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants