Skip to content

Conversation

@weizijun
Copy link
Contributor

Now the num_candidates must be lower then 10000, In some case, use want to get more vectors.
I think the num_candidates can be a parameter like max_window_size, The use can change the default vaule.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 17, 2025
@henningandersen henningandersen added :Search Relevance/Vectors Vector search and removed needs:triage Requires assignment of a team area label labels Mar 17, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Mar 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

* main: (95 commits)
  Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testLifecycleAppliedToFailureStore elastic#124999
  Merge template mappings properly during validation (elastic#124784)
  [Build] Rework internal build plugin plugin to work with Isolated Projects (elastic#123461)
  [Build] Require reason for usesDefaultDistribution (elastic#124707)
  Mute org.elasticsearch.packaging.test.DockerTests test011SecurityEnabledStatus elastic#124990
  Mute org.elasticsearch.xpack.ilm.TimeSeriesDataStreamsIT testRolloverAction elastic#124987
  Mute org.elasticsearch.packaging.test.BootstrapCheckTests test10Install elastic#124957
  Mute org.elasticsearch.integration.DataStreamLifecycleServiceRuntimeSecurityIT testRolloverLifecycleAndForceMergeAuthorized elastic#124978
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQuery elastic#124977
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocal elastic#121672
  Mention zero-window state in networking docs (elastic#124967)
  Remove remoteAddress field from TransportResponse (elastic#120016)
  Include failures in partial response (elastic#124929)
  Prevent work starvation bug if using scaling EsThreadPoolExecutor with core pool size = 0  (elastic#124732)
  Re-enable analysis stemmer test (elastic#124961)
  Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryStopIT testStopQueryLocalNoRemotes elastic#124959
  ESQL: Catch parsing exception (elastic#124958)
  ESQL: Improve error message for ( and [ (elastic#124177)
  Mute org.elasticsearch.xpack.esql.qa.single_node.EsqlSpecIT test {lookup-join.MvJoinKeyFromRow SYNC} elastic#124951
  Mute org.elasticsearch.datastreams.lifecycle.DataStreamLifecycleServiceIT testErrorRecordingOnRetention elastic#124950
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
@benwtrent
Copy link
Member

Now the num_candidates must be lower then 10000, In some case, use want to get more vectors.

To me, this falls within the "too many clauses" type of error. We make this limit to prevent dramatic cluster performance issues as certain scratch data is eagerly initialized.

If this is indeed something that is desired (e.g. get more than 10k candidates), I wonder if we are better suited for a index wide setting that places the upper limit.

I am not a huge fan of tacking on more parameters to an already complicated set of index options.

@weizijun
Copy link
Contributor Author

I am not a huge fan of tacking on more parameters to an already complicated set of index options.

yeah, I wanted to implement it in index settings, but I found it a little hard, and easy implement in the mapping.
Maybe we can try to implement in the setting.

@benwtrent
Copy link
Member

yeah, I wanted to implement it in index settings, but I found it a little hard, and easy implement in the mapping.
Maybe we can try to implement in the setting.

I can understand the difficulty as this will require moving validation logic and such around. Additionally, it should probably be a "dynamic", which then means you have a value that gets updated on setting updates.

But this is the way it should go, instead of adding more configuration points to the mapped field.

@weizijun
Copy link
Contributor Author

But this is the way it should go, instead of adding more configuration points to the mapped field.

I find that the check can be add in the DfsPhase, it can get the index setting, and check the num_candidates value.
Is this the right way?

@weizijun
Copy link
Contributor Author

Hi, @benwtrent , I add a new PR #125065
It implement the feature in the index settings.

@benwtrent
Copy link
Member

closing in deference to: #125065

Though that PR might not be merged either.

@benwtrent benwtrent closed this May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants