Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Jan 28, 2025

This commit integrates MinimalServiceSettings (introduced in #120560) into the cluster state for all registered models in the ModelRegistry. These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations.

To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index. If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.

This commit integrates `MinimalServiceSettings` (introduced in elastic#120560) into the cluster state for all registered models in the `ModelRegistry`.
These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations.

To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index.
If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

@jonathan-buttner
Copy link
Contributor

Just thinking out loud while looking at the PR:

These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations.

Do consumers have to access this information frequently?

Do you think there'd be any benefit to using a cache specific to these settings instead of the cluster state metadata? I guess we'd run into the same issue of needing the synchronized across all the nodes 🤔

The Elastic Inference Service makes an asynchronous authorization call when the node boots up to determine which default inference endpoints are enabled. Does that cause any issues with this solution? Basically it means we don't know immediately after the node boots up what all the default inference endpoints are.

Here's where that call happens:

https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceService.java#L186

@jimczi
Copy link
Contributor Author

jimczi commented Jan 29, 2025

Do consumers have to access this information frequently?

For semantic_text, only when creating a new field so very infrequently.

The Elastic Inference Service makes an asynchronous authorization call when the node boots up to determine which default inference endpoints are enabled. Does that cause any issues with this solution? Basically it means we don't know immediately after the node boots up what all the default inference endpoints are.

I think that's fine. That would mean that creating new semantic_text field that depends on this models could fail before the default endpoints are added.
On a separate note, do we handle the case where the default models change? With the current system, if the authorisation fail, the default models won't be loaded but they might be already stored in the index, right?

@jonathan-buttner
Copy link
Contributor

For semantic_text, only when creating a new field so very infrequently.

If it's very infrequently, what's the benefit to move it into the cluster state? Is it because the call to get the whole model is expensive?

I think that's fine. That would mean that creating new semantic_text field that depends on this models could fail before the default endpoints are added.

Yeah, probably not super likely since for that to happen the semantic_text field would have to be created like immediately after a node is finished booting 🤷‍♂️

On a separate note, do we handle the case where the default models change? With the current system, if the authorisation fail, the default models won't be loaded but they might be already stored in the index, right?

Oh that's an interesting point I hadn't thought of. I think what you're saying is if we revoke access to an model after it was granted previously?

Let me ping the EIS team on how we should handle that.

@jimczi
Copy link
Contributor Author

jimczi commented Jan 29, 2025

If it's very infrequently, what's the benefit to move it into the cluster state? Is it because the call to get the whole model is expensive?

The call would happen on the master node when updating/creating a mapping so we cannot block the thread to get the model from the index. Today we are lenient and get the model definition in a later stage but now that we want to add options to setup the inner fields we have to know the model early.

@jonathan-buttner
Copy link
Contributor

The call would happen on the master node when updating/creating a mapping so we cannot block the thread to get the model from the index. Today we are lenient and get the model definition in a later stage but now that we want to add options to setup the inner fields we have to know the model early.

Ah I see.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach makes sense to me. I left a couple questions for things that weren't immediately clear to me.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great if you can remove the return_minimal_config parameter from the REST API pls.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimczi jimczi merged commit 270ec53 into elastic:main Mar 18, 2025
17 checks passed
@jimczi jimczi deleted the model_registry_cluster_state branch March 18, 2025 10:12
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Mar 19, 2025
When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
jimczi added a commit that referenced this pull request Mar 19, 2025
When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since #121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Mar 19, 2025
…ic#125242)

When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025
* Add ModelRegistryMetadata to Cluster State (#121106)

This commit integrates `MinimalServiceSettings` (introduced in #120560) into the cluster state for all registered models in the `ModelRegistry`.
These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations.

To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index.
If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.

* fix test compil

* fix serialisation

* Exclude Default Inference Endpoints from Cluster State Storage (#125242)

When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since #121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
…ic#125242)

When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
This commit integrates `MinimalServiceSettings` (introduced in elastic#120560) into the cluster state for all registered models in the `ModelRegistry`.
These settings allow consumers to access configuration details without requiring asynchronous calls to retrieve full model configurations.

To ensure consistency, the cluster state metadata must remain synchronized with the models in the inference index.
If a mismatch is detected during startup, the master node performs an upgrade to load all model settings from the index.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…ic#125242)

When retrieving a default inference endpoint for the first time, the system automatically creates the endpoint.
However, unlike the `put inference model` action, the `get` action does not redirect the request to the master node.

Since elastic#121106, we rely on the assumption that every model creation (`put model`) must run on the master node, as it modifies the cluster state. However, this assumption led to a bug where the get action tries to store default inference endpoints from a different node.

This change resolves the issue by preventing default inference endpoints from being added to the cluster state. These endpoints are not strictly needed there, as they are already reported by inference services upon startup.

**Note:** This bug did not prevent the default endpoints from being used, but it caused repeated attempts to store them in the index, resulting in logging errors on every usage.
@jimczi jimczi mentioned this pull request Apr 1, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants