[Inference API] Introduce Update API to change some aspects of existing inference endpoints #114457

maxhniebergall · 2024-10-09T21:07:54Z

Overview

The following fields can be updated

endpoint secrets
task settings
num_allocations (for endpoints backed by in-cluster TrainedModels)

Examples:

Put ELSER:
response:

{
    "inference_id": "elser_endpoint1",
    "task_type": "sparse_embedding",
    "service": "elasticsearch",
    "service_settings": {
        "num_allocations": 1,
        "num_threads": 1,
        "model_id": ".elser_model_2"
    },
    "task_settings": {}
}

ELSER

Update ELSER:
request:
localhost:9200/_inference/sparse_embedding/elser_endpoint1/_update

  {
  "service_settings": {
    "num_allocations": 2
  }
  }

response:

{
    "inference_id": "elser_endpoint1",
    "task_type": "sparse_embedding",
    "service": "elasticsearch",
    "service_settings": {
        "num_allocations": 2,
        "num_threads": 1,
        "model_id": ".elser_model_2"
    },
    "task_settings": {}
}

GET Trained Models:
response:

...
            "deployment_stats": {
                "deployment_id": "elser_endpoint1",
                "model_id": ".elser_model_2",
                "threads_per_allocation": 1,
                "number_of_allocations": 2,
                "queue_capacity": 1024,
                "state": "started",
                "allocation_status": {
                    "allocation_count": 2,
                    "target_allocation_count": 2,
                    "state": "fully_allocated"
                },
                "cache_size": "417.8mb",
                "priority": "normal",
                "start_time": 1728508000822,
                "peak_throughput_per_minute": 0,
                "nodes": [
                    {
...

GET endpoints:
response

...
        {
            "inference_id": "elser_endpoint1",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_allocations": 2,
                "num_threads": 1,
                "model_id": ".elser_model_2"
            },
            "task_settings": {}
        }
...

Cohere Rerank

Put endpoint
request:

  {
  "service": "cohere",
  "service_settings": {
    "model_id": "rerank-english-v3.0",
    "api_key": "<REDACTED>"
  },
  "task_settings": {
    "return_documents": true
  }
  }

response:

{
    "inference_id": "testss",
    "task_type": "rerank",
    "service": "cohere",
    "service_settings": {
        "model_id": "rerank-english-v3.0",
        "rate_limit": {
            "requests_per_minute": 10000
        }
    },
    "task_settings": {
        "return_documents": true
    }
}

Update endpoint:
request:
localhost:9200/_inference/rerank/testss/_update

  {
        "task_settings": {
        "top_n": 1
    }
  }

response:

  {
        "task_settings": {
        "top_n": 1
    }
  }

GET endpoints:
response

{
    "endpoints": [
        {
            "inference_id": ".elser-2",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 1,
                    "max_number_of_allocations": 8
                }
            },
            "task_settings": {}
        },
        {
            "inference_id": "elser_endpoint1",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_allocations": 2,
                "num_threads": 1,
                "model_id": ".elser_model_2"
            },
            "task_settings": {}
        },
        {
            "inference_id": "testss",
            "task_type": "rerank",
            "service": "cohere",
            "service_settings": {
                "model_id": "rerank-english-v3.0",
                "rate_limit": {
                    "requests_per_minute": 10000
                }
            },
            "task_settings": {
                "top_n": 1,
                "return_documents": true
            }
        }
    ]
}

elasticsearchmachine · 2024-10-09T21:08:18Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-10-09T21:08:19Z

Hi @maxhniebergall, I've created a changelog YAML for you.

maxhniebergall · 2024-10-09T22:08:46Z

@elasticmachine merge upstream

maxhniebergall · 2024-10-10T12:51:12Z

@elasticmachine merge upstream

elasticmachine · 2024-10-10T12:51:16Z

merge conflict between base and head

…h into inferenceUpgradeApi

…eUpgradeApi # Conflicts: # server/src/main/java/org/elasticsearch/inference/TaskSettings.java # x-pack/plugin/inference/src/internalClusterTest/java/org/elasticsearch/xpack/inference/integration/ModelRegistryIT.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/completion/AlibabaCloudSearchCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/sparse/AlibabaCloudSearchSparseTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/amazonbedrock/completion/AmazonBedrockChatCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/completion/AzureAiStudioChatCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureopenai/completion/AzureOpenAiCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/embeddings/CohereEmbeddingsTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elasticsearch/CustomElandRerankTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/rerank/GoogleVertexAiRerankTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionTaskSettingsTests.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/embeddings/OpenAiEmbeddingsTaskSettingsTests.java

maxhniebergall · 2024-10-10T16:20:20Z

@elasticmachine merge upstream

prwhelan · 2024-10-10T17:50:44Z

.../main/java/org/elasticsearch/xpack/inference/action/TransportUpdateInferenceModelAction.java

+    }
+
+    /**
+     * Only for non-in-cluster services. Combines the existing model with the new settings to create a new model using the


Is the only for non-in-cluster services part outdated, or how does it reconcile with the below comment

// In cluster services can only have their num_allocations updated, so this is a special case

yes, you're right, that comment is outdated. I'll update it. Thanks!

prwhelan · 2024-10-10T17:55:49Z

.../main/java/org/elasticsearch/xpack/inference/action/TransportUpdateInferenceModelAction.java

+        modelRegistry.getModelWithSecrets(inferenceEntityId, ActionListener.wrap((model) -> {
+            if (model == null) {
+                listener.onFailure(
+                    ExceptionsHelper.badRequestException(Messages.INFERENCE_ENTITY_NON_EXISTANT_NO_UPDATE, inferenceEntityId)


Should we throw a 404 for these two? Though I see that doesn't exist in ExceptionsHelper?

yes, I think that would be more correct. I'll update it.

prwhelan · 2024-10-10T18:15:34Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+                );
+
+                client.prepareBulk().add(configRequest).setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).execute(subListener);
+                logger.error(


Does this call the below attached listener if we call subListener.onFailure? If yes, I think we'll log two errors for the same issue. If no, I think we need to preventDeletionLock.remove(inferenceEntityId);

Might be easier to always call subListener in each block, then replace that last block with:

.addListener(finalListener.delegateResponse(subListener, e) -> { preventDeletionLock.remove(inferenceEntityId); finalListener.onFailure( new ElasticsearchStatusException( format( "Failed to rollback while handling failure to update inference endpoint [%s]. " + "Endpoint may be in an inconsistent state due to [%s]", inferenceEntityId ), RestStatus.INTERNAL_SERVER_ERROR, e ) ); });

Subscribable listener documentation says "A failure of any step will bypass the remaining steps and ultimately fail finalListener.". I do wonder if these lines after 379 will ever actually be executed... Actually, I think they need to be added into the listener passed on 379.

I think that the reason that the final block is set up like this is because the bulk API doesn't throw exceptions, it catches them into this BulkResponse configResponse object, and we need to turn that into an exception for the listener.

Similarly, I am calling onResponse or onFailure for the finalListener instead of the subListener because I need to call those functions with the correct object types (boolean). The last block isn't really for the whole set of calls, its only for catching the case

Actually, it also occurs to me that if the bulk API throws an exception for some reason, we won't properly remove the lock, so I need to add a new action listener

prwhelan · 2024-10-10T18:16:17Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+            );
+            return;
+        } else {
+            preventDeletionLock.add(inferenceEntityId);


prwhelan · 2024-10-10T21:12:08Z

...plugin/inference/src/main/java/org/elasticsearch/xpack/inference/registry/ModelRegistry.java

+                    format("Failed to update inference endpoint [%s] due to [%s]", inferenceEntityId, configResponse.buildFailureMessage())
+                );
+                // Since none of our updates succeeded at this point, we can simply return.
+                finalListener.onFailure(


Do we also need to unlock here as well?

Yea, I think that's correct. We probably should be unlocking whenever we call the finalListener or the the final sub listener. Do you think you could add this tomorrow? If not, I can probably do it at some point before ff.

davidkyle · 2024-10-11T20:31:45Z

@elasticmachine update branch

elasticsearchmachine · 2024-10-11T21:39:25Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 114457

…ng inference endpoints (elastic#114457)

…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)

maxhniebergall · 2024-10-14T15:31:45Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

…existing inference endpoints (#114457) (#114734) * [Inference API] Introduce Update API to change some aspects of existing inference endpoints (#114457) (cherry picked from commit 6b714e2) * Fix syntax error caused by old JDK?

…ng inference endpoints (elastic#114457)

Max Hniebergall added 3 commits October 8, 2024 19:32

Most of the update API

8892f03

Clean up TODOs

0e87606

Add tests

b451f06

maxhniebergall added >enhancement :ml Machine learning v8.16.0 v9.0.0 labels Oct 9, 2024

maxhniebergall requested a review from davidkyle October 9, 2024 21:07

elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 9, 2024

Update docs/changelog/114457.yaml

d59aeec

spotless

98b677e

Merge branch 'main' into inferenceUpgradeApi

95da70e

Max Hniebergall added 3 commits October 10, 2024 09:08

Merge branch 'inferenceUpgradeApi' of github.com:elastic/elasticsearc…

8998681

…h into inferenceUpgradeApi

precommit

e1f908c

maxhniebergall added the auto-backport Automatically create backport pull requests when merged label Oct 10, 2024

maxhniebergall enabled auto-merge (squash) October 10, 2024 13:48

Add new API to non-operator constants

87535e5

maxhniebergall requested a review from prwhelan October 10, 2024 15:25

Merge branch 'main' into inferenceUpgradeApi

88a4cfb

prwhelan reviewed Oct 10, 2024

View reviewed changes

Improvements from review

1cd4ec3

prwhelan approved these changes Oct 10, 2024

View reviewed changes

Merge branch 'main' into inferenceUpgradeApi

24fab77

maxhniebergall merged commit 6b714e2 into main Oct 11, 2024
17 checks passed

maxhniebergall deleted the inferenceUpgradeApi branch October 11, 2024 21:39

elasticsearchmachine added the backport pending label Oct 11, 2024

davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Oct 13, 2024

[Inference API] Introduce Update API to change some aspects of existi…

03d88a5

…ng inference endpoints (elastic#114457)

maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Oct 14, 2024

[Inference API] Introduce Update API to change some aspects of existi…

e2360fa

…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)

maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Oct 14, 2024

[Inference API] Introduce Update API to change some aspects of existi…

a80f26c

…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)

maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Oct 14, 2024

[Inference API] Introduce Update API to change some aspects of existi…

0600876

…ng inference endpoints (elastic#114457) (cherry picked from commit 6b714e2)

maxhniebergall mentioned this pull request Oct 14, 2024

[8.x] [Inference API] Introduce Update API to change some aspects of existing inference endpoints (#114457) #114734

Merged

maxhniebergall mentioned this pull request Oct 17, 2024

[Inference API] Add missing preventDeletionLock.remove in corner case #115010

Merged

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024

[Inference API] Introduce Update API to change some aspects of existi…

bb2c67b

…ng inference endpoints (elastic#114457)

pquentin mentioned this pull request Feb 21, 2025

[8.17] Add missing ES|QL, data stream, inference, and PKI security specifications (#119472) #123137

Merged

[Inference API] Introduce Update API to change some aspects of existing inference endpoints #114457

[Inference API] Introduce Update API to change some aspects of existing inference endpoints #114457

Uh oh!

Conversation

maxhniebergall commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Examples:

ELSER

Cohere Rerank

Uh oh!

elasticsearchmachine commented Oct 9, 2024

Uh oh!

elasticsearchmachine commented Oct 9, 2024

Uh oh!

maxhniebergall commented Oct 9, 2024

Uh oh!

maxhniebergall commented Oct 10, 2024

Uh oh!

elasticmachine commented Oct 10, 2024

Uh oh!

maxhniebergall commented Oct 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle commented Oct 11, 2024

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 11, 2024

💔 Backport failed

Uh oh!

maxhniebergall commented Oct 14, 2024

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maxhniebergall commented Oct 9, 2024 •

edited

Loading