Skip to content

Conversation

maxhniebergall
Copy link
Contributor

@maxhniebergall maxhniebergall commented Oct 9, 2024

Overview

The following fields can be updated

  • endpoint secrets
  • task settings
  • num_allocations (for endpoints backed by in-cluster TrainedModels)

Examples:

Put ELSER:
response:

{
    "inference_id": "elser_endpoint1",
    "task_type": "sparse_embedding",
    "service": "elasticsearch",
    "service_settings": {
        "num_allocations": 1,
        "num_threads": 1,
        "model_id": ".elser_model_2"
    },
    "task_settings": {}
}

ELSER

Update ELSER:
request:
localhost:9200/_inference/sparse_embedding/elser_endpoint1/_update

  {
  "service_settings": {
    "num_allocations": 2
  }
  }

response:

{
    "inference_id": "elser_endpoint1",
    "task_type": "sparse_embedding",
    "service": "elasticsearch",
    "service_settings": {
        "num_allocations": 2,
        "num_threads": 1,
        "model_id": ".elser_model_2"
    },
    "task_settings": {}
}

GET Trained Models:
response:

...
            "deployment_stats": {
                "deployment_id": "elser_endpoint1",
                "model_id": ".elser_model_2",
                "threads_per_allocation": 1,
                "number_of_allocations": 2,
                "queue_capacity": 1024,
                "state": "started",
                "allocation_status": {
                    "allocation_count": 2,
                    "target_allocation_count": 2,
                    "state": "fully_allocated"
                },
                "cache_size": "417.8mb",
                "priority": "normal",
                "start_time": 1728508000822,
                "peak_throughput_per_minute": 0,
                "nodes": [
                    {
...

GET endpoints:
response

...
        {
            "inference_id": "elser_endpoint1",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_allocations": 2,
                "num_threads": 1,
                "model_id": ".elser_model_2"
            },
            "task_settings": {}
        }
...

Cohere Rerank

Put endpoint
request:

  {
  "service": "cohere",
  "service_settings": {
    "model_id": "rerank-english-v3.0",
    "api_key": "<REDACTED>"
  },
  "task_settings": {
    "return_documents": true
  }
  }

response:

{
    "inference_id": "testss",
    "task_type": "rerank",
    "service": "cohere",
    "service_settings": {
        "model_id": "rerank-english-v3.0",
        "rate_limit": {
            "requests_per_minute": 10000
        }
    },
    "task_settings": {
        "return_documents": true
    }
}

Update endpoint:
request:
localhost:9200/_inference/rerank/testss/_update

  {
        "task_settings": {
        "top_n": 1
    }
  }

response:

  {
        "task_settings": {
        "top_n": 1
    }
  }

GET endpoints:
response

{
    "endpoints": [
        {
            "inference_id": ".elser-2",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 1,
                    "max_number_of_allocations": 8
                }
            },
            "task_settings": {}
        },
        {
            "inference_id": "elser_endpoint1",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_allocations": 2,
                "num_threads": 1,
                "model_id": ".elser_model_2"
            },
            "task_settings": {}
        },
        {
            "inference_id": "testss",
            "task_type": "rerank",
            "service": "cohere",
            "service_settings": {
                "model_id": "rerank-english-v3.0",
                "rate_limit": {
                    "requests_per_minute": 10000
                }
            },
            "task_settings": {
                "top_n": 1,
                "return_documents": true
            }
        }
    ]
}

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @maxhniebergall, I've created a changelog YAML for you.

@maxhniebergall
Copy link
Contributor Author

@elasticmachine merge upstream

@maxhniebergall
Copy link
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Collaborator

merge conflict between base and head

Max Hniebergall added 3 commits October 10, 2024 09:08
…eUpgradeApi

# Conflicts:
#	server/src/main/java/org/elasticsearch/inference/TaskSettings.java
#	x-pack/plugin/inference/src/internalClusterTest/java/org/elasticsearch/xpack/inference/integration/ModelRegistryIT.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/completion/AlibabaCloudSearchCompletionTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/alibabacloudsearch/sparse/AlibabaCloudSearchSparseTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/amazonbedrock/completion/AmazonBedrockChatCompletionTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureaistudio/completion/AzureAiStudioChatCompletionTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/azureopenai/completion/AzureOpenAiCompletionTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/cohere/embeddings/CohereEmbeddingsTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elasticsearch/CustomElandRerankTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/rerank/GoogleVertexAiRerankTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionTaskSettingsTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/openai/embeddings/OpenAiEmbeddingsTaskSettingsTests.java
@maxhniebergall maxhniebergall added the auto-backport Automatically create backport pull requests when merged label Oct 10, 2024
@maxhniebergall maxhniebergall enabled auto-merge (squash) October 10, 2024 13:48
@maxhniebergall
Copy link
Contributor Author

@elasticmachine merge upstream

}

/**
* Only for non-in-cluster services. Combines the existing model with the new settings to create a new model using the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only for non-in-cluster services part outdated, or how does it reconcile with the below comment

            // In cluster services can only have their num_allocations updated, so this is a special case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right, that comment is outdated. I'll update it. Thanks!

modelRegistry.getModelWithSecrets(inferenceEntityId, ActionListener.wrap((model) -> {
if (model == null) {
listener.onFailure(
ExceptionsHelper.badRequestException(Messages.INFERENCE_ENTITY_NON_EXISTANT_NO_UPDATE, inferenceEntityId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw a 404 for these two? Though I see that doesn't exist in ExceptionsHelper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think that would be more correct. I'll update it.

);

client.prepareBulk().add(configRequest).setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).execute(subListener);
logger.error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this call the below attached listener if we call subListener.onFailure? If yes, I think we'll log two errors for the same issue. If no, I think we need to preventDeletionLock.remove(inferenceEntityId);

Might be easier to always call subListener in each block, then replace that last block with:

.addListener(finalListener.delegateResponse(subListener, e) -> {
            preventDeletionLock.remove(inferenceEntityId);
            finalListener.onFailure(
                new ElasticsearchStatusException(
                    format(
                        "Failed to rollback while handling failure to update inference endpoint [%s]. "
                            + "Endpoint may be in an inconsistent state due to [%s]",
                        inferenceEntityId
                    ),
                    RestStatus.INTERNAL_SERVER_ERROR,
                    e
                )
            );
        });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subscribable listener documentation says "A failure of any step will bypass the remaining steps and ultimately fail finalListener.". I do wonder if these lines after 379 will ever actually be executed... Actually, I think they need to be added into the listener passed on 379.

I think that the reason that the final block is set up like this is because the bulk API doesn't throw exceptions, it catches them into this BulkResponse configResponse object, and we need to turn that into an exception for the listener.

Similarly, I am calling onResponse or onFailure for the finalListener instead of the subListener because I need to call those functions with the correct object types (boolean). The last block isn't really for the whole set of calls, its only for catching the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it also occurs to me that if the bulk API throws an exception for some reason, we won't properly remove the lock, so I need to add a new action listener

);
return;
} else {
preventDeletionLock.add(inferenceEntityId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

format("Failed to update inference endpoint [%s] due to [%s]", inferenceEntityId, configResponse.buildFailureMessage())
);
// Since none of our updates succeeded at this point, we can simply return.
finalListener.onFailure(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to unlock here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think that's correct. We probably should be unlocking whenever we call the finalListener or the the final sub listener. Do you think you could add this tomorrow? If not, I can probably do it at some point before ff.

@davidkyle
Copy link
Member

@elasticmachine update branch

@maxhniebergall maxhniebergall merged commit 6b714e2 into main Oct 11, 2024
17 checks passed
@maxhniebergall maxhniebergall deleted the inferenceUpgradeApi branch October 11, 2024 21:39
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 114457

davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Oct 13, 2024
maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Oct 14, 2024
maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Oct 14, 2024
maxhniebergall added a commit to maxhniebergall/elasticsearch that referenced this pull request Oct 14, 2024
@maxhniebergall
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Oct 14, 2024
…existing inference endpoints (#114457) (#114734)

* [Inference API] Introduce Update API to change some aspects of existing inference endpoints (#114457)

(cherry picked from commit 6b714e2)

* Fix syntax error caused by old JDK?
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants