Skip to content

Conversation

@maxhniebergall
Copy link
Contributor

@maxhniebergall maxhniebergall commented Oct 4, 2024

Summary

In this change we deprecate task_settings and alias it to "parameters". Moreover, when creating inference endpoints, users can pass an object named "parameters" in place of "task_settings", where the object can have all of the same fields it had before. If the user passes "task_settings" to the create endpoint API, the operation will succeed, but there will be a warning asking them to migrate to using "parameters" instead. In the response to the create endpoint API, both "parameters" and "task_settings" objects will be included, and they will have all of the same fields, to allow users to update their programs to expect "parameters" instead of "task_settings" without breaking backwards compatibility. When users perform a GET _inference/all, the response will include both "parameters" and "task_settings" until at least 9.0.

TODO

  • We will need to update the clients
  • We will need to update the docs
    • The docs should specify the above information, as well as providing examples like below.
  • Filter out empty task settings/parameters

Examples of create endpoint API:

Put ELSER

Request:

{
    "service": "elasticsearch",
    "service_settings": {
        "model_id": ".elser_model_2",
        "num_allocations": 1,
        "num_threads": 1
    }
}

Response:

{
    "inference_id": "elser_endpoint2",
    "task_type": "sparse_embedding",
    "service": "elasticsearch",
    "service_settings": {
        "num_allocations": 1,
        "num_threads": 1,
        "model_id": ".elser_model_2"
    },
    "task_settings": {},
    "parameters": {}
}

Put cohere

Request:

{
    "service": "cohere",
    "service_settings": {
        "model_id": "embed-english-v3.0",
        "api_key": <REDACTED>
    },
    "task_settings": {
        "input_type": "ingest"
    }
}

Response:

{
    "inference_id": "testss",
    "task_type": "text_embedding",
    "service": "cohere",
    "service_settings": {
        "similarity": "dot_product",
        "dimensions": 1024,
        "model_id": "embed-english-v3.0",
        "rate_limit": {
            "requests_per_minute": 10000
        },
        "embedding_type": "float"
    },
    "task_settings": {
        "input_type": "ingest"
    },
    "parameters": {
        "input_type": "ingest"
    }
}

Request:

{
    "service": "cohere",
    "service_settings": {
        "model_id": "embed-english-v3.0",
        "api_key": "gNMQtKcON8qrF3CjuZ270SJq7TCVyG6il08jZ4nV"
    },
    "parameters": {
        "input_type": "ingest"
    }
}

Response:

{
    "inference_id": "tests2",
    "task_type": "text_embedding",
    "service": "cohere",
    "service_settings": {
        "similarity": "dot_product",
        "dimensions": 1024,
        "model_id": "embed-english-v3.0",
        "rate_limit": {
            "requests_per_minute": 10000
        },
        "embedding_type": "float"
    },
    "task_settings": {
        "input_type": "ingest"
    },
    "parameters": {
        "input_type": "ingest"
    }
}

Get _inference/all

Response:

{
    "endpoints": [
        {
            "inference_id": ".elser-2",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_threads": 1,
                "model_id": ".elser_model_2",
                "adaptive_allocations": {
                    "enabled": true,
                    "min_number_of_allocations": 1,
                    "max_number_of_allocations": 8
                }
            },
            "task_settings": {},
            "parameters": {}
        },
        {
            "inference_id": "elser_endpoint1",
            "task_type": "sparse_embedding",
            "service": "elasticsearch",
            "service_settings": {
                "num_allocations": 1,
                "num_threads": 1,
                "model_id": ".elser_model_2"
            },
            "task_settings": {},
            "parameters": {}
        },
        {
            "inference_id": "testss",
            "task_type": "text_embedding",
            "service": "cohere",
            "service_settings": {
                "similarity": "dot_product",
                "dimensions": 1024,
                "model_id": "embed-english-v3.0",
                "rate_limit": {
                    "requests_per_minute": 10000
                },
                "embedding_type": "float"
            },
            "task_settings": {
                "input_type": "ingest"
            },
            "parameters": {
                "input_type": "ingest"
            }
        }
    ]
}

@elasticsearchmachine
Copy link
Collaborator

Hi @maxhniebergall, I've created a changelog YAML for you. Note that since this PR is labelled >deprecation, you need to update the changelog YAML to fill out the extended information sections.

Max Hniebergall added 4 commits October 7, 2024 15:13
@maxhniebergall maxhniebergall marked this pull request as ready for review October 7, 2024 21:28
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 7, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@maxhniebergall maxhniebergall changed the title [Inference API ] Add endpoint_version to deprecate task settings [Inference API ] Deprecate task_settings in favour of parameters Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>deprecation :ml Machine learning Team:ML Meta label for the ML team v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants