Skip to content

Conversation

Jan-Kazlouski-elastic
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented Sep 3, 2025

Update of the existing Google Vertex AI inference provider integration allowing performing completion (both streaming and non-streaming) and chat_completion (only streaming) of Anthropic provider models withing Google Model Garden.

Changes were tested locally against next anthropic models:

  • claude-3-5-haiku
Create Completion Endpoint

Success:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

With max_tokens in task settings:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Unknown Provider:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "unknown",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: [service_settings] Invalid value [unknown] received. [provider] must be one of [anthropic];"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: [service_settings] Invalid value [unknown] received. [provider] must be one of [anthropic];"
    },
    "status": 400
}

No Provider + No Google Vertex AI parameters:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: For Google Model Garden, you must provide either provider with uri and/or streaming_uri. For Google Vertex AI models, you must provide location, project_id, and model. Provided values: location=null, project_id=null, model=null, provider=null, url=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_url=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict.;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: For Google Model Garden, you must provide either provider with url and/or streaming_url. For Google Vertex AI models, you must provide location, project_id, and model. Provided values: location=null, project_id=null, model=null, provider=null, url=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_url=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict.;"
    },
    "status": 400
}

No URL + No Streaming URL + No Google Vertex AI parameters:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}}
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: For Google Model Garden, you must provide either provider with uri and/or streaming_uri. For Google Vertex AI models, you must provide location, project_id, and model. Provided values: location=null, project_id=null, model=null, provider=anthropic, url=null, streaming_url=null.;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: For Google Model Garden, you must provide either provider with uri and/or streaming_uri. For Google Vertex AI models, you must provide location, project_id, and model. Provided values: location=null, project_id=null, model=null, provider=anthropic, url=null, streaming_url=null.;"
    },
    "status": 400
}

URL + No Streaming URL (URL is default for both streaming/non-streaming):

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict"
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

No URL + Streaming URL (Streaming URL is default for both streaming/non-streaming):

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Not Found:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict2",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-completion] status [404]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-completion] status [404]"
        }
    },
    "status": 400
}

Perform Completion

Success Non Streaming:

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That's the opening line from William Gibson's groun"
        }
    ]
}

Success Streaming:

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion/_stream
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"This"},{"delta":" is"},{"delta":" the"},{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line of William"},{"delta":" Gibson's sem"}]}

event: message
data: [DONE]

Success Non Streaming with task_settings max_tokens:

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
{
    "completion": [
        {
            "result": "That's the opening line from William Gibson's groun"
        }
    ]
}

Success Streaming with task_settings max_tokens:

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion/_stream
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
event: message
data: {"completion":[{"delta":"This"},{"delta":" is"},{"delta":" the"},{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line of William"},{"delta":" Gibson's sem"}]}

event: message
data: [DONE]

Create Chat Completion Endpoint

Success:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS:
{
    "inference_id": "google-model-garden-anthropic-chat-completion",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Success with task_settings max_tokens:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS:
{
    "inference_id": "google-model-garden-anthropic-chat-completion",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Unknown Provider:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "unknown",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: [service_settings] Invalid value [unknown] received. [provider] must be one of [anthropic];"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: [service_settings] Invalid value [unknown] received. [provider] must be one of [anthropic];"
    },
    "status": 400
}

No url/streaming_url:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}}
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: For Google Model Garden, you must provide either provider with uri and/or streaming_uri. For Google Vertex AI models, you must provide location, project_id, and model. Provided values: location=null, project_id=null, model=null, provider=anthropic, url=null, streaming_url=null.;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: For Google Model Garden, you must provide either provider with uri and/or streaming_uri. For Google Vertex AI models, you must provide location, project_id, and model. Provided values: location=null, project_id=null, model=null, provider=anthropic, url=null, streaming_url=null.;"
    },
    "status": 400
}

Not found:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict2",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2"
    }
}
RS:
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-chat-completion] status [404]. Error message: [<!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 404 (Not Found)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>404.</b> <ins>That’s an error.</ins>\n  <p>The requested URL <code>/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2</code> was not found on this server.  <ins>That’s all we know.</ins>\n]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-chat-completion] status [404]. Error message: [<!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 404 (Not Found)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>404.</b> <ins>That’s an error.</ins>\n  <p>The requested URL <code>/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2</code> was not found on this server.  <ins>That’s all we know.</ins>\n]"
        }
    },
    "status": 400
}

No streaming_url (url is default for both streaming/non-streaming):

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict"
    }
}
RS:
{
    "inference_id": "google-model-garden-anthropic-chat-completion",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

No url (steraming_url is default for both streaming/non-streaming):

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ:
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS:
{
    "inference_id": "google-model-garden-anthropic-chat-completion",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}
Perform Chat Completion

Basic:

POST {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion/_stream
RQ
{
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 100
}
RS
event: message
data: {"id":"msg_vrtx_01X7rphiRiVbTsKBiEJ2ejjR","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of machine learning an"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"d artificial intelligence that uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial neural networks with"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" multiple layers ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"hence \""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"deep\") to progress"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ively extract"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" higher-level features from"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" raw input"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Here"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" are key characteristics"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n\n1"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Core Components"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Neural networks"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" with multiple hidden"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Ability"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" to learn complex patterns from"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" large datasets"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Inspired by the"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" structure of"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" human brain"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neurons\n\n2. Key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Techniques:\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Convolutional Neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Networks (CNNs"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":")\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":100,"prompt_tokens":0,"total_tokens":100}}

event: message
data: [DONE]


Complex

POST {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion/_stream
RQ
{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is the price of scarf?"
                }
            ]
        }
    ],
    "max_completion_tokens": 100,
    "temperature": 0.2,
    "top_p": 0.2,
    "tools": [
        {
            "type": "auto",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "auto",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS
event: message
data: {"id":"msg_vrtx_01C6WmQ5QCsxv9WZvSwniHiw","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":2,"prompt_tokens":330,"total_tokens":332}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"I'll"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" help"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" you find the current price"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of a scarf."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" I"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"'ll"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" use the get_current"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"_price function"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" to"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" retrieve"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" this"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" information."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"tool_calls":[{"index":0,"id":"toolu_vrtx_01Dug9z7HJSRPfAExfssoQ1z","function":{"arguments":"{}","name":"get_current_price"},"type":null}]},"index":1}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":""},"type":null}]},"index":1}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""},"type":null}]},"index":1}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"item\": \""},"type":null}]},"index":1}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"scarf\"}"},"type":null}]},"index":1}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"tool_use","index":0}],"model":null,"object":null,"usage":{"completion_tokens":85,"prompt_tokens":0,"total_tokens":85}}

event: message
data: [DONE]

@elasticsearchmachine elasticsearchmachine added v9.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 3, 2025
…ntegration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/action/GoogleVertexAiActionCreator.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/request/completion/GoogleVertexAiUnifiedChatCompletionRequest.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/action/GoogleVertexAiUnifiedChatCompletionActionTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionModelTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/googlevertexai/request/completion/GoogleVertexAiUnifiedChatCompletionRequestTests.java
…ntegration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
…ntegration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
… to support new content block types and improve parsing logic
… parser and add unit tests for response validation
…ity to validate serialization of user fields
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic marked this pull request as ready for review September 11, 2025 14:00
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 11, 2025
@Jan-Kazlouski-elastic
Copy link
Contributor Author

Hello @jonathan-buttner @dan-rubinstein
Could you please take a look at this PR? It is out of the draft state.

public static final TransportVersion ESQL_DOCUMENTS_FOUND_AND_VALUES_LOADED_8_19 = def(8_841_0_61);
public static final TransportVersion ESQL_PROFILE_INCLUDE_PLAN_8_19 = def(8_841_0_62);
public static final TransportVersion INITIAL_ELASTICSEARCH_8_19_4 = def(8_841_0_68);
public static final TransportVersion ML_INFERENCE_GOOGLE_MODEL_GARDEN_ADDED_8_19 = def(8_841_0_69);
Copy link
Contributor Author

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this needs to be removed. I haven't seen backports in a while. But Google Vertex AI is there for quite some time, so probably we'd require one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this, we won't be backporting the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

…ntegration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
…ntegration

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/googlevertexai/completion/GoogleVertexAiChatCompletionServiceSettings.java
@Jan-Kazlouski-elastic
Copy link
Contributor Author

@jonathan-buttner your comments are addressed. Could you please take a look at the PR once more?

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, couple more changes

}

public GoogleVertexAiChatCompletionTaskSettings(StreamInput in) throws IOException {
thinkingConfig = new ThinkingConfig(in);
TransportVersion version = in.getTransportVersion();
if (GoogleVertexAiUtils.supportsModelGarden(version)) {
maxTokens = Objects.requireNonNullElse(in.readOptionalInt(), DEFAULT_MAX_TOKENS);
maxTokens = in.readOptionalInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use readOptionalVInt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking. Done.

@@ -124,7 +124,9 @@ public TransportVersion getMinimalSupportedVersion() {
@Override
public void writeTo(StreamOutput out) throws IOException {
thinkingConfig.writeTo(out);
out.writeOptionalInt(maxTokens);
if (GoogleVertexAiUtils.supportsModelGarden(out.getTransportVersion())) {
out.writeOptionalInt(maxTokens);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use writeOptionalVInt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 189 to 204
delta = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta(
null,
null,
null,
List.of(
new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall(
0,
id,
new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall.Function(
input != null ? input.toString() : null,
name
),
null
)
)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, this might be better as:

var function = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall.Function(
    input != null ? input.toString() : null,
    name
);
var toolCall = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta.ToolCall(0, id, function, null);
delta = new StreamingUnifiedChatCompletionResults.ChatCompletionChunk.Choice.Delta(null, null, null, List.of(toolCall));

Similar changes can be made in the parseContentBlockDelta() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done!

…ntegration

# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.2.csv
…ntegration

# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.2.csv
@Jan-Kazlouski-elastic
Copy link
Contributor Author

@jonathan-buttner @DonalEvans

Your comments are addressed. Could you please review the fixes?

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@Jan-Kazlouski-elastic
Copy link
Contributor Author

Create Completion Endpoint

No Provider No URLs:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
    },
    "status": 400
}

Google Provider With URLs:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "google",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
    },
    "status": 400
}

Google Provider No URLs:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "google",
        "service_account_json": {{service_account_config}}
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: For Google Vertex AI models, you must provide 'location', 'project_id', and 'model_id'. Provided values: location=null, project_id=null, model_id=null;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: For Google Vertex AI models, you must provide 'location', 'project_id', and 'model_id'. Provided values: location=null, project_id=null, model_id=null;"
    },
    "status": 400
}

No URLs:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}}
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: Google Model Garden provider=anthropic selected. Either 'uri' or 'streaming_uri' must be provided;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: Google Model Garden provider=anthropic selected. Either 'uri' or 'streaming_uri' must be provided;"
    },
    "status": 400
}

Both URLs:


PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-1
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion-1",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    },
    "task_settings": {
        "max_tokens": 10
    }
}

Only Non-Streaming URL:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-2
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion-2",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    },
    "task_settings": {
        "max_tokens": 10
    }
}

Only Streaming URL:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-3
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion-3",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    },
    "task_settings": {
        "max_tokens": 10
    }
}

No Task Parameters:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-4
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-completion-4",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Not Found:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-5
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict2",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2"
    },
    "task_settings": {
        "max_tokens": 10
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-completion-5] status [404]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-completion-5] status [404]"
        }
    },
    "status": 400
}

Perform Non-Streaming Completion

Non-Streaming Both URLs


POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-1
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
{
    "completion": [
        {
            "result": "This is the famous opening line from William Gibson's groundbreaking cyberpunk novel \"Neu"
        }
    ]
}

Non-Streaming Only Non-Streaming URL


POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-2
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
{
    "completion": [
        {
            "result": "This is the famous opening line from William Gibson's seminal cyberpunk novel \"Neurom"
        }
    ]
}

Non-Streaming Only Streaming URL


POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-3
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
{
    "completion": [
        {
            "result": "This is the famous opening line from William Gibson's groundbreaking cyberpunk novel \"Neu"
        }
    ]
}

Non-Streaming Without Task Settings

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-4
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "This is the famous opening line from William Gibson's groundbreaking cyberpunk novel \"Neuromancer,\" published in 1984. The line is notable for its poetic description of the sky using a technological metaphor, which was characteristic of Gibson's innovative writing style.\n\nIn the era when the book was written, a dead television channel would typically display static - a gray, fuzzy, slightly shifting monochromatic screen. So the line suggests a sky that is gray, bleak, and somewhat undefined, creating an immediate atmosphere of technological melancholy.\n\nThis sentence has become one of the most famous opening lines in science fiction literature, often cited as an example of the cyberpunk genre's aesthetic: a gritty, technological world where technology and human experience are deeply intertwined.\n\nWould you like me to discuss more about the novel, its context, or cyberpunk as a literary genre?"
        }
    ]
}
Perform Streaming Completion

Streaming Both URLs


POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-1/_stream
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
event: message
data: {"completion":[{"delta":"This"}]}

event: message
data: {"completion":[{"delta":" is the"}]}

event: message
data: {"completion":[{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line of William"},{"delta":" Gibson"}]}

event: message
data: {"completion":[{"delta":"'s groun"}]}

event: message
data: {"completion":[{"delta":"dbreaking cyberpunk"}]}

event: message
data: {"completion":[{"delta":" novel \""}]}

event: message
data: {"completion":[{"delta":"Neurom"}]}

event: message
data: [DONE]


Streaming Only Non-Streaming URL

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-2/_stream
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
event: message
data: {"completion":[{"delta":"This is the"}]}

event: message
data: {"completion":[{"delta":" famous"},{"delta":" opening"},{"delta":" line from William"},{"delta":" Gibson"},{"delta":"'s sem"},{"delta":"inal cyb"},{"delta":"erpunk novel \""},{"delta":"Neurom"}]}

event: message
data: [DONE]


Streaming Only Streaming URL

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-3/_stream

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
        "max_tokens": 20
    }
}
RS
event: message
data: {"completion":[{"delta":"This is the"}]}

event: message
data: {"completion":[{"delta":" famous"},{"delta":" opening"},{"delta":" line from William"},{"delta":" Gibson"},{"delta":"'s sem"},{"delta":"inal cyb"},{"delta":"erpunk novel \""},{"delta":"Neurom"}]}

event: message
data: [DONE]


Streaming Without Task Settings

POST {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-4/_stream

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"This"},{"delta":" is"},{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" opening"}]}

event: message
data: {"completion":[{"delta":" line of William"}]}

event: message
data: {"completion":[{"delta":" Gibson"},{"delta":"'s sem"}]}

event: message
data: {"completion":[{"delta":"inal cyb"},{"delta":"erpunk novel \""}]}

event: message
data: {"completion":[{"delta":"Neuromancer,\""}]}

event: message
data: {"completion":[{"delta":" published in 1984"},{"delta":". It's"}]}

event: message
data: {"completion":[{"delta":" a"},{"delta":" famous"}]}

event: message
data: {"completion":[{"delta":" an"}]}

event: message
data: {"completion":[{"delta":"d much"},{"delta":"-discusse"}]}

event: message
data: {"completion":[{"delta":"d first"}]}

event: message
data: {"completion":[{"delta":" sentence that creates"}]}

event: message
data: {"completion":[{"delta":" an"}]}

event: message
data: {"completion":[{"delta":" ev"},{"delta":"ocative, mo"}]}

event: message
data: {"completion":[{"delta":"ody image"},{"delta":"."}]}

event: message
data: {"completion":[{"delta":" \n\nIn"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" early"}]}

event: message
data: {"completion":[{"delta":" 1980s,"}]}

event: message
data: {"completion":[{"delta":" when"},{"delta":" the book was written,"},{"delta":" a"},{"delta":" television"},{"delta":" tu"}]}

event: message
data: {"completion":[{"delta":"ned to a dead channel"}]}

event: message
data: {"completion":[{"delta":" woul"}]}

event: message
data: {"completion":[{"delta":"d display"}]}

event: message
data: {"completion":[{"delta":" static"}]}

event: message
data: {"completion":[{"delta":" -"}]}

event: message
data: {"completion":[{"delta":" a gray"}]}

event: message
data: {"completion":[{"delta":", fuz"}]}

event: message
data: {"completion":[{"delta":"zy,"}]}

event: message
data: {"completion":[{"delta":" slightly"}]}

event: message
data: {"completion":[{"delta":" blu"}]}

event: message
data: {"completion":[{"delta":"ish-"},{"delta":"white noise"}]}

event: message
data: {"completion":[{"delta":". So"},{"delta":" the line"}]}

event: message
data: {"completion":[{"delta":" suggests"}]}

event: message
data: {"completion":[{"delta":" a"},{"delta":" sky"}]}

event: message
data: {"completion":[{"delta":" that"}]}

event: message
data: {"completion":[{"delta":" is"}]}

event: message
data: {"completion":[{"delta":" blank"}]}

event: message
data: {"completion":[{"delta":", neutral"}]}

event: message
data: {"completion":[{"delta":", somewhat"}]}

event: message
data: {"completion":[{"delta":" bl"},{"delta":"eak and technological"}]}

event: message
data: {"completion":[{"delta":" -"}]}

event: message
data: {"completion":[{"delta":" which"}]}

event: message
data: {"completion":[{"delta":" perfectly"},{"delta":" sets"}]}

event: message
data: {"completion":[{"delta":" the tone"}]}

event: message
data: {"completion":[{"delta":" for the cyb"},{"delta":"erpunk genre"}]}

event: message
data: {"completion":[{"delta":" Gibson"}]}

event: message
data: {"completion":[{"delta":" helpe"},{"delta":"d create"}]}

event: message
data: {"completion":[{"delta":"."}]}

event: message
data: {"completion":[{"delta":"\n\nThe"}]}

event: message
data: {"completion":[{"delta":" line"}]}

event: message
data: {"completion":[{"delta":" is"},{"delta":" considere"}]}

event: message
data: {"completion":[{"delta":"d a masterful"}]}

event: message
data: {"completion":[{"delta":" piece"},{"delta":" of descript"},{"delta":"ive writing"}]}

event: message
data: {"completion":[{"delta":","}]}

event: message
data: {"completion":[{"delta":" using"},{"delta":" a"}]}

event: message
data: {"completion":[{"delta":" then"}]}

event: message
data: {"completion":[{"delta":"-contemporary"},{"delta":" technological"}]}

event: message
data: {"completion":[{"delta":" reference"}]}

event: message
data: {"completion":[{"delta":" to create a po"}]}

event: message
data: {"completion":[{"delta":"etic an"}]}

event: message
data: {"completion":[{"delta":"d atmospheric"}]}

event: message
data: {"completion":[{"delta":" description"}]}

event: message
data: {"completion":[{"delta":" of the"}]}

event: message
data: {"completion":[{"delta":" sky"}]}

event: message
data: {"completion":[{"delta":". It immediately"}]}

event: message
data: {"completion":[{"delta":" establishes the novel"}]}

event: message
data: {"completion":[{"delta":"'s blen"},{"delta":"d of high"}]}

event: message
data: {"completion":[{"delta":"-tech imagery"}]}

event: message
data: {"completion":[{"delta":" and g"}]}

event: message
data: {"completion":[{"delta":"ritty, dystop"}]}

event: message
data: {"completion":[{"delta":"ian moo"}]}

event: message
data: {"completion":[{"delta":"d.\n\nWoul"}]}

event: message
data: {"completion":[{"delta":"d you like me to discuss"}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" novel"},{"delta":","}]}

event: message
data: {"completion":[{"delta":" the"}]}

event: message
data: {"completion":[{"delta":" line"}]}

event: message
data: {"completion":[{"delta":", or cyb"}]}

event: message
data: {"completion":[{"delta":"erpunk literature"},{"delta":" further"}]}

event: message
data: {"completion":[{"delta":"?"}]}

event: message
data: [DONE]


@Jan-Kazlouski-elastic
Copy link
Contributor Author

Jan-Kazlouski-elastic commented Sep 29, 2025

Create Chat Completion Endpoint

No Provider No URLs:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
    },
    "status": 400
}

Google Provider With URLs:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "google",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: 'provider' is either GOOGLE or null. For Google Vertex AI models 'uri' and 'streaming_uri' must not be provided. Remove 'url' and 'streaming_url' fields. Provided values: uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict, streaming_uri=https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict;"
    },
    "status": 400
}

Google Provider No URLs:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "google",
        "service_account_json": {{service_account_config}}
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: For Google Vertex AI models, you must provide 'location', 'project_id', and 'model_id'. Provided values: location=null, project_id=null, model_id=null;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: For Google Vertex AI models, you must provide 'location', 'project_id', and 'model_id'. Provided values: location=null, project_id=null, model_id=null;"
    },
    "status": 400
}

No URLs:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}}
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: Google Model Garden provider=anthropic selected. Either 'uri' or 'streaming_uri' must be provided;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: Google Model Garden provider=anthropic selected. Either 'uri' or 'streaming_uri' must be provided;"
    },
    "status": 400
}

Both URLs:


PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion-1
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-chat-completion-1",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    },
    "task_settings": {
        "max_tokens": 10
    }
}

Only Non-Streaming URL:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion-2
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-chat-completion-2",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    },
    "task_settings": {
        "max_tokens": 5
    }
}

Only Streaming URL:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-3
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-chat-completion-3",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    },
    "task_settings": {
        "max_tokens": 5
    }
}

No Task Parameters:

PUT {{base-url}}/_inference/completion/google-model-garden-anthropic-completion-4
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict"
    }
}
RS
{
    "inference_id": "google-model-garden-anthropic-chat-completion-4",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict",
        "provider": "ANTHROPIC",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

Not Found:

PUT {{base-url}}/_inference/chat_completion/google-model-garden-anthropic-chat-completion-5
RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "provider": "anthropic",
        "service_account_json": {{service_account_config}},
        "url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:rawPredict2",
        "streaming_url": "https://us-east5-aiplatform.googleapis.com/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2"
    },
    "task_settings": {
        "max_tokens": 5
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "unified_chat_completion_exception",
                "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-chat-completion] status [404]. Error message: [<!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 404 (Not Found)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>404.</b> <ins>That’s an error.</ins>\n  <p>The requested URL <code>/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2</code> was not found on this server.  <ins>That’s all we know.</ins>\n]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "unified_chat_completion_exception",
            "reason": "Received an unsuccessful status code for request from inference entity id [google-model-garden-anthropic-chat-completion] status [404]. Error message: [<!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 404 (Not Found)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>404.</b> <ins>That’s an error.</ins>\n  <p>The requested URL <code>/v1/projects/elastic-cloud-dev/locations/us-east5/publishers/anthropic/models/claude-3-5-haiku:streamRawPredict2</code> was not found on this server.  <ins>That’s all we know.</ins>\n]"
        }
    },
    "status": 400
}

Testing of Performing Streaming Chat Completion is done and it is confirmed to be successful.

@Jan-Kazlouski-elastic
Copy link
Contributor Author

Perform Chat Completion

Both URLs

event: message
data: {"id":"msg_vrtx_01DtbJbxQxXDC2NM98ex3eUY","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":5,"prompt_tokens":0,"total_tokens":5}}

event: message
data: [DONE]


Both URLs With Max Tokens in RQ

event: message
data: {"id":"msg_vrtx_01H9UMg6c8ey3rhtAsN6uomT","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of machine learning that uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural networks with multiple layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"hence"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" \""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"deep\") to progress"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ively extract"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" higher"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"-level features"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" from"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" raw"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" input"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Here"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" are key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" points about deep learning:"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n1"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Core"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Characteristics"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural networks with many"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Can"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" automatically"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" learn representations"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" from data\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Mim"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ics the way"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" human brain"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" processes information"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Capable"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of learning complex patterns an"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"d features\n\n2. Key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Components"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n- Neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" networks"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" with multiple"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" hidden"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":100,"prompt_tokens":0,"total_tokens":100}}

event: message
data: [DONE]


Only Non-Streaming URL

event: message
data: {"id":"msg_vrtx_018Ci31jn9VXVh3F8SGuePA5","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":5,"prompt_tokens":0,"total_tokens":5}}

event: message
data: [DONE]


Only Non-Streaming URL With Max Tokens in RQ

event: message
data: {"id":"msg_vrtx_01KiN7pMLqYfXWGvsR7wfkzg","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of machine learning that uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial neural networks with"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" multiple layers ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"hence"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" \""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"deep\") to progress"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ively extract higher"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"-level features from raw"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" input"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Here"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" are key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" points about deep learning:"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n1"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Core"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Characteristics"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural networks with many"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" hidden"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Can"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" automatically"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" learn features"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" from data\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Capable of handling"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" complex"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":", non-linear relationships"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Requires"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" large amounts of data"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" for"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" training\n\n2. Neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Network"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Structure"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n- Input"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layer"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":100,"prompt_tokens":0,"total_tokens":100}}

event: message
data: [DONE]


Only Streaming URL

event: message
data: {"id":"msg_vrtx_01AwvZxPifsMPLhWyDCy92Zo","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":5,"prompt_tokens":0,"total_tokens":5}}

event: message
data: [DONE]


Only Streaming URL With Max Tokens in RQ

event: message
data: {"id":"msg_vrtx_01MGYZfdQf2LpreTA6xgJexk","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of machine learning that uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial neural networks with multiple"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"hence \"deep\") to"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" progress"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ively extract"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" higher-level features from"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" raw"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" input."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Here"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" are key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" characteristics"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n1"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Core"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Components"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural networks"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Multiple hidden"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Ability"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" to learn from large"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" amounts"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of data"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Mim"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ics human"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" brain"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"'s"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" processing"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n2. Key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Techniques"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n- Con"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"volutional Neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Networks (CNNs"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":")"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Rec"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":100,"prompt_tokens":0,"total_tokens":100}}

event: message
data: [DONE]


Both URLs No task settings on creation

event: message
data: {"id":"msg_vrtx_014AvDDsgg3BhcwxoQxRwuDR","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of machine learning that uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial neural networks with"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" multiple layers ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"hence"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" \""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"deep\") to progress"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ively extract"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" higher-level features from"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" raw"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" input"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" characteristics"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" include"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n\n1"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Network"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Architecture"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Consists"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of interconn"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ected layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neurons"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Multiple"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" hidden layers between"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" input and output layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Capable"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of learning complex"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":","},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" non-linear relationships"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n2. Key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Features"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Automatic"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" feature extraction"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Can"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" handle"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" un"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"structured data (images"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":", text, audio)"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Self"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"-learning and adaptive"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" capabilities"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n3. Common"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Applications\n- Image"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" recognition"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Speech"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" recognition\n- Natural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" language processing\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Autonomous"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" vehicles\n- Medical"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" diagnosis"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Predict"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ive analytics\n\n4"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Learning"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Techniques"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Supervised learning"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Unsuperv"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ised learning\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Reinforcement learning\n\n5"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Popular"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Architectures\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Convolutional Neural"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Networks (CNNs"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":")\n- Rec"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"urrent Neural Networks (R"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"NNs)"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Generative"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Adversarial Networks ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"GANs)"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Transform"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ers\n\n6"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Advantages\n- High"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" accuracy"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Can"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" handle complex, large"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"-scale datasets\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Reduces"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" nee"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"d for manual feature engineering"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n7"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Challenges\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Requires large"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" amounts"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of training"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" data\n- Comput"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ationally intensive\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Can"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" be difficult"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" to interpret"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\nDeep learning has"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" revolutionized artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" intelligence by"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" enabling"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" more"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" sophisticate"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"d an"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"d nu"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"anced machine"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" learning approaches"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"end_turn","index":0}],"model":null,"object":null,"usage":{"completion_tokens":300,"prompt_tokens":0,"total_tokens":300}}

event: message
data: [DONE]


Both URLs No task settings on creation With Max Tokens in RQ

event: message
data: {"id":"msg_vrtx_018RE7gV36k9m8KGnKTeMnGa","choices":[{"delta":{"role":"assistant"},"index":0}],"model":"claude-3-5-haiku-20241022","object":null,"usage":{"completion_tokens":5,"prompt_tokens":12,"total_tokens":17}}

event: message
data: {"id":null,"choices":[{"delta":{"content":""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"Deep learning is a subset"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" of machine learning that uses"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural networks with multiple layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"hence"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" \""},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"deep\") to progress"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"ively extract"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" higher"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"-level features from"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" raw"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" input"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"."},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Here"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" are key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" characteristics"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n\n1"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Core"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Components"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n-"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Artificial"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" neural networks"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Multiple hidden layers"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Complex"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" pattern"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" recognition"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Ability"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" to learn from large"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" amounts of data\n\n2"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":". Key"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Techniques"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":":\n- Con"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"volutional Neural Networks ("},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"CNNs)"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":"\n- Recurrent"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{"content":" Neural Networks (R"},"index":0}],"model":null,"object":null}

event: message
data: {"id":null,"choices":[{"delta":{},"finish_reason":"max_tokens","index":0}],"model":null,"object":null,"usage":{"completion_tokens":100,"prompt_tokens":0,"total_tokens":100}}

event: message
data: [DONE]


@Jan-Kazlouski-elastic
Copy link
Contributor Author

Regression Tests for Google Vertex AI.

Create Completion endpoint

Success


RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": {{service_account_config}},
        "model_id": "gemini-2.5-pro",
        "location": "us-central1",
        "project_id": "project_id"
    }
}
RS
{
    "inference_id": "google-vertex-ai-completion",
    "task_type": "completion",
    "service": "googlevertexai",
    "service_settings": {
        "project_id": "project_id",
        "location": "us-central1",
        "model_id": "gemini-2.5-pro",
        "provider": "GOOGLE",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}

No model_id


RQ
{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": {{service_account_config}},
        "location": "us-central1",
        "project_id": "project_id"
    }
}

RS
{
    "error": {
        "root_cause": [
            {
                "type": "validation_exception",
                "reason": "Validation Failed: 1: For Google Vertex AI models, you must provide 'location', 'project_id', and 'model_id'. Provided values: location=us-central1, project_id=1014491842772, model_id=null;"
            }
        ],
        "type": "validation_exception",
        "reason": "Validation Failed: 1: For Google Vertex AI models, you must provide 'location', 'project_id', and 'model_id'. Provided values: location=us-central1, project_id=1014491842772, model_id=null;"
    },
    "status": 400
}

Perform Non-Streaming Completion

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That is the iconic opening line of William Gibson's 1984 debut novel, **Neuromancer**.\n\nIt is widely regarded as one of the most brilliant and effective opening sentences in modern literature, especially within science fiction. Here’s a breakdown of why it's so powerful:\n\n### 1. Instant World-Building\nIn a single sentence, Gibson establishes the entire mood and setting of his novel. This isn't a world of blue skies and natural beauty. It's a world where the man-made has superseded the natural to such an extent that the sky itself is described in terms of a technological failure. It immediately signals a gritty, polluted, and dystopian future.\n\n### 2. Subverting Poetic Language\nThe sentence structure is traditionally poetic (\"The sky above the port...\"), but the simile at the end is jarringly modern, ugly, and technical. This clash between a classic literary form and a piece of low-tech jargon perfectly encapsulates the \"high tech, low life\" ethos of the cyberpunk genre that Gibson was pioneering.\n\n### 3. Establishing Tone\nThe image is bleak and unsettling. A \"dead channel\" implies a lack of signal, an absence of information, a void. It's not a peaceful, uniform gray, but a staticky, lifeless, and oppressive color. This sets a noir-ish, melancholic, and anxious tone that persists throughout the novel.\n\n### 4. A Generational Image\nThe meaning of the line has evolved with technology, which is a fascinating aspect of its legacy:\n*   **In 1984:** A \"dead channel\" on an analog CRT television was a screen of flickering, staticky, light-gray noise. This is the image Gibson intended—a dynamic, unpleasant, and textured sky.\n*   **Today:** For younger readers, a \"dead channel\" might be a solid blue or black screen from a modern digital TV or monitor. While different from the original intent, this new interpretation—a flat, empty, digital void—still powerfully conveys a sense of technological alienation.\n\nWilliam Gibson himself has commented on this generational shift, acknowledging that he was thinking of the specific static of an old black-and-white TV, but finds the modern \"blue screen of death\" interpretation equally valid and interesting.\n\nIn short, that one sentence is a masterclass in economical writing. It establishes the setting, tone, and central themes of *Neuromancer* before the story has even begun."
        }
    ]
}

Perform Streaming Completion

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"That's the legendary opening line of William Gibson's 1984 novel, ***Neuromancer***.\n\nIt is widely considered one of the most effective and iconic opening sentences in modern literature, especially within science fiction. It does a phenomenal amount of work in just 14 words.\n\nHere's"}]}

event: message
data: {"completion":[{"delta":" a breakdown of why it's so brilliant:\n\n### 1. It Establishes the Genre (Cyberpunk)\nIn a single stroke, the line marries the natural world with defunct technology.\n*   **The Natural:** \"The sky above the port\" is a classic, almost poetic, scenic description"}]}

event: message
data: {"completion":[{"delta":".\n*   **The Artificial & Decayed:** \"...the color of television tuned to a dead channel\" is jarring, modern, and specific. It's not the grey of a storm cloud; it's the specific, staticky, lifeless grey of technological failure.\n\nThis fusion of the natural world with technology ("}]}

event: message
data: {"completion":[{"delta":"often in a state of decay) is the absolute heart of the cyberpunk genre. Gibson didn't just describe a scene; he announced a new literary sensibility.\n\n### 2. It Sets the Mood (Tone)\nThe image is incredibly bleak. A \"dead channel\" evokes feelings of:\n*   **Em"}]}

event: message
data: {"completion":[{"delta":"ptiness:** A lack of signal, no information, a void.\n*   **Alienation:** It's an unnatural, disorienting sight.\n*   **Dystopia:** This is not a beautiful, clear blue sky. It's polluted, oppressive, and grim. The world the characters inhabit is broken"}]}

event: message
data: {"completion":[{"delta":", much like the television.\n\nThis is the \"low life\" part of cyberpunk's \"high tech, low life\" motto.\n\n### 3. It's a Masterclass in Imagery and Simile\nThe simile is potent and multi-sensory. When reading it, especially for the first time in"}]}

event: message
data: {"completion":[{"delta":" the 1980s, you don't just *see* the fuzzy, colorless static; you can almost *hear* the white-noise hiss that accompanies it. It's an image that's both visual and auditory, creating a powerful and immersive experience for the reader from the very first sentence"}]}

event: message
data: {"completion":[{"delta":".\n\n### The Evolution of its Meaning\nOne of the most fascinating aspects of this line is how its meaning has changed with technology.\n\n*   **In 1984:** An analog television tuned to a dead channel displayed a screen of flickering black-and-white static. This is the image Gibson intended"}]}

event: message
data: {"completion":[{"delta":".\n*   **Today:** A modern television or monitor \"tuned to a dead channel\" often displays a solid blue or black screen, perhaps with a \"No Signal\" message.\n\nWilliam Gibson himself has commented on this. He's joked that if he were writing it today, he might have to say the"}]}

event: message
data: {"completion":[{"delta":" sky was \"the color of a blue screen of death.\"\n\nThis technological shift doesn't diminish the line's power; it anchors it in a specific historical and technological moment. It makes the novel a kind of artifact of the very era it was critiquing, turning the sentence itself into a piece of retro"}]}

event: message
data: {"completion":[{"delta":"-tech. Even if a younger reader has to look up what \"television static\" looks like, the core idea—describing the natural world in terms of technological failure—remains as powerful as ever."}]}

event: message
data: [DONE]

Create Chat Completion endpoint
RQ

{
    "service": "googlevertexai",
    "service_settings": {
        "service_account_json": {{service_account_config}},
        "model_id": "gemini-2.5-pro",
        "location": "us-central1",
        "project_id": "project_id"
    }
}

RS

{
    "inference_id": "google-vertex-ai-chat-completion",
    "task_type": "chat_completion",
    "service": "googlevertexai",
    "service_settings": {
        "project_id": "project_id",
        "location": "us-central1",
        "model_id": "gemini-2.5-pro",
        "provider": "GOOGLE",
        "rate_limit": {
            "requests_per_minute": 1000
        }
    }
}
Perform Chat Completion

RQ
{
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 100
}
RS
event: message
data: {"id":"54zaaLahA-OcmecPmbjUCA","choices":[{"delta":{"content":"Of course! Here","role":"model"},"finish_reason":"MAX_TOKENS","index":0}],"model":"gemini-2.5-pro","object":"chat.completion.chunk","usage":{"completion_tokens":4,"prompt_tokens":5,"total_tokens":103}}

event: message
data: [DONE]


@Jan-Kazlouski-elastic
Copy link
Contributor Author

@jonathan-buttner
Testing is finished. All good. We're ready to merge.

@jonathan-buttner jonathan-buttner enabled auto-merge (squash) September 29, 2025 14:06
@jonathan-buttner jonathan-buttner merged commit d18da3c into elastic:main Sep 29, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants