Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Jan-Kazlouski-elastic · 2025-10-15T13:56:43Z

Creation of new OpenShift AI inference provider integration allowing

text_embedding,
completion (both streaming and non-streaming),
chat_completion (only streaming)
rerank

tasks to be executed as part of inference API with openshiftai provider.

Changes were tested locally against next models:

gritlm-7b (text_embedding)
llama-31-8b-instruct (completion and chat_completion)
bge-reranker-v2-m3 (rerank) (return_documents param is defined in API reference but is ignored by the model)

Test results:

EMBEDDINGS

Create Embeddings Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-embeddings-url}}",
        "api_key": "{{openshift-ai-embeddings-token}}",
        "model_id": "gritlm-7b"
    }
}
RS
{
    "inference_id": "openshift-ai-text-embedding",
    "task_type": "text_embedding",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "gritlm-7b",
        "url": "{{openshift-ai-embeddings-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        },
        "dimensions": 4096,
        "similarity": "dot_product",
        "dimensions_set_by_user": false
    },
    "chunking_settings": {
        "strategy": "sentence",
        "max_chunk_size": 250,
        "sentence_overlap": 1
    }
}

Create Embeddings Endpoint (404)

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{invalid-url}}",
        "api_key": "{{openshift-ai-embeddings-token}}",
        "model_id": "gritlm-7b"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://gritlm-7b-elastic.apps.851f0d88-elastic.openshiftpartnerlabs.com/v1/embeddings2] for request from inference entity id [openshift-ai-text-embedding-2] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://gritlm-7b-elastic.apps.851f0d88-elastic.openshiftpartnerlabs.com/v1/embeddings2] for request from inference entity id [openshift-ai-text-embedding-2] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}

Perform Embeddings

RQ
{
    "input": [
        "The sky above the port was the color of television tuned to a dead channel.",
        "The sky above the port was the color of television tuned to a dead channel."
    ]
}
RS
{
    "text_embedding": [
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        },
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        }
    ]
}

COMPLETION

Create Completion Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-completion",
    "task_type": "completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}

Create Completion Endpoint (Redirection error)

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{invalid-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Unhandled redirection for request from inference entity id [openshift-ai-completion2] status [302]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Unhandled redirection for request from inference entity id [openshift-ai-completion2] status [302]"
        }
    },
    "status": 400
}

Perform Non-Streaming Completion

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That's a famous opening line from George Orwell's novel \"1984\". The full quote is:\n\n\"He gazed up at the grey sky, which was like the colour of television tuned to a dead channel.\"\n\nIn the novel, the sky is a perpetual grey, which is a metaphor for the bleak and oppressive atmosphere of the totalitarian society that Orwell describes. The comparison to a dead TV channel is also significant, as it suggests a lack of signal, a lack of information, and a lack of life.\n\nOrwell wrote \"1984\" in 1948-49, as a warning about the dangers of totalitarianism and the erosion of individual freedom. The novel has become a classic of dystopian literature and a powerful commentary on the human condition."
        }
    ]
}

Perform Streaming Completion

RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"The"},{"delta":" quote"}]}

event: message
data: {"completion":[{"delta":" \""},{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sky"},{"delta":" above"}]}

event: message
data: [DONE]

CHAT COMPLETION

Create Chat Completion Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-chat-completion",
    "task_type": "chat_completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}

Perform Basic Chat Completion

RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 10
}
RS
event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"**"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":40,"total_tokens":50}}

event: message
data: [DONE]

Perform Tool Call Chat Completion

RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's the price of a scarf?"
                }
            ]
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS
event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-e425f3a8f702434a80d3896bbe5cb36c","function":{"arguments":"{\"","name":"get_current_price"},"type":"function"}]},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":172,"total_tokens":182}}

event: message
data: [DONE]

RERANK

Create Rerank Endpoint

RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-rerank-url}}",
        "api_key": "{{openshift-ai-rerank-token}}",
        "model_id": "bge-reranker-v2-m3"
    }
}
RS
{
    "inference_id": "openshift-ai-rerank",
    "task_type": "rerank",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "bge-reranker-v2-m3",
        "url": "{{openshift-ai-rerank-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}

Perform Rerank

RQ
{
    "input": [
        "luke",
        "like",
        "leia",
        "chewy",
        "r2d2",
        "star",
        "wars"
    ],
    "query": "star wars main character",
    "top_n": 2
}
RS
{
    "rerank": [
        {
            "index": 0,
            "relevance_score": 0.28466797,
            "text": "luke"
        },
        {
            "index": 3,
            "relevance_score": 0.23522949,
            "text": "chewy"
        }
    ]
}

- Have you signed the contributor license agreement?
- Have you followed the contributor guidelines?
- If submitting code, have you built your formula locally prior to submission with gradle check?
- If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
- If submitting code, have you checked that your submission is for an OS and architecture that we support?
- If you are submitting this code for a class then read our policy for that.

…nd reranking

… names and add changelog

…andling

…t AI chat completion

…ogic and update dimensionsSetByUser handling

…ax tokens and add unit tests for request creation and validation

…tion logic

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

…rankTaskSettings

…ServiceTests for improved readability and accuracy

… documentation

Jan-Kazlouski-elastic · 2025-10-21T08:38:25Z

Hello @jonathan-buttner @dan-rubinstein @DonalEvans
PR is out of draft state and ready to be reviewed.

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

…nd consistency

…larity in OpenShift AI integration

…ency

Jan-Kazlouski-elastic · 2025-11-10T23:01:21Z

Hi @DonalEvans
Your comments are addressed. Could you please take another look at this PR?

DonalEvans

Just a few small changes, thanks for addressing all of the other comments!

DonalEvans · 2025-11-11T16:15:04Z

...est/java/org/elasticsearch/xpack/inference/services/openshiftai/OpenShiftAiServiceTests.java

    }

    public void testParseRequestConfig_CreatesAnEmbeddingsModelWhenChunkingSettingsProvided() throws IOException {
+        var chunkingSettingsMap = createRandomChunkingSettings();


This object is an instance of ChunkingSettings rather than a Map so the name is a little misleading.

Nice catch. Fixed.

...est/java/org/elasticsearch/xpack/inference/services/openshiftai/OpenShiftAiServiceTests.java

DonalEvans · 2025-11-11T16:36:21Z

...elasticsearch/xpack/inference/services/openshiftai/action/OpenShiftAiActionCreatorTests.java

+            );
+            var action = actionCreator.create(
+                model,
+                new HashMap<>(Map.of(OpenShiftAiRerankTaskSettings.RETURN_DOCUMENTS, false, OpenShiftAiRerankTaskSettings.TOP_N, 1))


Could the values of false and 1 be extracted to local variables here so they can be reused on line 698?

Moved to the constants with understandable names.

DonalEvans · 2025-11-11T16:38:57Z

...elasticsearch/xpack/inference/services/openshiftai/action/OpenShiftAiActionCreatorTests.java

+
+            PlainActionFuture<InferenceServiceResults> listener = new PlainActionFuture<>();
+            action.execute(
+                new QueryAndDocsInputs(QUERY_VALUE, DOCUMENTS_VALUE, true, 2, false),


The values true and 2 can be extracted to local variables and reused on line 805.

Moved to the constants with understandable names.

DonalEvans · 2025-11-11T16:40:37Z

...elasticsearch/xpack/inference/services/openshiftai/action/OpenShiftAiActionCreatorTests.java

+
+            PlainActionFuture<InferenceServiceResults> listener = new PlainActionFuture<>();
+            action.execute(
+                new QueryAndDocsInputs(QUERY_VALUE, documents, false, 1, false),


false and 1 can be extracted to local variables and reused on line 849.

Moved to the constants with understandable names.

Also added test case for absent model, removed redundant documents local variables in favor of constant values, removed forbidden String.formatted API calls.

…ency

Jan-Kazlouski-elastic · 2025-11-12T02:06:49Z

@DonalEvans your comments are addressed. PR is ready to be reviewed once more. Also could you please add yourself as assignee and add appropriate enhancement type label

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

DonalEvans

Sorry, was a bit premature in my approval, the InferenceGetServicesIT class needs to be updated to include OpenShift AI in the appropriate lists of expected services.

Jan-Kazlouski-elastic · 2025-11-12T18:29:40Z

Thanks @DonalEvans.
So every integration is to be added to these lists again. Noted.

Added openshift_ai to the lists and pushed my commit.
Also turns out there is a change to inner structure of EmbeddingsInput merged recently.
Initiated check task. Waiting for it to finish locally.
Let's wait for the PR checks and see if Serverless Checks go smoothly. It failed previously but I don't have access to Elasticsearch Serverless Checks failure that was there before checks got retriggered. Perhaps you have access? What was the reason? For google model garden fix it has failed as well.

DonalEvans · 2025-11-12T18:41:29Z

What was the reason?

There's a known issue causing failures in serverless tests right now, the test in question should have been muted, but it might take a little while for the checks to start passing.

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

…-json * upstream/main: (158 commits) Cleanup files from repo root folder (elastic#138030) Implement OpenShift AI integration for chat completion, embeddings, and reranking (elastic#136624) Optimize AsyncSearchErrorTraceIT to avoid failures (elastic#137716) Removes support for null TransportService in RemoteClusterService (elastic#137939) Mute org.elasticsearch.index.mapper.DateFieldMapperTests testSortShortcuts elastic#138018 rest-api-spec: fix type of enums (elastic#137521) Update Gradle wrapper to 9.2.0 (elastic#136155) Add RCS Strong Verification Documentation (elastic#137822) Use docvalue skippers on dimension fields (elastic#137029) Introduce INDEX_SHARD_COUNT_FORMAT (elastic#137210) Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesChatCompletion_AndThenCreatesTextEmbedding elastic#138012 Fix ES|QL search context creation to use correct results type (elastic#137994) Improve Snapshot Logging (elastic#137470) Support extra output field in TOP function (elastic#135434) Remove NumericDoubleValues class (elastic#137884) [ML] Fix ML calendar event update scalability issues (elastic#136886) Task may be unregistered outside of the trace context in exceptional cases. (elastic#137865) Refine workaround for S3 repo analysis known issue (elastic#138000) Additional DEBUG logging on authc failures (elastic#137941) Cleanup index resolution (elastic#137867) ...

Implement OpenShift AI integration for chat completion, embeddings, a…

3c67123

…nd reranking

elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 15, 2025

Jan-Kazlouski-elastic and others added 22 commits October 15, 2025 17:13

Refactor OpenShift AI service settings to use underscores in constant…

fdb22ff

… names and add changelog

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

8ce569e

Add constructor to OpenShiftAiChatCompletionServiceSettings for URL h…

b268e08

…andling

Add unit tests

9cae6b1

[CI] Auto commit changes from spotless

f804331

Add tests for UnifiedCompletionRequest model ID overrides in OpenShif…

af2fcd6

…t AI chat completion

Add unit tests for OpenShiftAiChatCompletionResponseHandler

b98d8d6

Add unit tests for OpenShiftAiChatCompletionServiceSettings

b19342f

Update request type description in OpenShiftAiCompletionResponseHandler

6af168c

Refactor OpenShiftAiEmbeddingsServiceSettings to improve validation l…

aadbfde

…ogic and update dimensionsSetByUser handling

Update OpenShiftAiChatCompletionRequestEntity to use new method for m…

fc5c182

…ax tokens and add unit tests for request creation and validation

Add unit tests for OpenShiftAiChatCompletionRequestEntity serialization

d664644

Add unit tests for OpenShiftAiEmbeddingsRequest and update model crea…

e6d4079

…tion logic

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

e0ecbc7

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

Add unit tests for OpenShiftAiEmbeddingsRequestEntity

fb31094

Fix Typo in OpenShiftAiRerankTaskSettings, add tests for request models

8e78337

Add unit tests for OpenShiftAiRerankServiceSettings and OpenShiftAiRe…

ce2cf92

…rankTaskSettings

[CI] Auto commit changes from spotless

6c6dfe5

Add unit tests for OpenShiftAiRerankServiceSettings and OpenShiftAiRe…

52d439f

…rankTaskSettings

Refactor tests in OpenShiftAIRerankRequestEntityTests and OpenShiftAi…

d63f84f

…ServiceTests for improved readability and accuracy

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

63c2a58

Enhance OpenShift AI service with detailed comments and utility class…

0a6da54

… documentation

Jan-Kazlouski-elastic marked this pull request as ready for review October 21, 2025 08:37

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 21, 2025

szybia added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Oct 21, 2025

Jan-Kazlouski-elastic added 7 commits November 10, 2025 15:38

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

30319bd

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

Refactor OpenShift AI action creator tests for improved readability a…

1de3b06

…nd consistency

Add DIMENSIONS_SET_BY_USER constant and refactor variable names for c…

9a7586f

…larity in OpenShift AI integration

Update OpenShift AI embeddings request tests to pass null for dimensions

bb54f4b

Refactor OpenShift AI test constants for improved clarity and consist…

5fd79a7

…ency

Refactor OpenShift AI test field names for clarity and consistency

6aa70a6

Add validation tests for invalid and empty URL in OpenShift AI settings

e5c58b4

Jan-Kazlouski-elastic requested a review from DonalEvans November 10, 2025 23:00

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

4a0c7ba

DonalEvans reviewed Nov 11, 2025

View reviewed changes

Refactor OpenShift AI test constants for improved clarity and consist…

0e1b14b

…ency

Jan-Kazlouski-elastic requested a review from DonalEvans November 12, 2025 02:03

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

f242b74

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

DonalEvans self-assigned this Nov 12, 2025

DonalEvans added the >enhancement label Nov 12, 2025

DonalEvans approved these changes Nov 12, 2025

View reviewed changes

DonalEvans requested changes Nov 12, 2025

View reviewed changes

Jan-Kazlouski-elastic added 3 commits November 12, 2025 20:12

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

a9974cc

Add "openshift_ai" to various service lists in InferenceGetServicesIT

70bebeb

Fix embeddings input handling in OpenShiftAiActionCreator

b30282d

DonalEvans approved these changes Nov 12, 2025

View reviewed changes

Jan-Kazlouski-elastic mentioned this pull request Nov 12, 2025

Add OpenShift AI integration specifications elastic/elasticsearch-specification#5662

Merged

2 tasks

Merge remote-tracking branch 'origin/main' into openshift-ai-integration

6825e6c

# Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv

Jan-Kazlouski-elastic mentioned this pull request Nov 13, 2025

Add OpenShift AI REST API specification #138025

Merged

6 tasks

DonalEvans merged commit f87b3c4 into elastic:main Nov 13, 2025
36 checks passed

Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Implement OpenShift AI integration for chat completion, embeddings, and reranking #136624

Uh oh!

Conversation

Jan-Kazlouski-elastic commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jan-Kazlouski-elastic commented Oct 21, 2025

Uh oh!

Jan-Kazlouski-elastic commented Nov 10, 2025

Uh oh!

DonalEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic commented Nov 12, 2025

Uh oh!

DonalEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DonalEvans commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jan-Kazlouski-elastic commented Oct 15, 2025 •

edited

Loading

Jan-Kazlouski-elastic commented Nov 12, 2025 •

edited

Loading