Skip to content

Conversation

@Jan-Kazlouski-elastic
Copy link
Contributor

@Jan-Kazlouski-elastic Jan-Kazlouski-elastic commented Oct 15, 2025

Creation of new OpenShift AI inference provider integration allowing

  • text_embedding,
  • completion (both streaming and non-streaming),
  • chat_completion (only streaming)
  • rerank

tasks to be executed as part of inference API with openshiftai provider.

Changes were tested locally against next models:

  • gritlm-7b (text_embedding)
  • llama-31-8b-instruct (completion and chat_completion)
  • bge-reranker-v2-m3 (rerank) (return_documents param is defined in API reference but is ignored by the model)

Test results:

EMBEDDINGS
Create Embeddings Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-embeddings-url}}",
        "api_key": "{{openshift-ai-embeddings-token}}",
        "model_id": "gritlm-7b"
    }
}
RS
{
    "inference_id": "openshift-ai-text-embedding",
    "task_type": "text_embedding",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "gritlm-7b",
        "url": "{{openshift-ai-embeddings-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        },
        "dimensions": 4096,
        "similarity": "dot_product",
        "dimensions_set_by_user": false
    },
    "chunking_settings": {
        "strategy": "sentence",
        "max_chunk_size": 250,
        "sentence_overlap": 1
    }
}
Create Embeddings Endpoint (404)
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{invalid-url}}",
        "api_key": "{{openshift-ai-embeddings-token}}",
        "model_id": "gritlm-7b"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Resource not found at [https://gritlm-7b-elastic.apps.851f0d88-elastic.openshiftpartnerlabs.com/v1/embeddings2] for request from inference entity id [openshift-ai-text-embedding-2] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Resource not found at [https://gritlm-7b-elastic.apps.851f0d88-elastic.openshiftpartnerlabs.com/v1/embeddings2] for request from inference entity id [openshift-ai-text-embedding-2] status [404]. Error message: [{\"detail\":\"Not Found\"}]"
        }
    },
    "status": 400
}
Perform Embeddings
RQ
{
    "input": [
        "The sky above the port was the color of television tuned to a dead channel.",
        "The sky above the port was the color of television tuned to a dead channel."
    ]
}
RS
{
    "text_embedding": [
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        },
        {
            "embedding": [
                -0.001739502,
                -0.0077819824
            ]
        }
    ]
}
COMPLETION
Create Completion Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-completion",
    "task_type": "completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}
Create Completion Endpoint (Redirection error)
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{invalid-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Unhandled redirection for request from inference entity id [openshift-ai-completion2] status [302]"
            }
        ],
        "type": "status_exception",
        "reason": "Could not complete inference endpoint creation as validation call to service threw an exception.",
        "caused_by": {
            "type": "status_exception",
            "reason": "Unhandled redirection for request from inference entity id [openshift-ai-completion2] status [302]"
        }
    },
    "status": 400
}
Perform Non-Streaming Completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
{
    "completion": [
        {
            "result": "That's a famous opening line from George Orwell's novel \"1984\". The full quote is:\n\n\"He gazed up at the grey sky, which was like the colour of television tuned to a dead channel.\"\n\nIn the novel, the sky is a perpetual grey, which is a metaphor for the bleak and oppressive atmosphere of the totalitarian society that Orwell describes. The comparison to a dead TV channel is also significant, as it suggests a lack of signal, a lack of information, and a lack of life.\n\nOrwell wrote \"1984\" in 1948-49, as a warning about the dangers of totalitarianism and the erosion of individual freedom. The novel has become a classic of dystopian literature and a powerful commentary on the human condition."
        }
    ]
}
Perform Streaming Completion
RQ
{
    "input": "The sky above the port was the color of television tuned to a dead channel."
}
RS
event: message
data: {"completion":[{"delta":"The"},{"delta":" quote"}]}

event: message
data: {"completion":[{"delta":" \""},{"delta":"The"}]}

event: message
data: {"completion":[{"delta":" sky"},{"delta":" above"}]}

event: message
data: [DONE]
CHAT COMPLETION
Create Chat Completion Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-chat-completion-url}}",
        "api_key": "{{openshift-ai-chat-completion-token}}",
        "model_id": "llama-31-8b-instruct"
    }
}
RS
{
    "inference_id": "openshift-ai-chat-completion",
    "task_type": "chat_completion",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "llama-31-8b-instruct",
        "url": "{{openshift-ai-chat-completion-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}
Perform Basic Chat Completion
RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_completion_tokens": 10
}
RS
event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"**"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[{"delta":{"content":"Deep"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-9ad9a8b5952648339b54a5349e757153","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":40,"total_tokens":50}}

event: message
data: [DONE]
Perform Tool Call Chat Completion
RQ
{
    "model": "llama-31-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's the price of a scarf?"
                }
            ]
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_price",
                "description": "Get the current price of a item",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "item": {
                            "id": "123"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": {
        "type": "function",
        "function": {
            "name": "get_current_price"
        }
    }
}
RS
event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[{"delta":{"tool_calls":[{"index":0,"id":"chatcmpl-tool-e425f3a8f702434a80d3896bbe5cb36c","function":{"arguments":"{\"","name":"get_current_price"},"type":"function"}]},"index":0}],"model":"llama-31-8b-instruct","object":"chat.completion.chunk"}

event: message
data: {"id":"chatcmpl-174e269abeed4ba59208458ec8f1b22f","choices":[],"model":"llama-31-8b-instruct","object":"chat.completion.chunk","usage":{"completion_tokens":10,"prompt_tokens":172,"total_tokens":182}}

event: message
data: [DONE]
RERANK
Create Rerank Endpoint
RQ
{
    "service": "openshift_ai",
    "service_settings": {
        "url": "{{openshift-ai-rerank-url}}",
        "api_key": "{{openshift-ai-rerank-token}}",
        "model_id": "bge-reranker-v2-m3"
    }
}
RS
{
    "inference_id": "openshift-ai-rerank",
    "task_type": "rerank",
    "service": "openshift_ai",
    "service_settings": {
        "model_id": "bge-reranker-v2-m3",
        "url": "{{openshift-ai-rerank-url}}",
        "rate_limit": {
            "requests_per_minute": 3000
        }
    }
}
Perform Rerank
RQ
{
    "input": [
        "luke",
        "like",
        "leia",
        "chewy",
        "r2d2",
        "star",
        "wars"
    ],
    "query": "star wars main character",
    "top_n": 2
}
RS
{
    "rerank": [
        {
            "index": 0,
            "relevance_score": 0.28466797,
            "text": "luke"
        },
        {
            "index": 3,
            "relevance_score": 0.23522949,
            "text": "chewy"
        }
    ]
}
  • - Have you signed the contributor license agreement?
  • - Have you followed the contributor guidelines?
  • - If submitting code, have you built your formula locally prior to submission with gradle check?
  • - If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • - If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • - If you are submitting this code for a class then read our policy for that.

@elasticsearchmachine elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 15, 2025
Jan-Kazlouski-elastic and others added 22 commits October 15, 2025 17:13
…ogic and update dimensionsSetByUser handling
…ax tokens and add unit tests for request creation and validation
# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.3.csv
…ServiceTests for improved readability and accuracy
@Jan-Kazlouski-elastic Jan-Kazlouski-elastic marked this pull request as ready for review October 21, 2025 08:37
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Oct 21, 2025
@Jan-Kazlouski-elastic
Copy link
Contributor Author

Hello @jonathan-buttner @dan-rubinstein @DonalEvans
PR is out of draft state and ready to be reviewed.

@szybia szybia added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Oct 21, 2025
@Jan-Kazlouski-elastic
Copy link
Contributor Author

Hi @DonalEvans
Your comments are addressed. Could you please take another look at this PR?

Copy link
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small changes, thanks for addressing all of the other comments!

}

public void testParseRequestConfig_CreatesAnEmbeddingsModelWhenChunkingSettingsProvided() throws IOException {
var chunkingSettingsMap = createRandomChunkingSettings();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object is an instance of ChunkingSettings rather than a Map so the name is a little misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Fixed.

);
var action = actionCreator.create(
model,
new HashMap<>(Map.of(OpenShiftAiRerankTaskSettings.RETURN_DOCUMENTS, false, OpenShiftAiRerankTaskSettings.TOP_N, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the values of false and 1 be extracted to local variables here so they can be reused on line 698?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the constants with understandable names.


PlainActionFuture<InferenceServiceResults> listener = new PlainActionFuture<>();
action.execute(
new QueryAndDocsInputs(QUERY_VALUE, DOCUMENTS_VALUE, true, 2, false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values true and 2 can be extracted to local variables and reused on line 805.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the constants with understandable names.


PlainActionFuture<InferenceServiceResults> listener = new PlainActionFuture<>();
action.execute(
new QueryAndDocsInputs(QUERY_VALUE, documents, false, 1, false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false and 1 can be extracted to local variables and reused on line 849.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the constants with understandable names.

Also added test case for absent model, removed redundant documents local variables in favor of constant values, removed forbidden String.formatted API calls.

@Jan-Kazlouski-elastic
Copy link
Contributor Author

@DonalEvans your comments are addressed. PR is ready to be reviewed once more. Also could you please add yourself as assignee and add appropriate enhancement type label

# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.3.csv
@DonalEvans DonalEvans self-assigned this Nov 12, 2025
Copy link
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was a bit premature in my approval, the InferenceGetServicesIT class needs to be updated to include OpenShift AI in the appropriate lists of expected services.

@Jan-Kazlouski-elastic
Copy link
Contributor Author

Jan-Kazlouski-elastic commented Nov 12, 2025

Thanks @DonalEvans.
So every integration is to be added to these lists again. Noted.

Added openshift_ai to the lists and pushed my commit.
Also turns out there is a change to inner structure of EmbeddingsInput merged recently.
Initiated check task. Waiting for it to finish locally.
Let's wait for the PR checks and see if Serverless Checks go smoothly. It failed previously but I don't have access to Elasticsearch Serverless Checks failure that was there before checks got retriggered. Perhaps you have access? What was the reason? For google model garden fix it has failed as well.

@DonalEvans
Copy link
Contributor

What was the reason?

There's a known issue causing failures in serverless tests right now, the test in question should have been muted, but it might take a little while for the checks to start passing.

# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.3.csv
@DonalEvans DonalEvans merged commit f87b3c4 into elastic:main Nov 13, 2025
36 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 13, 2025
…-json

* upstream/main: (158 commits)
  Cleanup files from repo root folder (elastic#138030)
  Implement OpenShift AI integration for chat completion, embeddings, and reranking (elastic#136624)
  Optimize AsyncSearchErrorTraceIT to avoid failures (elastic#137716)
  Removes support for null TransportService in RemoteClusterService (elastic#137939)
  Mute org.elasticsearch.index.mapper.DateFieldMapperTests testSortShortcuts elastic#138018
  rest-api-spec: fix type of enums (elastic#137521)
  Update Gradle wrapper to 9.2.0 (elastic#136155)
  Add RCS Strong Verification Documentation (elastic#137822)
  Use docvalue skippers on dimension fields (elastic#137029)
  Introduce INDEX_SHARD_COUNT_FORMAT (elastic#137210)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesChatCompletion_AndThenCreatesTextEmbedding elastic#138012
  Fix ES|QL search context creation to use correct results type (elastic#137994)
  Improve Snapshot Logging (elastic#137470)
  Support extra output field in TOP function (elastic#135434)
  Remove NumericDoubleValues class (elastic#137884)
  [ML] Fix ML calendar event update scalability issues (elastic#136886)
  Task may be unregistered outside of the trace context in exceptional cases. (elastic#137865)
  Refine workaround for S3 repo analysis known issue (elastic#138000)
  Additional DEBUG logging on authc failures (elastic#137941)
  Cleanup index resolution (elastic#137867)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants