-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add Llama support to Inference Plugin #130092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jonathan-buttner
merged 51 commits into
elastic:main
from
Jan-Kazlouski-elastic:feature/llama-embeding-completion
Jul 18, 2025
Merged
Changes from 46 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
85478cf
Refactor Hugging Face service settings and completion request methods…
Jan-Kazlouski-elastic 727fd8e
Add Llama model support for embeddings and chat completions
Jan-Kazlouski-elastic cc14b18
Refactor Llama request classes to improve secret settings handling
Jan-Kazlouski-elastic ceef95a
Refactor DeltaParser in LlamaStreamingProcessor to improve argument h…
Jan-Kazlouski-elastic 55d9014
Enhance Llama streaming processing by adding support for nullable obj…
Jan-Kazlouski-elastic a83b0b7
[CI] Auto commit changes from spotless
852aa19
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic d6b53c3
Fix error messages in LlamaActionCreator
Jan-Kazlouski-elastic 6ce4b09
[CI] Auto commit changes from spotless
8e7ca13
Add detailed Javadoc comments to Llama classes for improved documenta…
Jan-Kazlouski-elastic 604d441
Enhance LlamaChatCompletionResponseHandler to support mid-stream erro…
Jan-Kazlouski-elastic 74fd6e8
Add Javadoc comments to Llama classes for improved documentation and …
Jan-Kazlouski-elastic ac161fa
Fix checkstyle
Jan-Kazlouski-elastic a13020c
Update LlamaEmbeddingsRequest to use mediaTypeWithoutParameters for c…
Jan-Kazlouski-elastic 4eade05
Add unit tests for LlamaActionCreator and related models
Jan-Kazlouski-elastic 39c5787
Add unit tests for LlamaChatCompletionServiceSettings to validate con…
Jan-Kazlouski-elastic 6a135c5
Add unit tests for LlamaEmbeddingsServiceSettings to validate configu…
Jan-Kazlouski-elastic c6fc56f
Add unit tests for LlamaEmbeddingsServiceSettings to validate various…
Jan-Kazlouski-elastic e2dce7c
Add unit tests for LlamaChatCompletionResponseHandler to validate err…
Jan-Kazlouski-elastic 41591ae
Refactor Llama embedding and chat completion tests for consistency an…
Jan-Kazlouski-elastic 4d2a5dd
Add unit tests for LlamaChatCompletionRequestEntity to validate messa…
Jan-Kazlouski-elastic 1573d53
Add unit tests for LlamaEmbeddingsRequest to validate request creatio…
Jan-Kazlouski-elastic da55903
Add unit tests for LlamaEmbeddingsRequestEntity to validate XContent …
Jan-Kazlouski-elastic 8cc8958
Add unit tests for LlamaErrorResponse to validate error handling from…
Jan-Kazlouski-elastic 9573a48
Add unit tests for LlamaChatCompletionServiceSettings to validate con…
Jan-Kazlouski-elastic 36ff4cd
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic c193ecf
Add tests for LlamaService request configuration validation and error…
Jan-Kazlouski-elastic c3baecf
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic a7e342b
Fix error message formatting in LlamaServiceTests for better localiza…
Jan-Kazlouski-elastic 15c14d7
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic 75cbf85
Refactor Llama model classes to implement accept method for action vi…
Jan-Kazlouski-elastic e06653b
Hide Llama service from configuration API to enhance security and red…
Jan-Kazlouski-elastic fe6173e
Refactor Llama model classes to remove modelId and update embedding r…
Jan-Kazlouski-elastic ad009c6
Refactor Llama request classes to use pattern matching for secret set…
Jan-Kazlouski-elastic 18ee182
Update embeddings handler to use HuggingFace response entity
Jan-Kazlouski-elastic c2621e7
Refactor Mistral model classes to remove modelId and update rate limi…
Jan-Kazlouski-elastic eb60dfa
Refactor Mistral action classes to remove taskSettings parameter and …
Jan-Kazlouski-elastic 76ddf99
Refactor Llama and Mistral models to remove taskSettings parameter an…
Jan-Kazlouski-elastic 9100f69
Refactor Llama service tests to use Model instead of CustomModel and …
Jan-Kazlouski-elastic 5fb9dad
Remove unused tests and imports from LlamaServiceTests
Jan-Kazlouski-elastic 47c9cc6
Add chunking settings support to Llama embeddings model tests
Jan-Kazlouski-elastic 34e21de
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic 1c1ba1d
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic c267269
Add changelog
Jan-Kazlouski-elastic 098849f
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic 06c7bd1
Add support for version checks in Llama settings and define new trans…
Jan-Kazlouski-elastic 33da7a9
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic 28335ef
Refactor Llama model assertions and remove unused version support met…
Jan-Kazlouski-elastic d43d1e9
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic 528a9d9
Refactor Llama service constructors to include ClusterService and imp…
Jan-Kazlouski-elastic c879b96
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 130092 | ||
summary: "Added Llama provider support to the Inference Plugin" | ||
area: Machine Learning | ||
type: enhancement | ||
issues: [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
89 changes: 89 additions & 0 deletions
89
.../inference/src/main/java/org/elasticsearch/xpack/inference/services/llama/LlamaModel.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.inference.services.llama; | ||
|
||
import org.elasticsearch.inference.EmptySecretSettings; | ||
import org.elasticsearch.inference.ModelConfigurations; | ||
import org.elasticsearch.inference.ModelSecrets; | ||
import org.elasticsearch.inference.SecretSettings; | ||
import org.elasticsearch.inference.ServiceSettings; | ||
import org.elasticsearch.xpack.inference.external.action.ExecutableAction; | ||
import org.elasticsearch.xpack.inference.services.RateLimitGroupingModel; | ||
import org.elasticsearch.xpack.inference.services.llama.action.LlamaActionVisitor; | ||
import org.elasticsearch.xpack.inference.services.settings.DefaultSecretSettings; | ||
import org.elasticsearch.xpack.inference.services.settings.RateLimitSettings; | ||
|
||
import java.net.URI; | ||
import java.net.URISyntaxException; | ||
import java.util.Map; | ||
import java.util.Objects; | ||
|
||
/** | ||
* Abstract class representing a Llama model for inference. | ||
* This class extends RateLimitGroupingModel and provides common functionality for Llama models. | ||
*/ | ||
public abstract class LlamaModel extends RateLimitGroupingModel { | ||
protected URI uri; | ||
protected RateLimitSettings rateLimitSettings; | ||
|
||
/** | ||
* Constructor for creating a LlamaModel with specified configurations and secrets. | ||
* | ||
* @param configurations the model configurations | ||
* @param secrets the secret settings for the model | ||
*/ | ||
protected LlamaModel(ModelConfigurations configurations, ModelSecrets secrets) { | ||
super(configurations, secrets); | ||
} | ||
|
||
/** | ||
* Constructor for creating a LlamaModel with specified model, service settings, and secret settings. | ||
* @param model the model configurations | ||
* @param serviceSettings the settings for the inference service | ||
*/ | ||
protected LlamaModel(RateLimitGroupingModel model, ServiceSettings serviceSettings) { | ||
super(model, serviceSettings); | ||
} | ||
|
||
public URI uri() { | ||
return this.uri; | ||
} | ||
|
||
@Override | ||
public RateLimitSettings rateLimitSettings() { | ||
return this.rateLimitSettings; | ||
} | ||
|
||
@Override | ||
public int rateLimitGroupingHash() { | ||
return Objects.hash(getServiceSettings().modelId(), uri, getSecretSettings()); | ||
} | ||
|
||
// Needed for testing only | ||
public void setURI(String newUri) { | ||
try { | ||
this.uri = new URI(newUri); | ||
} catch (URISyntaxException e) { | ||
// swallow any error | ||
} | ||
} | ||
|
||
/** | ||
* Retrieves the secret settings from the provided map of secrets. | ||
* If the map is null or empty, it returns an instance of EmptySecretSettings. | ||
* Caused by the fact that Llama model doesn't have out of the box security settings and can be used witout authentication. | ||
* | ||
* @param secrets the map containing secret settings | ||
* @return an instance of SecretSettings | ||
*/ | ||
protected static SecretSettings retrieveSecretSettings(Map<String, Object> secrets) { | ||
return (secrets != null && secrets.isEmpty()) ? EmptySecretSettings.INSTANCE : DefaultSecretSettings.fromMap(secrets); | ||
} | ||
|
||
protected abstract ExecutableAction accept(LlamaActionVisitor creator); | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to mention this in the previous review, we won't be backporting this to 8.x so we can remove this transport version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.