Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
85478cf
Refactor Hugging Face service settings and completion request methods…
Jan-Kazlouski-elastic Jun 23, 2025
727fd8e
Add Llama model support for embeddings and chat completions
Jan-Kazlouski-elastic Jun 26, 2025
cc14b18
Refactor Llama request classes to improve secret settings handling
Jan-Kazlouski-elastic Jun 26, 2025
ceef95a
Refactor DeltaParser in LlamaStreamingProcessor to improve argument h…
Jan-Kazlouski-elastic Jun 29, 2025
55d9014
Enhance Llama streaming processing by adding support for nullable obj…
Jan-Kazlouski-elastic Jul 1, 2025
a83b0b7
[CI] Auto commit changes from spotless
Jul 1, 2025
852aa19
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 1, 2025
d6b53c3
Fix error messages in LlamaActionCreator
Jan-Kazlouski-elastic Jul 1, 2025
6ce4b09
[CI] Auto commit changes from spotless
Jul 1, 2025
8e7ca13
Add detailed Javadoc comments to Llama classes for improved documenta…
Jan-Kazlouski-elastic Jul 1, 2025
604d441
Enhance LlamaChatCompletionResponseHandler to support mid-stream erro…
Jan-Kazlouski-elastic Jul 1, 2025
74fd6e8
Add Javadoc comments to Llama classes for improved documentation and …
Jan-Kazlouski-elastic Jul 1, 2025
ac161fa
Fix checkstyle
Jan-Kazlouski-elastic Jul 1, 2025
a13020c
Update LlamaEmbeddingsRequest to use mediaTypeWithoutParameters for c…
Jan-Kazlouski-elastic Jul 2, 2025
4eade05
Add unit tests for LlamaActionCreator and related models
Jan-Kazlouski-elastic Jul 2, 2025
39c5787
Add unit tests for LlamaChatCompletionServiceSettings to validate con…
Jan-Kazlouski-elastic Jul 2, 2025
6a135c5
Add unit tests for LlamaEmbeddingsServiceSettings to validate configu…
Jan-Kazlouski-elastic Jul 2, 2025
c6fc56f
Add unit tests for LlamaEmbeddingsServiceSettings to validate various…
Jan-Kazlouski-elastic Jul 2, 2025
e2dce7c
Add unit tests for LlamaChatCompletionResponseHandler to validate err…
Jan-Kazlouski-elastic Jul 3, 2025
41591ae
Refactor Llama embedding and chat completion tests for consistency an…
Jan-Kazlouski-elastic Jul 3, 2025
4d2a5dd
Add unit tests for LlamaChatCompletionRequestEntity to validate messa…
Jan-Kazlouski-elastic Jul 3, 2025
1573d53
Add unit tests for LlamaEmbeddingsRequest to validate request creatio…
Jan-Kazlouski-elastic Jul 3, 2025
da55903
Add unit tests for LlamaEmbeddingsRequestEntity to validate XContent …
Jan-Kazlouski-elastic Jul 3, 2025
8cc8958
Add unit tests for LlamaErrorResponse to validate error handling from…
Jan-Kazlouski-elastic Jul 3, 2025
9573a48
Add unit tests for LlamaChatCompletionServiceSettings to validate con…
Jan-Kazlouski-elastic Jul 4, 2025
36ff4cd
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 4, 2025
c193ecf
Add tests for LlamaService request configuration validation and error…
Jan-Kazlouski-elastic Jul 5, 2025
c3baecf
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 7, 2025
a7e342b
Fix error message formatting in LlamaServiceTests for better localiza…
Jan-Kazlouski-elastic Jul 7, 2025
15c14d7
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 10, 2025
75cbf85
Refactor Llama model classes to implement accept method for action vi…
Jan-Kazlouski-elastic Jul 10, 2025
e06653b
Hide Llama service from configuration API to enhance security and red…
Jan-Kazlouski-elastic Jul 10, 2025
fe6173e
Refactor Llama model classes to remove modelId and update embedding r…
Jan-Kazlouski-elastic Jul 10, 2025
ad009c6
Refactor Llama request classes to use pattern matching for secret set…
Jan-Kazlouski-elastic Jul 10, 2025
18ee182
Update embeddings handler to use HuggingFace response entity
Jan-Kazlouski-elastic Jul 10, 2025
c2621e7
Refactor Mistral model classes to remove modelId and update rate limi…
Jan-Kazlouski-elastic Jul 10, 2025
eb60dfa
Refactor Mistral action classes to remove taskSettings parameter and …
Jan-Kazlouski-elastic Jul 10, 2025
76ddf99
Refactor Llama and Mistral models to remove taskSettings parameter an…
Jan-Kazlouski-elastic Jul 10, 2025
9100f69
Refactor Llama service tests to use Model instead of CustomModel and …
Jan-Kazlouski-elastic Jul 11, 2025
5fb9dad
Remove unused tests and imports from LlamaServiceTests
Jan-Kazlouski-elastic Jul 11, 2025
47c9cc6
Add chunking settings support to Llama embeddings model tests
Jan-Kazlouski-elastic Jul 11, 2025
34e21de
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 11, 2025
1c1ba1d
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 14, 2025
c267269
Add changelog
Jan-Kazlouski-elastic Jul 14, 2025
098849f
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 15, 2025
06c7bd1
Add support for version checks in Llama settings and define new trans…
Jan-Kazlouski-elastic Jul 15, 2025
33da7a9
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 16, 2025
28335ef
Refactor Llama model assertions and remove unused version support met…
Jan-Kazlouski-elastic Jul 16, 2025
d43d1e9
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 18, 2025
528a9d9
Refactor Llama service constructors to include ClusterService and imp…
Jan-Kazlouski-elastic Jul 18, 2025
c879b96
Merge remote-tracking branch 'origin/main' into feature/llama-embedin…
Jan-Kazlouski-elastic Jul 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/130092.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 130092
summary: "Added Llama provider support to the Inference Plugin"
area: Machine Learning
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ static TransportVersion def(int id) {
public static final TransportVersion ESQL_PROFILE_INCLUDE_PLAN_8_19 = def(8_841_0_62);
public static final TransportVersion ESQL_SPLIT_ON_BIG_VALUES_8_19 = def(8_841_0_63);
public static final TransportVersion ESQL_FIXED_INDEX_LIKE_8_19 = def(8_841_0_64);
public static final TransportVersion ML_INFERENCE_LLAMA_ADDED_8_19 = def(8_841_0_65);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot to mention this in the previous review, we won't be backporting this to 8.x so we can remove this transport version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

public static final TransportVersion V_9_0_0 = def(9_000_0_09);
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_1 = def(9_000_0_10);
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_2 = def(9_000_0_11);
Expand Down Expand Up @@ -334,7 +335,13 @@ static TransportVersion def(int id) {
public static final TransportVersion PROJECT_STATE_REGISTRY_RECORDS_DELETIONS = def(9_113_0_00);
public static final TransportVersion ESQL_SERIALIZE_TIMESERIES_FIELD_TYPE = def(9_114_0_00);
public static final TransportVersion ML_INFERENCE_IBM_WATSONX_COMPLETION_ADDED = def(9_115_0_00);
public static final TransportVersion ML_INFERENCE_LLAMA_ADDED = def(9_116_0_00);
public static final TransportVersion ESQL_SPLIT_ON_BIG_VALUES = def(9_116_0_00);
public static final TransportVersion ESQL_LOCAL_RELATION_WITH_NEW_BLOCKS = def(9_117_0_00);
public static final TransportVersion ML_INFERENCE_CUSTOM_SERVICE_EMBEDDING_TYPE = def(9_118_0_00);
public static final TransportVersion ESQL_FIXED_INDEX_LIKE = def(9_119_0_00);
public static final TransportVersion LOOKUP_JOIN_CCS = def(9_120_0_00);
public static final TransportVersion NODE_USAGE_STATS_FOR_THREAD_POOLS_IN_CLUSTER_INFO = def(9_121_0_00);
public static final TransportVersion ML_INFERENCE_LLAMA_ADDED = def(9_122_0_00);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
import org.elasticsearch.inference.ModelSecrets;
import org.elasticsearch.inference.SecretSettings;
import org.elasticsearch.inference.ServiceSettings;
import org.elasticsearch.xpack.inference.external.action.ExecutableAction;
import org.elasticsearch.xpack.inference.services.RateLimitGroupingModel;
import org.elasticsearch.xpack.inference.services.llama.action.LlamaActionVisitor;
import org.elasticsearch.xpack.inference.services.settings.DefaultSecretSettings;
import org.elasticsearch.xpack.inference.services.settings.RateLimitSettings;

Expand All @@ -26,7 +28,6 @@
* This class extends RateLimitGroupingModel and provides common functionality for Llama models.
*/
public abstract class LlamaModel extends RateLimitGroupingModel {
protected String modelId;
protected URI uri;
protected RateLimitSettings rateLimitSettings;

Expand All @@ -49,10 +50,6 @@ protected LlamaModel(RateLimitGroupingModel model, ServiceSettings serviceSettin
super(model, serviceSettings);
}

public String model() {
return this.modelId;
}

public URI uri() {
return this.uri;
}
Expand All @@ -64,7 +61,7 @@ public RateLimitSettings rateLimitSettings() {

@Override
public int rateLimitGroupingHash() {
return Objects.hash(modelId, uri, getSecretSettings());
return Objects.hash(getServiceSettings().modelId(), uri, getSecretSettings());
}

// Needed for testing only
Expand All @@ -87,4 +84,6 @@ public void setURI(String newUri) {
protected static SecretSettings retrieveSecretSettings(Map<String, Object> secrets) {
return (secrets != null && secrets.isEmpty()) ? EmptySecretSettings.INSTANCE : DefaultSecretSettings.fromMap(secrets);
}

protected abstract ExecutableAction accept(LlamaActionVisitor creator);
}
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,8 @@ protected void doInfer(
ActionListener<InferenceServiceResults> listener
) {
var actionCreator = new LlamaActionCreator(getSender(), getServiceComponents());

if (model instanceof LlamaEmbeddingsModel llamaEmbeddingsModel) {
llamaEmbeddingsModel.accept(actionCreator).execute(inputs, timeout, listener);
} else if (model instanceof LlamaChatCompletionModel llamaChatCompletionModel) {
llamaChatCompletionModel.accept(actionCreator).execute(inputs, timeout, listener);
if (model instanceof LlamaModel llamaModel) {
llamaModel.accept(actionCreator).execute(inputs, timeout, listener);
} else {
listener.onFailure(createInvalidModelException(model));
}
Expand All @@ -127,7 +124,6 @@ protected void validateInputType(InputType inputType, Model model, ValidationExc
* @param inferenceId the unique identifier for the inference entity
* @param taskType the type of task this model is designed for
* @param serviceSettings the settings for the inference service
* @param taskSettings the settings specific to the task
* @param chunkingSettings the settings for chunking, if applicable
* @param secretSettings the secret settings for the model, such as API keys or tokens
* @param failureMessage the message to use in case of failure
Expand All @@ -138,24 +134,14 @@ protected LlamaModel createModel(
String inferenceId,
TaskType taskType,
Map<String, Object> serviceSettings,
Map<String, Object> taskSettings,
ChunkingSettings chunkingSettings,
Map<String, Object> secretSettings,
String failureMessage,
ConfigurationParseContext context
) {
switch (taskType) {
case TEXT_EMBEDDING:
return new LlamaEmbeddingsModel(
inferenceId,
taskType,
NAME,
serviceSettings,
taskSettings,
chunkingSettings,
secretSettings,
context
);
return new LlamaEmbeddingsModel(inferenceId, taskType, NAME, serviceSettings, chunkingSettings, secretSettings, context);
case CHAT_COMPLETION, COMPLETION:
return new LlamaChatCompletionModel(inferenceId, taskType, NAME, serviceSettings, secretSettings, context);
default:
Expand All @@ -168,7 +154,7 @@ public Model updateModelWithEmbeddingDetails(Model model, int embeddingSize) {
if (model instanceof LlamaEmbeddingsModel embeddingsModel) {
var serviceSettings = embeddingsModel.getServiceSettings();
var similarityFromModel = serviceSettings.similarity();
var similarityToUse = similarityFromModel == null ? SimilarityMeasure.COSINE : similarityFromModel;
var similarityToUse = similarityFromModel == null ? SimilarityMeasure.DOT_PRODUCT : similarityFromModel;

var updatedServiceSettings = new LlamaEmbeddingsServiceSettings(
serviceSettings.modelId(),
Expand Down Expand Up @@ -283,7 +269,6 @@ public void parseRequestConfig(
modelId,
taskType,
serviceSettingsMap,
taskSettingsMap,
chunkingSettings,
serviceSettingsMap,
TaskType.unsupportedTaskTypeErrorMsg(taskType, NAME),
Expand All @@ -308,7 +293,7 @@ public Model parsePersistedConfigWithSecrets(
Map<String, Object> secrets
) {
Map<String, Object> serviceSettingsMap = removeFromMapOrThrowIfNull(config, ModelConfigurations.SERVICE_SETTINGS);
Map<String, Object> taskSettingsMap = removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS);
removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS);
Map<String, Object> secretSettingsMap = removeFromMapOrDefaultEmpty(secrets, ModelSecrets.SECRET_SETTINGS);

ChunkingSettings chunkingSettings = null;
Expand All @@ -320,7 +305,6 @@ public Model parsePersistedConfigWithSecrets(
modelId,
taskType,
serviceSettingsMap,
taskSettingsMap,
chunkingSettings,
secretSettingsMap,
parsePersistedConfigErrorMsg(modelId, NAME)
Expand All @@ -331,7 +315,6 @@ private LlamaModel createModelFromPersistent(
String inferenceEntityId,
TaskType taskType,
Map<String, Object> serviceSettings,
Map<String, Object> taskSettings,
ChunkingSettings chunkingSettings,
Map<String, Object> secretSettings,
String failureMessage
Expand All @@ -340,7 +323,6 @@ private LlamaModel createModelFromPersistent(
inferenceEntityId,
taskType,
serviceSettings,
taskSettings,
chunkingSettings,
secretSettings,
failureMessage,
Expand All @@ -351,7 +333,7 @@ private LlamaModel createModelFromPersistent(
@Override
public Model parsePersistedConfig(String modelId, TaskType taskType, Map<String, Object> config) {
Map<String, Object> serviceSettingsMap = removeFromMapOrThrowIfNull(config, ModelConfigurations.SERVICE_SETTINGS);
Map<String, Object> taskSettingsMap = removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS);
removeFromMapOrDefaultEmpty(config, ModelConfigurations.TASK_SETTINGS);

ChunkingSettings chunkingSettings = null;
if (TaskType.TEXT_EMBEDDING.equals(taskType)) {
Expand All @@ -362,7 +344,6 @@ public Model parsePersistedConfig(String modelId, TaskType taskType, Map<String,
modelId,
taskType,
serviceSettingsMap,
taskSettingsMap,
chunkingSettings,
null,
parsePersistedConfigErrorMsg(modelId, NAME)
Expand All @@ -374,6 +355,12 @@ public TransportVersion getMinimalSupportedVersion() {
return TransportVersions.ML_INFERENCE_LLAMA_ADDED;
}

@Override
public boolean hideFromConfigurationApi() {
// The Llama service is very configurable so we're going to hide it from being exposed in the service API.
return true;
}

/**
* Configuration class for the Llama inference service.
* It provides the settings and configurations required for the service.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@
import org.elasticsearch.xpack.inference.external.http.sender.Sender;
import org.elasticsearch.xpack.inference.external.http.sender.UnifiedChatInput;
import org.elasticsearch.xpack.inference.services.ServiceComponents;
import org.elasticsearch.xpack.inference.services.huggingface.response.HuggingFaceEmbeddingsResponseEntity;
import org.elasticsearch.xpack.inference.services.llama.completion.LlamaChatCompletionModel;
import org.elasticsearch.xpack.inference.services.llama.completion.LlamaCompletionResponseHandler;
import org.elasticsearch.xpack.inference.services.llama.embeddings.LlamaEmbeddingsModel;
import org.elasticsearch.xpack.inference.services.llama.embeddings.LlamaEmbeddingsResponseHandler;
import org.elasticsearch.xpack.inference.services.llama.request.completion.LlamaChatCompletionRequest;
import org.elasticsearch.xpack.inference.services.llama.request.embeddings.LlamaEmbeddingsRequest;
import org.elasticsearch.xpack.inference.services.llama.response.embeddings.LlamaEmbeddingsResponseEntity;
import org.elasticsearch.xpack.inference.services.openai.response.OpenAiChatCompletionResponseEntity;

import java.util.Objects;
Expand All @@ -44,7 +44,7 @@ public class LlamaActionCreator implements LlamaActionVisitor {

private static final ResponseHandler EMBEDDINGS_HANDLER = new LlamaEmbeddingsResponseHandler(
"llama text embedding",
LlamaEmbeddingsResponseEntity::fromResponse
HuggingFaceEmbeddingsResponseEntity::fromResponse
);
private static final ResponseHandler COMPLETION_HANDLER = new LlamaCompletionResponseHandler(
"llama completion",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ public LlamaChatCompletionModel(
SecretSettings secrets
) {
super(
new ModelConfigurations(inferenceEntityId, taskType, service, serviceSettings, new EmptyTaskSettings()),
new ModelConfigurations(inferenceEntityId, taskType, service, serviceSettings, EmptyTaskSettings.INSTANCE),
new ModelSecrets(secrets)
);
setPropertiesFromServiceSettings(serviceSettings);
Expand Down Expand Up @@ -105,7 +105,6 @@ public static LlamaChatCompletionModel of(LlamaChatCompletionModel model, Unifie
}

private void setPropertiesFromServiceSettings(LlamaChatCompletionServiceSettings serviceSettings) {
this.modelId = serviceSettings.modelId();
this.uri = serviceSettings.uri();
this.rateLimitSettings = serviceSettings.rateLimitSettings();
}
Expand All @@ -126,6 +125,7 @@ public LlamaChatCompletionServiceSettings getServiceSettings() {
* @param creator the visitor that creates the executable action
* @return an ExecutableAction representing this model
*/
@Override
public ExecutableAction accept(LlamaActionVisitor creator) {
return creator.create(this);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,16 @@ public String getWriteableName() {

@Override
public TransportVersion getMinimalSupportedVersion() {
assert false : "should never be called when supportsVersion is used";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can remove this line now because we won't need to backport to 8.x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

return TransportVersions.ML_INFERENCE_LLAMA_ADDED;
}

@Override
public boolean supportsVersion(TransportVersion version) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

return version.onOrAfter(TransportVersions.ML_INFERENCE_LLAMA_ADDED)
|| version.isPatchFrom(TransportVersions.ML_INFERENCE_LLAMA_ADDED_8_19);
}

@Override
public String modelId() {
return this.modelId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
import org.elasticsearch.inference.ModelConfigurations;
import org.elasticsearch.inference.ModelSecrets;
import org.elasticsearch.inference.SecretSettings;
import org.elasticsearch.inference.TaskSettings;
import org.elasticsearch.inference.TaskType;
import org.elasticsearch.xpack.inference.external.action.ExecutableAction;
import org.elasticsearch.xpack.inference.services.ConfigurationParseContext;
Expand Down Expand Up @@ -42,7 +41,6 @@ public LlamaEmbeddingsModel(
TaskType taskType,
String service,
Map<String, Object> serviceSettings,
Map<String, Object> taskSettings,
ChunkingSettings chunkingSettings,
Map<String, Object> secrets,
ConfigurationParseContext context
Expand All @@ -52,7 +50,6 @@ public LlamaEmbeddingsModel(
taskType,
service,
LlamaEmbeddingsServiceSettings.fromMap(serviceSettings, context),
EmptyTaskSettings.INSTANCE, // no task settings for Llama embeddings
chunkingSettings,
retrieveSecretSettings(secrets)
);
Expand All @@ -75,7 +72,6 @@ public LlamaEmbeddingsModel(LlamaEmbeddingsModel model, LlamaEmbeddingsServiceSe
* @param serviceSettings the service settings to extract properties from
*/
private void setPropertiesFromServiceSettings(LlamaEmbeddingsServiceSettings serviceSettings) {
this.modelId = serviceSettings.modelId();
this.uri = serviceSettings.uri();
this.rateLimitSettings = serviceSettings.rateLimitSettings();
}
Expand All @@ -87,7 +83,6 @@ private void setPropertiesFromServiceSettings(LlamaEmbeddingsServiceSettings ser
* @param taskType the type of task this model is designed for
* @param service the name of the inference service
* @param serviceSettings the settings for the inference service, specific to embeddings
* @param taskSettings the task settings for the model
* @param chunkingSettings the chunking settings for processing input data
* @param secrets the secret settings for the model, such as API keys or tokens
*/
Expand All @@ -96,7 +91,6 @@ public LlamaEmbeddingsModel(
TaskType taskType,
String service,
LlamaEmbeddingsServiceSettings serviceSettings,
TaskSettings taskSettings,
ChunkingSettings chunkingSettings,
SecretSettings secrets
) {
Expand All @@ -118,6 +112,7 @@ public LlamaEmbeddingsServiceSettings getServiceSettings() {
* @param creator the visitor that creates the executable action
* @return an ExecutableAction representing the Llama embeddings model
*/
@Override
public ExecutableAction accept(LlamaActionVisitor creator) {
return creator.create(this);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,9 +154,16 @@ public String getWriteableName() {

@Override
public TransportVersion getMinimalSupportedVersion() {
assert false : "should never be called when supportsVersion is used";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this.

return TransportVersions.ML_INFERENCE_LLAMA_ADDED;
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this method override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

public boolean supportsVersion(TransportVersion version) {
return version.onOrAfter(TransportVersions.ML_INFERENCE_LLAMA_ADDED)
|| version.isPatchFrom(TransportVersions.ML_INFERENCE_LLAMA_ADDED_8_19);
}

@Override
public String modelId() {
return this.modelId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,7 @@ public HttpRequest createHttpRequest() {
httpPost.setEntity(byteEntity);

httpPost.setHeader(HttpHeaders.CONTENT_TYPE, XContentType.JSON.mediaTypeWithoutParameters());
if (model.getSecretSettings() instanceof DefaultSecretSettings) {
var secretSettings = (DefaultSecretSettings) model.getSecretSettings();
if (model.getSecretSettings() instanceof DefaultSecretSettings secretSettings) {
httpPost.setHeader(createAuthBearerHeader(secretSettings.apiKey()));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,13 @@ public HttpRequest createHttpRequest() {
HttpPost httpPost = new HttpPost(this.uri);

ByteArrayEntity byteEntity = new ByteArrayEntity(
Strings.toString(new LlamaEmbeddingsRequestEntity(model.model(), truncationResult.input())).getBytes(StandardCharsets.UTF_8)
Strings.toString(new LlamaEmbeddingsRequestEntity(model.getServiceSettings().modelId(), truncationResult.input()))
.getBytes(StandardCharsets.UTF_8)
);
httpPost.setEntity(byteEntity);

httpPost.setHeader(HttpHeaders.CONTENT_TYPE, XContentType.JSON.mediaTypeWithoutParameters());
if (model.getSecretSettings() instanceof DefaultSecretSettings) {
var secretSettings = (DefaultSecretSettings) model.getSecretSettings();
if (model.getSecretSettings() instanceof DefaultSecretSettings secretSettings) {
httpPost.setHeader(createAuthBearerHeader(secretSettings.apiKey()));
}

Expand Down
Loading
Loading