Skip to content
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 55 additions & 2 deletions explore-analyze/elastic-inference/eis.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,59 @@ applies_to:
navigation_title: Elastic Inference Service (EIS)
---

# Elastic {{infer-cap}} Service
# Elastic {{infer-cap}} Service [elastic-inference-service-eis]

This is the documentation of the Elastic Inference Service.
The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster.
With EIS, you don't need to manage the infrastructure and resources required for large language models (LLMs) by adding, configuring, and scaling {{ml}} nodes.
Instead, you can use {{ml}} models in high-throughput, low-latency scenarios independently of your {{es}} infrastructure.

Currently, you can perform chat completion tasks through EIS using the {{infer}} API.

% TO DO: Link to the EIS inference endpoint reference docs when it's added to the OpenAPI spec. (Comming soon) %

## Default EIS endpoints [default-eis-inference-endpoints]

Your {{es}} deployment includes a preconfigured EIS endpoint, making it easier to use chat completion via the {{infer}} API:

* `rainbow-sprinkles-elastic`: uses Anthropic's Claude Sonnet 3.5 model for chat completion {{infer}} tasks.

::::{note}

* The model appears as `Elastic LLM` in the AI Assistant, Attack Discovery UI, preconfigured connectors list, and the Search Playground.
* To fine-tune prompts sent to `rainbow-sprinkles-elastic`, optimize them for Claude Sonnet 3.5.

::::

% TO DO: Link to the AI assistant documentation in the different solutions and possibly connector docs. %

## Regions [eis-regions]

EIS is currently running on AWS and in the following regions:

* `us-east-1`
* `us-west-2`

For more details on AWS regions, refer to the [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) and the [supported cross-region {{infer}} profiles](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html) documentation.

## LLM hosts [llm-hosts]

The LLM used with EIS is hosted by [Amazon Bedrock](https://aws.amazon.com/bedrock/).

## Examples

The following example demostrates how to perform a `chat_completion` task through EIS by using the `.rainbow-sprinkles-elastic` default {{infer}} endpoint.

```json
POST /_inference/chat_completion/.rainbow-sprinkles-elastic/_stream
{
"messages": [
{
"role": "user",
"content": "Say yes if it works."
}
],
"temperature": 0.7,
"max_completion_tokens": 300
}
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,10 @@ Refer to the [{{infer-cap}} APIs](https://www.elastic.co/docs/api/doc/elasticsea

Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` service.


## {{api-request-title}} [infer-service-elastic-api-request]

`PUT /_inference/<task_type>/<inference_id>`


## {{api-path-parms-title}} [infer-service-elastic-api-path-params]

`<inference_id>`
Expand All @@ -34,16 +32,13 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se
* `chat_completion`,
* `sparse_embedding`.


::::{note}
The `chat_completion` task type only supports streaming and only through the `_stream` API.

For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](chat-completion-inference-api.md).

::::



## {{api-request-body-title}} [infer-service-elastic-api-request-body]

`max_chunk_size`
Expand All @@ -64,7 +59,6 @@ For more information on how to use the `chat_completion` task type, please refer
`service_settings`
: (Required, object) Settings used to install the {{infer}} model.


`model_id`
: (Required, string) The name of the model to use for the {{infer}} task.

Expand All @@ -77,9 +71,7 @@ For more information on how to use the `chat_completion` task type, please refer
}
```



## Elastic {{infer-cap}} Service example [inference-example-elastic]
## Elastic {{infer-cap}} Service example [inference-example-elastic]

The following example shows how to create an {{infer}} endpoint called `elser-model-eis` to perform a `text_embedding` task type.

Expand All @@ -104,4 +96,3 @@ PUT /_inference/chat_completion/chat-completion-endpoint
}
}
```