diff --git a/deploy-manage/cloud-organization/billing/elastic-observability-billing-dimensions.md b/deploy-manage/cloud-organization/billing/elastic-observability-billing-dimensions.md index 8188db28d0..343fbbe5d7 100644 --- a/deploy-manage/cloud-organization/billing/elastic-observability-billing-dimensions.md +++ b/deploy-manage/cloud-organization/billing/elastic-observability-billing-dimensions.md @@ -22,11 +22,4 @@ Data volumes for ingest and retention are based on the fully enriched normalized [Synthetic monitoring](../../../solutions/observability/apps/synthetic-monitoring.md) is an optional add-on to Observability Serverless projects that allows you to periodically check the status of your services and applications. In addition to the core ingest and retention dimensions, there is a charge to execute synthetic monitors on our testing infrastructure. Browser (journey) based tests are charged per-test-run, and ping (lightweight) tests have an all-you-can-use model per location used. -## Elastic Inference Service [EIS-billing] -[Elastic Inference Service (EIS)](../../../explore-analyze/elastic-inference/eis.md) enables you to leverage AI-powered search as a service without deploying a model in your serverless project. EIS is configured as a default LLM for use with the Observability AI Assistant (for all observability projects). - -:::{note} -Use of the Observability AI Assistant uses EIS tokens and incurs related token-based add-on billing for your serverless project. -::: - Refer to [Serverless billing dimensions](serverless-project-billing-dimensions.md) and the [{{ecloud}} pricing table](https://cloud.elastic.co/cloud-pricing-table?productType=serverless&project=observability) for more details about {{obs-serverless}} billing dimensions and rates. diff --git a/explore-analyze/elastic-inference.md b/explore-analyze/elastic-inference.md index debda95960..9b19f2a4cd 100644 --- a/explore-analyze/elastic-inference.md +++ b/explore-analyze/elastic-inference.md @@ -9,6 +9,5 @@ navigation_title: Elastic Inference There are several ways to perform {{infer}} in the {{stack}}. This page provides a brief overview of the different methods: -* [Using EIS (Elastic Inference Service)](elastic-inference/eis.md) * [Using the {{infer}} API](elastic-inference/inference-api.md) * [Trained models deployed in your cluster](machine-learning/nlp/ml-nlp-overview.md) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md deleted file mode 100644 index 9a28237447..0000000000 --- a/explore-analyze/elastic-inference/eis.md +++ /dev/null @@ -1,10 +0,0 @@ ---- -applies_to: - stack: ga - serverless: ga -navigation_title: Elastic Inference Service (EIS) ---- - -# Elastic {{infer-cap}} Service - -This is the documentation of the Elastic Inference Service. diff --git a/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md b/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md deleted file mode 100644 index 3c30ebb6d7..0000000000 --- a/explore-analyze/elastic-inference/inference-api/elastic-inference-service-eis.md +++ /dev/null @@ -1,107 +0,0 @@ ---- -navigation_title: "Elastic Inference Service" -mapped_pages: - - https://www.elastic.co/guide/en/elasticsearch/reference/master/infer-service-elastic.html -applies_to: - stack: - serverless: ---- - -# Elastic Inference Service (EIS) [infer-service-elastic] - -:::{tip} Inference API reference -Refer to the [{{infer-cap}} APIs](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) for further information. -::: - -Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` service. - - -## {{api-request-title}} [infer-service-elastic-api-request] - -`PUT /_inference//` - - -## {{api-path-parms-title}} [infer-service-elastic-api-path-params] - -`` -: (Required, string) The unique identifier of the {{infer}} endpoint. - -`` -: (Required, string) The type of the {{infer}} task that the model will perform. - - Available task types: - - * `chat_completion`, - * `sparse_embedding`. - - -::::{note} -The `chat_completion` task type only supports streaming and only through the `_stream` API. - -For more information on how to use the `chat_completion` task type, refer to the [chat completion documentation](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-stream-inference). - -:::: - - - -## {{api-request-body-title}} [infer-service-elastic-api-request-body] - -`max_chunk_size` -: (Optional, integer) Specifies the maximum size of a chunk in words. Defaults to `250`. This value cannot be higher than `300` or lower than `20` (for `sentence` strategy) or `10` (for `word` strategy). - -`overlap` -: (Optional, integer) Only for `word` chunking strategy. Specifies the number of overlapping words for chunks. Defaults to `100`. This value cannot be higher than the half of `max_chunk_size`. - -`sentence_overlap` -: (Optional, integer) Only for `sentence` chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either `1` or `0`. Defaults to `1`. - -`strategy` -: (Optional, string) Specifies the chunking strategy. It could be either `sentence` or `word`. - - `service` - : (Required, string) The type of service supported for the specified task type. In this case, `elastic`. - - `service_settings` - : (Required, object) Settings used to install the {{infer}} model. - - -`model_id` -: (Required, string) The name of the model to use for the {{infer}} task. - -`rate_limit` -: (Optional, object) By default, the `elastic` service sets the number of requests allowed per minute to `1000` in case of `sparse_embedding` and `240` in case of `chat_completion`. This helps to minimize the number of rate limit errors returned. To modify this, set the `requests_per_minute` setting of this object in your service settings: - - ```text - "rate_limit": { - "requests_per_minute": <> - } - ``` - - - -## Elastic {{infer-cap}} Service example [inference-example-elastic] - -The following example shows how to create an {{infer}} endpoint called `elser-model-eis` to perform a `text_embedding` task type. - -```console -PUT _inference/sparse_embedding/elser-model-eis -{ - "service": "elastic", - "service_settings": { - "model_name": "elser" - } -} -``` - -The following example shows how to create an {{infer}} endpoint called `chat-completion-endpoint` to perform a `chat_completion` task type. - -```console -PUT /_inference/chat_completion/chat-completion-endpoint -{ - "service": "elastic", - "service_settings": { - "model_id": "model-1" - } -} -``` - diff --git a/explore-analyze/toc.yml b/explore-analyze/toc.yml index 5685008ace..ba04dba75b 100644 --- a/explore-analyze/toc.yml +++ b/explore-analyze/toc.yml @@ -118,10 +118,7 @@ toc: - file: transforms/transform-limitations.md - file: elastic-inference.md children: - - file: elastic-inference/eis.md - file: elastic-inference/inference-api.md - children: - - file: elastic-inference/inference-api/elastic-inference-service-eis.md - file: machine-learning.md children: - file: machine-learning/setting-up-machine-learning.md diff --git a/solutions/observability/observability-ai-assistant.md b/solutions/observability/observability-ai-assistant.md index 5c3b794d4c..4afeb6df99 100644 --- a/solutions/observability/observability-ai-assistant.md +++ b/solutions/observability/observability-ai-assistant.md @@ -16,7 +16,7 @@ You can [interact with the AI Assistant](#obs-ai-interact) in two ways: * **Contextual insights**: Embedded assistance throughout Elastic UIs that explains errors and messages with suggested remediation steps. * **Chat interface**: A conversational experience where you can ask questions and receive answers about your data. The assistant uses function calling to request, analyze, and visualize information based on your needs. -By default, AI Assistant uses a [preconfigured LLM](#preconfigured-llm-ai-assistant) connector that works out of the box. You can also connect to third-party LLM providers. +The AI Assistant integrates with your large language model (LLM) provider through our supported {{stack}} connectors: ## Use cases @@ -28,11 +28,6 @@ The {{obs-ai-assistant}} helps you: * **Build and execute queries**: Build Elasticsearch queries from natural language, convert Query DSL to ES|QL syntax, and execute queries directly from the chat interface * **Visualize data**: Create time-series charts and distribution graphs from your Elasticsearch data -## Preconfigured LLM [preconfigured-llm-ai-assistant] - -:::{include} ../_snippets/elastic-llm.md -::: - ## Requirements [obs-ai-requirements] The AI assistant requires the following: @@ -45,7 +40,7 @@ The AI assistant requires the following: - To run {{obs-ai-assistant}} on a self-hosted Elastic stack, you need an [appropriate license](https://www.elastic.co/subscriptions). -- If not using the [default preconfigured LLM](#preconfigured-llm-ai-assistant), you need an account with a third-party generative AI provider that preferably supports function calling. If your provider does not support function calling, you can configure AI Assistant settings under **Stack Management** to simulate function calling, but this might affect performance. +- An account with a third-party generative AI provider that preferably supports function calling. If your AI provider does not support function calling, you can configure AI Assistant settings under **Stack Management** to simulate function calling, but this might affect performance. - The free tier offered by third-party generative AI provider may not be sufficient for the proper functioning of the AI assistant. In most cases, a paid subscription to one of the supported providers is required. @@ -76,10 +71,6 @@ It's important to understand how your data is handled when using the AI Assistan ## Set up the AI Assistant [obs-ai-set-up] -:::{note} -If you use [the preconfigured LLM](#preconfigured-llm-ai-assistant) connector, you can skip this step. Your LLM connector is ready to use. -::: - The AI Assistant connects to one of these supported LLM providers: | Provider | Configuration | Authentication | diff --git a/solutions/search/rag/playground.md b/solutions/search/rag/playground.md index 9f5c13161c..bec09040f2 100644 --- a/solutions/search/rag/playground.md +++ b/solutions/search/rag/playground.md @@ -59,11 +59,6 @@ Here’s a simpified overview of how Playground works: * User can also **Download the code** to integrate into application -## Elastic LLM [preconfigured-llm-playground] - -:::{include} ../../_snippets/elastic-llm.md -::: - ## Availability and prerequisites [playground-availability-prerequisites] For Elastic Cloud and self-managed deployments Playground is available in the **Search** space in {{kib}}, under **Content** > **Playground**. @@ -77,7 +72,7 @@ To use Playground, you’ll need the following: * See [ingest data](playground.md#playground-getting-started-ingest) if you’d like to ingest sample data. -3. If not using the default preconfigured LLM connector, you will need an account with a supported LLM provider: +3. An account with a **supported LLM provider**. Playground supports the following: * **Amazon Bedrock** @@ -119,11 +114,6 @@ You can also use locally hosted LLMs that are compatible with the OpenAI SDK. On ### Connect to LLM provider [playground-getting-started-connect] -:::{note} -If you use [the preconfigured LLM](#preconfigured-llm-playground) connector, you can skip this step. Your LLM connector is ready to use. - -::: - To get started with Playground, you need to create a [connector](../../../deploy-manage/manage-connectors.md) for your LLM provider. You can also connect to [locally hosted LLMs](playground.md#playground-local-llms) which are compatible with the OpenAI API, by using the OpenAI connector. To connect to an LLM provider, follow these steps on the Playground landing page: