Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions solutions/observability/connect-to-own-local-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,17 @@ products:

# Connect to your own local LLM

:::{important}
Elastic doesn’t support the setup and configuration of local LLMs. The example provided is for reference only.
Before using a local LLM, evaluate its performance according to the [LLM performance matrix](./llm-performance-matrix.md#evaluate-your-own-model).
:::

This page provides instructions for setting up a connector to a large language model (LLM) of your choice using LM Studio. This allows you to use your chosen model within the {{obs-ai-assistant}}. You’ll first need to set up LM Studio, then download and deploy a model via LM studio and finally configure the connector in your Elastic deployment.

::::{note}
If your Elastic deployment is not on the same network, you must configure an Nginx reverse proxy to authenticate with Elastic. Refer to [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions.

You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
::::

::::{note}
Expand Down Expand Up @@ -85,7 +90,7 @@ Once you’ve downloaded a model, use the following commands in your CLI:
4. Load a model: `lms load llama-3.3-70b-instruct --context-length 64000 --gpu max`.

::::{important}
When loading a model, use the `--context-length` flag with a context window of 64,000 or higher.
When loading a model, use the `--context-length` flag with a context window of 64,000 or higher.
Optionally, you can set how much to offload to the GPU by using the `--gpu` flag. `--gpu max` will offload all layers to GPU.
::::

Expand Down
4 changes: 4 additions & 0 deletions solutions/observability/observability-ai-assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,10 @@ While the {{obs-ai-assistant}} is compatible with many different models, refer t
:::

### Connect to a custom local LLM
```{applies_to}
serverless: ga
stack: ga 9.2
```

[Connect to LM Studio](/solutions/observability/connect-to-own-local-llm.md) to use a custom LLM deployed and managed by you.

Expand Down