Skip to content

Commit 6c7d7ba

Browse files
[Obs AI Assistant] Adds docs for connecting to a local LLM with the Obs AI Assistant (#2536)
Closes elastic/obs-ai-assistant-team#322 This PR adds documentation about how to connect to a local LLM with the Observability AI Assistant. --------- Co-authored-by: Mike Birnstiehl <[email protected]>
1 parent 88713a6 commit 6c7d7ba

8 files changed

+146
-4
lines changed
596 KB
Loading
809 KB
Loading
30.6 KB
Loading
110 KB
Loading
36.7 KB
Loading
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
navigation_title: Connect to a local LLM
3+
mapped_pages:
4+
- https://www.elastic.co/guide/en/observability/current/connect-to-local-llm.html
5+
applies_to:
6+
stack: ga 9.2
7+
serverless: ga
8+
products:
9+
- id: observability
10+
---
11+
12+
# Connect to your own local LLM
13+
14+
This page provides instructions for setting up a connector to a large language model (LLM) of your choice using LM Studio. This allows you to use your chosen model within the {{obs-ai-assistant}}. You’ll first need to set up LM Studio, then download and deploy a model via LM studio and finally configure the connector in your Elastic deployment.
15+
16+
::::{note}
17+
If your Elastic deployment is not on the same network, you must configure an Nginx reverse proxy to authenticate with Elastic. Refer to [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions.
18+
19+
You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment.
20+
::::
21+
22+
This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model.
23+
24+
### Already running LM Studio? [skip-if-already-running]
25+
26+
If you've already installed LM Studio, the server is running, and you have a model loaded (with a context window of at least 64K tokens), skip directly to [Configure the connector in your Elastic deployment](#configure-the-connector-in-your-elastic-deployment).
27+
28+
## Configure LM Studio and download a model [configure-lm-studio-and-download-a-model]
29+
30+
LM Studio supports the OpenAI SDK, which makes it compatible with Elastic’s OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace.
31+
32+
To get started with LM Studio:
33+
34+
1. Install [LM Studio](https://lmstudio.ai/).
35+
2. You must launch the application using its GUI before being able to use the CLI. Depending on where you're deploying, use one of the following methods:
36+
* **Local deployments**: Launch LM Studio using the GUI.
37+
* **GCP deployments**: Launch using Chrome RDP with an [X Window System](https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine).
38+
* **Other cloud platform deployments**: Launch using any secure remote desktop (RDP, VNC over SSH tunnel, or X11 forwarding) as long as you can open the LM Studio GUI once.
39+
3. After you’ve opened the application for the first time using the GUI, start the server using `sudo lms server start` in the [CLI](https://lmstudio.ai/docs/cli/server-start).
40+
41+
Once you’ve launched LM Studio:
42+
43+
1. Go to LM Studio’s Discover window.
44+
2. Search for an LLM (for example, `Llama 3.3`). Your chosen model must include `instruct` in its name (specified in download options) in order to work with Elastic.
45+
3. We recommend you use models published by a trusted source or verified authors (indicated by the purple verification badge next to the model name).
46+
4. After you find a model, view download options and select a recommended option (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware.
47+
5. Download one or more models.
48+
49+
::::{important}
50+
For security reasons, before downloading a model, verify that it is from a trusted source or by a verified author. It can be helpful to review community feedback on the model (for example using a site like Hugging Face).
51+
::::
52+
53+
:::{image} /solutions/images/observability-ai-assistant-lms-model-selection.png
54+
:alt: The LM Studio model selection interface with download options
55+
:::
56+
57+
Throughout this documentation, we used [`llama-3.3-70b-instruct`](https://lmstudio.ai/models/meta/llama-3.3-70b). It has 70B total parameters, a 128,000 token context window, and uses GGUF [quantization](https://huggingface.co/docs/transformers/main/en/quantization/overview). For more information about model names and format information, refer to the following table.
58+
59+
| Attribute | Description |
60+
| --- | --- |
61+
| **Model Name** | LLM model name, sometimes with a version number (e.g., Llama, Mistral). |
62+
| **Parameter Size** | Number of parameters, which measures the size and complexity of a model (more parameters = more data it can process, learn from, generate, and predict). |
63+
| **Tokens / Context Window** | Tokens are small chunks of input information that don't necessarily correspond to characters. Use the [Tokenizer](https://platform.openai.com/tokenizer) to estimate how many tokens a prompt contains. The context window defines how much information the model can process at once. If the number of input tokens exceeds this limit, the input is truncated. |
64+
| **Quantization Format** | Type of quantization applied. Quantization reduces overall parameters and increases model speed, but reduces accuracy. Most models now support GPU offloading rather than CPU offloading. |
65+
66+
::::{important}
67+
The {{obs-ai-assistant}} requires a model with at least a 64,000 token context window.
68+
::::
69+
70+
## Load a model in LM Studio [load-a-model-in-lm-studio]
71+
72+
After downloading a model, load it in LM Studio using LM Studio’s [CLI tool](https://lmstudio.ai/docs/cli/load) or the GUI.
73+
74+
### Option 1: Load a model using the CLI (Recommended) [option-1-load-a-model-using-the-cli-recommended]
75+
76+
Once you’ve downloaded a model, use the following commands in your CLI:
77+
78+
1. Verify LM Studio is installed: `lms`
79+
2. Check LM Studio’s status: `lms status`
80+
3. List all downloaded models: `lms ls`
81+
4. Load a model: `lms load llama-3.3-70b-instruct --context-length 64000 --gpu max`.
82+
83+
::::{important}
84+
When loading a model, use the `--context-length` flag with a context window of 64,000 or higher.
85+
Optionally, you can set how much to offload to the GPU by using the `--gpu` flag. `--gpu max` will offload all layers to GPU.
86+
::::
87+
88+
After the model loads, you should see the message `Model loaded successfully` in the CLI.
89+
90+
:::{image} /solutions/images/observability-ai-assistant-model-loaded.png
91+
:alt: The CLI message that appears after a model loads
92+
:::
93+
94+
To verify which model is loaded, use the `lms ps` command.
95+
96+
:::{image} /solutions/images/observability-ai-assistant-lms-ps-command.png
97+
:alt: The CLI message that appears after running lms ps
98+
:::
99+
100+
If your model uses NVIDIA drivers, you can check the GPU performance with the `sudo nvidia-smi` command.
101+
102+
### Option 2: Load a model using the GUI [option-2-load-a-model-using-the-gui]
103+
104+
Once the model is downloaded, you'll find it in the **My Models** window in LM Studio.
105+
106+
1. Navigate to the **Developer** window.
107+
2. Turn on the **Start server** toggle on the top left. Once the server is started, you'll see the address and port of the server. The default port is `1234`.
108+
3. Click on **Select a model to load** and pick your model from the model dropdown.
109+
4. Select the **Load** tab on the right side of the LM Studio GUI, and adjust the **Context Length** to 64,000. Reload the model to apply the changes.
110+
111+
::::{note}
112+
To enable other devices on the same network to access the server, go to **Settings** and turn on **Serve on Local Network**.
113+
::::
114+
115+
:::{image} /solutions/images/observability-ai-assistant-lm-studio-load-model-gui.png
116+
:alt: Loading a model in LM studio developer tab
117+
:::
118+
119+
## Configure the connector in your Elastic deployment [configure-the-connector-in-your-elastic-deployment]
120+
121+
Finally, configure the connector:
122+
123+
1. Log in to your Elastic deployment.
124+
2. Find the **Connectors** page in the navigation menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Then click **Create Connector**, and select **OpenAI**. The OpenAI connector works for this use case because LM Studio uses the OpenAI SDK.
125+
3. Name your connector to help keep track of the model version you are using.
126+
4. Under **Select an OpenAI provider**, select **Other (OpenAI Compatible Service)**.
127+
5. Under **URL**, enter the host's IP address and port, followed by `/v1/chat/completions`. (If you have a reverse proxy set up, enter the domain name specified in your Nginx configuration file followed by `/v1/chat/completions`.)
128+
6. Under **Default model**, enter `llama-3.3-70b-instruct`.
129+
7. Under **API key**, fill in anything. (If you have a reverse proxy set up, enter the secret token specified in your Nginx configuration file.)
130+
8. Click **Save**.
131+
132+
:::{image} /solutions/images/observability-ai-assistant-local-llm-connector-setup.png
133+
:alt: The OpenAI create connector flyout
134+
:::
135+
136+
Setup is now complete. You can use the model you’ve loaded in LM Studio to power Elastic’s generative AI features.
137+
138+
::::{note}
139+
While local (open-weight) LLMs offer greater privacy and control, they generally do not match the raw performance and advanced reasoning capabilities of proprietary models by LLM providers mentioned in [Set up the AI Assistant](/solutions/observability/observability-ai-assistant.md#obs-ai-set-up).
140+
::::

solutions/observability/observability-ai-assistant.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -91,15 +91,15 @@ The AI Assistant connects to one of these supported LLM providers:
9191
- The provider's API endpoint URL
9292
- Your authentication key or secret
9393

94-
::::{important}
95-
{{obs-ai-assistant}} doesn’t support connecting to a private LLM. Elastic doesn’t recommend using private LLMs with the AI Assistant.
96-
::::
97-
9894
### Elastic Managed LLM [elastic-managed-llm-obs-ai-assistant]
9995

10096
:::{include} ../_snippets/elastic-managed-llm.md
10197
:::
10298

99+
### Connect to a custom local LLM
100+
101+
[Connect to LM Studio](/solutions/observability/connect-to-own-local-llm.md) to use a custom LLM deployed and managed by you.
102+
103103
## Add data to the AI Assistant knowledge base [obs-ai-add-data]
104104

105105
The AI Assistant uses one of the following text embedding models to run semantic search against the internal knowledge base index. The top results are passed to the LLM as context (retrieval‑augmented generation), producing more accurate and grounded responses:

solutions/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,8 @@ toc:
463463
- file: observability/incident-management/create-an-slo.md
464464
- file: observability/data-set-quality-monitoring.md
465465
- file: observability/observability-ai-assistant.md
466+
children:
467+
- file: observability/connect-to-own-local-llm.md
466468
- file: observability/observability-serverless-feature-tiers.md
467469
- file: security.md
468470
children:

0 commit comments

Comments
 (0)