Skip to content

Commit 44ba9ef

Browse files
committed
Address PR comments
1 parent f1146ce commit 44ba9ef

File tree

1 file changed

+17
-19
lines changed

1 file changed

+17
-19
lines changed

solutions/observability/connect-to-own-local-llm.md

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ products:
1414
This page provides instructions for setting up a connector to a large language model (LLM) of your choice using LM Studio. This allows you to use your chosen model within the {{obs-ai-assistant}}. You’ll first need to set up LM Studio, then download and deploy a model via LM studio and finally configure the connector in your Elastic deployment.
1515

1616
::::{note}
17-
You do not have to set up a proxy if LM studio is configured on the same network as your Elastic deployment or locally on your machine.
17+
If your Elastic deployment is not on the same network, you must configure an Nginx reverse proxy to authenticate with Elastic. Refer to [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions.
1818

19-
If your Elastic deployment is not on the same network, you would need to configure a reverse proxy using Nginx to authenticate with Elastic. Refer [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions.
19+
You do not have to set up a proxy if LM studio is running locally, or on the same network as your Elastic deployment.
2020
::::
2121

2222
This example uses a server hosted in GCP to configure LM Studio with the [Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) model.
@@ -29,23 +29,21 @@ If LM Studio is already installed, the server is running, and you have a model l
2929

3030
LM Studio supports the OpenAI SDK, which makes it compatible with Elastic’s OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace.
3131

32-
As the first step, install [LM Studio](https://lmstudio.ai/).
32+
First, install [LM Studio](https://lmstudio.ai/).
3333

34-
You must launch the application using its GUI before being able to use the CLI.
34+
You must launch the application using its GUI before being able to use the CLI. Depending on where you're deploying, use one of the following methods:
3535

36-
::::{note}
37-
For local/on‑prem desktop: Launch LM studio directly.
38-
For GCP, Chrome RDP with an [X Window System](https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine) can be used for this purpose.
39-
For other cloud platforms: Any secure remote desktop (RDP, VNC over SSH tunnel, or X11 forwarding) works as long as you can open the LM Studio GUI once.
40-
::::
36+
* Local deployments: Launch LM studio using the GUI.
37+
* GCP deployments: Launch using Chrome RDP with an [X Window System](https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine).
38+
* Other cloud platform deployments: Launch using any secure remote desktop (RDP, VNC over SSH tunnel, or X11 forwarding) as long as you can open the LM Studio GUI once.
4139

4240
After you’ve opened the application for the first time using the GUI, you can start the server by using `sudo lms server start` in the [CLI](https://lmstudio.ai/docs/cli/server-start).
4341

4442
Once you’ve launched LM Studio:
4543

4644
1. Go to LM Studio’s Discover window.
4745
2. Search for an LLM (for example, `Llama 3.3`). Your chosen model must include `instruct` in its name (specified in download options) in order to work with Elastic.
48-
3. When selecting a model, models published by verified authors are recommended (indicated by the purple verification badge next to the model name).
46+
3. We recommend you use models published by a trusted source or verified authors (indicated by the purple verification badge next to the model name).
4947
4. After you find a model, view download options and select a recommended option (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware.
5048
5. Download one or more models.
5149

@@ -57,15 +55,15 @@ For security reasons, before downloading a model, verify that it is from a trust
5755
:alt: The LM Studio model selection interface with download options
5856
:::
5957

60-
This [`llama-3.3-70b-instruct`](https://lmstudio.ai/models/meta/llama-3.3-70b) model used in this example has 70B total parameters, a 128,000 token context window, and uses GGUF [quanitization](https://huggingface.co/docs/transformers/main/en/quantization/overview). For more information about model names and format information, refer to the following table.
58+
In this example we used [`llama-3.3-70b-instruct`](https://lmstudio.ai/models/meta/llama-3.3-70b). It has 70B total parameters, a 128,000 token context window, and uses GGUF [quanitization](https://huggingface.co/docs/transformers/main/en/quantization/overview). For more information about model names and format information, refer to the following table.
6159

6260
| Model Name | Parameter Size | Tokens/Context Window | Quantization Format |
6361
| --- | --- | --- | --- |
6462
| Name of model, sometimes with a version number. | LLMs are often compared by their number of parameters — higher numbers mean more powerful models. | Tokens are small chunks of input information. Tokens do not necessarily correspond to characters. You can use [Tokenizer](https://platform.openai.com/tokenizer) to see how many tokens a given prompt might contain. | Quantization reduces overall parameters and helps the model to run faster, but reduces accuracy. |
6563
| Examples: Llama, Mistral. | The number of parameters is a measure of the size and the complexity of the model. The more parameters a model has, the more data it can process, learn from, generate, and predict. | The context window defines how much information the model can process at once. If the number of input tokens exceeds this limit, input gets truncated. | Specific formats for quantization vary, most models now support GPU rather than CPU offloading. |
6664

6765
::::{important}
68-
The {{obs-ai-assistant}} requires a model with at least 64,000 token context window.
66+
The {{obs-ai-assistant}} requires a model with at least a 64,000 token context window.
6967
::::
7068

7169
## Load a model in LM Studio [load-a-model-in-lm-studio]
@@ -106,19 +104,19 @@ If your model uses NVIDIA drivers, you can check the GPU performance with the `s
106104

107105
### Option 2: Load a model using the GUI [option-2-load-a-model-using-the-gui]
108106

109-
Once the model is downloaded, it will appear in the "My Models" window in LM Studio.
107+
Once the model is downloaded, it will appear in the **My Models** window in LM Studio.
110108

111109
:::{image} /solutions/images/observability-ai-assistant-lm-studio-my-models.png
112110
:alt: My Models window in LM Studio with downloaded models
113111
:::
114112

115-
1. Navigate to the Developer window.
116-
2. Click on the "Start server" toggle on the top left. Once the server is started, you'll see the address and port of the server. The port will be defaulted to 1234.
117-
3. Click on "Select a model to load" and pick the model `Llama 3.3 70B Instruct` from the dropdown menu.
118-
4. Navigate to the "Load" on the right side of the LM Studio window, to adjust the context window to 64,000. Reload the model to apply the changes.
113+
1. Navigate to the **Developer** window.
114+
2. Click on the **Start server** toggle on the top left. Once the server is started, you'll see the address and port of the server. The default port is `1234`.
115+
4. Click on **Select a model to load** and pick the model `Llama 3.3 70B Instruct` from the dropdown menu.
116+
5. Navigate to the **Load** on the right side of the LM Studio window, to adjust the context window to 64,000. Reload the model to apply the changes.
119117

120118
::::{note}
121-
To enable other devices in the same network access the server, turn on "Serve on Local Network" via Settings.
119+
To enable other devices on the same network to access the server, go to **Settings** and turn on **Serve on Local Network**.
122120
::::
123121

124122
:::{image} /solutions/images/observability-ai-assistant-lm-studio-load-model-gui.png.png
@@ -130,7 +128,7 @@ To enable other devices in the same network access the server, turn on "Serve on
130128
Finally, configure the connector:
131129

132130
1. Log in to your Elastic deployment.
133-
2. Find the **Connectors** page in the navigation menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Then click **Create Connector**, and select **OpenAI**. The OpenAI connector is compatible for this use case because LM Studio uses the OpenAI SDK.
131+
2. Find the **Connectors** page in the navigation menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Then click **Create Connector**, and select **OpenAI**. The OpenAI connector works for this use case because LM Studio uses the OpenAI SDK.
134132
3. Name your connector to help keep track of the model version you are using.
135133
4. Under **Select an OpenAI provider**, select **Other (OpenAI Compatible Service)**.
136134
5. Under **URL**, enter the host's IP address and port, followed by `/v1/chat/completions`. (If you have a reverse proxy set up, enter the domain name specified in your Nginx configuration file followed by `/v1/chat/completions`.)

0 commit comments

Comments
 (0)