Address PR comments

viduni94 · viduni94 · commit 44ba9efdcc13 · 2025-08-15T14:00:08.000-04:00
diff --git a/solutions/observability/connect-to-own-local-llm.md b/solutions/observability/connect-to-own-local-llm.md
@@ -14,9 +14,9 @@ products:
 This page provides instructions for setting up a connector to a large language model (LLM) of your choice using LM Studio. This allows you to use your chosen model within the {{obs-ai-assistant}}. You’ll first need to set up LM Studio, then download and deploy a model via LM studio and finally configure the connector in your Elastic deployment.
 
 ::::{note}
-You do not have to set up a proxy if LM studio is configured on the same network as your Elastic deployment or locally on your machine. 
+If your Elastic deployment is not on the same network, you must configure an Nginx reverse proxy to authenticate with Elastic. Refer to [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions.
 
-If your Elastic deployment is not on the same network, you would need to configure a reverse proxy using Nginx to authenticate with Elastic. Refer [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions.
+You do not have to set up a proxy if LM studio is running locally, or on the same network as your Elastic deployment. 
 ::::
 
 This example uses a server hosted in GCP to configure LM Studio with the [Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) model.
@@ -29,23 +29,21 @@ If LM Studio is already installed, the server is running, and you have a model l
 
 LM Studio supports the OpenAI SDK, which makes it compatible with Elastic’s OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace.
 
-As the first step, install [LM Studio](https://lmstudio.ai/).
+First, install [LM Studio](https://lmstudio.ai/).
 
-You must launch the application using its GUI before being able to use the CLI.
+You must launch the application using its GUI before being able to use the CLI. Depending on where you're deploying, use one of the following methods:
 
-::::{note}
-For local/on‑prem desktop: Launch LM studio directly.
-For GCP, Chrome RDP with an [X Window System](https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine) can be used for this purpose.
-For other cloud platforms: Any secure remote desktop (RDP, VNC over SSH tunnel, or X11 forwarding) works as long as you can open the LM Studio GUI once.
-::::
+* Local deployments: Launch LM studio using the GUI.
+* GCP deployments: Launch using Chrome RDP with an [X Window System](https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine).
+* Other cloud platform deployments: Launch using any secure remote desktop (RDP, VNC over SSH tunnel, or X11 forwarding) as long as you can open the LM Studio GUI once.
 
 After you’ve opened the application for the first time using the GUI, you can start the server by using `sudo lms server start` in the [CLI](https://lmstudio.ai/docs/cli/server-start).
 
 Once you’ve launched LM Studio:
 
 1. Go to LM Studio’s Discover window.
 2. Search for an LLM (for example, `Llama 3.3`). Your chosen model must include `instruct` in its name (specified in download options) in order to work with Elastic.
-3. When selecting a model, models published by verified authors are recommended (indicated by the purple verification badge next to the model name).
+3. We recommend you use models published by a trusted source or verified authors (indicated by the purple verification badge next to the model name).
 4. After you find a model, view download options and select a recommended option (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware.
 5. Download one or more models.
 
@@ -57,15 +55,15 @@ For security reasons, before downloading a model, verify that it is from a trust
 :alt: The LM Studio model selection interface with download options
 :::
 
-This [`llama-3.3-70b-instruct`](https://lmstudio.ai/models/meta/llama-3.3-70b) model used in this example has 70B total parameters, a 128,000 token context window, and uses GGUF [quanitization](https://huggingface.co/docs/transformers/main/en/quantization/overview). For more information about model names and format information, refer to the following table.
+In this example we used [`llama-3.3-70b-instruct`](https://lmstudio.ai/models/meta/llama-3.3-70b). It has 70B total parameters, a 128,000 token context window, and uses GGUF [quanitization](https://huggingface.co/docs/transformers/main/en/quantization/overview). For more information about model names and format information, refer to the following table.
 
 | Model Name | Parameter Size | Tokens/Context Window | Quantization Format |
 | --- | --- | --- | --- |
 | Name of model, sometimes with a version number. | LLMs are often compared by their number of parameters — higher numbers mean more powerful models. | Tokens are small chunks of input information. Tokens do not necessarily correspond to characters. You can use [Tokenizer](https://platform.openai.com/tokenizer) to see how many tokens a given prompt might contain. | Quantization reduces overall parameters and helps the model to run faster, but reduces accuracy. |
 | Examples: Llama, Mistral. | The number of parameters is a measure of the size and the complexity of the model. The more parameters a model has, the more data it can process, learn from, generate, and predict. | The context window defines how much information the model can process at once. If the number of input tokens exceeds this limit, input gets truncated. | Specific formats for quantization vary, most models now support GPU rather than CPU offloading. |
 
 ::::{important}
-The {{obs-ai-assistant}} requires a model with at least 64,000 token context window.
+The {{obs-ai-assistant}} requires a model with at least a 64,000 token context window.
 ::::
 
 ## Load a model in LM Studio [load-a-model-in-lm-studio]
@@ -106,19 +104,19 @@ If your model uses NVIDIA drivers, you can check the GPU performance with the `s
 
 ### Option 2: Load a model using the GUI [option-2-load-a-model-using-the-gui]
 
-Once the model is downloaded, it will appear in the "My Models" window in LM Studio.
+Once the model is downloaded, it will appear in the **My Models** window in LM Studio.
 
 :::{image} /solutions/images/observability-ai-assistant-lm-studio-my-models.png
 :alt: My Models window in LM Studio with downloaded models
 :::
 
-1. Navigate to the Developer window.
-2. Click on the "Start server" toggle on the top left. Once the server is started, you'll see the address and port of the server. The port will be defaulted to 1234.
-3. Click on "Select a model to load" and pick the model `Llama 3.3 70B Instruct` from the dropdown menu.
-4. Navigate to the "Load" on the right side of the LM Studio window, to adjust the context window to 64,000. Reload the model to apply the changes.
+1. Navigate to the **Developer** window.
+2. Click on the **Start server** toggle on the top left. Once the server is started, you'll see the address and port of the server. The default port is `1234`.
+4. Click on **Select a model to load** and pick the model `Llama 3.3 70B Instruct` from the dropdown menu.
+5. Navigate to the **Load** on the right side of the LM Studio window, to adjust the context window to 64,000. Reload the model to apply the changes.
 
 ::::{note}
-To enable other devices in the same network access the server, turn on "Serve on Local Network" via Settings.
+To enable other devices on the same network to access the server, go to **Settings** and turn on **Serve on Local Network**.
 ::::
 
 :::{image} /solutions/images/observability-ai-assistant-lm-studio-load-model-gui.png.png
@@ -130,7 +128,7 @@ To enable other devices in the same network access the server, turn on "Serve on
 Finally, configure the connector:
 
 1. Log in to your Elastic deployment.
-2. Find the **Connectors** page in the navigation menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Then click **Create Connector**, and select **OpenAI**. The OpenAI connector is compatible for this use case because LM Studio uses the OpenAI SDK.
+2. Find the **Connectors** page in the navigation menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Then click **Create Connector**, and select **OpenAI**. The OpenAI connector works for this use case because LM Studio uses the OpenAI SDK.
 3. Name your connector to help keep track of the model version you are using.
 4. Under **Select an OpenAI provider**, select **Other (OpenAI Compatible Service)**.
 5. Under **URL**, enter the host's IP address and port, followed by `/v1/chat/completions`. (If you have a reverse proxy set up, enter the domain name specified in your Nginx configuration file followed by `/v1/chat/completions`.)