|
| 1 | +--- |
| 2 | +navigation_title: Connect to a local LLM |
| 3 | +mapped_pages: |
| 4 | + - https://www.elastic.co/guide/en/observability/current/connect-to-local-llm.html |
| 5 | +applies_to: |
| 6 | + stack: ga 9.2 |
| 7 | + serverless: ga |
| 8 | +products: |
| 9 | + - id: observability |
| 10 | +--- |
| 11 | + |
| 12 | +# Connect to your own local LLM |
| 13 | + |
| 14 | +This page provides instructions for setting up a connector to a large language model (LLM) of your choice using LM Studio. This allows you to use your chosen model within the {{obs-ai-assistant}}. You’ll first need to set up LM Studio, then download and deploy a model via LM studio and finally configure the connector in your Elastic deployment. |
| 15 | + |
| 16 | +::::{note} |
| 17 | +If your Elastic deployment is not on the same network, you must configure an Nginx reverse proxy to authenticate with Elastic. Refer to [Configure your reverse proxy](https://www.elastic.co/docs/solutions/security/ai/connect-to-own-local-llm#_configure_your_reverse_proxy) for more detailed instructions. |
| 18 | + |
| 19 | +You do not have to set up a proxy if LM Studio is running locally, or on the same network as your Elastic deployment. |
| 20 | +:::: |
| 21 | + |
| 22 | +This example uses a server hosted in GCP to configure LM Studio with the [Llama-3.3-70B-Instruct](https://huggingface.co/lmstudio-community/Llama-3.3-70B-Instruct-GGUF) model. |
| 23 | + |
| 24 | +### Already running LM Studio? [skip-if-already-running] |
| 25 | + |
| 26 | +If you've already installed LM Studio, the server is running, and you have a model loaded (with a context window of at least 64K tokens), skip directly to [Configure the connector in your Elastic deployment](#configure-the-connector-in-your-elastic-deployment). |
| 27 | + |
| 28 | +## Configure LM Studio and download a model [configure-lm-studio-and-download-a-model] |
| 29 | + |
| 30 | +LM Studio supports the OpenAI SDK, which makes it compatible with Elastic’s OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace. |
| 31 | + |
| 32 | +To get started with LM Studio: |
| 33 | + |
| 34 | +1. Install [LM Studio](https://lmstudio.ai/). |
| 35 | +2. You must launch the application using its GUI before being able to use the CLI. Depending on where you're deploying, use one of the following methods: |
| 36 | + * **Local deployments**: Launch LM Studio using the GUI. |
| 37 | + * **GCP deployments**: Launch using Chrome RDP with an [X Window System](https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine). |
| 38 | + * **Other cloud platform deployments**: Launch using any secure remote desktop (RDP, VNC over SSH tunnel, or X11 forwarding) as long as you can open the LM Studio GUI once. |
| 39 | +3. After you’ve opened the application for the first time using the GUI, start the server using `sudo lms server start` in the [CLI](https://lmstudio.ai/docs/cli/server-start). |
| 40 | + |
| 41 | +Once you’ve launched LM Studio: |
| 42 | + |
| 43 | +1. Go to LM Studio’s Discover window. |
| 44 | +2. Search for an LLM (for example, `Llama 3.3`). Your chosen model must include `instruct` in its name (specified in download options) in order to work with Elastic. |
| 45 | +3. We recommend you use models published by a trusted source or verified authors (indicated by the purple verification badge next to the model name). |
| 46 | +4. After you find a model, view download options and select a recommended option (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware. |
| 47 | +5. Download one or more models. |
| 48 | + |
| 49 | +::::{important} |
| 50 | +For security reasons, before downloading a model, verify that it is from a trusted source or by a verified author. It can be helpful to review community feedback on the model (for example using a site like Hugging Face). |
| 51 | +:::: |
| 52 | + |
| 53 | +:::{image} /solutions/images/observability-ai-assistant-lms-model-selection.png |
| 54 | +:alt: The LM Studio model selection interface with download options |
| 55 | +::: |
| 56 | + |
| 57 | +Throughout this documentation, we used [`llama-3.3-70b-instruct`](https://lmstudio.ai/models/meta/llama-3.3-70b). It has 70B total parameters, a 128,000 token context window, and uses GGUF [quantization](https://huggingface.co/docs/transformers/main/en/quantization/overview). For more information about model names and format information, refer to the following table. |
| 58 | + |
| 59 | +| Attribute | Description | |
| 60 | +| --- | --- | |
| 61 | +| **Model Name** | LLM model name, sometimes with a version number (e.g., Llama, Mistral). | |
| 62 | +| **Parameter Size** | Number of parameters, which measures the size and complexity of a model (more parameters = more data it can process, learn from, generate, and predict). | |
| 63 | +| **Tokens / Context Window** | Tokens are small chunks of input information that don't necessarily correspond to characters. Use the [Tokenizer](https://platform.openai.com/tokenizer) to estimate how many tokens a prompt contains. The context window defines how much information the model can process at once. If the number of input tokens exceeds this limit, the input is truncated. | |
| 64 | +| **Quantization Format** | Type of quantization applied. Quantization reduces overall parameters and increases model speed, but reduces accuracy. Most models now support GPU offloading rather than CPU offloading. | |
| 65 | + |
| 66 | +::::{important} |
| 67 | +The {{obs-ai-assistant}} requires a model with at least a 64,000 token context window. |
| 68 | +:::: |
| 69 | + |
| 70 | +## Load a model in LM Studio [load-a-model-in-lm-studio] |
| 71 | + |
| 72 | +After downloading a model, load it in LM Studio using LM Studio’s [CLI tool](https://lmstudio.ai/docs/cli/load) or the GUI. |
| 73 | + |
| 74 | +### Option 1: Load a model using the CLI (Recommended) [option-1-load-a-model-using-the-cli-recommended] |
| 75 | + |
| 76 | +Once you’ve downloaded a model, use the following commands in your CLI: |
| 77 | + |
| 78 | +1. Verify LM Studio is installed: `lms` |
| 79 | +2. Check LM Studio’s status: `lms status` |
| 80 | +3. List all downloaded models: `lms ls` |
| 81 | +4. Load a model: `lms load llama-3.3-70b-instruct --context-length 64000 --gpu max`. |
| 82 | + |
| 83 | +::::{important} |
| 84 | +When loading a model, use the `--context-length` flag with a context window of 64,000 or higher. |
| 85 | +Optionally, you can set how much to offload to the GPU by using the `--gpu` flag. `--gpu max` will offload all layers to GPU. |
| 86 | +:::: |
| 87 | + |
| 88 | +After the model loads, you should see the message `Model loaded successfully` in the CLI. |
| 89 | + |
| 90 | +:::{image} /solutions/images/observability-ai-assistant-model-loaded.png |
| 91 | +:alt: The CLI message that appears after a model loads |
| 92 | +::: |
| 93 | + |
| 94 | +To verify which model is loaded, use the `lms ps` command. |
| 95 | + |
| 96 | +:::{image} /solutions/images/observability-ai-assistant-lms-ps-command.png |
| 97 | +:alt: The CLI message that appears after running lms ps |
| 98 | +::: |
| 99 | + |
| 100 | +If your model uses NVIDIA drivers, you can check the GPU performance with the `sudo nvidia-smi` command. |
| 101 | + |
| 102 | +### Option 2: Load a model using the GUI [option-2-load-a-model-using-the-gui] |
| 103 | + |
| 104 | +Once the model is downloaded, you'll find it in the **My Models** window in LM Studio. |
| 105 | + |
| 106 | +1. Navigate to the **Developer** window. |
| 107 | +2. Turn on the **Start server** toggle on the top left. Once the server is started, you'll see the address and port of the server. The default port is `1234`. |
| 108 | +3. Click on **Select a model to load** and pick your model from the model dropdown. |
| 109 | +4. Select the **Load** tab on the right side of the LM Studio GUI, and adjust the **Context Length** to 64,000. Reload the model to apply the changes. |
| 110 | + |
| 111 | +::::{note} |
| 112 | +To enable other devices on the same network to access the server, go to **Settings** and turn on **Serve on Local Network**. |
| 113 | +:::: |
| 114 | + |
| 115 | +:::{image} /solutions/images/observability-ai-assistant-lm-studio-load-model-gui.png |
| 116 | +:alt: Loading a model in LM studio developer tab |
| 117 | +::: |
| 118 | + |
| 119 | +## Configure the connector in your Elastic deployment [configure-the-connector-in-your-elastic-deployment] |
| 120 | + |
| 121 | +Finally, configure the connector: |
| 122 | + |
| 123 | +1. Log in to your Elastic deployment. |
| 124 | +2. Find the **Connectors** page in the navigation menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). Then click **Create Connector**, and select **OpenAI**. The OpenAI connector works for this use case because LM Studio uses the OpenAI SDK. |
| 125 | +3. Name your connector to help keep track of the model version you are using. |
| 126 | +4. Under **Select an OpenAI provider**, select **Other (OpenAI Compatible Service)**. |
| 127 | +5. Under **URL**, enter the host's IP address and port, followed by `/v1/chat/completions`. (If you have a reverse proxy set up, enter the domain name specified in your Nginx configuration file followed by `/v1/chat/completions`.) |
| 128 | +6. Under **Default model**, enter `llama-3.3-70b-instruct`. |
| 129 | +7. Under **API key**, fill in anything. (If you have a reverse proxy set up, enter the secret token specified in your Nginx configuration file.) |
| 130 | +8. Click **Save**. |
| 131 | + |
| 132 | +:::{image} /solutions/images/observability-ai-assistant-local-llm-connector-setup.png |
| 133 | +:alt: The OpenAI create connector flyout |
| 134 | +::: |
| 135 | + |
| 136 | +Setup is now complete. You can use the model you’ve loaded in LM Studio to power Elastic’s generative AI features. |
| 137 | + |
| 138 | +::::{note} |
| 139 | +While local (open-weight) LLMs offer greater privacy and control, they generally do not match the raw performance and advanced reasoning capabilities of proprietary models by LLM providers mentioned in [Set up the AI Assistant](/solutions/observability/observability-ai-assistant.md#obs-ai-set-up). |
| 140 | +:::: |
0 commit comments