Skip to content

Commit a8d2107

Browse files
benironsidemergify[bot]
authored andcommitted
[ESS] [Serverless] Updates BYO LLM page (#6326)
* updates BYO LLM page * fix link error * fixes broken serverless link * troubleshoot * fixes another broken serverless link * updates images and video * Update docs/AI-for-security/connect-to-byo.asciidoc Co-authored-by: Nastasha Solomon <[email protected]> * Update docs/serverless/AI-for-security/connect-to-byo-llm.asciidoc * Update docs/AI-for-security/connect-to-byo.asciidoc --------- Co-authored-by: Nastasha Solomon <[email protected]> (cherry picked from commit 900040b) # Conflicts: # docs/serverless/AI-for-security/connect-to-byo-llm.asciidoc # docs/serverless/AI-for-security/images/lms-cli-welcome.png # docs/serverless/AI-for-security/images/lms-model-select.png # docs/serverless/AI-for-security/images/lms-ps-command.png # docs/serverless/AI-for-security/images/lms-studio-model-loaded-msg.png # docs/serverless/AI-for-security/llm-connector-guides.asciidoc
1 parent 8d034bc commit a8d2107

File tree

11 files changed

+232
-16
lines changed

11 files changed

+232
-16
lines changed

docs/AI-for-security/connect-to-byo.asciidoc

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This page provides instructions for setting up a connector to a large language m
1010

1111
This example uses a single server hosted in GCP to run the following components:
1212

13-
* LM Studio with the https://mistral.ai/technology/#models[Mixtral-8x7b] model
13+
* LM Studio with the https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407[Mistral-Nemo-Instruct-2407] model
1414
* A reverse proxy using Nginx to authenticate to Elastic Cloud
1515

1616
image::images/lms-studio-arch-diagram.png[Architecture diagram for this guide]
@@ -20,7 +20,7 @@ NOTE: For testing, you can use alternatives to Nginx such as https://learn.micro
2020
[discrete]
2121
== Configure your reverse proxy
2222

23-
NOTE: If your Elastic instance is on the same host as LM Studio, you can skip this step.
23+
NOTE: If your Elastic instance is on the same host as LM Studio, you can skip this step. Also, check out our https://www.elastic.co/blog/herding-llama-3-1-with-elastic-and-lm-studio[blog post] that walks through the whole process of setting up a single-host implementation.
2424

2525
You need to set up a reverse proxy to enable communication between LM Studio and Elastic. For more complete instructions, refer to a guide such as https://www.digitalocean.com/community/tutorials/how-to-configure-nginx-as-a-reverse-proxy-on-ubuntu-22-04[this one].
2626

@@ -74,7 +74,14 @@ server {
7474
}
7575
--------------------------------------------------
7676

77-
IMPORTANT: If using the example configuration file above, you must replace several values: Replace `<secret token>` with your actual token, and keep it safe since you'll need it to set up the {elastic-sec} connector. Replace `<yourdomainname.com>` with your actual domain name. Update the `proxy_pass` value at the bottom of the configuration if you decide to change the port number in LM Studio to something other than 1234.
77+
[IMPORTANT]
78+
====
79+
If using the example configuration file above, you must replace several values:
80+
81+
* Replace `<secret token>` with your actual token, and keep it safe since you'll need it to set up the {elastic-sec} connector.
82+
* Replace `<yourdomainname.com>` with your actual domain name.
83+
* Update the `proxy_pass` value at the bottom of the configuration if you decide to change the port number in LM Studio to something other than 1234.
84+
====
7885

7986
[discrete]
8087
=== (Optional) Set up performance monitoring for your reverse proxy
@@ -85,23 +92,20 @@ You can use Elastic's {integrations-docs}/nginx[Nginx integration] to monitor pe
8592

8693
First, install https://lmstudio.ai/[LM Studio]. LM Studio supports the OpenAI SDK, which makes it compatible with Elastic's OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace.
8794

88-
One current limitation of LM Studio is that when it is installed on a server, you must launch the application using its GUI before doing so using the CLI. For example, by using Chrome RDP with an https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine[X Window System]. After you've opened the application the first time using the GUI, you can start it by using `sudo lms server start` in the CLI.
95+
You must launch the application using its GUI before doing so using the CLI. For example, use Chrome RDP with an https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine[X Window System]. After you've opened the application the first time using the GUI, you can start it by using `sudo lms server start` in the CLI.
8996

9097
Once you've launched LM Studio:
9198

9299
1. Go to LM Studio's Search window.
93-
2. Search for an LLM (for example, `Mixtral-8x7B-instruct`). Your chosen model must include `instruct` in its name in order to work with Elastic.
94-
3. Filter your search for "Compatibility Guess" to optimize results for your hardware. Results will be color coded:
95-
* Green means "Full GPU offload possible", which yields the best results.
96-
* Blue means "Partial GPU offload possible", which may work.
97-
* Red for "Likely too large for this machine", which typically will not work.
100+
2. Search for an LLM (for example, `Mistral-Nemo-Instruct-2407`). Your chosen model must include `instruct` in its name in order to work with Elastic.
101+
3. After you find a model, view download options and select a recommended version (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware.
98102
4. Download one or more models.
99103

100104
IMPORTANT: For security reasons, before downloading a model, verify that it is from a trusted source. It can be helpful to review community feedback on the model (for example using a site like Hugging Face).
101105

102106
image::images/lms-model-select.png[The LM Studio model selection interface]
103107

104-
In this example we used https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF[`TheBloke/Mixtral-8x7B-Instruct-v0.1.Q3_K_M.gguf`]. It has 46.7B total parameters, a 32,000 token context window, and uses GGUF https://huggingface.co/docs/transformers/main/en/quantization/overview[quanitization]. For more information about model names and format information, refer to the following table.
108+
In this example we used https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407[`mistralai/Mistral-Nemo-Instruct-2407`]. It has 12B total parameters, a 128,000 token context window, and uses GGUF https://huggingface.co/docs/transformers/main/en/quantization/overview[quanitization]. For more information about model names and format information, refer to the following table.
105109

106110
[cols="1,1,1,1", options="header"]
107111
|===
@@ -124,18 +128,18 @@ After downloading a model, load it in LM Studio using the GUI or LM Studio's htt
124128
[discrete]
125129
=== Option 1: load a model using the CLI (Recommended)
126130

127-
It is a best practice to download models from the marketplace using the GUI, and then load or unload them using the CLI. The GUI allows you to search for models, whereas the CLI only allows you to import specific paths, but the CLI provides a good interface for loading and unloading.
131+
It is a best practice to download models from the marketplace using the GUI, and then load or unload them using the CLI. The GUI allows you to search for models, whereas the CLI allows you to use `lms get` to search for models. The CLI provides a good interface for loading and unloading.
128132

129-
Use the following commands in your CLI:
133+
Once you've downloaded a model, use the following commands in your CLI:
130134

131135
1. Verify LM Studio is installed: `lms`
132136
2. Check LM Studio's status: `lms status`
133137
3. List all downloaded models: `lms ls`
134-
4. Load a model: `lms load`
138+
4. Load a model: `lms load`.
135139

136140
image::images/lms-cli-welcome.png[The CLI interface during execution of initial LM Studio commands]
137141

138-
After the model loads, you should see a `Model loaded successfully` message in the CLI.
142+
After the model loads, you should see a `Model loaded successfully` message in the CLI.
139143

140144
image::images/lms-studio-model-loaded-msg.png[The CLI message that appears after a model loads]
141145

@@ -156,8 +160,8 @@ Refer to the following video to see how to load a model using LM Studio's GUI. Y
156160
<img
157161
style="width: 100%; margin: auto; display: block;"
158162
class="vidyard-player-embed"
159-
src="https://play.vidyard.com/FMx2wxGQhquWPVhGQgjkyM.jpg"
160-
data-uuid="FMx2wxGQhquWPVhGQgjkyM"
163+
src="https://play.vidyard.com/c4AxH8d9tWMnwNp5J6bcfX.jpg"
164+
data-uuid="c4AxH8d9tWMnwNp5J6bcfX"
161165
data-v="4"
162166
data-type="inline"
163167
/>
344 KB
Loading
-454 KB
Loading
197 KB
Loading
-128 Bytes
Loading
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
[[connect-to-byo-llm]]
2+
= Connect to your own local LLM
3+
4+
:frontmatter-description: Set up a connector to LM Studio so you can use a local model with AI Assistant.
5+
:frontmatter-tags-products: [security]
6+
:frontmatter-tags-content-type: [guide]
7+
:frontmatter-tags-user-goals: [get-started]
8+
9+
This page provides instructions for setting up a connector to a large language model (LLM) of your choice using LM Studio. This allows you to use your chosen model within {elastic-sec}. You'll first need to set up a reverse proxy to communicate with {elastic-sec}, then set up LM Studio on a server, and finally configure the connector in your Elastic deployment. https://www.elastic.co/blog/ai-assistant-locally-hosted-models[Learn more about the benefits of using a local LLM].
10+
11+
This example uses a single server hosted in GCP to run the following components:
12+
13+
* LM Studio with the https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407[Mistral-Nemo-Instruct-2407] model
14+
* A reverse proxy using Nginx to authenticate to Elastic Cloud
15+
16+
image::images/lms-studio-arch-diagram.png[Architecture diagram for this guide]
17+
18+
NOTE: For testing, you can use alternatives to Nginx such as https://learn.microsoft.com/en-us/azure/developer/dev-tunnels/overview[Azure Dev Tunnels] or https://ngrok.com/[Ngrok], but using Nginx makes it easy to collect additional telemetry and monitor its status by using Elastic's native Nginx integration. While this example uses cloud infrastructure, it could also be replicated locally without an internet connection.
19+
20+
[discrete]
21+
== Configure your reverse proxy
22+
23+
NOTE: If your Elastic instance is on the same host as LM Studio, you can skip this step. Also, check out our https://www.elastic.co/blog/herding-llama-3-1-with-elastic-and-lm-studio[blog post] that walks through the whole process of setting up a single-host implementation.
24+
25+
You need to set up a reverse proxy to enable communication between LM Studio and Elastic. For more complete instructions, refer to a guide such as https://www.digitalocean.com/community/tutorials/how-to-configure-nginx-as-a-reverse-proxy-on-ubuntu-22-04[this one].
26+
27+
The following is an example Nginx configuration file:
28+
29+
[source,txt]
30+
--------------------------------------------------
31+
server {
32+
listen 80;
33+
listen [::]:80;
34+
server_name <yourdomainname.com>;
35+
server_tokens off;
36+
add_header x-xss-protection "1; mode=block" always;
37+
add_header x-frame-options "SAMEORIGIN" always;
38+
add_header X-Content-Type-Options "nosniff" always;
39+
return 301 https://$server_name$request_uri;
40+
}
41+
42+
server {
43+
44+
listen 443 ssl http2;
45+
listen [::]:443 ssl http2;
46+
server_name <yourdomainname.com>;
47+
server_tokens off;
48+
ssl_certificate /etc/letsencrypt/live/<yourdomainname.com>/fullchain.pem;
49+
ssl_certificate_key /etc/letsencrypt/live/<yourdomainname.com>/privkey.pem;
50+
ssl_session_timeout 1d;
51+
ssl_session_cache shared:SSL:50m;
52+
ssl_session_tickets on;
53+
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256';
54+
ssl_protocols TLSv1.3 TLSv1.2;
55+
ssl_prefer_server_ciphers on;
56+
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
57+
add_header x-xss-protection "1; mode=block" always;
58+
add_header x-frame-options "SAMEORIGIN" always;
59+
add_header X-Content-Type-Options "nosniff" always;
60+
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
61+
ssl_stapling on;
62+
ssl_stapling_verify on;
63+
ssl_trusted_certificate /etc/letsencrypt/live/<yourdomainname.com>/fullchain.pem;
64+
resolver 1.1.1.1;
65+
location / {
66+
67+
if ($http_authorization != "Bearer <secret token>") {
68+
return 401;
69+
}
70+
71+
proxy_pass http://localhost:1234/;
72+
}
73+
74+
}
75+
--------------------------------------------------
76+
77+
[IMPORTANT]
78+
====
79+
If using the example configuration file above, you must replace several values:
80+
81+
* Replace `<secret token>` with your actual token, and keep it safe since you'll need it to set up the {elastic-sec} connector.
82+
* Replace `<yourdomainname.com>` with your actual domain name.
83+
* Update the `proxy_pass` value at the bottom of the configuration if you decide to change the port number in LM Studio to something other than 1234.
84+
====
85+
86+
[discrete]
87+
=== (Optional) Set up performance monitoring for your reverse proxy
88+
You can use Elastic's {integrations-docs}/nginx[Nginx integration] to monitor performance and populate monitoring dashboards in the {security-app}.
89+
90+
[discrete]
91+
== Configure LM Studio and download a model
92+
93+
First, install https://lmstudio.ai/[LM Studio]. LM Studio supports the OpenAI SDK, which makes it compatible with Elastic's OpenAI connector, allowing you to connect to any model available in the LM Studio marketplace.
94+
95+
You must launch the application using its GUI before doing so using the CLI. For example, use Chrome RDP with an https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine[X Window System]. After you've opened the application the first time using the GUI, you can start it by using `sudo lms server start` in the CLI.
96+
97+
Once you've launched LM Studio:
98+
99+
1. Go to LM Studio's Search window.
100+
2. Search for an LLM (for example, `Mistral-Nemo-Instruct-2407`). Your chosen model must include `instruct` in its name in order to work with Elastic.
101+
3. After you find a model, view download options and select a recommended version (green). For best performance, select one with the thumbs-up icon that indicates good performance on your hardware.
102+
4. Download one or more models.
103+
104+
IMPORTANT: For security reasons, before downloading a model, verify that it is from a trusted source. It can be helpful to review community feedback on the model (for example using a site like Hugging Face).
105+
106+
image::images/lms-model-select.png[The LM Studio model selection interface]
107+
108+
In this example we used https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407[`mistralai/Mistral-Nemo-Instruct-2407`]. It has 12B total parameters, a 128,000 token context window, and uses GGUF https://huggingface.co/docs/transformers/main/en/quantization/overview[quanitization]. For more information about model names and format information, refer to the following table.
109+
110+
[cols="1,1,1,1", options="header"]
111+
|===
112+
| Model Name | Parameter Size | Tokens/Context Window | Quantization Format
113+
| Name of model, sometimes with a version number.
114+
| LLMs are often compared by their number of parameters — higher numbers mean more powerful models.
115+
| Tokens are small chunks of input information. Tokens do not necessarily correspond to characters. You can use https://platform.openai.com/tokenizer[Tokenizer] to see how many tokens a given prompt might contain.
116+
| Quantization reduces overall parameters and helps the model to run faster, but reduces accuracy.
117+
| Examples: Llama, Mistral, Phi-3, Falcon.
118+
| The number of parameters is a measure of the size and the complexity of the model. The more parameters a model has, the more data it can process, learn from, generate, and predict.
119+
| The context window defines how much information the model can process at once. If the number of input tokens exceeds this limit, input gets truncated.
120+
| Specific formats for quantization vary, most models now support GPU rather than CPU offloading.
121+
|===
122+
123+
[discrete]
124+
== Load a model in LM Studio
125+
126+
After downloading a model, load it in LM Studio using the GUI or LM Studio's https://lmstudio.ai/blog/lms[CLI tool].
127+
128+
[discrete]
129+
=== Option 1: load a model using the CLI (Recommended)
130+
131+
It is a best practice to download models from the marketplace using the GUI, and then load or unload them using the CLI. The GUI allows you to search for models, whereas the CLI allows you to use `lms get` to search for models. The CLI provides a good interface for loading and unloading.
132+
133+
Once you've downloaded a model, use the following commands in your CLI:
134+
135+
1. Verify LM Studio is installed: `lms`
136+
2. Check LM Studio's status: `lms status`
137+
3. List all downloaded models: `lms ls`
138+
4. Load a model: `lms load`.
139+
140+
image::images/lms-cli-welcome.png[The CLI interface during execution of initial LM Studio commands]
141+
142+
After the model loads, you should see a `Model loaded successfully` message in the CLI.
143+
144+
image::images/lms-studio-model-loaded-msg.png[The CLI message that appears after a model loads]
145+
146+
To verify which model is loaded, use the `lms ps` command.
147+
148+
image::images/lms-ps-command.png[The CLI message that appears after running lms ps]
149+
150+
If your model uses NVIDIA drivers, you can check the GPU performance with the `sudo nvidia-smi` command.
151+
152+
[discrete]
153+
=== Option 2: load a model using the GUI
154+
155+
Refer to the following video to see how to load a model using LM Studio's GUI. You can change the **port** setting, which is referenced in the Nginx configuration file. Note that the **GPU offload** was set to **Max**.
156+
157+
=======
158+
++++
159+
<script type="text/javascript" async src="https://play.vidyard.com/embed/v4.js"></script>
160+
<img
161+
style="width: 100%; margin: auto; display: block;"
162+
class="vidyard-player-embed"
163+
src="https://play.vidyard.com/c4AxH8d9tWMnwNp5J6bcfX.jpg"
164+
data-uuid="c4AxH8d9tWMnwNp5J6bcfX"
165+
data-v="4"
166+
data-type="inline"
167+
/>
168+
</br>
169+
++++
170+
=======
171+
172+
[discrete]
173+
== (Optional) Collect logs using Elastic's Custom Logs integration
174+
175+
You can monitor the performance of the host running LM Studio using Elastic's {integrations-docs}/log[Custom Logs integration]. This can also help with troubleshooting. Note that the default path for LM Studio logs is `/tmp/lmstudio-server-log.txt`, as in the following screenshot:
176+
177+
image::images/lms-custom-logs-config.png[The configuration window for the custom logs integration]
178+
179+
[discrete]
180+
== Configure the connector in your Elastic deployment
181+
182+
Finally, configure the connector:
183+
184+
1. Log in to your Elastic deployment.
185+
2. Find the **Connectors** page in the navigation menu or use the {kibana-ref}/introduction.html#kibana-navigation-search[global search field]. Then click **Create Connector**, and select **OpenAI**. The OpenAI connector enables this use case because LM Studio uses the OpenAI SDK.
186+
3. Name your connector to help keep track of the model version you are using.
187+
4. Under **Select an OpenAI provider**, select **Other (OpenAI Compatible Service)**.
188+
5. Under **URL**, enter the domain name specified in your Nginx configuration file, followed by `/v1/chat/completions`.
189+
6. Under **Default model**, enter `local-model`.
190+
7. Under **API key**, enter the secret token specified in your Nginx configuration file.
191+
8. Click **Save**.
192+
193+
image::images/lms-edit-connector.png[The Edit connector page in the {security-app}, with appropriate values populated]
194+
195+
Setup is now complete. You can use the model you've loaded in LM Studio to power Elastic's generative AI features. You can test a variety of models as you interact with AI Assistant to see what works best without having to update your connector.
196+
197+
NOTE: While local models work well for <<security-ai-assistant, AI Assistant>>, we recommend you use one of <<security-llm-performance-matrix, these models>> for interacting with <<attack-discovery, Attack discovery>>. As local models become more performant over time, this is likely to change.
921 KB
Loading
210 KB
Loading
266 KB
Loading
148 KB
Loading

0 commit comments

Comments
 (0)