Skip to content

Conversation

@benironside
Copy link
Contributor

@benironside benironside commented Nov 7, 2025

Resolves #3474 by creating a tutorial for how to connect a custom LLM running in vLLM to Elastic.

Technical reviewers, I left a few questions for you in comments. Also:

  • Has this been tested with the Obs/Search Assistant, or are these instructions security-only.
  • Is this supported in v9.0+?
  • @dhru42 I could use some insight into how the use-case for this guide differs from the existing self-managed LLM guide

@benironside benironside self-assigned this Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025


1. Configure your host server with the necessary GPU resources.
2. Run the desired model in a vLLM container.
3. Use a reverse proxy like Nginx to securely expose the endpoint to {{ecloud}}.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just Elastic Cloud that this works with? Not other deployment types?

1. When you want to invoke a tool, never describe the call in text.
2. Always return the invocation in the `tool_calls` field.
3. The `content` field must remain empty for any assistant message that performs a tool call.
4. Only use tool calls defined in the "tools" parameter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Following https://github.com/elastic/sdh-security-team/issues/1417 to confirm if this system prompt fix works

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since 9.1.7 it seems it is not needed anymore, but we can keep it until we change the recommended model,
more important in this case is to make sure they add

feature_flags.overrides:
  securitySolution.inferenceChatModelDisabled: true

to config/kibana.yml otherwise Mistral is not going to work with Security Assistant (more details in linked SDH above)

@dhru42
Copy link
Contributor

dhru42 commented Nov 10, 2025

@dhru42 I could use some insight into how the use-case for this guide differs from the existing self-managed LLM guide

can we make the existing page generic then link to two methods:

  1. Connect to your own local LLM with LM Studio (exists already)
  2. Connect to your own local LLM with vLLM (the google doc i shared)

@benironside
Copy link
Contributor Author

@dhru42 I could use some insight into how the use-case for this guide differs from the existing self-managed LLM guide

can we make the existing page generic then link to two methods:

1. Connect to your own local LLM with LM Studio (exists already)

2. Connect to your own local LLM with vLLM ([the google doc i shared](https://docs.google.com/document/d/1pGKBECl6T4LdFhctAWURRZJrdEC8qKRVWz0bWnylN4s/edit?usp=sharing))

Yeah, that makes sense. @dhru42 I'm still curious to better understand the different use-cases that each option addresses. How should users pick which to set up?

After docs on-week (this week), I'll work on it and sync up with Patryk and/or Garrett to discuss details.

@dhru42
Copy link
Contributor

dhru42 commented Nov 12, 2025

@benironside there's a formatting issue. could you ensure that all the steps are reflected as shown in the docs, otherwise it LGTM.

image

@benironside benironside requested a review from spong November 14, 2025 23:08
@benironside benironside marked this pull request as ready for review November 19, 2025 21:20
@benironside benironside requested review from a team as code owners November 19, 2025 21:20
2. Run the following terminal command to start the vLLM server, download the model, and expose it on port 8000:

```bash
docker run --name Mistral-Small-3.2-24B --gpus all \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is something we will be able to update shortly? I mean we should avoid recommending Mistral-Small-3.2-24B as it has a lot of issues with Security Assistant tool calling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update this any time. For now, since this model isn't recommended, I replaced it with [YOUR_MODEL_ID]. Make sense to you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's going to be less confusing if we stay to the previous version and just update it with a new model, because the list of params depends on the model id

1. When you want to invoke a tool, never describe the call in text.
2. Always return the invocation in the `tool_calls` field.
3. The `content` field must remain empty for any assistant message that performs a tool call.
4. Only use tool calls defined in the "tools" parameter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since 9.1.7 it seems it is not needed anymore, but we can keep it until we change the recommended model,
more important in this case is to make sure they add

feature_flags.overrides:
  securitySolution.inferenceChatModelDisabled: true

to config/kibana.yml otherwise Mistral is not going to work with Security Assistant (more details in linked SDH above)

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

Vale Linting Results

Summary: 3 suggestions found

💡 Suggestions (3)
File Line Rule Message
solutions/security/ai/connect-to-own-local-llm.md 14 Elastic.Capitalization 'Connect to your own local LLM using LM Studio' should use sentence-style capitalization.
solutions/security/ai/connect-to-vLLM.md 21 Elastic.Capitalization 'Connect vLLM to' should use sentence-style capitalization.
solutions/security/ai/connect-to-vLLM.md 93 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.

Copy link
Member

@bmorelli25 bmorelli25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but right now the page feels like it's trying to be a guide and an example. If you pick a single type of content, it'll be more useful and easier to follow. I think you should structure this page as a guide, similar to these:

The example content, like the Server info is probably still useful, but could be folded up into the relevant step.

Thoughts?

Copy link
Member

@bmorelli25 bmorelli25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just two more small comments for your consideration.

🚢 🚢

@benironside benironside enabled auto-merge (squash) November 25, 2025 21:39
@benironside benironside disabled auto-merge November 25, 2025 21:39
@benironside benironside enabled auto-merge (squash) November 25, 2025 21:57
@benironside benironside merged commit 59c7441 into main Nov 25, 2025
7 of 8 checks passed
@benironside benironside deleted the 3474-vLLM-guide branch November 25, 2025 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Internal]: Document how to connect OSS model using vLLM

5 participants