-
Notifications
You must be signed in to change notification settings - Fork 181
vLLM custom connector setup guide #3858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| 1. Configure your host server with the necessary GPU resources. | ||
| 2. Run the desired model in a vLLM container. | ||
| 3. Use a reverse proxy like Nginx to securely expose the endpoint to {{ecloud}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just Elastic Cloud that this works with? Not other deployment types?
| 1. When you want to invoke a tool, never describe the call in text. | ||
| 2. Always return the invocation in the `tool_calls` field. | ||
| 3. The `content` field must remain empty for any assistant message that performs a tool call. | ||
| 4. Only use tool calls defined in the "tools" parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Following https://github.com/elastic/sdh-security-team/issues/1417 to confirm if this system prompt fix works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since 9.1.7 it seems it is not needed anymore, but we can keep it until we change the recommended model,
more important in this case is to make sure they add
feature_flags.overrides:
securitySolution.inferenceChatModelDisabled: true
to config/kibana.yml otherwise Mistral is not going to work with Security Assistant (more details in linked SDH above)
can we make the existing page generic then link to two methods:
|
Yeah, that makes sense. @dhru42 I'm still curious to better understand the different use-cases that each option addresses. How should users pick which to set up? After docs on-week (this week), I'll work on it and sync up with Patryk and/or Garrett to discuss details. |
|
@benironside there's a formatting issue. could you ensure that all the steps are reflected as shown in the docs, otherwise it LGTM.
|
| 2. Run the following terminal command to start the vLLM server, download the model, and expose it on port 8000: | ||
|
|
||
| ```bash | ||
| docker run --name Mistral-Small-3.2-24B --gpus all \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is something we will be able to update shortly? I mean we should avoid recommending Mistral-Small-3.2-24B as it has a lot of issues with Security Assistant tool calling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can update this any time. For now, since this model isn't recommended, I replaced it with [YOUR_MODEL_ID]. Make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's going to be less confusing if we stay to the previous version and just update it with a new model, because the list of params depends on the model id
| 1. When you want to invoke a tool, never describe the call in text. | ||
| 2. Always return the invocation in the `tool_calls` field. | ||
| 3. The `content` field must remain empty for any assistant message that performs a tool call. | ||
| 4. Only use tool calls defined in the "tools" parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since 9.1.7 it seems it is not needed anymore, but we can keep it until we change the recommended model,
more important in this case is to make sure they add
feature_flags.overrides:
securitySolution.inferenceChatModelDisabled: true
to config/kibana.yml otherwise Mistral is not going to work with Security Assistant (more details in linked SDH above)
Vale Linting ResultsSummary: 3 suggestions found 💡 Suggestions (3)
|
bmorelli25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but right now the page feels like it's trying to be a guide and an example. If you pick a single type of content, it'll be more useful and easier to follow. I think you should structure this page as a guide, similar to these:
- https://www.elastic.co/docs/solutions/search/get-started/semantic-search
- https://www.elastic.co/docs/manage-data/data-store/data-streams/quickstart-tsds
The example content, like the Server info is probably still useful, but could be folded up into the relevant step.
Thoughts?
bmorelli25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just two more small comments for your consideration.
🚢 🚢
Co-authored-by: Brandon Morelli <[email protected]>
Co-authored-by: Brandon Morelli <[email protected]>

Resolves #3474 by creating a tutorial for how to connect a custom LLM running in vLLM to Elastic.
Technical reviewers, I left a few questions for you in comments. Also: