-
Notifications
You must be signed in to change notification settings - Fork 218
Minor docs fixes #3565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Minor docs fixes #3565
Changes from 5 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
a3c9fbe
minor fixed in demos and docs
dtrawins 797484e
fix export script for new params
dtrawins 99d2a9e
fix sdl
dtrawins 932b412
spelling
dtrawins 58ab619
spelling
dtrawins 0b0364a
Apply suggestions from code review
dtrawins 5991053
fix model export for image generation
dtrawins cabf81b
Merge branch 'minor-docs-fixes' of https://github.com/openvinotoolkit…
dtrawins 314b60d
add auto_gptq to export script
dtrawins 98a4c8c
fix installing auto_gptq
dtrawins 44b1e36
fix prepare model on windows
dtrawins 739719f
restore quen3 tool call type
dtrawins fc75c8c
fix embedding demo command
dtrawins 44fce55
fix building python
dtrawins 49412f9
fix docs automation
dtrawins d073fb0
refresh command help in readme
dtrawins 64e883b
spelling
dtrawins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
# Agentic AI with OpenVINO Model Server {#ovms_demos_continuous_batching_agent} | ||
|
||
This demo version requires OVMS version 2025.3. Build it from [source](../../../docs/build_from_source.md) before it is published. | ||
|
||
OpenVINO Model Server can be used to serve language models for AI Agents. It supports the usage of tools in the context of content generation. | ||
It can be integrated with MCP servers and AI agent frameworks. | ||
You can learn more about [tools calling based on OpenAI API](https://platform.openai.com/docs/guides/function-calling?api-mode=responses) | ||
|
@@ -10,10 +12,14 @@ Here are presented required steps to deploy language models trained for tools su | |
The application employing OpenAI agent SDK is using MCP server. It is equipped with a set of tools to providing context for the content generation. | ||
The tools can also be used for automation purposes based on input in text format. | ||
|
||
|
||
|
||
## Export LLM model | ||
Currently supported models: | ||
- Qwen/Qwen3-8B | ||
- Qwen/Qwen3-4B | ||
- meta-llama/Llama-3.1-8B-Instruct | ||
- meta-llama/Llama-3.2-3B-Instruct | ||
- NousResearch/Hermes-3-Llama-3.1-8B | ||
- microsoft/Phi-4-mini-instruct | ||
|
||
|
@@ -23,7 +29,7 @@ The model response with tool call follow a specific syntax which is process by a | |
Download export script, install it's dependencies and create directory for the models: | ||
```console | ||
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py | ||
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt | ||
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/2/demos/common/export_models/requirements.txt | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if demo requires 2025.3 ovms version shouldn't we use export_models from main? |
||
mkdir models | ||
``` | ||
Run `export_model.py` script to download and quantize the model: | ||
|
@@ -47,7 +53,13 @@ python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-for | |
:::: | ||
|
||
You can use similar commands for different models. Change the source_model and the tools_model_type (note that as of today the following types as available: `[phi4, llama3, qwen3, hermes3]`). | ||
> **Note:** The tuned chat template will be copied to the model folder as template.jinja and the response parser will be set in the graph.pbtxt | ||
> **Note:** Some models give more reliable responses with tuned chat template. Copy custom template to the model folder like below: | ||
``` | ||
ngrozae marked this conversation as resolved.
Show resolved
Hide resolved
|
||
curl -L -o models/meta-llama/Llama-3.1-8B-Instruct/template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.9.0/examples/tool_chat_template_llama3.1_json.jinja | ||
curl -L -o models/meta-llama/Llama-3.2-3B-Instruct/template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.9.0/examples/tool_chat_template_llama3.2_json.jinja | ||
curl -L -o models/NousResearch/Hermes-3-Llama-3.1-8B/template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.9.0/examples/tool_chat_template_hermes.jinja | ||
curl -L -o models/microsoft/Phi-4-mini-instruct/template.jinja https://raw.githubusercontent.com/vllm-project/vllm/refs/tags/v0.9.0/examples/tool_chat_template_phi4_mini.jinja | ||
``` | ||
|
||
|
||
## Start OVMS | ||
|
@@ -74,7 +86,7 @@ In case you want to use GPU device to run the generation, add extra docker param | |
to `docker run` command, use the image with GPU support. Export the models with precision matching the GPU capacity and adjust pipeline configuration. | ||
It can be applied using the commands below: | ||
```bash | ||
docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/models:ro openvino/model_server:2025.2-gpu \ | ||
docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/models:ro openvino/model_server:latest-gpu \ | ||
--rest_port 8000 --model_path /models/Qwen/Qwen3-8B --model_name Qwen/Qwen3-8B | ||
``` | ||
::: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demos/image_generation/README.md also uses 2025.2 ovms version. maybe update there too?