|
1 | | -*Note:* |
2 | | -This functionality is a work in progress |
3 | | - |
4 | | -# Pulling the models {#ovms_pul} |
5 | | - |
6 | | -There is a special mode to make OVMS pull the model from Hugging Face before starting the service: |
7 | | - |
8 | | -``` |
9 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --task <task> --task_params <task_params> |
10 | | -``` |
11 | | - |
12 | | -| option | description | |
13 | | -|---------------------------|-----------------------------------------------------------------------------------------------| |
14 | | -| `--pull` | Instructs the server to run in pulling mode to get the model from the Hugging Face repository | |
15 | | -| `--source_model` | Specifies the model name in the Hugging Face model repository (optional - if empty model_name is used) | |
16 | | -| `--model_repository_path` | Directory where all required model files will be saved | |
17 | | -| `--model_name` | Name of the model as exposed externally by the server | |
18 | | -| `--task` | Defines the task the model will support (e.g., text_generation/embedding, rerank, etc.) | |
19 | | -| `--task_params` | Task-specific parameters in a format to be determined (TBD FIXME) | |
20 | | - |
| 1 | +# OVMS Pull mode {#ovms_docs_pull} |
21 | 2 |
|
22 | | -It will prepare all needed configuration files to support LLMS with OVMS in model repository |
| 3 | +This documents describes how leverage OpenVINO Model Server (OVMS) pull feature to automate deployment configuration with Generative AI models from OpenVINO organization in HuggingFace (HF). This approach assumes that you are pulling from [OpenVINO organization](https://huggingface.co/OpenVINO) from HF. If the model is not from that organization, follow steps described in [this document](../demos/common/export_models/README.md). |
23 | 4 |
|
24 | | -# Starting the mediapipe graph or LLM models |
25 | | -Now you can start server with single mediapipe graph, or LLM model that is already present in local filesystem with: |
| 5 | +### Pulling the models |
26 | 6 |
|
27 | | -``` |
28 | | -docker run -d --rm -v <model_repository_path>:/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest \ |
29 | | ---model_path <path_to_model> --model_name <model_name> --port 9000 --rest_port 8000 |
30 | | -``` |
31 | | - |
32 | | -Server will detect the type of requested servable (model or mediapipe graph) and load it accordingly. This detection is based on the presence of a `.pbtxt` file, which defines the Mediapipe graph structure. |
33 | | - |
34 | | -*Note*: There is no online model modification nor versioning capability as of now for graphs, LLM like models. |
35 | | - |
36 | | -# Starting the LLM model from HF directly |
| 7 | +There is a special mode to make OVMS pull the model from Hugging Face before starting the service: |
37 | 8 |
|
38 | | -In case you do not want to prepare model repository before starting the server in one command you can run OVMS with: |
| 9 | +::::{tab-set} |
| 10 | +:::{tab-item} With Docker |
| 11 | +:sync: docker |
| 12 | +**Required:** Docker Engine installed |
39 | 13 |
|
| 14 | +```text |
| 15 | +docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS] |
40 | 16 | ``` |
41 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest --source_model <model_name_in_HF> --model_repository_path /models --model_name <ovms_servable_name> --task <task> --task_params <task_params> |
42 | | -``` |
43 | | - |
44 | | -It will download required model files, prepare configuration for OVMS and start serving the model. |
45 | | - |
46 | | -# Starting the LLM model from local storage |
| 17 | +::: |
47 | 18 |
|
48 | | -In case you have predownloaded the model files from HF but you lack OVMS configuration files you can start OVMS with |
49 | | -``` |
50 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest --source_model <model_name_in_HF> --model_repository_path <path_where_to_store_ovms_config_files> --model_name <external_model_name> --task <task> --task_params <task_params> |
51 | | -``` |
52 | | -This command will create graph.pbtxt in the ```model_repository_path/source_model``` path. |
53 | | - |
54 | | -# Simplified mediapipe graphs and LLM models loading |
55 | | - |
56 | | -Now there is an easier way to specify LLM configurations in `config.json`. In the `model_config` section, it is sufficient to specify `model_name` and `base_path`, and the server will detect if there is a graph configuration file (`.pbtxt`) present and load the servable accordingly. |
57 | | - |
58 | | -For example, the `model_config` section in `config.json` could look like this: |
59 | | - |
60 | | -```json |
61 | | -{ |
62 | | - "model_config_list": [ |
63 | | - { |
64 | | - "config": { |
65 | | - "name": "text_generation_model", |
66 | | - "base_path": "/models/text_generation_model" |
67 | | - } |
68 | | - }, |
69 | | - { |
70 | | - "config": { |
71 | | - "name": "embedding_model", |
72 | | - "base_path": "/models/embedding_model" |
73 | | - } |
74 | | - }, |
75 | | - { |
76 | | - "config": { |
77 | | - "name": "mediapipe_graph", |
78 | | - "base_path": "/models/mediapipe_graph" |
79 | | - } |
80 | | - } |
81 | | - ] |
82 | | -} |
83 | | -``` |
84 | | -# List models |
| 19 | +:::{tab-item} On Baremetal Host |
| 20 | +:sync: baremetal |
| 21 | +**Required:** OpenVINO Model Server package - see [deployment instructions](../deploying_server_baremetal.md) for details. |
85 | 22 |
|
86 | | -To check what models are servable from specified model repository: |
87 | | -``` |
88 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest \ |
89 | | ---model_repository_path /models --list_models |
| 23 | +```text |
| 24 | +ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS] |
90 | 25 | ``` |
| 26 | +::: |
| 27 | +:::: |
91 | 28 |
|
92 | | -For following directory structure: |
93 | | -``` |
94 | | -/models |
95 | | -├── meta |
96 | | -│ ├── llama4 |
97 | | -│ │ └── graph.pbtxt |
98 | | -│ ├── llama3.1 |
99 | | -│ │ └── graph.pbtxt |
100 | | -├── LLama3.2 |
101 | | -│ └── graph.pbtxt |
102 | | -└── resnet |
103 | | - └── 1 |
104 | | - └── saved_model.pb |
105 | | -``` |
| 29 | +Example for pulling `OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov`: |
106 | 30 |
|
107 | | -The output would be: |
108 | | -``` |
109 | | -meta/llama4 |
110 | | -meta/llama3.1 |
111 | | -LLama3.2 |
112 | | -resnet |
| 31 | +```text |
| 32 | +ovms --pull --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models --model_name Phi-3-mini-FastDraft-50M-int8-ov --target_device CPU --task text_generation |
113 | 33 | ``` |
| 34 | +::::{tab-set} |
| 35 | +:::{tab-item} With Docker |
| 36 | +:sync: docker |
| 37 | +**Required:** Docker Engine installed |
114 | 38 |
|
115 | | -# Enable model |
116 | | - |
117 | | -To add model to ovms configuration file with specific model use either: |
118 | | - |
| 39 | +```text |
| 40 | +docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models --model_name Phi-3-mini-FastDraft-50M-int8-ov --task text_generation |
119 | 41 | ``` |
120 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest \ |
121 | | ---model_repository_path /models/<model_path> --add_to_config <config_file_directory_path> --model_name <name> |
122 | | -``` |
123 | | - |
124 | | -When model is directly inside `/models`. |
| 42 | +::: |
125 | 43 |
|
126 | | -Or |
| 44 | +:::{tab-item} On Baremetal Host |
| 45 | +:sync: baremetal |
| 46 | +**Required:** OpenVINO Model Server package - see [deployment instructions](../deploying_server_baremetal.md) for details. |
127 | 47 |
|
| 48 | +```text |
| 49 | +ovms --pull --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models --model_name Phi-3-mini-FastDraft-50M-int8-ov --task text_generation |
128 | 50 | ``` |
129 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest \ |
130 | | ---add_to_config <config_file_directory_path> --model_name <name> --model_path <model_path> |
131 | | -``` |
132 | | -when there is no model_repository specified. |
| 51 | +::: |
| 52 | +:::: |
133 | 53 |
|
134 | | -## TIP: Use relative paths to make the config.json transferable in model_repository across ovms instances. |
135 | | -For example: |
136 | | -``` |
137 | | -cd model_repository_path |
138 | | -ovms --add_to_config . --model_name OpenVINO/DeepSeek-R1-Distill-Qwen-1.5B-int4-ov --model_repository_path . |
139 | | -``` |
140 | 54 |
|
141 | | -# Disable model |
| 55 | +It will prepare all needed configuration files to support LLMS with OVMS in the model repository. Check [parameters page](./parameters.md) for detailed descriptions of configuration options and parameter usage. |
142 | 56 |
|
143 | | -If you want to remove model from configuration file you can do it either manually or use command: |
| 57 | +In case you want to setup model and start server in one step follow instructions on [this page](./starting_server.md). |
144 | 58 |
|
145 | | -``` |
146 | | -docker run -d --rm -v <model_repository_path>:/models openvino/model_server:latest \ |
147 | | ---remove_from_config <config_file_directory_path> --model_name <name> |
148 | | -``` |
149 | | - |
150 | | -FIXME TODO TBD |
151 | | -- adjust existing documentation to link with this doc |
152 | | -- task, task_params to be updated explained |
| 59 | +*Note:* |
| 60 | +When using pull mode you need both read and write access rights to models repository. |
0 commit comments