Skip to content

Commit 16e7fa4

Browse files
dtrawinsngrozae
andauthored
updates in demos and readme (#3408)
--------- Co-authored-by: ngrozae <[email protected]>
1 parent 48a270c commit 16e7fa4

File tree

6 files changed

+102
-62
lines changed

6 files changed

+102
-62
lines changed

Dockerfile.redhat

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@
1414
# limitations under the License.
1515
#
1616

17-
ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.5
17+
ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.6
1818
ARG BUILD_IMAGE=build
1919
ARG PKG_IMAGE=pkg
20-
ARG RELEASE_BASE_IMAGE=registry.access.redhat.com/ubi9/ubi-minimal:9.5
20+
ARG RELEASE_BASE_IMAGE=registry.access.redhat.com/ubi9/ubi-minimal:9.6
2121

2222
FROM $BASE_IMAGE as base_build
23-
ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.5
23+
ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.6
2424

2525
SHELL ["/bin/bash", "-xo", "pipefail", "-c"]
2626

@@ -130,7 +130,7 @@ git clone https://github.com/git-lfs/git-lfs && \
130130

131131
RUN python3 --version && python3 -m pip install numpy==1.21.0 --no-cache-dir
132132

133-
ARG INSTALL_DRIVER_VERSION="23.22.26516"
133+
ARG INSTALL_DRIVER_VERSION="24.52.32224"
134134
# GPU testing in build img & remote tensors dependencies
135135
WORKDIR /usr/lib64/
136136
RUN ln -s libOpenCL.so.1 libOpenCL.so
@@ -158,7 +158,7 @@ RUN dnf install -y https://github.com/linux-test-project/lcov/releases/download
158158

159159
ENV TF_SYSTEM_LIBS="curl"
160160
ENV TEST_LOG="/root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/bazel-out/test.log"
161-
ARG ov_source_branch=master
161+
ARG ov_source_branch=c01cd93e24d1cd78bfbb401eed51c08fb93e0816
162162
ARG ov_contrib_branch=master
163163
ARG ov_source_org=openvinotoolkit
164164
ARG ov_contrib_org=openvinotoolkit
@@ -221,7 +221,7 @@ ENV OpenVINO_DIR=/opt/intel/openvino/runtime/cmake
221221
ENV OPENVINO_TOKENIZERS_PATH_GENAI=/opt/intel/openvino/runtime/lib/intel64/libopenvino_tokenizers.so
222222
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/intel/openvino/runtime/lib/intel64/:/opt/opencv/lib/:/opt/intel/openvino/runtime/3rdparty/tbb/lib/
223223

224-
ARG ov_tokenizers_branch=master
224+
ARG ov_tokenizers_branch=85be884a69f10270703f81f970a5ee596a4c8df7
225225
# hadolint ignore=DL3003
226226
RUN git clone https://github.com/openvinotoolkit/openvino_tokenizers.git /openvino_tokenizers && cd /openvino_tokenizers && git checkout $ov_tokenizers_branch && git submodule update --init --recursive
227227
WORKDIR /openvino_tokenizers/build
@@ -364,7 +364,7 @@ LABEL "summary"="OpenVINO(TM) Model Server"
364364
LABEL "description"="OpenVINO(TM) Model Server is a solution for serving AI models"
365365
LABEL "maintainer"="[email protected]"
366366
ARG INSTALL_RPMS_FROM_URL=
367-
ARG INSTALL_DRIVER_VERSION="23.22.26516"
367+
ARG INSTALL_DRIVER_VERSION="24.52.32224"
368368
ARG GPU=0
369369
ARG debug_bazel_flags=
370370
LABEL bazel-build-flags=${debug_bazel_flags}

README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,22 +11,23 @@ Model Server hosts models and makes them accessible to software components over
1111

1212
![OVMS diagram](docs/ovms_diagram.png)
1313

14-
OpenVINO&trade; Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as [TensorFlow Serving](https://github.com/tensorflow/serving) and [KServe](https://github.com/kserve/kserve) while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
15-
16-
In addition, there are included endpoints for generative use cases compatible with [OpenAI API and Cohere API](./docs/clients_genai.md).
14+
OpenVINO&trade; Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the [generative API](https://docs.openvino.ai/2025/model-server/ovms_docs_clients_genai.html) like OpenAI and Cohere, [KServe](https://docs.openvino.ai/2025/model-server/ovms_docs_clients_kfs.html) and [TensorFlow Serving](https://docs.openvino.ai/2025/model-server/ovms_docs_clients_tfs.html) and while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
1715

1816
![OVMS picture](docs/ovms_high_level.png)
1917

20-
The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to [Preparing Model Repository](docs/models_repository.md) documentation. Model server works inside [Docker containers](docs/deploying_server.md#deploying-model-server-in-docker-container), on [Bare Metal](docs/deploying_server.md#deploying-model-server-on-baremetal-without-container), and in [Kubernetes environment](docs/deploying_server.md#deploying-model-server-in-kubernetes).
21-
Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](docs/ovms_quickstart.md) or [LLM QuickStart guide](./docs/llm/quickstart.md).
18+
The models used by the server can be stored locally, hosted remotely by object storage services or pulled from HuggingFace Hub. For more details, refer to [Preparing Model Repository](https://docs.openvino.ai/2025/model-server/ovms_docs_models_repository.html) and [Deployment](https://docs.openvino.ai/2025/model-server/ovms_docs_deploying_server.html) documentation.
19+
Model server works inside Docker containers, Bare Metal and in Kubernetes environment.
20+
21+
Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](https://docs.openvino.ai/2025/model-server/ovms_docs_quick_start_guide.html) or [LLM QuickStart guide](https://docs.openvino.ai/2025/model-server/ovms_docs_llm_quickstart.html).
2222

2323
Read [release notes](https://github.com/openvinotoolkit/model_server/releases) to find out what’s new.
2424

2525
### Key features:
26-
- **[NEW]** Native Windows support. Check updated [deployment guide](./docs/deploying_server.md)
27-
- **[NEW]** [Text Embeddings compatible with OpenAI API](demos/embeddings/README.md)
28-
- **[NEW]** [Reranking compatible with Cohere API](demos/rerank/README.md)
29-
- **[NEW]** [Efficient Text Generation via OpenAI API](demos/continuous_batching/README.md)
26+
- **[NEW]** [Image generation compatible with OpenAI API](https://docs.openvino.ai/2025/model-server/ovms_demos_image_generation.html)
27+
- **[NEW]** Native Windows support. Check updated [deployment guide](https://docs.openvino.ai/2025/model-server/ovms_docs_deploying_server_baremetal.html)
28+
- **[NEW]** [Text Embeddings compatible with OpenAI API](https://docs.openvino.ai/2025/model-server/ovms_demos_embeddings.html)
29+
- **[NEW]** [Reranking compatible with Cohere API](https://docs.openvino.ai/2025/model-server/ovms_demos_rerank.html)
30+
- **[NEW]** [Efficient Text Generation via OpenAI API](https://docs.openvino.ai/2025/model-server/ovms_demos_continuous_batching.html)
3031
- [Python code execution](docs/python_support/reference.md)
3132
- [gRPC streaming](docs/streaming_endpoints.md)
3233
- [MediaPipe graphs serving](docs/mediapipe.md)

demos/continuous_batching/agentic_ai/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-for
4040

4141
:::{tab-item} GPU
4242
```console
43-
python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-format int4 --config_file_path models/config.json --model_repository_path models --tools_model_type qwen3 --target_device GPU --enable_prefix_caching
43+
python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-format int8 --config_file_path models/config.json --model_repository_path models --tools_model_type qwen3 --target_device GPU --enable_prefix_caching --cache_size 2
4444
```
4545
:::
4646

demos/continuous_batching/agentic_ai/openai_agent.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ async def run(query, agent, OVMS_MODEL_PROVIDER):
4545
await weather_server.connect()
4646
print(f"\n\nRunning: {query}")
4747
result = await Runner.run(starting_agent=agent, input=query, run_config=RunConfig(model_provider=OVMS_MODEL_PROVIDER, tracing_disabled=True))
48-
print(result.raw_responses, dir(result))
4948
print(result.final_output)
5049

5150

0 commit comments

Comments
 (0)