updates in demos and readme (#3408)

dtrawins · ngrozae · web-flow · commit 16e7fa4694b6 · 2025-06-17T13:41:52.000+02:00
---------

Co-authored-by: ngrozae &lt;104074686+ngrozae@users.noreply.github.com&gt;
diff --git a/Dockerfile.redhat b/Dockerfile.redhat
@@ -14,13 +14,13 @@
 # limitations under the License.
 #
 
-ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.5
+ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.6
 ARG BUILD_IMAGE=build
 ARG PKG_IMAGE=pkg
-ARG RELEASE_BASE_IMAGE=registry.access.redhat.com/ubi9/ubi-minimal:9.5
+ARG RELEASE_BASE_IMAGE=registry.access.redhat.com/ubi9/ubi-minimal:9.6
 
 FROM $BASE_IMAGE as base_build
-ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.5
+ARG BASE_IMAGE=registry.access.redhat.com/ubi9/ubi:9.6
 
 SHELL ["/bin/bash", "-xo", "pipefail", "-c"]
 
@@ -130,7 +130,7 @@ git clone https://github.com/git-lfs/git-lfs && \
 
 RUN python3 --version && python3 -m pip install numpy==1.21.0 --no-cache-dir
 
-ARG INSTALL_DRIVER_VERSION="23.22.26516"
+ARG INSTALL_DRIVER_VERSION="24.52.32224"
 # GPU testing in build img & remote tensors dependencies
 WORKDIR /usr/lib64/
 RUN ln -s libOpenCL.so.1 libOpenCL.so
@@ -158,7 +158,7 @@ RUN  dnf install -y https://github.com/linux-test-project/lcov/releases/download
 
 ENV TF_SYSTEM_LIBS="curl"
 ENV TEST_LOG="/root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/bazel-out/test.log"
-ARG ov_source_branch=master
+ARG ov_source_branch=c01cd93e24d1cd78bfbb401eed51c08fb93e0816
 ARG ov_contrib_branch=master
 ARG ov_source_org=openvinotoolkit
 ARG ov_contrib_org=openvinotoolkit
@@ -221,7 +221,7 @@ ENV OpenVINO_DIR=/opt/intel/openvino/runtime/cmake
 ENV OPENVINO_TOKENIZERS_PATH_GENAI=/opt/intel/openvino/runtime/lib/intel64/libopenvino_tokenizers.so
 ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/intel/openvino/runtime/lib/intel64/:/opt/opencv/lib/:/opt/intel/openvino/runtime/3rdparty/tbb/lib/
 
-ARG ov_tokenizers_branch=master
+ARG ov_tokenizers_branch=85be884a69f10270703f81f970a5ee596a4c8df7
 # hadolint ignore=DL3003
 RUN git clone https://github.com/openvinotoolkit/openvino_tokenizers.git /openvino_tokenizers && cd /openvino_tokenizers && git checkout $ov_tokenizers_branch && git submodule update --init --recursive
 WORKDIR /openvino_tokenizers/build
@@ -364,7 +364,7 @@ LABEL "summary"="OpenVINO(TM) Model Server"
 LABEL "description"="OpenVINO(TM) Model Server is a solution for serving AI models"
 LABEL "maintainer"="dariusz.trawinski@intel.com"
 ARG INSTALL_RPMS_FROM_URL=
-ARG INSTALL_DRIVER_VERSION="23.22.26516"
+ARG INSTALL_DRIVER_VERSION="24.52.32224"
 ARG GPU=0
 ARG debug_bazel_flags=
 LABEL bazel-build-flags=${debug_bazel_flags}
diff --git a/README.md b/README.md
@@ -11,22 +11,23 @@ Model Server hosts models and makes them accessible to software components over
 
 ![OVMS diagram](docs/ovms_diagram.png)
 
-OpenVINO&trade; Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as [TensorFlow Serving](https://github.com/tensorflow/serving) and [KServe](https://github.com/kserve/kserve) while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
-
-In addition, there are included endpoints for generative use cases compatible with [OpenAI API and Cohere API](./docs/clients_genai.md).
+OpenVINO&trade; Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the [generative API](https://docs.openvino.ai/2025/model-server/ovms_docs_clients_genai.html) like OpenAI and Cohere, [KServe](https://docs.openvino.ai/2025/model-server/ovms_docs_clients_kfs.html) and [TensorFlow Serving](https://docs.openvino.ai/2025/model-server/ovms_docs_clients_tfs.html) and while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
 
 ![OVMS picture](docs/ovms_high_level.png)
 
-The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to [Preparing Model Repository](docs/models_repository.md) documentation. Model server works inside [Docker containers](docs/deploying_server.md#deploying-model-server-in-docker-container), on [Bare Metal](docs/deploying_server.md#deploying-model-server-on-baremetal-without-container), and in [Kubernetes environment](docs/deploying_server.md#deploying-model-server-in-kubernetes).
-Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](docs/ovms_quickstart.md) or [LLM QuickStart guide](./docs/llm/quickstart.md).
+The models used by the server can be stored locally, hosted remotely by object storage services or pulled from HuggingFace Hub. For more details, refer to [Preparing Model Repository](https://docs.openvino.ai/2025/model-server/ovms_docs_models_repository.html) and [Deployment](https://docs.openvino.ai/2025/model-server/ovms_docs_deploying_server.html) documentation.
+Model server works inside Docker containers, Bare Metal and in Kubernetes environment.
+
+Start using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](https://docs.openvino.ai/2025/model-server/ovms_docs_quick_start_guide.html) or [LLM QuickStart guide](https://docs.openvino.ai/2025/model-server/ovms_docs_llm_quickstart.html).
 
 Read [release notes](https://github.com/openvinotoolkit/model_server/releases) to find out what’s new.
 
 ### Key features:
-- **[NEW]** Native Windows support. Check updated [deployment guide](./docs/deploying_server.md)
-- **[NEW]** [Text Embeddings compatible with OpenAI API](demos/embeddings/README.md)
-- **[NEW]** [Reranking compatible with Cohere API](demos/rerank/README.md)
-- **[NEW]** [Efficient Text Generation via OpenAI API](demos/continuous_batching/README.md)
+- **[NEW]** [Image generation compatible with OpenAI API](https://docs.openvino.ai/2025/model-server/ovms_demos_image_generation.html)
+- **[NEW]** Native Windows support. Check updated [deployment guide](https://docs.openvino.ai/2025/model-server/ovms_docs_deploying_server_baremetal.html)
+- **[NEW]** [Text Embeddings compatible with OpenAI API](https://docs.openvino.ai/2025/model-server/ovms_demos_embeddings.html)
+- **[NEW]** [Reranking compatible with Cohere API](https://docs.openvino.ai/2025/model-server/ovms_demos_rerank.html)
+- **[NEW]** [Efficient Text Generation via OpenAI API](https://docs.openvino.ai/2025/model-server/ovms_demos_continuous_batching.html)
 - [Python code execution](docs/python_support/reference.md)
 - [gRPC streaming](docs/streaming_endpoints.md)
 - [MediaPipe graphs serving](docs/mediapipe.md)
diff --git a/demos/continuous_batching/agentic_ai/README.md b/demos/continuous_batching/agentic_ai/README.md
@@ -40,7 +40,7 @@ python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-for
 
 :::{tab-item} GPU
 ```console
-python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-format int4 --config_file_path models/config.json --model_repository_path models --tools_model_type qwen3 --target_device GPU --enable_prefix_caching
+python export_model.py text_generation --source_model Qwen/Qwen3-8B --weight-format int8 --config_file_path models/config.json --model_repository_path models --tools_model_type qwen3 --target_device GPU --enable_prefix_caching --cache_size 2
 ```
 :::
 
diff --git a/demos/continuous_batching/agentic_ai/openai_agent.py b/demos/continuous_batching/agentic_ai/openai_agent.py
@@ -45,7 +45,6 @@ async def run(query, agent, OVMS_MODEL_PROVIDER):
     await weather_server.connect()
     print(f"\n\nRunning: {query}")
     result = await Runner.run(starting_agent=agent, input=query, run_config=RunConfig(model_provider=OVMS_MODEL_PROVIDER, tracing_disabled=True))
-    print(result.raw_responses, dir(result))
     print(result.final_output)
 
 
diff --git a/demos/continuous_batching/vlm/README.md b/demos/continuous_batching/vlm/README.md
diff --git a/docs/home.md b/docs/home.md