odh-31309 move model-serving runtime ref info to kbase article (#902)

MelissaFlinn · web-flow · commit 106e1e267be6 · 2025-08-08T11:23:11.000-04:00
diff --git a/modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc b/modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc
@@ -365,8 +365,8 @@ The *Serving runtimes* page opens and shows the updated list of runtimes that ar
 [role='_additional-resources']
 .Additional resources
 ifndef::upstream[]
-* link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#tested-and-verified-model-serving-runtimes_serving-large-models[Tested and verified model-serving runtimes]
-endif::[]
+* link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#tested-verified-runtimes_serving-large-models[Tested and verified model-serving runtimes]
+endif::[] 
 ifdef::upstream[]
 * link:{odhdocshome}/serving-models/#tested-verified-runtimes_serving-large-models[Tested and verified model-serving runtimes]
 endif::[]
diff --git a/modules/ref-supported-runtimes.adoc b/modules/ref-supported-runtimes.adoc
@@ -4,8 +4,11 @@
 = Supported model-serving runtimes
 
 [role='_abstract']
+
 {productname-short} includes several preinstalled model-serving runtimes. You can use preinstalled model-serving runtimes to start serving models without modifying or defining the runtime yourself. You can also add a custom runtime to support a model. 
 
+See link:https://access.redhat.com/articles/rhoai-supported-configs[Supported configurations] for a list of the supported model-serving runtimes and deployment requirements.
+
 ifdef::upstream[]
 For help adding a custom runtime, see link:{odhdocshome}/serving-models/#adding-a-custom-model-serving-runtime-for-the-single-model-serving-platform_serving-large-models[Adding a custom model-serving runtime for the single-model serving platform].
 endif::[]
@@ -14,74 +17,6 @@ ifndef::upstream[]
 For help adding a custom runtime, see link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#adding-a-custom-model-serving-runtime-for-the-single-model-serving-platform_serving-large-models[Adding a custom model-serving runtime for the single-model serving platform].
 endif::[]
 
-.Model-serving runtimes
-
-|===
-| Name | Description | Exported model format 
-
-| Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe (1)| A composite runtime for serving models in the Caikit format | Caikit Text Generation 
-
-| Caikit Standalone ServingRuntime for KServe (2) | A runtime for serving models in the Caikit embeddings format for embeddings tasks | Caikit Embeddings
-
-| OpenVINO Model Server | A scalable, high-performance runtime for serving models that are optimized for Intel architectures | PyTorch, TensorFlow, OpenVINO IR, PaddlePaddle, MXNet, Caffe, Kaldi 
-
-| [Deprecated] Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe (3) |  A runtime for serving TGI-enabled models | PyTorch Model Formats
-
-| vLLM NVIDIA GPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime for large language models that supports NVIDIA GPU accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
-
-| vLLM Intel Gaudi Accelerator ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports Intel Gaudi accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
-
-| vLLM AMD GPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports AMD GPU accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
-
-| vLLM CPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports IBM Power (ppc64le) and IBM Z (s390x).| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
-
-|===
-
-ifdef::upstream[]
-
-. The composite Caikit-TGIS runtime is based on link:https://github.com/opendatahub-io/caikit[Caikit^] and link:https://github.com/IBM/text-generation-inference[Text Generation Inference Server (TGIS)^]. To use this runtime, you must convert your models to Caikit format. For an example, see link:https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md#bootstrap-process[Converting Hugging Face Hub models to Caikit format^] in the link:https://github.com/opendatahub-io/caikit-tgis-serving/tree/main[caikit-tgis-serving^] repository.
-
-. The Caikit Standalone runtime is based on link:https://github.com/caikit/caikit-nlp/tree/main[Caikit NLP^]. To use this runtime, you must convert your models to the Caikit embeddings format. For an example, see link:https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_embedding/test_embedding.py[Tests for text embedding module^].
-
-. The *Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe* is deprecated. For more information, see link:{rhoaidocshome}{default-format-url}/release_notes/index[{productname-long} release notes].
-
-endif::[]
-
-ifndef::upstream[]
-
-. The composite Caikit-TGIS runtime is based on link:https://github.com/opendatahub-io/caikit[Caikit^] and link:https://github.com/IBM/text-generation-inference[Text Generation Inference Server (TGIS)^]. To use this runtime, you must convert your models to Caikit format. For an example, see link:https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md#bootstrap-process[Converting Hugging Face Hub models to Caikit format^] in the link:https://github.com/opendatahub-io/caikit-tgis-serving/tree/main[caikit-tgis-serving^] repository.
-
-. The Caikit Standalone runtime is based on link:https://github.com/caikit/caikit-nlp/tree/main[Caikit NLP^]. To use this runtime, you must convert your models to the Caikit embeddings format. For an example, see link:https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_embedding/test_embedding.py[Tests for text embedding module^].
-
-. The *Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe* is deprecated. For more information, see link:{rhoaidocshome}{default-format-url}/release_notes/index[{productname-long} release notes].
-
-endif::[]
-
-.Deployment requirements
-
-|===
-| Name | Default protocol | Additonal protocol | Model mesh support | Single node OpenShift support | Deployment mode
-
-| Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe | REST | gRPC | No | Yes | Raw and serverless
-
-| Caikit Standalone ServingRuntime for KServe | REST | gRPC | No | Yes | Raw and serverless 
-
-| OpenVINO Model Server | REST | None | Yes | Yes | Raw and serverless 
-
-| [Deprecated] Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe | gRPC | None | No | Yes | Raw and serverless
-
-| vLLM NVIDIA GPU ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
-
-| vLLM Intel Gaudi Accelerator ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless 
-
-| vLLM AMD GPU ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
-
-| vLLM CPU ServingRuntime for KServe[1] | REST | None | No | Yes | Raw
-
-|===
-
-footnote:[vLLM CPU ServingRuntime for KServe] If you are using IBM Z and IBM Power architecture, you can only deploy models in standard deployment mode.
-
 [role="_additional-resources"]
 .Additional resources
 ifdef::upstream[]
diff --git a/modules/ref-tested-verified-runtimes.adoc b/modules/ref-tested-verified-runtimes.adoc
@@ -9,6 +9,8 @@ Tested and verified runtimes are community versions of model-serving runtimes th
 
 {org-name} tests the current version of a tested and verified runtime each time there is a new version of {productname-short}. If a new version of a tested and verified runtime is released in the middle of an {productname-short} release cycle, it will be tested and verified in an upcoming release.
 
+See link:https://access.redhat.com/articles/rhoai-supported-configs[Supported configurations] for a list of tested and verified runtimes in {productname-short}.
+
 [NOTE]
 --
 Tested and verified runtimes are not directly supported by {org-name}. You are responsible for ensuring that you are licensed to use any tested and verified runtimes that you add, and for correctly configuring and maintaining them.
@@ -18,33 +20,7 @@ ifndef::upstream[]
 For more information, see link:https://access.redhat.com/articles/7089743[Tested and verified runtimes in {productname-short}].
 endif::[]
 
-.Model-serving runtimes
-
-|===
-| Name | Description | Exported model format 
-
-| NVIDIA Triton Inference Server | An open-source inference-serving software for fast and scalable AI in applications. | TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more
-| Seldon MLServer | An open-source inference server designed to simplify the deployment of machine learning models. | Scikit-Learn (sklearn), XGBoost, LightGBM, CatBoost, HuggingFace and MLflow
-
-|===
-
-.Deployment requirements
-
-|===
-| Name | Default protocol | Additional protocol | Model mesh support | Single node OpenShift support | Deployment mode
-
-| NVIDIA Triton Inference Server | gRPC | REST | Yes | Yes | Raw and serverless
-| Seldon MLServer | gRPC | REST | No | Yes | Raw and serverless
-
-|===
-
-
-[NOTE]
---
-The `alibi-detect` and `alibi-explain` libraries from Seldon are under the Business Source License 1.1 (BSL 1.1). These libraries are not tested, verified, or supported by {org-name} as part of the certified *Seldon MLServer* runtime. It is not recommended that you use these libraries in production environments with the runtime.
---
 
-[role="_additional-resources"]
 .Additional resources
 ifdef::upstream[]
 * link:{odhdocshome}/serving-models/#inference-endpoints_serving-large-models[Inference endpoints]