Skip to content

Commit 106e1e2

Browse files
authored
odh-31309 move model-serving runtime ref info to kbase article (#902)
1 parent 45de92a commit 106e1e2

File tree

3 files changed

+7
-96
lines changed

3 files changed

+7
-96
lines changed

modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -365,8 +365,8 @@ The *Serving runtimes* page opens and shows the updated list of runtimes that ar
365365
[role='_additional-resources']
366366
.Additional resources
367367
ifndef::upstream[]
368-
* link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#tested-and-verified-model-serving-runtimes_serving-large-models[Tested and verified model-serving runtimes]
369-
endif::[]
368+
* link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#tested-verified-runtimes_serving-large-models[Tested and verified model-serving runtimes]
369+
endif::[]
370370
ifdef::upstream[]
371371
* link:{odhdocshome}/serving-models/#tested-verified-runtimes_serving-large-models[Tested and verified model-serving runtimes]
372372
endif::[]

modules/ref-supported-runtimes.adoc

Lines changed: 3 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,11 @@
44
= Supported model-serving runtimes
55

66
[role='_abstract']
7+
78
{productname-short} includes several preinstalled model-serving runtimes. You can use preinstalled model-serving runtimes to start serving models without modifying or defining the runtime yourself. You can also add a custom runtime to support a model.
89

10+
See link:https://access.redhat.com/articles/rhoai-supported-configs[Supported configurations] for a list of the supported model-serving runtimes and deployment requirements.
11+
912
ifdef::upstream[]
1013
For help adding a custom runtime, see link:{odhdocshome}/serving-models/#adding-a-custom-model-serving-runtime-for-the-single-model-serving-platform_serving-large-models[Adding a custom model-serving runtime for the single-model serving platform].
1114
endif::[]
@@ -14,74 +17,6 @@ ifndef::upstream[]
1417
For help adding a custom runtime, see link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#adding-a-custom-model-serving-runtime-for-the-single-model-serving-platform_serving-large-models[Adding a custom model-serving runtime for the single-model serving platform].
1518
endif::[]
1619

17-
.Model-serving runtimes
18-
19-
|===
20-
| Name | Description | Exported model format
21-
22-
| Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe (1)| A composite runtime for serving models in the Caikit format | Caikit Text Generation
23-
24-
| Caikit Standalone ServingRuntime for KServe (2) | A runtime for serving models in the Caikit embeddings format for embeddings tasks | Caikit Embeddings
25-
26-
| OpenVINO Model Server | A scalable, high-performance runtime for serving models that are optimized for Intel architectures | PyTorch, TensorFlow, OpenVINO IR, PaddlePaddle, MXNet, Caffe, Kaldi
27-
28-
| [Deprecated] Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe (3) | A runtime for serving TGI-enabled models | PyTorch Model Formats
29-
30-
| vLLM NVIDIA GPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime for large language models that supports NVIDIA GPU accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
31-
32-
| vLLM Intel Gaudi Accelerator ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports Intel Gaudi accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
33-
34-
| vLLM AMD GPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports AMD GPU accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
35-
36-
| vLLM CPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports IBM Power (ppc64le) and IBM Z (s390x).| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
37-
38-
|===
39-
40-
ifdef::upstream[]
41-
42-
. The composite Caikit-TGIS runtime is based on link:https://github.com/opendatahub-io/caikit[Caikit^] and link:https://github.com/IBM/text-generation-inference[Text Generation Inference Server (TGIS)^]. To use this runtime, you must convert your models to Caikit format. For an example, see link:https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md#bootstrap-process[Converting Hugging Face Hub models to Caikit format^] in the link:https://github.com/opendatahub-io/caikit-tgis-serving/tree/main[caikit-tgis-serving^] repository.
43-
44-
. The Caikit Standalone runtime is based on link:https://github.com/caikit/caikit-nlp/tree/main[Caikit NLP^]. To use this runtime, you must convert your models to the Caikit embeddings format. For an example, see link:https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_embedding/test_embedding.py[Tests for text embedding module^].
45-
46-
. The *Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe* is deprecated. For more information, see link:{rhoaidocshome}{default-format-url}/release_notes/index[{productname-long} release notes].
47-
48-
endif::[]
49-
50-
ifndef::upstream[]
51-
52-
. The composite Caikit-TGIS runtime is based on link:https://github.com/opendatahub-io/caikit[Caikit^] and link:https://github.com/IBM/text-generation-inference[Text Generation Inference Server (TGIS)^]. To use this runtime, you must convert your models to Caikit format. For an example, see link:https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md#bootstrap-process[Converting Hugging Face Hub models to Caikit format^] in the link:https://github.com/opendatahub-io/caikit-tgis-serving/tree/main[caikit-tgis-serving^] repository.
53-
54-
. The Caikit Standalone runtime is based on link:https://github.com/caikit/caikit-nlp/tree/main[Caikit NLP^]. To use this runtime, you must convert your models to the Caikit embeddings format. For an example, see link:https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_embedding/test_embedding.py[Tests for text embedding module^].
55-
56-
. The *Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe* is deprecated. For more information, see link:{rhoaidocshome}{default-format-url}/release_notes/index[{productname-long} release notes].
57-
58-
endif::[]
59-
60-
.Deployment requirements
61-
62-
|===
63-
| Name | Default protocol | Additonal protocol | Model mesh support | Single node OpenShift support | Deployment mode
64-
65-
| Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe | REST | gRPC | No | Yes | Raw and serverless
66-
67-
| Caikit Standalone ServingRuntime for KServe | REST | gRPC | No | Yes | Raw and serverless
68-
69-
| OpenVINO Model Server | REST | None | Yes | Yes | Raw and serverless
70-
71-
| [Deprecated] Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe | gRPC | None | No | Yes | Raw and serverless
72-
73-
| vLLM NVIDIA GPU ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
74-
75-
| vLLM Intel Gaudi Accelerator ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
76-
77-
| vLLM AMD GPU ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
78-
79-
| vLLM CPU ServingRuntime for KServe[1] | REST | None | No | Yes | Raw
80-
81-
|===
82-
83-
footnote:[vLLM CPU ServingRuntime for KServe] If you are using IBM Z and IBM Power architecture, you can only deploy models in standard deployment mode.
84-
8520
[role="_additional-resources"]
8621
.Additional resources
8722
ifdef::upstream[]

modules/ref-tested-verified-runtimes.adoc

Lines changed: 2 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ Tested and verified runtimes are community versions of model-serving runtimes th
99

1010
{org-name} tests the current version of a tested and verified runtime each time there is a new version of {productname-short}. If a new version of a tested and verified runtime is released in the middle of an {productname-short} release cycle, it will be tested and verified in an upcoming release.
1111

12+
See link:https://access.redhat.com/articles/rhoai-supported-configs[Supported configurations] for a list of tested and verified runtimes in {productname-short}.
13+
1214
[NOTE]
1315
--
1416
Tested and verified runtimes are not directly supported by {org-name}. You are responsible for ensuring that you are licensed to use any tested and verified runtimes that you add, and for correctly configuring and maintaining them.
@@ -18,33 +20,7 @@ ifndef::upstream[]
1820
For more information, see link:https://access.redhat.com/articles/7089743[Tested and verified runtimes in {productname-short}].
1921
endif::[]
2022

21-
.Model-serving runtimes
22-
23-
|===
24-
| Name | Description | Exported model format
25-
26-
| NVIDIA Triton Inference Server | An open-source inference-serving software for fast and scalable AI in applications. | TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more
27-
| Seldon MLServer | An open-source inference server designed to simplify the deployment of machine learning models. | Scikit-Learn (sklearn), XGBoost, LightGBM, CatBoost, HuggingFace and MLflow
28-
29-
|===
30-
31-
.Deployment requirements
32-
33-
|===
34-
| Name | Default protocol | Additional protocol | Model mesh support | Single node OpenShift support | Deployment mode
35-
36-
| NVIDIA Triton Inference Server | gRPC | REST | Yes | Yes | Raw and serverless
37-
| Seldon MLServer | gRPC | REST | No | Yes | Raw and serverless
38-
39-
|===
40-
41-
42-
[NOTE]
43-
--
44-
The `alibi-detect` and `alibi-explain` libraries from Seldon are under the Business Source License 1.1 (BSL 1.1). These libraries are not tested, verified, or supported by {org-name} as part of the certified *Seldon MLServer* runtime. It is not recommended that you use these libraries in production environments with the runtime.
45-
--
4623

47-
[role="_additional-resources"]
4824
.Additional resources
4925
ifdef::upstream[]
5026
* link:{odhdocshome}/serving-models/#inference-endpoints_serving-large-models[Inference endpoints]

0 commit comments

Comments
 (0)