You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/ref-supported-runtimes.adoc
+3-68Lines changed: 3 additions & 68 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,11 @@
4
4
= Supported model-serving runtimes
5
5
6
6
[role='_abstract']
7
+
7
8
{productname-short} includes several preinstalled model-serving runtimes. You can use preinstalled model-serving runtimes to start serving models without modifying or defining the runtime yourself. You can also add a custom runtime to support a model.
8
9
10
+
See link:https://access.redhat.com/articles/rhoai-supported-configs[Supported configurations] for a list of the supported model-serving runtimes and deployment requirements.
11
+
9
12
ifdef::upstream[]
10
13
For help adding a custom runtime, see link:{odhdocshome}/serving-models/#adding-a-custom-model-serving-runtime-for-the-single-model-serving-platform_serving-large-models[Adding a custom model-serving runtime for the single-model serving platform].
11
14
endif::[]
@@ -14,74 +17,6 @@ ifndef::upstream[]
14
17
For help adding a custom runtime, see link:{rhoaidocshome}{default-format-url}/serving_models/serving-large-models_serving-large-models#adding-a-custom-model-serving-runtime-for-the-single-model-serving-platform_serving-large-models[Adding a custom model-serving runtime for the single-model serving platform].
15
18
endif::[]
16
19
17
-
.Model-serving runtimes
18
-
19
-
|===
20
-
| Name | Description | Exported model format
21
-
22
-
| Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe (1)| A composite runtime for serving models in the Caikit format | Caikit Text Generation
23
-
24
-
| Caikit Standalone ServingRuntime for KServe (2) | A runtime for serving models in the Caikit embeddings format for embeddings tasks | Caikit Embeddings
25
-
26
-
| OpenVINO Model Server | A scalable, high-performance runtime for serving models that are optimized for Intel architectures | PyTorch, TensorFlow, OpenVINO IR, PaddlePaddle, MXNet, Caffe, Kaldi
27
-
28
-
| [Deprecated] Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe (3) | A runtime for serving TGI-enabled models | PyTorch Model Formats
29
-
30
-
| vLLM NVIDIA GPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime for large language models that supports NVIDIA GPU accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
31
-
32
-
| vLLM Intel Gaudi Accelerator ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports Intel Gaudi accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
33
-
34
-
| vLLM AMD GPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports AMD GPU accelerators| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
35
-
36
-
| vLLM CPU ServingRuntime for KServe | A high-throughput and memory-efficient inference and serving runtime that supports IBM Power (ppc64le) and IBM Z (s390x).| link:https://docs.vllm.ai/en/latest/models/supported_models.html[Supported models^]
37
-
38
-
|===
39
-
40
-
ifdef::upstream[]
41
-
42
-
. The composite Caikit-TGIS runtime is based on link:https://github.com/opendatahub-io/caikit[Caikit^] and link:https://github.com/IBM/text-generation-inference[Text Generation Inference Server (TGIS)^]. To use this runtime, you must convert your models to Caikit format. For an example, see link:https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md#bootstrap-process[Converting Hugging Face Hub models to Caikit format^] in the link:https://github.com/opendatahub-io/caikit-tgis-serving/tree/main[caikit-tgis-serving^] repository.
43
-
44
-
. The Caikit Standalone runtime is based on link:https://github.com/caikit/caikit-nlp/tree/main[Caikit NLP^]. To use this runtime, you must convert your models to the Caikit embeddings format. For an example, see link:https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_embedding/test_embedding.py[Tests for text embedding module^].
45
-
46
-
. The *Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe* is deprecated. For more information, see link:{rhoaidocshome}{default-format-url}/release_notes/index[{productname-long} release notes].
47
-
48
-
endif::[]
49
-
50
-
ifndef::upstream[]
51
-
52
-
. The composite Caikit-TGIS runtime is based on link:https://github.com/opendatahub-io/caikit[Caikit^] and link:https://github.com/IBM/text-generation-inference[Text Generation Inference Server (TGIS)^]. To use this runtime, you must convert your models to Caikit format. For an example, see link:https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/built-tip.md#bootstrap-process[Converting Hugging Face Hub models to Caikit format^] in the link:https://github.com/opendatahub-io/caikit-tgis-serving/tree/main[caikit-tgis-serving^] repository.
53
-
54
-
. The Caikit Standalone runtime is based on link:https://github.com/caikit/caikit-nlp/tree/main[Caikit NLP^]. To use this runtime, you must convert your models to the Caikit embeddings format. For an example, see link:https://github.com/caikit/caikit-nlp/blob/main/tests/modules/text_embedding/test_embedding.py[Tests for text embedding module^].
55
-
56
-
. The *Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe* is deprecated. For more information, see link:{rhoaidocshome}{default-format-url}/release_notes/index[{productname-long} release notes].
57
-
58
-
endif::[]
59
-
60
-
.Deployment requirements
61
-
62
-
|===
63
-
| Name | Default protocol | Additonal protocol | Model mesh support | Single node OpenShift support | Deployment mode
64
-
65
-
| Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe | REST | gRPC | No | Yes | Raw and serverless
66
-
67
-
| Caikit Standalone ServingRuntime for KServe | REST | gRPC | No | Yes | Raw and serverless
68
-
69
-
| OpenVINO Model Server | REST | None | Yes | Yes | Raw and serverless
70
-
71
-
| [Deprecated] Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe | gRPC | None | No | Yes | Raw and serverless
72
-
73
-
| vLLM NVIDIA GPU ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
74
-
75
-
| vLLM Intel Gaudi Accelerator ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
76
-
77
-
| vLLM AMD GPU ServingRuntime for KServe | REST | None | No | Yes | Raw and serverless
78
-
79
-
| vLLM CPU ServingRuntime for KServe[1] | REST | None | No | Yes | Raw
80
-
81
-
|===
82
-
83
-
footnote:[vLLM CPU ServingRuntime for KServe] If you are using IBM Z and IBM Power architecture, you can only deploy models in standard deployment mode.
Copy file name to clipboardExpand all lines: modules/ref-tested-verified-runtimes.adoc
+2-26Lines changed: 2 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,8 @@ Tested and verified runtimes are community versions of model-serving runtimes th
9
9
10
10
{org-name} tests the current version of a tested and verified runtime each time there is a new version of {productname-short}. If a new version of a tested and verified runtime is released in the middle of an {productname-short} release cycle, it will be tested and verified in an upcoming release.
11
11
12
+
See link:https://access.redhat.com/articles/rhoai-supported-configs[Supported configurations] for a list of tested and verified runtimes in {productname-short}.
13
+
12
14
[NOTE]
13
15
--
14
16
Tested and verified runtimes are not directly supported by {org-name}. You are responsible for ensuring that you are licensed to use any tested and verified runtimes that you add, and for correctly configuring and maintaining them.
@@ -18,33 +20,7 @@ ifndef::upstream[]
18
20
For more information, see link:https://access.redhat.com/articles/7089743[Tested and verified runtimes in {productname-short}].
19
21
endif::[]
20
22
21
-
.Model-serving runtimes
22
-
23
-
|===
24
-
| Name | Description | Exported model format
25
-
26
-
| NVIDIA Triton Inference Server | An open-source inference-serving software for fast and scalable AI in applications. | TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more
27
-
| Seldon MLServer | An open-source inference server designed to simplify the deployment of machine learning models. | Scikit-Learn (sklearn), XGBoost, LightGBM, CatBoost, HuggingFace and MLflow
28
-
29
-
|===
30
-
31
-
.Deployment requirements
32
-
33
-
|===
34
-
| Name | Default protocol | Additional protocol | Model mesh support | Single node OpenShift support | Deployment mode
35
-
36
-
| NVIDIA Triton Inference Server | gRPC | REST | Yes | Yes | Raw and serverless
37
-
| Seldon MLServer | gRPC | REST | No | Yes | Raw and serverless
38
-
39
-
|===
40
-
41
-
42
-
[NOTE]
43
-
--
44
-
The `alibi-detect` and `alibi-explain` libraries from Seldon are under the Business Source License 1.1 (BSL 1.1). These libraries are not tested, verified, or supported by {org-name} as part of the certified *Seldon MLServer* runtime. It is not recommended that you use these libraries in production environments with the runtime.
0 commit comments