2828
2929[ ![ License] ( https://img.shields.io/badge/License-BSD3-lightgrey.svg )] ( https://opensource.org/licenses/BSD-3-Clause )
3030
31- ** LATEST RELEASE: You are currently on the main branch which tracks
32- under-development progress towards the next release. The current release branch
33- is [ r24.01] ( https://github.com/triton-inference-server/vllm_backend/tree/r24.01 )
34- and which corresponds to the 24.01 container release on
35- [ NVIDIA GPU Cloud (NGC)] ( https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver ) .**
36-
3731# vLLM Backend
3832
3933The Triton backend for [ vLLM] ( https://github.com/vllm-project/vllm )
@@ -81,7 +75,14 @@ script.
8175
8276A sample command to build a Triton Server container with all options enabled is shown below. Feel free to customize flags according to your needs.
8377
78+ Please use [ NGC registry] ( https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags )
79+ to get the latest version of the Triton vLLM container, which corresponds to the
80+ latest YY.MM (year.month) of [ Triton release] ( https://github.com/triton-inference-server/server/releases ) .
81+
82+
8483```
84+ # YY.MM is the version of Triton.
85+ export TRITON_CONTAINER_VERSION=<YY.MM>
8586./build.py -v --enable-logging
8687 --enable-stats
8788 --enable-tracing
@@ -96,9 +97,9 @@ A sample command to build a Triton Server container with all options enabled is
9697 --endpoint=grpc
9798 --endpoint=sagemaker
9899 --endpoint=vertex-ai
99- --upstream-container-version=24.01
100- --backend=python:r24.01
101- --backend=vllm:r24.01
100+ --upstream-container-version=${TRITON_CONTAINER_VERSION}
101+ --backend=python:r${TRITON_CONTAINER_VERSION}
102+ --backend=vllm:r${TRITON_CONTAINER_VERSION}
102103```
103104
104105### Option 3. Add the vLLM Backend to the Default Triton Container
0 commit comments