Skip to content

Commit d3ee577

Browse files
mc-nvpvijayakrishnv-kmcgill53nv-tusharmanvda-mesharma
authored
Update default branch post 25.01 (#7983)
Co-authored-by: Pavithra Vijayakrishnan <[email protected]> Co-authored-by: Kyle McGill <[email protected]> Co-authored-by: Tushar Sharma <[email protected]> Co-authored-by: Meenakshi Sharma <[email protected]>
1 parent db6b3a3 commit d3ee577

File tree

26 files changed

+369
-154
lines changed

26 files changed

+369
-154
lines changed

Dockerfile.sdk

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2019-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright 2019-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -29,7 +29,7 @@
2929
#
3030

3131
# Base image on the minimum Triton container
32-
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:24.12-py3-min
32+
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:25.01-py3-min
3333

3434
ARG TRITON_CLIENT_REPO_SUBDIR=clientrepo
3535
ARG TRITON_PA_REPO_SUBDIR=perfanalyzerrepo

README.md

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -30,11 +30,6 @@
3030

3131
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
3232

33-
>[!WARNING]
34-
>You are currently on the `main` branch which tracks under-development progress
35-
>towards the next release. The current release is version [2.53.0](https://github.com/triton-inference-server/server/releases/latest)
36-
>and corresponds to the 24.12 container release on NVIDIA GPU Cloud (NGC).
37-
3833
Triton Inference Server is an open source inference serving software that
3934
streamlines AI inferencing. Triton enables teams to deploy any AI model from
4035
multiple deep learning and machine learning frameworks, including TensorRT,
@@ -62,7 +57,7 @@ Major features include:
6257
- Provides [Backend API](https://github.com/triton-inference-server/backend) that
6358
allows adding custom backends and pre/post processing operations
6459
- Supports writing custom backends in python, a.k.a.
65-
[Python-based backends.](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#python-based-backends)
60+
[Python-based backends.](https://github.com/triton-inference-server/backend/blob/r25.01/docs/python_based_backends.md#python-based-backends)
6661
- Model pipelines using
6762
[Ensembling](docs/user_guide/architecture.md#ensemble-models) or [Business
6863
Logic Scripting
@@ -91,16 +86,16 @@ Inference Server with the
9186

9287
```bash
9388
# Step 1: Create the example model repository
94-
git clone -b r24.12 https://github.com/triton-inference-server/server.git
89+
git clone -b r25.01 https://github.com/triton-inference-server/server.git
9590
cd server/docs/examples
9691
./fetch_models.sh
9792

9893
# Step 2: Launch triton from the NGC Triton container
99-
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.12-py3 tritonserver --model-repository=/models
94+
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:25.01-py3 tritonserver --model-repository=/models
10095

10196
# Step 3: Sending an Inference Request
10297
# In a separate console, launch the image_client example from the NGC Triton SDK container
103-
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.12-py3-sdk
98+
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:25.01-py3-sdk
10499
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
105100

106101
# Inference should return the following
@@ -175,10 +170,10 @@ configuration](docs/user_guide/model_configuration.md) for the model.
175170
[Python](https://github.com/triton-inference-server/python_backend), and more
176171
- Not all the above backends are supported on every platform supported by Triton.
177172
Look at the
178-
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
173+
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/r25.01/docs/backend_platform_support_matrix.md)
179174
to learn which backends are supported on your target platform.
180175
- Learn how to [optimize performance](docs/user_guide/optimization.md) using the
181-
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
176+
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/r25.01/README.md)
182177
and
183178
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
184179
- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in
@@ -192,14 +187,14 @@ A Triton *client* application sends inference and other requests to Triton. The
192187
[Python and C++ client libraries](https://github.com/triton-inference-server/client)
193188
provide APIs to simplify this communication.
194189

195-
- Review client examples for [C++](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/examples),
196-
[Python](https://github.com/triton-inference-server/client/blob/main/src/python/examples),
197-
and [Java](https://github.com/triton-inference-server/client/blob/main/src/java/src/main/java/triton/client/examples)
190+
- Review client examples for [C++](https://github.com/triton-inference-server/client/blob/r25.01/src/c%2B%2B/examples),
191+
[Python](https://github.com/triton-inference-server/client/blob/r25.01/src/python/examples),
192+
and [Java](https://github.com/triton-inference-server/client/blob/r25.01/src/java/src/main/java/triton/client/examples)
198193
- Configure [HTTP](https://github.com/triton-inference-server/client#http-options)
199194
and [gRPC](https://github.com/triton-inference-server/client#grpc-options)
200195
client options
201196
- Send input data (e.g. a jpeg image) directly to Triton in the [body of an HTTP
202-
request without any additional metadata](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md#raw-binary-request)
197+
request without any additional metadata](https://github.com/triton-inference-server/server/blob/r25.01/docs/protocol/extension_binary_data.md#raw-binary-request)
203198

204199
### Extend Triton
205200

@@ -208,7 +203,7 @@ designed for modularity and flexibility
208203

209204
- [Customize Triton Inference Server container](docs/customization_guide/compose.md) for your use case
210205
- [Create custom backends](https://github.com/triton-inference-server/backend)
211-
in either [C/C++](https://github.com/triton-inference-server/backend/blob/main/README.md#triton-backend-api)
206+
in either [C/C++](https://github.com/triton-inference-server/backend/blob/r25.01/README.md#triton-backend-api)
212207
or [Python](https://github.com/triton-inference-server/python_backend)
213208
- Create [decoupled backends and models](docs/user_guide/decoupled_models.md) that can send
214209
multiple responses for a request or not send any responses for a request
@@ -217,7 +212,7 @@ designed for modularity and flexibility
217212
decryption, or conversion
218213
- Deploy Triton on [Jetson and JetPack](docs/user_guide/jetson.md)
219214
- [Use Triton on AWS
220-
Inferentia](https://github.com/triton-inference-server/python_backend/tree/main/inferentia)
215+
Inferentia](https://github.com/triton-inference-server/python_backend/tree/r25.01/inferentia)
221216

222217
### Additional Documentation
223218

TRITON_VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.54.0dev
1+
2.55.0dev

build.py

Lines changed: 70 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -71,12 +71,12 @@
7171
#
7272

7373
DEFAULT_TRITON_VERSION_MAP = {
74-
"release_version": "2.54.0dev",
75-
"triton_container_version": "25.01dev",
76-
"upstream_container_version": "24.12",
74+
"release_version": "2.55.0dev",
75+
"triton_container_version": "25.02dev",
76+
"upstream_container_version": "25.01",
7777
"ort_version": "1.20.1",
78-
"ort_openvino_version": "2024.4.0",
79-
"standalone_openvino_version": "2024.4.0",
78+
"ort_openvino_version": "2024.5.0",
79+
"standalone_openvino_version": "2024.5.0",
8080
"dcgm_version": "3.3.6",
8181
"vllm_version": "0.6.3.post1",
8282
"rhel_py_version": "3.12.3",
@@ -962,7 +962,6 @@ def create_dockerfile_buildbase_rhel(ddir, dockerfile_name, argmap):
962962
libcurl-devel \\
963963
libb64-devel \\
964964
gperftools-devel \\
965-
patchelf \\
966965
python3-pip \\
967966
python3-setuptools \\
968967
rapidjson-devel \\
@@ -990,7 +989,8 @@ def create_dockerfile_buildbase_rhel(ddir, dockerfile_name, argmap):
990989
wheel \\
991990
setuptools \\
992991
docker \\
993-
virtualenv
992+
virtualenv \\
993+
patchelf==0.17.2
994994
995995
# Install boost version >= 1.78 for boost::span
996996
# Current libboost-dev apt packages are < 1.78, so install from tar.gz
@@ -1089,7 +1089,6 @@ def create_dockerfile_buildbase(ddir, dockerfile_name, argmap):
10891089
libcurl4-openssl-dev \\
10901090
libb64-dev \\
10911091
libgoogle-perftools-dev \\
1092-
patchelf \\
10931092
python3-dev \\
10941093
python3-pip \\
10951094
python3-wheel \\
@@ -1110,7 +1109,8 @@ def create_dockerfile_buildbase(ddir, dockerfile_name, argmap):
11101109
RUN pip3 install --upgrade \\
11111110
build \\
11121111
docker \\
1113-
virtualenv
1112+
virtualenv \\
1113+
patchelf==0.17.2
11141114
11151115
# Install boost version >= 1.78 for boost::span
11161116
# Current libboost-dev apt packages are < 1.78, so install from tar.gz
@@ -1354,11 +1354,12 @@ def dockerfile_prepare_container_linux(argmap, backends, enable_gpu, target_mach
13541354
libcurl-devel \\
13551355
libb64-devel \\
13561356
gperftools-devel \\
1357-
patchelf \\
13581357
wget \\
13591358
python3-pip \\
13601359
numactl-devel
13611360
1361+
RUN pip3 install patchelf==0.17.2
1362+
13621363
"""
13631364
else:
13641365
df += """
@@ -1467,12 +1468,31 @@ def dockerfile_prepare_container_linux(argmap, backends, enable_gpu, target_mach
14671468
"""
14681469

14691470
if "vllm" in backends:
1470-
df += """
1471-
# vLLM needed for vLLM backend
1472-
RUN pip3 install vllm=={}
1473-
""".format(
1474-
FLAGS.vllm_version
1475-
)
1471+
df += f"""
1472+
ARG BUILD_PUBLIC_VLLM="true"
1473+
ARG VLLM_INDEX_URL
1474+
ARG PYTORCH_TRITON_URL
1475+
1476+
RUN --mount=type=secret,id=req,target=/run/secrets/requirements \\
1477+
if [ "$BUILD_PUBLIC_VLLM" = "false" ]; then \\
1478+
pip3 install --no-cache-dir \\
1479+
mkl==2021.1.1 \\
1480+
mkl-include==2021.1.1 \\
1481+
mkl-devel==2021.1.1 \\
1482+
&& pip3 install --no-cache-dir --progress-bar on --index-url $VLLM_INDEX_URL -r /run/secrets/requirements \\
1483+
# Need to install in-house build of pytorch-triton to support triton_key definition used by torch 2.5.1
1484+
&& cd /tmp \\
1485+
&& wget $PYTORCH_TRITON_URL \\
1486+
&& pip install --no-cache-dir /tmp/pytorch_triton-*.whl \\
1487+
&& rm /tmp/pytorch_triton-*.whl; \\
1488+
else \\
1489+
# public vLLM needed for vLLM backend
1490+
pip3 install vllm=={DEFAULT_TRITON_VERSION_MAP["vllm_version"]}; \\
1491+
fi
1492+
1493+
ARG PYVER=3.12
1494+
ENV LD_LIBRARY_PATH /usr/local/lib:/usr/local/lib/python${{PYVER}}/dist-packages/torch/lib:${{LD_LIBRARY_PATH}}
1495+
"""
14761496

14771497
if "dali" in backends:
14781498
df += """
@@ -1543,7 +1563,8 @@ def add_cpu_libs_to_linux_dockerfile(backends, target_machine):
15431563
15441564
# patchelf is needed to add deps of libcublasLt.so.12 to libtorch_cuda.so
15451565
RUN apt-get update \\
1546-
&& apt-get install -y --no-install-recommends openmpi-bin patchelf
1566+
&& apt-get install -y --no-install-recommends openmpi-bin
1567+
RUN pip3 install patchelf==0.17.2
15471568
15481569
ENV LD_LIBRARY_PATH /usr/local/cuda/targets/{cuda_arch}-linux/lib:/usr/local/cuda/lib64/stubs:${{LD_LIBRARY_PATH}}
15491570
""".format(
@@ -1846,13 +1867,21 @@ def create_docker_build_script(script_name, container_install_dir, container_ci_
18461867
finalargs = [
18471868
"docker",
18481869
"build",
1870+
]
1871+
if secrets != "":
1872+
finalargs += [
1873+
f"--secret id=req,src={requirements}",
1874+
f"--build-arg VLLM_INDEX_URL={vllm_index_url}",
1875+
f"--build-arg PYTORCH_TRITON_URL={pytorch_triton_url}",
1876+
f"--build-arg BUILD_PUBLIC_VLLM={build_public_vllm}",
1877+
]
1878+
finalargs += [
18491879
"-t",
18501880
"tritonserver",
18511881
"-f",
18521882
os.path.join(FLAGS.build_dir, "Dockerfile"),
18531883
".",
18541884
]
1855-
18561885
docker_script.cwd(THIS_SCRIPT_DIR)
18571886
docker_script.cmd(finalargs, check_exitcode=True)
18581887

@@ -2697,6 +2726,19 @@ def enable_all():
26972726
default=DEFAULT_TRITON_VERSION_MAP["rhel_py_version"],
26982727
help="This flag sets the Python version for RHEL platform of Triton Inference Server to be built. Default: the latest supported version.",
26992728
)
2729+
parser.add_argument(
2730+
"--build-secret",
2731+
action="append",
2732+
required=False,
2733+
nargs=2,
2734+
metavar=("key", "value"),
2735+
help="Add build secrets in the form of <key> <value>. These secrets are used during the build process for vllm. The secrets are passed to the Docker build step as `--secret id=<key>`. The following keys are expected and their purposes are described below:\n\n"
2736+
" - 'req': A file containing a list of dependencies for pip (e.g., requirements.txt).\n"
2737+
" - 'vllm_index_url': The index URL for the pip install.\n"
2738+
" - 'pytorch_triton_url': The location of the PyTorch wheel to download.\n"
2739+
" - 'build_public_vllm': A flag (default is 'true') indicating whether to build the public VLLM version.\n\n"
2740+
"Ensure that the required environment variables for these secrets are set before running the build.",
2741+
)
27002742
FLAGS = parser.parse_args()
27012743

27022744
if FLAGS.image is None:
@@ -2723,6 +2765,8 @@ def enable_all():
27232765
FLAGS.override_backend_cmake_arg = []
27242766
if FLAGS.extra_backend_cmake_arg is None:
27252767
FLAGS.extra_backend_cmake_arg = []
2768+
if FLAGS.build_secret is None:
2769+
FLAGS.build_secret = []
27262770

27272771
# if --enable-all is specified, then update FLAGS to enable all
27282772
# settings, backends, repo-agents, caches, file systems, endpoints, etc.
@@ -2816,6 +2860,14 @@ def enable_all():
28162860
)
28172861
backends["python"] = backends["vllm"]
28182862

2863+
secrets = dict(getattr(FLAGS, "build_secret", []))
2864+
if secrets is not None:
2865+
requirements = secrets.get("req", "")
2866+
vllm_index_url = secrets.get("vllm_index_url", "")
2867+
pytorch_triton_url = secrets.get("pytorch_triton_url", "")
2868+
build_public_vllm = secrets.get("build_public_vllm", "true")
2869+
log('Build Arg for BUILD_PUBLIC_VLLM: "{}"'.format(build_public_vllm))
2870+
28192871
# Initialize map of repo agents to build and repo-tag for each.
28202872
repoagents = {}
28212873
for be in FLAGS.repoagent:

deploy/aws/values.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
1+
# Copyright (c) 2019-2025, NVIDIA CORPORATION. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -27,7 +27,7 @@
2727
replicaCount: 1
2828

2929
image:
30-
imageName: nvcr.io/nvidia/tritonserver:24.12-py3
30+
imageName: nvcr.io/nvidia/tritonserver:25.01-py3
3131
pullPolicy: IfNotPresent
3232
modelRepositoryPath: s3://triton-inference-server-repository/model_repository
3333
numGpus: 1

deploy/fleetcommand/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
1+
# Copyright (c) 2019-2025, NVIDIA CORPORATION. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -26,7 +26,7 @@
2626

2727
apiVersion: v1
2828
# appVersion is the Triton version; update when changing release
29-
appVersion: "2.53.0"
29+
appVersion: "2.54.0"
3030
description: Triton Inference Server (Fleet Command)
3131
name: triton-inference-server
3232
# version is the Chart version; update when changing anything in the chart

deploy/fleetcommand/values.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
1+
# Copyright (c) 2019-2025, NVIDIA CORPORATION. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -27,7 +27,7 @@
2727
replicaCount: 1
2828

2929
image:
30-
imageName: nvcr.io/nvidia/tritonserver:24.12-py3
30+
imageName: nvcr.io/nvidia/tritonserver:25.01-py3
3131
pullPolicy: IfNotPresent
3232
numGpus: 1
3333
serverCommand: tritonserver
@@ -47,13 +47,13 @@ image:
4747
#
4848
# To set model control mode, uncomment and configure below
4949
# TODO: Fix the following url, it is invalid
50-
# See https://github.com/triton-inference-server/server/blob/r24.12/docs/model_management.md
50+
# See https://github.com/triton-inference-server/server/blob/r25.01/docs/model_management.md
5151
# for more details
5252
#- --model-control-mode=explicit|poll|none
5353
#
5454
# Additional server args
5555
#
56-
# see https://github.com/triton-inference-server/server/blob/r24.12/README.md
56+
# see https://github.com/triton-inference-server/server/blob/r25.01/README.md
5757
# for more details
5858

5959
service:

deploy/gcp/values.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
1+
# Copyright (c) 2019-2025, NVIDIA CORPORATION. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -27,7 +27,7 @@
2727
replicaCount: 1
2828

2929
image:
30-
imageName: nvcr.io/nvidia/tritonserver:24.12-py3
30+
imageName: nvcr.io/nvidia/tritonserver:25.01-py3
3131
pullPolicy: IfNotPresent
3232
modelRepositoryPath: gs://triton-inference-server-repository/model_repository
3333
numGpus: 1

deploy/gke-marketplace-app/benchmark/perf-analyzer-script/triton_client.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright 2021-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -33,7 +33,7 @@ metadata:
3333
namespace: default
3434
spec:
3535
containers:
36-
- image: nvcr.io/nvidia/tritonserver:24.12-py3-sdk
36+
- image: nvcr.io/nvidia/tritonserver:25.01-py3-sdk
3737
imagePullPolicy: Always
3838
name: nv-triton-client
3939
securityContext:

0 commit comments

Comments
 (0)