Skip to content

Commit b4cbfa2

Browse files
mc-nvkrishung5kthuiGuanLuo
authored
post-25.08: Update default branch (#8366)
Co-authored-by: Kris Hung <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: GuanLuo <[email protected]>
1 parent d3817a1 commit b4cbfa2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+347
-453
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ proposed change so that the Triton team can provide feedback.
8888
documentation for instructions on running these tests.
8989

9090
- Triton Inference Server's default build assumes recent versions of
91-
dependencies (CUDA, TensorFlow, PyTorch, TensorRT,
91+
dependencies (CUDA, PyTorch, TensorRT,
9292
etc.). Contributions that add compatibility with older versions of
9393
those dependencies will be considered, but NVIDIA cannot guarantee
9494
that all possible build configurations work, are not broken by

Dockerfile.sdk

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
#
3030

3131
# Base image on the minimum Triton container
32-
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:25.07-py3-min
32+
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:25.08-py3-min
3333

3434
ARG TRITON_CLIENT_REPO_SUBDIR=clientrepo
3535
ARG TRITON_PA_REPO_SUBDIR=perfanalyzerrepo
@@ -43,7 +43,7 @@ ARG JAVA_BINDINGS_MAVEN_VERSION=3.8.4
4343
ARG JAVA_BINDINGS_JAVACPP_PRESETS_TAG=1.5.8
4444
ARG TRITON_PERF_ANALYZER_BUILD=1
4545
# DCGM version to install for Model Analyzer
46-
ARG DCGM_VERSION=4.2.3-2
46+
ARG DCGM_VERSION=4.4.0-1
4747

4848
ARG NVIDIA_TRITON_SERVER_SDK_VERSION=unknown
4949
ARG NVIDIA_BUILD_ID=unknown

README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,15 @@
2929

3030
>[!WARNING]
3131
>You are currently on the `main` branch which tracks under-development progress
32-
>towards the next release. The current release is version [2.59.1](https://github.com/triton-inference-server/server/releases/latest)
33-
>and corresponds to the 25.07 container release on NVIDIA GPU Cloud (NGC).
32+
>towards the next release. The current release is version [2.60.0](https://github.com/triton-inference-server/server/releases/latest)
33+
>and corresponds to the 25.08 container release on NVIDIA GPU Cloud (NGC).
3434
3535
# Triton Inference Server
3636

3737
Triton Inference Server is an open source inference serving software that
3838
streamlines AI inferencing. Triton enables teams to deploy any AI model from
3939
multiple deep learning and machine learning frameworks, including TensorRT,
40-
TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton
40+
PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton
4141
Inference Server supports inference across cloud, data center, edge and embedded
4242
devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference
4343
Server delivers optimized performance for many query types, including real time,
@@ -90,16 +90,16 @@ Inference Server with the
9090

9191
```bash
9292
# Step 1: Create the example model repository
93-
git clone -b r25.07 https://github.com/triton-inference-server/server.git
93+
git clone -b r25.08 https://github.com/triton-inference-server/server.git
9494
cd server/docs/examples
9595
./fetch_models.sh
9696

9797
# Step 2: Launch triton from the NGC Triton container
98-
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:25.07-py3 tritonserver --model-repository=/models --model-control-mode explicit --load-model densenet_onnx
98+
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:25.08-py3 tritonserver --model-repository=/models --model-control-mode explicit --load-model densenet_onnx
9999

100100
# Step 3: Sending an Inference Request
101101
# In a separate console, launch the image_client example from the NGC Triton SDK container
102-
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:25.07-py3-sdk /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
102+
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:25.08-py3-sdk /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
103103

104104
# Inference should return the following
105105
Image '/workspace/images/mug.jpg':
@@ -166,7 +166,6 @@ configuration](docs/user_guide/model_configuration.md) for the model.
166166
- Triton supports multiple execution engines, called
167167
[backends](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton), including
168168
[TensorRT](https://github.com/triton-inference-server/tensorrt_backend),
169-
[TensorFlow](https://github.com/triton-inference-server/tensorflow_backend),
170169
[PyTorch](https://github.com/triton-inference-server/pytorch_backend),
171170
[ONNX](https://github.com/triton-inference-server/onnxruntime_backend),
172171
[OpenVINO](https://github.com/triton-inference-server/openvino_backend),

TRITON_VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.60.0dev
1+
2.61.0dev

build.py

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,14 @@
7171
#
7272

7373
DEFAULT_TRITON_VERSION_MAP = {
74-
"release_version": "2.60.0dev",
75-
"triton_container_version": "25.08dev",
76-
"upstream_container_version": "25.07",
77-
"ort_version": "1.22.0",
74+
"release_version": "2.61.0dev",
75+
"triton_container_version": "25.09dev",
76+
"upstream_container_version": "25.08",
77+
"ort_version": "1.23.0",
7878
"ort_openvino_version": "2025.2.0",
7979
"standalone_openvino_version": "2025.2.0",
80-
"dcgm_version": "4.2.3-2",
81-
"vllm_version": "0.9.0.1",
80+
"dcgm_version": "4.4.0-1",
81+
"vllm_version": "0.9.2",
8282
"rhel_py_version": "3.12.3",
8383
}
8484

@@ -1259,7 +1259,7 @@ def create_dockerfile_linux(
12591259
# stage of the PyTorch backend
12601260
if not FLAGS.enable_gpu and ("pytorch" in backends):
12611261
df += """
1262-
RUN patchelf --add-needed /usr/local/cuda/lib64/stubs/libcublasLt.so.12 backends/pytorch/libtorch_cuda.so
1262+
RUN patchelf --add-needed /usr/local/cuda/lib64/stubs/libcublasLt.so.13 backends/pytorch/libtorch_cuda.so
12631263
"""
12641264
if "tensorrtllm" in backends:
12651265
df += """
@@ -1494,7 +1494,7 @@ def dockerfile_prepare_container_linux(argmap, backends, enable_gpu, target_mach
14941494
cp -r nvpl_slim_24.04/include/* /usr/local/include && \\
14951495
rm -rf nvpl_slim_24.04.tar nvpl_slim_24.04; \\
14961496
fi \\
1497-
&& pip3 install --no-cache-dir --progress-bar on --index-url $VLLM_INDEX_URL -r /run/secrets/requirements \\
1497+
&& pip3 install --no-cache-dir --extra-index-url $VLLM_INDEX_URL -r /run/secrets/requirements \\
14981498
# Need to install in-house build of pytorch-triton to support triton_key definition used by torch 2.5.1
14991499
&& cd /tmp \\
15001500
&& wget $PYTORCH_TRITON_URL \\
@@ -1554,18 +1554,19 @@ def add_cpu_libs_to_linux_dockerfile(backends, target_machine):
15541554
df += """
15551555
RUN mkdir -p /usr/local/cuda/lib64/stubs
15561556
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcusparse.so /usr/local/cuda/lib64/stubs/libcusparse.so.12
1557-
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcusolver.so /usr/local/cuda/lib64/stubs/libcusolver.so.11
1557+
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcusolver.so /usr/local/cuda/lib64/stubs/libcusolver.so.12
15581558
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcurand.so /usr/local/cuda/lib64/stubs/libcurand.so.10
1559-
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcufft.so /usr/local/cuda/lib64/stubs/libcufft.so.11
1560-
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcublas.so /usr/local/cuda/lib64/stubs/libcublas.so.12
1561-
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcublasLt.so /usr/local/cuda/lib64/stubs/libcublasLt.so.12
1562-
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcublasLt.so /usr/local/cuda/lib64/stubs/libcublasLt.so.11
1559+
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcufft.so /usr/local/cuda/lib64/stubs/libcufft.so.12
1560+
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcublas.so /usr/local/cuda/lib64/stubs/libcublas.so.13
1561+
COPY --from=min_container /usr/local/cuda/lib64/stubs/libcublasLt.so /usr/local/cuda/lib64/stubs/libcublasLt.so.13
15631562
15641563
RUN mkdir -p /usr/local/cuda/targets/{cuda_arch}-linux/lib
1565-
COPY --from=min_container /usr/local/cuda/lib64/libcudart.so.12 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1566-
COPY --from=min_container /usr/local/cuda/lib64/libcupti.so.12 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1567-
COPY --from=min_container /usr/local/cuda/lib64/libnvJitLink.so.12 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1564+
COPY --from=min_container /usr/local/cuda/lib64/libcudart.so.13 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1565+
COPY --from=min_container /usr/local/cuda/lib64/libcupti.so.13 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1566+
COPY --from=min_container /usr/local/cuda/lib64/libnvJitLink.so.13 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
15681567
COPY --from=min_container /usr/local/cuda/lib64/libcufile.so.0 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1568+
COPY --from=min_container /usr/local/cuda/lib64/libnvrtc.so.13 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
1569+
COPY --from=min_container /usr/local/cuda/lib64/libcusparseLt.so.0 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
15691570
15701571
RUN mkdir -p /opt/hpcx/ucc/lib/ /opt/hpcx/ucx/lib/
15711572
COPY --from=min_container /opt/hpcx/ucc/lib/libucc.so.1 /opt/hpcx/ucc/lib/libucc.so.1

compose.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ def create_argmap(images, skip_pull):
298298
dcgm_ver = re.search("DCGM_VERSION=([\S]{4,}) ", vars)
299299
dcgm_version = ""
300300
if dcgm_ver is None:
301-
dcgm_version = "4.2.3-2"
301+
dcgm_version = "4.4.0-1"
302302
log(
303303
"WARNING: DCGM version not found from image, installing the earlierst version {}".format(
304304
dcgm_version

deploy/aws/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
replicaCount: 1
2828

2929
image:
30-
imageName: nvcr.io/nvidia/tritonserver:25.07-py3
30+
imageName: nvcr.io/nvidia/tritonserver:25.08-py3
3131
pullPolicy: IfNotPresent
3232
modelRepositoryPath: s3://triton-inference-server-repository/model_repository
3333
numGpus: 1

deploy/fleetcommand/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
apiVersion: v1
2828
# appVersion is the Triton version; update when changing release
29-
appVersion: "2.59.1"
29+
appVersion: 2.60.0"
3030
description: Triton Inference Server (Fleet Command)
3131
name: triton-inference-server
3232
# version is the Chart version; update when changing anything in the chart

deploy/fleetcommand/values.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
replicaCount: 1
2828

2929
image:
30-
imageName: nvcr.io/nvidia/tritonserver:25.07-py3
30+
imageName: nvcr.io/nvidia/tritonserver:25.08-py3
3131
pullPolicy: IfNotPresent
3232
numGpus: 1
3333
serverCommand: tritonserver
@@ -47,13 +47,13 @@ image:
4747
#
4848
# To set model control mode, uncomment and configure below
4949
# TODO: Fix the following url, it is invalid
50-
# See https://github.com/triton-inference-server/server/blob/r25.07/docs/user_guide/model_management.md
50+
# See https://github.com/triton-inference-server/server/blob/r25.08/docs/user_guide/model_management.md
5151
# for more details
5252
#- --model-control-mode=explicit|poll|none
5353
#
5454
# Additional server args
5555
#
56-
# see https://github.com/triton-inference-server/server/blob/r25.07/README.md
56+
# see https://github.com/triton-inference-server/server/blob/r25.08/README.md
5757
# for more details
5858

5959
service:

deploy/gcp/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
replicaCount: 1
2828

2929
image:
30-
imageName: nvcr.io/nvidia/tritonserver:25.07-py3
30+
imageName: nvcr.io/nvidia/tritonserver:25.08-py3
3131
pullPolicy: IfNotPresent
3232
modelRepositoryPath: gs://triton-inference-server-repository/model_repository
3333
numGpus: 1

0 commit comments

Comments
 (0)