Skip to content

Commit 9818033

Browse files
committed
Merge branch 'main' of github.com:triton-inference-server/server into yinggeh-DLIS-7061-add-vllm-metrics
2 parents e3b8df0 + f284101 commit 9818033

File tree

25 files changed

+317
-122
lines changed

25 files changed

+317
-122
lines changed

Dockerfile.sdk

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:24.07-py3-min
3333

3434
ARG TRITON_CLIENT_REPO_SUBDIR=clientrepo
3535
ARG TRITON_PA_REPO_SUBDIR=perfanalyzerrepo
36+
ARG TRITON_REPO_ORGANIZATION=http://github.com/triton-inference-server
3637
ARG TRITON_COMMON_REPO_TAG=main
3738
ARG TRITON_CORE_REPO_TAG=main
3839
ARG TRITON_CLIENT_REPO_TAG=main
@@ -217,6 +218,7 @@ WORKDIR /workspace
217218
COPY TRITON_VERSION .
218219
COPY NVIDIA_Deep_Learning_Container_License.pdf .
219220
COPY --from=sdk_build /workspace/client/ client/
221+
COPY --from=sdk_build /workspace/perf_analyzer/ perf_analyzer/
220222
COPY --from=sdk_build /workspace/install/ install/
221223
RUN cd install && \
222224
export VERSION=`cat /workspace/TRITON_VERSION` && \

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,17 @@
2828

2929
# Triton Inference Server
3030

31+
📣 **vLLM x Triton Meetup at Fort Mason on Sept 9th 4:00 - 9:00 pm**
32+
33+
We are excited to announce that we will be hosting our Triton user meetup with the vLLM team at
34+
[Fort Mason](https://maps.app.goo.gl/9Lr3fxRssrpQCGK58) on Sept 9th 4:00 - 9:00 pm. Join us for this
35+
exclusive event where you will learn about the newest vLLM and Triton features, get a
36+
glimpse into the roadmaps, and connect with fellow users, the NVIDIA Triton and vLLM teams. Seating is limited and registration confirmation
37+
is required to attend - please register [here](https://lu.ma/87q3nvnh) to join
38+
the meetup.
39+
40+
___
41+
3142
[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
3243

3344
[!WARNING]
@@ -179,7 +190,7 @@ configuration](docs/user_guide/model_configuration.md) for the model.
179190
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
180191
to learn which backends are supported on your target platform.
181192
- Learn how to [optimize performance](docs/user_guide/optimization.md) using the
182-
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
193+
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
183194
and
184195
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
185196
- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in

build.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1647,6 +1647,10 @@ def core_build(
16471647
os.path.join(repo_install_dir, "bin", "tritonserver.dll"),
16481648
os.path.join(install_dir, "bin"),
16491649
)
1650+
cmake_script.cp(
1651+
os.path.join(repo_install_dir, "lib", "tritonserver.lib"),
1652+
os.path.join(install_dir, "bin"),
1653+
)
16501654
else:
16511655
cmake_script.mkdir(os.path.join(install_dir, "bin"))
16521656
cmake_script.cp(

deploy/gke-marketplace-app/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -172,7 +172,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w
172172
![Locust Client Chart](client.png)
173173

174174
Alternatively, user can opt to use
175-
[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
175+
[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
176176
to profile and study the performance of Triton Inference Server. Here we also
177177
provide a
178178
[client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh)

deploy/k8s-onprem/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright (c) 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -295,7 +295,7 @@ Image 'images/mug.jpg':
295295
After you have confirmed that your Triton cluster is operational and can perform inference,
296296
you can test the load balancing and autoscaling features by sending a heavy load of requests.
297297
One option for doing this is using the
298-
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
298+
[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
299299
application.
300300

301301
You can apply a progressively increasing load with a command like:

docs/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -173,7 +173,7 @@ Understanding Inference performance is key to better resource utilization. Use T
173173
- [Performance Tuning Guide](user_guide/performance_tuning.md)
174174
- [Optimization](user_guide/optimization.md)
175175
- [Model Analyzer](user_guide/model_analyzer.md)
176-
- [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
176+
- [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
177177
- [Inference Request Tracing](user_guide/trace.md)
178178
### Jetson and JetPack
179179
Triton can be deployed on edge devices. Explore [resources](user_guide/jetson.md) and [examples](examples/jetson/README.md).
@@ -185,7 +185,7 @@ The following resources are recommended to explore the full suite of Triton Infe
185185

186186
- **Configuring Deployment**: Triton comes with three tools which can be used to configure deployment setting, measure performance and recommend optimizations.
187187
- [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) Model Analyzer is CLI tool built to recommend deployment configurations for Triton Inference Server based on user's Quality of Service Requirements. It also generates detailed reports about model performance to summarize the benefits and trade offs of different configurations.
188-
- [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md):
188+
- [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md):
189189
Perf Analyzer is a CLI application built to generate inference requests and
190190
measures the latency of those requests and throughput of the model being
191191
served.

docs/contents.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -119,17 +119,24 @@ client/src/grpc_generated/java/README
119119
:maxdepth: 1
120120
:caption: Performance Analyzer
121121
122-
client/src/c++/perf_analyzer/README
123-
client/src/c++/perf_analyzer/docs/README
124-
client/src/c++/perf_analyzer/docs/install
125-
client/src/c++/perf_analyzer/docs/quick_start
126-
client/src/c++/perf_analyzer/docs/cli
127-
client/src/c++/perf_analyzer/docs/inference_load_modes
128-
client/src/c++/perf_analyzer/docs/input_data
129-
client/src/c++/perf_analyzer/docs/measurements_metrics
130-
client/src/c++/perf_analyzer/docs/benchmarking
131-
client/src/c++/perf_analyzer/genai-perf/README
132-
client/src/c++/perf_analyzer/genai-perf/examples/tutorial
122+
perf_analyzer/README
123+
perf_analyzer/docs/README
124+
perf_analyzer/docs/install
125+
perf_analyzer/docs/quick_start
126+
perf_analyzer/docs/cli
127+
perf_analyzer/docs/inference_load_modes
128+
perf_analyzer/docs/input_data
129+
perf_analyzer/docs/measurements_metrics
130+
perf_analyzer/docs/benchmarking
131+
perf_analyzer/genai-perf/README
132+
perf_analyzer/genai-perf/docs/compare
133+
perf_analyzer/genai-perf/docs/embeddings
134+
perf_analyzer/genai-perf/docs/files
135+
perf_analyzer/genai-perf/docs/lora
136+
perf_analyzer/genai-perf/docs/multi_modal
137+
perf_analyzer/genai-perf/docs/rankings
138+
perf_analyzer/genai-perf/docs/tutorial
139+
perf_analyzer/genai-perf/examples/tutorial
133140
```
134141

135142
```{toctree}

docs/customization_guide/build.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -331,13 +331,13 @@ invocation builds all features and backends available on windows.
331331
python build.py --cmake-dir=<path/to/repo>/build --build-dir=/tmp/citritonbuild --no-container-pull --image=base,win10-py3-min --enable-logging --enable-stats --enable-tracing --enable-gpu --endpoint=grpc --endpoint=http --repo-tag=common:<container tag> --repo-tag=core:<container tag> --repo-tag=backend:<container tag> --repo-tag=thirdparty:<container tag> --backend=ensemble --backend=tensorrt:<container tag> --backend=onnxruntime:<container tag> --backend=openvino:<container tag>
332332
```
333333

334-
If you are building on *main* branch then '<container tag>' will
334+
If you are building on *main* branch then `<container tag>` will
335335
default to "main". If you are building on a release branch then
336-
'<container tag>' will default to the branch name. For example, if you
337-
are building on the r24.07 branch, '<container tag>' will default to
338-
r24.07. Therefore, you typically do not need to provide '<container
339-
tag>' at all (nor the preceding colon). You can use a different
340-
'<container tag>' for a component to instead use the corresponding
336+
`<container tag>` will default to the branch name. For example, if you
337+
are building on the r24.07 branch, `<container tag>` will default to
338+
r24.07. Therefore, you typically do not need to provide `<container
339+
tag>` at all (nor the preceding colon). You can use a different
340+
`<container tag>` for a component to instead use the corresponding
341341
branch/tag in the build. For example, if you have a branch called
342342
"mybranch" in the
343343
[onnxruntime_backend](https://github.com/triton-inference-server/onnxruntime_backend)

docs/customization_guide/test.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ $ ./gen_qa_model_repository
4848
$ ./gen_qa_custom_ops
4949
```
5050

51-
This will create multiple model repositories in /tmp/<version>/qa_*
51+
This will create multiple model repositories in /tmp/\<version\>/qa_*
5252
(for example /tmp/24.07/qa_model_repository). The TensorRT models
5353
will be created for the GPU on the system that CUDA considers device 0
5454
(zero). If you have multiple GPUs on your system see the documentation
@@ -57,14 +57,17 @@ in the scripts for how to target a specific GPU.
5757
## Build SDK Image
5858

5959
Build the *tritonserver_sdk* image that contains the client
60-
libraries, model analyzer, and examples using the following
61-
commands. You must first checkout the <client branch> branch of the
62-
*client* repo into the clientrepo/ subdirectory. Typically you want to
63-
set <client branch> to be the same as your current server branch.
60+
libraries, model analyzer, perf analyzer and examples using the following
61+
commands. You must first checkout the `<client branch>` branch of the
62+
*client* repo into the clientrepo/ subdirectory and the `<perf analyzer branch>`
63+
branch of the *perf_analyzer* repo into the perfanalyzerrepo/ subdirectory
64+
respectively. Typically you want to set both `<client branch>` and `<perf analyzer branch>`
65+
to be the same as your current server branch.
6466

6567
```
6668
$ cd <server repo root>
6769
$ git clone --single-branch --depth=1 -b <client branch> https://github.com/triton-inference-server/client.git clientrepo
70+
$ git clone --single-branch --depth=1 -b <perf analyzer branch> https://github.com/triton-inference-server/perf_analyzer.git perfanalyzerrepo
6871
$ docker build -t tritonserver_sdk -f Dockerfile.sdk .
6972
```
7073

docs/examples/jetson/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -53,7 +53,7 @@ Inference Server as a shared library.
5353
## Part 2. Analyzing model performance with perf_analyzer
5454

5555
To analyze model performance on Jetson,
56-
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
56+
[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
5757
tool is used. The `perf_analyzer` is included in the release tar file or can be
5858
compiled from source.
5959

@@ -65,4 +65,4 @@ From this directory of the repository, execute the following to evaluate model p
6565

6666
In the example above we saved the results as a `.csv` file. To visualize these
6767
results, follow the steps described
68-
[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
68+
[here](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md).

0 commit comments

Comments
 (0)