You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This example compiles a pretrained ResNet-18 ONNX model to a TensorRT engine during the build phase using `trt_compile`, then serves it on Triton Inference Server.
4
+
5
+
During build, `tsbk` will:
6
+
7
+
1. Download the ONNX model artifact from MLflow
8
+
2. Compile it to a `.plan` file using `trtexec` with fp16 precision (via Docker or a Kubernetes Job)
9
+
3. Set the backend to `tensorrt` in the generated `config.pbtxt`
10
+
4. Cache the compiled engine locally so subsequent builds skip compilation
11
+
12
+
## Prerequisites
13
+
14
+
- Install example requirements:
15
+
16
+
```bash
17
+
pip install -r requirements.txt
18
+
```
19
+
20
+
-**Docker with GPU access** (for local compilation), or
21
+
-**Kubernetes cluster with GPU nodes** + `TSBK_S3_PREFIX` env var set (for remote compilation)
22
+
23
+
## Setup
24
+
25
+
Export a pretrained ResNet-18 to ONNX and register it with MLflow:
- Build the model repo, compiling the ONNX model to TensorRT with fp16 precision
43
+
- Launch Triton server in a Docker container with GPU access
44
+
- Run the MLflow registered input example as a test case
45
+
- Stop the server
46
+
47
+
## Build and Run (remote GPU via Kubernetes)
48
+
49
+
If you don't have a local GPU but have access to a Kubernetes cluster with GPU nodes, pass `--gpu-name` to target a specific GPU type via Karpenter:
50
+
51
+
```bash
52
+
export TSBK_S3_PREFIX=s3://your-bucket/tsbk-cache
53
+
python server.py --test --gpu-name a10g
54
+
```
55
+
56
+
The `--gpu-name` value maps to a Karpenter node selector (`karpenter.k8s.aws/instance-gpu-name`) so the compilation job is scheduled on the correct hardware.
0 commit comments