Skip to content

Commit 53c2a03

Browse files
triton-gpu-oke
adding an entire walkthrough to deploy Triton inference server with TensorRT-LLM Backend on a GPU node within the OKE service.
1 parent 647764b commit 53c2a03

File tree

17 files changed

+2826
-0
lines changed

17 files changed

+2826
-0
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
FROM <region-key>.ocir.io/<tenancy-namespace>/triton_llm:triton_trt_llm_23.12_manual_build
2+
3+
COPY output_llama_hf /app/output_llama_hf
4+
5+
COPY model_repo /app/model_repo
6+
7+
RUN mkdir /app/cache
8+
RUN chmod 777 /app/cache

cloud-infrastructure/ai-infra-gpu/GPU/triton-gpu-oke/README.md

Lines changed: 375 additions & 0 deletions
Large diffs are not rendered by default.
151 KB
Loading

0 commit comments

Comments
 (0)