You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sample_blueprints/offline-inference-infra/README.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,3 @@
1
-
Here’s your rewritten `README.md`, styled similarly to the CPU inference blueprint but focused on **offline GPU inference using the SGLang backend**.
2
-
3
1
# Offline Inference Blueprint - Infra (SGLang + vLLM)
4
2
5
3
This blueprint provides a configurable framework to run **offline LLM inference benchmarks** using either the SGLang or vLLM backends. It is designed for cloud GPU environments and supports automated performance benchmarking with MLflow logging.
0 commit comments