Skip to content

Commit d40e08f

Browse files
committed
Add high-throughput batch processing cookbook for Nemotron 3 Super
Demonstrates offline batch inference, concurrent server requests, bulk document classification, and throughput benchmarking with vLLM. Signed-off-by: Matt Van Horn <matt@mvanhorn.com>
1 parent 6fad2df commit d40e08f

File tree

2 files changed

+756
-0
lines changed

2 files changed

+756
-0
lines changed

usage-cookbook/Nemotron-3-Super/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ These notebooks provide end-to-end recipes for deploying and customizing Nemotro
1313
- **[vllm_cookbook.ipynb](vllm_cookbook.ipynb)** — Deploy Nemotron-3-Super with vLLM.
1414
- **[sglang_cookbook.ipynb](sglang_cookbook.ipynb)** — Deploy Nemotron-3-Super with SGLang.
1515
- **[trtllm_cookbook.ipynb](trtllm_cookbook.ipynb)** — Deploy Nemotron-3-Super with TensorRT-LLM.
16+
- **[batch_throughput_cookbook.ipynb](batch_throughput_cookbook.ipynb)** — High-throughput batch processing with vLLM: offline batch inference, concurrent server requests, bulk document classification, and throughput benchmarking.
1617
- **{doc}`AdvancedDeploymentGuide <AdvancedDeploymentGuide/README>`** — Production deployment configurations for vLLM, SGLang, and TRT-LLM across GPU topologies (GB200, B200, DGX Spark), including MTP speculative decoding, expert parallelism, and tuning guidance.
1718

1819
### Fine-Tuning

0 commit comments

Comments
 (0)