docs: Clean up incomplete recipes and clarify Kubernetes-only focus #4159

BenHamm · 2025-11-06T17:21:18Z

🎯 Objective

Clean up the recipes/ directory to focus exclusively on production-ready Kubernetes deployments. Remove incomplete configurations and clarify that benchmark recipes are tools for users, not published performance results.

Related PR to release/0.6.1: #4145

📊 Summary of Changes

34 files changed: +210 insertions, -1,587 deletions

Deleted Content

5 incomplete model directories (no K8s deploy.yaml manifests):
- deepseek-r1-distill-llama-8b/
- gemma3/
- llama4/
- qwen2-vl-7b-instruct/
- qwen3/ (kept qwen3-32b-fp8/ which is complete)
run.sh script (228 lines) - non-K8s automation tool
12 standalone engine config YAMLs from deepseek-r1/trtllm/:
- agg/simple/, agg/mtp/, agg/wide_ep/ (all subdirs)
- disagg/simple/, disagg/mtp/
- disagg/wide_ep/*.yaml (standalone configs only)
- Kept: disagg/wide_ep/gb200/ (has complete K8s manifests)

Added/Modified Content

Comprehensive new recipes/README.md with:
- Complete recipe table showing Deployment and Benchmark Recipe availability
- Prerequisites section linking to correct K8s installation guides
- Step-by-step deployment instructions
- Troubleshooting section
gpt-oss-120b/trtllm/disagg/README.md - Documents incomplete recipe status

📦 What Remains

4 Models | 10 Complete Deployments | 7 with Benchmark Recipes

Model	Framework	Recipes	Status
llama-3-70b	vLLM	agg, disagg-single-node, disagg-multi-node	✅ All with deploy.yaml + perf.yaml
qwen3-32b-fp8	TensorRT-LLM	agg, disagg	✅ All with deploy.yaml + perf.yaml
gpt-oss-120b	TensorRT-LLM	agg, disagg*	✅ agg complete; disagg documented as incomplete
deepseek-r1	SGLang + TensorRT-LLM	sglang/disagg-8gpu, sglang/disagg-16gpu, trtllm/disagg/wide_ep/gb200	✅ 2 functional, 1 complete

Summary by CodeRabbit

Documentation
- Comprehensively rewrote core recipe documentation with structured guidance on prerequisites, deployment steps, customization, and troubleshooting
- Added documentation for GPT-OSS-120B disaggregated deployment mode
Chores
- Removed multiple configuration files across model recipes (deepseek, gemma3, llama4, qwen variants)
- Removed automated deployment orchestration script

Remove incomplete model directories and non-Kubernetes configurations to streamline the recipes directory for production Kubernetes deployments. Changes: - Remove 5 incomplete model directories (deepseek-r1-distill-llama-8b, gemma3, llama4, qwen2-vl-7b-instruct, qwen3) that lack proper Kubernetes deployment manifests - Delete run.sh script (non-Kubernetes automation tool) - Remove standalone engine config YAMLs from deepseek-r1/trtllm that were not wrapped in Kubernetes manifests - Document incomplete gpt-oss-120b disagg recipe with README explaining missing components README improvements: - Restructure Available Recipes table with 'Deployment' and 'Benchmark Recipe' columns to clarify that perf.yaml files are tools for users to run benchmarks, not published performance results - Add comprehensive quick start guide with prerequisites - Link to correct Kubernetes deployment guides - Add troubleshooting section - Remove extraneous links (docs.nvidia.com, license section) Result: 4 models with 10 complete deployment recipes (7 with benchmark scripts), focused exclusively on Kubernetes deployments. Signed-off-by: Ben Hamm <[email protected]>

Address feedback to make the README less AI-generated looking by removing decorative emojis from section headings while keeping status indicators (✅ ❌) in tables and content. Signed-off-by: Ben Hamm <[email protected]>

coderabbitai · 2025-11-06T17:25:06Z

Walkthrough

This PR restructures the recipes directory by rewriting the main README with comprehensive documentation, removing numerous model-specific TensorRT-LLM configuration files across multiple recipe variants (aggregation, disaggregation, speculative decoding), adding a disaggregated mode README for GPT-OSS-120B, and deleting the orchestration run.sh script.

Changes

Cohort / File(s)	Summary
Documentation Updates `recipes/README.md`, `recipes/gpt-oss-120b/trtllm/disagg/README.md`	Rewrote main README with comprehensive structure including Recipe Structure, Quick Start, Prerequisites, Deployment, Benchmarking, and Troubleshooting sections. Added new README for GPT-OSS-120B disaggregated mode with status and contribution guidelines.
DeepSeek-R1 Distill TensorRT-LLM Configs Removed `recipes/deepseek-r1-distill-llama-8b/trtllm/{agg,decode,prefill}.yaml`	Deleted three model configuration files containing parallelism, batch, token limits, KV cache, and CUDA graph settings.
DeepSeek-R1 Aggregation Configs Removed `recipes/deepseek-r1/trtllm/agg/{mtp/mtp_agg,simple/agg,wide_ep/dep16_agg,wide_ep/eplb,wide_ep/wide_ep_agg}.yaml`	Deleted five aggregation configuration files spanning MTP, simple, and WideEP modes with associated parallelism and memory tuning.
DeepSeek-R1 Disaggregation Configs Removed `recipes/deepseek-r1/trtllm/disagg/{mtp/mtp_decode,mtp/mtp_prefill,simple/decode,simple/prefill,wide_ep/eplb,wide_ep/wide_ep_decode,wide_ep/wide_ep_prefill}.yaml`	Deleted seven disaggregation configuration files for decode, prefill, and load balancing across MTP, simple, and WideEP variants.
Gemma3 VSWA Configs Removed `recipes/gemma3/trtllm/{vswa_agg,vswa_decode,vswa_prefill}.yaml`	Deleted three VSWA (Vector Selective Window Attention) configuration files with tensor parallelism and attention window settings.
Llama4 Eagle Configs Removed `recipes/llama4/trtllm/eagle/{eagle_agg.yml,eagle_decode,eagle_prefill}.yaml`	Deleted three Eagle speculative decoding configuration files with draft generation and cache settings.
Llama4 Multimodal Configs Removed `recipes/llama4/trtllm/multimodal/{agg,decode,prefill}.yaml`	Deleted three multimodal model configuration files with parallelism and chunked prefill settings.
Qwen2-VL Configs Removed `recipes/qwen2-vl-7b-instruct/trtllm/{agg,decode,encode,prefill}.yaml`	Deleted four model configuration files for aggregated and disaggregated inference modes.
Qwen3 Configs Removed `recipes/qwen3/trtllm/{agg,decode,prefill}.yaml`	Deleted three model configuration files for aggregated and disaggregated inference.
Orchestration Script Removed `recipes/run.sh`	Deleted Bash orchestration script handling model downloads, Kubernetes deployments, GAIE integration, and benchmark job management.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Areas requiring extra attention:

recipes/README.md: Verify new documentation structure is complete and accurate; confirm all deployment steps are correctly sequenced and prerequisites are comprehensive
Configuration file deletions: Confirm these removals do not break existing deployments; validate whether any configs should be retained for backward compatibility
recipes/run.sh removal: Verify orchestration workflow has been replaced by documented manual steps or an alternative deployment mechanism in the new README

Poem

🐰 hop hop ✨

Old configs took a leap away,
Fresh docs light the way,
Run script's role now clear,
Recipes simplified here! 🎉

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and concisely summarizes the main changes: cleanup of incomplete recipes and emphasis on Kubernetes-only focus, which aligns with the substantive file deletions and README updates in the changeset.
Description check	✅ Passed	The PR description comprehensively covers all required sections: Overview with clear objective, detailed Summary of Changes with specific file counts and categorization, and explicit documentation of what remains. However, it does not include a 'Where should the reviewer start?' section or explicit 'Related Issues' with action keywords as specified in the template.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 209783d and 4947aaf.

📒 Files selected for processing (34)

recipes/README.md (1 hunks)
recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml (0 hunks)
recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml (0 hunks)
recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml (0 hunks)
recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml (0 hunks)
recipes/deepseek-r1/trtllm/agg/simple/agg.yaml (0 hunks)
recipes/deepseek-r1/trtllm/agg/wide_ep/dep16_agg.yaml (0 hunks)
recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml (0 hunks)
recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/mtp/mtp_decode.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/mtp/mtp_prefill.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/simple/decode.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/simple/prefill.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml (0 hunks)
recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml (0 hunks)
recipes/gemma3/trtllm/vswa_agg.yaml (0 hunks)
recipes/gemma3/trtllm/vswa_decode.yaml (0 hunks)
recipes/gemma3/trtllm/vswa_prefill.yaml (0 hunks)
recipes/gpt-oss-120b/trtllm/disagg/README.md (1 hunks)
recipes/llama4/trtllm/eagle/eagle_agg.yml (0 hunks)
recipes/llama4/trtllm/eagle/eagle_decode.yaml (0 hunks)
recipes/llama4/trtllm/eagle/eagle_prefill.yaml (0 hunks)
recipes/llama4/trtllm/multimodal/agg.yaml (0 hunks)
recipes/llama4/trtllm/multimodal/decode.yaml (0 hunks)
recipes/llama4/trtllm/multimodal/prefill.yaml (0 hunks)
recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml (0 hunks)
recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml (0 hunks)
recipes/qwen2-vl-7b-instruct/trtllm/encode.yaml (0 hunks)
recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml (0 hunks)
recipes/qwen3/trtllm/agg.yaml (0 hunks)
recipes/qwen3/trtllm/decode.yaml (0 hunks)
recipes/qwen3/trtllm/prefill.yaml (0 hunks)
recipes/run.sh (0 hunks)

💤 Files with no reviewable changes (32)

recipes/qwen3/trtllm/decode.yaml
recipes/deepseek-r1/trtllm/agg/wide_ep/dep16_agg.yaml
recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml
recipes/llama4/trtllm/multimodal/agg.yaml
recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml
recipes/deepseek-r1/trtllm/disagg/mtp/mtp_decode.yaml
recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml
recipes/deepseek-r1/trtllm/disagg/simple/decode.yaml
recipes/deepseek-r1/trtllm/disagg/simple/prefill.yaml
recipes/run.sh
recipes/llama4/trtllm/eagle/eagle_prefill.yaml
recipes/llama4/trtllm/eagle/eagle_decode.yaml
recipes/gemma3/trtllm/vswa_prefill.yaml
recipes/deepseek-r1/trtllm/disagg/mtp/mtp_prefill.yaml
recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml
recipes/llama4/trtllm/multimodal/decode.yaml
recipes/gemma3/trtllm/vswa_decode.yaml
recipes/llama4/trtllm/eagle/eagle_agg.yml
recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml
recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml
recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml
recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml
recipes/llama4/trtllm/multimodal/prefill.yaml
recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml
recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml
recipes/qwen3/trtllm/agg.yaml
recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml
recipes/gemma3/trtllm/vswa_agg.yaml
recipes/deepseek-r1/trtllm/agg/simple/agg.yaml
recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml
recipes/qwen2-vl-7b-instruct/trtllm/encode.yaml
recipes/qwen3/trtllm/prefill.yaml

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

📚 Learning: 2025-10-24T04:21:08.751Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

recipes/gpt-oss-120b/trtllm/disagg/README.md
recipes/README.md

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4159/merge) by BenHamm.

recipes/README.md

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'.

[error] 1-1: Pre-commit hook 'trailing-whitespace' failed and modified the file; please re-run pre-commit to verify.

🪛 markdownlint-cli2 (0.18.1)

recipes/README.md

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

47-47: Emphasis used instead of a heading