-
Notifications
You must be signed in to change notification settings - Fork 674
docs: Clean up incomplete recipes and clarify Kubernetes-only focus #4159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Remove incomplete model directories and non-Kubernetes configurations to streamline the recipes directory for production Kubernetes deployments. Changes: - Remove 5 incomplete model directories (deepseek-r1-distill-llama-8b, gemma3, llama4, qwen2-vl-7b-instruct, qwen3) that lack proper Kubernetes deployment manifests - Delete run.sh script (non-Kubernetes automation tool) - Remove standalone engine config YAMLs from deepseek-r1/trtllm that were not wrapped in Kubernetes manifests - Document incomplete gpt-oss-120b disagg recipe with README explaining missing components README improvements: - Restructure Available Recipes table with 'Deployment' and 'Benchmark Recipe' columns to clarify that perf.yaml files are tools for users to run benchmarks, not published performance results - Add comprehensive quick start guide with prerequisites - Link to correct Kubernetes deployment guides - Add troubleshooting section - Remove extraneous links (docs.nvidia.com, license section) Result: 4 models with 10 complete deployment recipes (7 with benchmark scripts), focused exclusively on Kubernetes deployments. Signed-off-by: Ben Hamm <[email protected]>
Address feedback to make the README less AI-generated looking by removing decorative emojis from section headings while keeping status indicators (✅ ❌) in tables and content. Signed-off-by: Ben Hamm <[email protected]>
WalkthroughThis PR restructures the recipes directory by rewriting the main README with comprehensive documentation, removing numerous model-specific TensorRT-LLM configuration files across multiple recipe variants (aggregation, disaggregation, speculative decoding), adding a disaggregated mode README for GPT-OSS-120B, and deleting the orchestration run.sh script. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Areas requiring extra attention:
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
recipes/README.md(1 hunks)recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml(0 hunks)recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml(0 hunks)recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml(0 hunks)recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml(0 hunks)recipes/deepseek-r1/trtllm/agg/simple/agg.yaml(0 hunks)recipes/deepseek-r1/trtllm/agg/wide_ep/dep16_agg.yaml(0 hunks)recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml(0 hunks)recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/mtp/mtp_decode.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/mtp/mtp_prefill.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/simple/decode.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/simple/prefill.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml(0 hunks)recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml(0 hunks)recipes/gemma3/trtllm/vswa_agg.yaml(0 hunks)recipes/gemma3/trtllm/vswa_decode.yaml(0 hunks)recipes/gemma3/trtllm/vswa_prefill.yaml(0 hunks)recipes/gpt-oss-120b/trtllm/disagg/README.md(1 hunks)recipes/llama4/trtllm/eagle/eagle_agg.yml(0 hunks)recipes/llama4/trtllm/eagle/eagle_decode.yaml(0 hunks)recipes/llama4/trtllm/eagle/eagle_prefill.yaml(0 hunks)recipes/llama4/trtllm/multimodal/agg.yaml(0 hunks)recipes/llama4/trtllm/multimodal/decode.yaml(0 hunks)recipes/llama4/trtllm/multimodal/prefill.yaml(0 hunks)recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml(0 hunks)recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml(0 hunks)recipes/qwen2-vl-7b-instruct/trtllm/encode.yaml(0 hunks)recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml(0 hunks)recipes/qwen3/trtllm/agg.yaml(0 hunks)recipes/qwen3/trtllm/decode.yaml(0 hunks)recipes/qwen3/trtllm/prefill.yaml(0 hunks)recipes/run.sh(0 hunks)
💤 Files with no reviewable changes (32)
- recipes/qwen3/trtllm/decode.yaml
- recipes/deepseek-r1/trtllm/agg/wide_ep/dep16_agg.yaml
- recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml
- recipes/llama4/trtllm/multimodal/agg.yaml
- recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml
- recipes/deepseek-r1/trtllm/disagg/mtp/mtp_decode.yaml
- recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml
- recipes/deepseek-r1/trtllm/disagg/simple/decode.yaml
- recipes/deepseek-r1/trtllm/disagg/simple/prefill.yaml
- recipes/run.sh
- recipes/llama4/trtllm/eagle/eagle_prefill.yaml
- recipes/llama4/trtllm/eagle/eagle_decode.yaml
- recipes/gemma3/trtllm/vswa_prefill.yaml
- recipes/deepseek-r1/trtllm/disagg/mtp/mtp_prefill.yaml
- recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml
- recipes/llama4/trtllm/multimodal/decode.yaml
- recipes/gemma3/trtllm/vswa_decode.yaml
- recipes/llama4/trtllm/eagle/eagle_agg.yml
- recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml
- recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml
- recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml
- recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml
- recipes/llama4/trtllm/multimodal/prefill.yaml
- recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml
- recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml
- recipes/qwen3/trtllm/agg.yaml
- recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml
- recipes/gemma3/trtllm/vswa_agg.yaml
- recipes/deepseek-r1/trtllm/agg/simple/agg.yaml
- recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml
- recipes/qwen2-vl-7b-instruct/trtllm/encode.yaml
- recipes/qwen3/trtllm/prefill.yaml
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
Applied to files:
recipes/gpt-oss-120b/trtllm/disagg/README.mdrecipes/README.md
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4159/merge) by BenHamm.
recipes/README.md
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'.
[error] 1-1: Pre-commit hook 'trailing-whitespace' failed and modified the file; please re-run pre-commit to verify.
🪛 markdownlint-cli2 (0.18.1)
recipes/README.md
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
47-47: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
54-54: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
61-61: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
75-75: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
90-90: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
103-103: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
118-118: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
137-137: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: trtllm (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
recipes/gpt-oss-120b/trtllm/disagg/README.md (2)
1-25: Documentation structure and messaging are clear and appropriate.The file effectively communicates the incomplete status of this recipe:
- The
⚠️ warning banner prominently flags incompleteness- Sections are well-organized (Current Status → Missing Components → Alternative → Contributing)
- Relative links to aggregated mode and contribution guidelines are helpful
- Content aligns with the PR objective to document incomplete recipes
1-25: All referenced files and paths verified—no issues found.The README accurately references existing files and directories with correct relative paths. The warning indicator and incomplete status are appropriate, and the suggested alternative link to the aggregated mode is valid.
recipes/README.md (1)
1-270: Remove trailing whitespace detected by pre-commit hook.The pre-commit hook
trailing-whitespacedetected and fixed trailing whitespace in this file. Re-runpre-commit run --all-fileslocally to identify and remove any remaining trailing whitespace before pushing.⛔ Skipped due to learnings
Learnt from: biswapanda Repo: ai-dynamo/dynamo PR: 3858 File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32 Timestamp: 2025-10-24T04:21:08.751Z Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.Learnt from: biswapanda Repo: ai-dynamo/dynamo PR: 2872 File: examples/multimodal/deploy/agg_qwen.yaml:53-60 Timestamp: 2025-09-04T19:03:06.643Z Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment about hyperlinks but otherwise LGTM
|
✅ Addressed feedback:
|
|
/ok to test e0592e6 |
🎯 Objective
Clean up the
recipes/directory to focus exclusively on production-ready Kubernetes deployments. Remove incomplete configurations and clarify that benchmark recipes are tools for users, not published performance results.Related PR to release/0.6.1: #4145
📊 Summary of Changes
34 files changed: +210 insertions, -1,587 deletions
Deleted Content
5 incomplete model directories (no K8s
deploy.yamlmanifests):deepseek-r1-distill-llama-8b/gemma3/llama4/qwen2-vl-7b-instruct/qwen3/(keptqwen3-32b-fp8/which is complete)run.shscript (228 lines) - non-K8s automation tool12 standalone engine config YAMLs from
deepseek-r1/trtllm/:agg/simple/,agg/mtp/,agg/wide_ep/(all subdirs)disagg/simple/,disagg/mtp/disagg/wide_ep/*.yaml(standalone configs only)disagg/wide_ep/gb200/(has complete K8s manifests)Added/Modified Content
Comprehensive new
recipes/README.mdwith:gpt-oss-120b/trtllm/disagg/README.md- Documents incomplete recipe status📦 What Remains
4 Models | 10 Complete Deployments | 7 with Benchmark Recipes
Summary by CodeRabbit
Documentation
Chores