Skip to content

Conversation

@BenHamm
Copy link
Contributor

@BenHamm BenHamm commented Nov 6, 2025

🎯 Objective

Clean up the recipes/ directory to focus exclusively on production-ready Kubernetes deployments. Remove incomplete configurations and clarify that benchmark recipes are tools for users, not published performance results.

Related PR to release/0.6.1: #4145


📊 Summary of Changes

34 files changed: +210 insertions, -1,587 deletions

Deleted Content

  • 5 incomplete model directories (no K8s deploy.yaml manifests):

    • deepseek-r1-distill-llama-8b/
    • gemma3/
    • llama4/
    • qwen2-vl-7b-instruct/
    • qwen3/ (kept qwen3-32b-fp8/ which is complete)
  • run.sh script (228 lines) - non-K8s automation tool

  • 12 standalone engine config YAMLs from deepseek-r1/trtllm/:

    • agg/simple/, agg/mtp/, agg/wide_ep/ (all subdirs)
    • disagg/simple/, disagg/mtp/
    • disagg/wide_ep/*.yaml (standalone configs only)
    • Kept: disagg/wide_ep/gb200/ (has complete K8s manifests)

Added/Modified Content

  • Comprehensive new recipes/README.md with:

    • Complete recipe table showing Deployment and Benchmark Recipe availability
    • Prerequisites section linking to correct K8s installation guides
    • Step-by-step deployment instructions
    • Troubleshooting section
  • gpt-oss-120b/trtllm/disagg/README.md - Documents incomplete recipe status


📦 What Remains

4 Models | 10 Complete Deployments | 7 with Benchmark Recipes

Model Framework Recipes Status
llama-3-70b vLLM agg, disagg-single-node, disagg-multi-node ✅ All with deploy.yaml + perf.yaml
qwen3-32b-fp8 TensorRT-LLM agg, disagg ✅ All with deploy.yaml + perf.yaml
gpt-oss-120b TensorRT-LLM agg, disagg* ✅ agg complete; disagg documented as incomplete
deepseek-r1 SGLang + TensorRT-LLM sglang/disagg-8gpu, sglang/disagg-16gpu, trtllm/disagg/wide_ep/gb200 ✅ 2 functional, 1 complete

Summary by CodeRabbit

  • Documentation

    • Comprehensively rewrote core recipe documentation with structured guidance on prerequisites, deployment steps, customization, and troubleshooting
    • Added documentation for GPT-OSS-120B disaggregated deployment mode
  • Chores

    • Removed multiple configuration files across model recipes (deepseek, gemma3, llama4, qwen variants)
    • Removed automated deployment orchestration script

Remove incomplete model directories and non-Kubernetes configurations to
streamline the recipes directory for production Kubernetes deployments.

Changes:
- Remove 5 incomplete model directories (deepseek-r1-distill-llama-8b,
  gemma3, llama4, qwen2-vl-7b-instruct, qwen3) that lack proper
  Kubernetes deployment manifests
- Delete run.sh script (non-Kubernetes automation tool)
- Remove standalone engine config YAMLs from deepseek-r1/trtllm that
  were not wrapped in Kubernetes manifests
- Document incomplete gpt-oss-120b disagg recipe with README explaining
  missing components

README improvements:
- Restructure Available Recipes table with 'Deployment' and 'Benchmark
  Recipe' columns to clarify that perf.yaml files are tools for users
  to run benchmarks, not published performance results
- Add comprehensive quick start guide with prerequisites
- Link to correct Kubernetes deployment guides
- Add troubleshooting section
- Remove extraneous links (docs.nvidia.com, license section)

Result: 4 models with 10 complete deployment recipes (7 with benchmark
scripts), focused exclusively on Kubernetes deployments.

Signed-off-by: Ben Hamm <[email protected]>
Address feedback to make the README less AI-generated looking by removing
decorative emojis from section headings while keeping status indicators
(✅ ❌) in tables and content.

Signed-off-by: Ben Hamm <[email protected]>
@BenHamm BenHamm requested review from a team as code owners November 6, 2025 17:21
@github-actions github-actions bot added the docs label Nov 6, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 6, 2025

Walkthrough

This PR restructures the recipes directory by rewriting the main README with comprehensive documentation, removing numerous model-specific TensorRT-LLM configuration files across multiple recipe variants (aggregation, disaggregation, speculative decoding), adding a disaggregated mode README for GPT-OSS-120B, and deleting the orchestration run.sh script.

Changes

Cohort / File(s) Summary
Documentation Updates
recipes/README.md, recipes/gpt-oss-120b/trtllm/disagg/README.md
Rewrote main README with comprehensive structure including Recipe Structure, Quick Start, Prerequisites, Deployment, Benchmarking, and Troubleshooting sections. Added new README for GPT-OSS-120B disaggregated mode with status and contribution guidelines.
DeepSeek-R1 Distill TensorRT-LLM Configs Removed
recipes/deepseek-r1-distill-llama-8b/trtllm/{agg,decode,prefill}.yaml
Deleted three model configuration files containing parallelism, batch, token limits, KV cache, and CUDA graph settings.
DeepSeek-R1 Aggregation Configs Removed
recipes/deepseek-r1/trtllm/agg/{mtp/mtp_agg,simple/agg,wide_ep/dep16_agg,wide_ep/eplb,wide_ep/wide_ep_agg}.yaml
Deleted five aggregation configuration files spanning MTP, simple, and WideEP modes with associated parallelism and memory tuning.
DeepSeek-R1 Disaggregation Configs Removed
recipes/deepseek-r1/trtllm/disagg/{mtp/mtp_decode,mtp/mtp_prefill,simple/decode,simple/prefill,wide_ep/eplb,wide_ep/wide_ep_decode,wide_ep/wide_ep_prefill}.yaml
Deleted seven disaggregation configuration files for decode, prefill, and load balancing across MTP, simple, and WideEP variants.
Gemma3 VSWA Configs Removed
recipes/gemma3/trtllm/{vswa_agg,vswa_decode,vswa_prefill}.yaml
Deleted three VSWA (Vector Selective Window Attention) configuration files with tensor parallelism and attention window settings.
Llama4 Eagle Configs Removed
recipes/llama4/trtllm/eagle/{eagle_agg.yml,eagle_decode,eagle_prefill}.yaml
Deleted three Eagle speculative decoding configuration files with draft generation and cache settings.
Llama4 Multimodal Configs Removed
recipes/llama4/trtllm/multimodal/{agg,decode,prefill}.yaml
Deleted three multimodal model configuration files with parallelism and chunked prefill settings.
Qwen2-VL Configs Removed
recipes/qwen2-vl-7b-instruct/trtllm/{agg,decode,encode,prefill}.yaml
Deleted four model configuration files for aggregated and disaggregated inference modes.
Qwen3 Configs Removed
recipes/qwen3/trtllm/{agg,decode,prefill}.yaml
Deleted three model configuration files for aggregated and disaggregated inference.
Orchestration Script Removed
recipes/run.sh
Deleted Bash orchestration script handling model downloads, Kubernetes deployments, GAIE integration, and benchmark job management.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Areas requiring extra attention:

  • recipes/README.md: Verify new documentation structure is complete and accurate; confirm all deployment steps are correctly sequenced and prerequisites are comprehensive
  • Configuration file deletions: Confirm these removals do not break existing deployments; validate whether any configs should be retained for backward compatibility
  • recipes/run.sh removal: Verify orchestration workflow has been replaced by documented manual steps or an alternative deployment mechanism in the new README

Poem

🐰 hop hop

Old configs took a leap away,
Fresh docs light the way,
Run script's role now clear,
Recipes simplified here! 🎉

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely summarizes the main changes: cleanup of incomplete recipes and emphasis on Kubernetes-only focus, which aligns with the substantive file deletions and README updates in the changeset.
Description check ✅ Passed The PR description comprehensively covers all required sections: Overview with clear objective, detailed Summary of Changes with specific file counts and categorization, and explicit documentation of what remains. However, it does not include a 'Where should the reviewer start?' section or explicit 'Related Issues' with action keywords as specified in the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 209783d and 4947aaf.

📒 Files selected for processing (34)
  • recipes/README.md (1 hunks)
  • recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml (0 hunks)
  • recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml (0 hunks)
  • recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/agg/simple/agg.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/agg/wide_ep/dep16_agg.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/mtp/mtp_decode.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/mtp/mtp_prefill.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/simple/decode.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/simple/prefill.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml (0 hunks)
  • recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml (0 hunks)
  • recipes/gemma3/trtllm/vswa_agg.yaml (0 hunks)
  • recipes/gemma3/trtllm/vswa_decode.yaml (0 hunks)
  • recipes/gemma3/trtllm/vswa_prefill.yaml (0 hunks)
  • recipes/gpt-oss-120b/trtllm/disagg/README.md (1 hunks)
  • recipes/llama4/trtllm/eagle/eagle_agg.yml (0 hunks)
  • recipes/llama4/trtllm/eagle/eagle_decode.yaml (0 hunks)
  • recipes/llama4/trtllm/eagle/eagle_prefill.yaml (0 hunks)
  • recipes/llama4/trtllm/multimodal/agg.yaml (0 hunks)
  • recipes/llama4/trtllm/multimodal/decode.yaml (0 hunks)
  • recipes/llama4/trtllm/multimodal/prefill.yaml (0 hunks)
  • recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml (0 hunks)
  • recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml (0 hunks)
  • recipes/qwen2-vl-7b-instruct/trtllm/encode.yaml (0 hunks)
  • recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml (0 hunks)
  • recipes/qwen3/trtllm/agg.yaml (0 hunks)
  • recipes/qwen3/trtllm/decode.yaml (0 hunks)
  • recipes/qwen3/trtllm/prefill.yaml (0 hunks)
  • recipes/run.sh (0 hunks)
💤 Files with no reviewable changes (32)
  • recipes/qwen3/trtllm/decode.yaml
  • recipes/deepseek-r1/trtllm/agg/wide_ep/dep16_agg.yaml
  • recipes/deepseek-r1/trtllm/disagg/wide_ep/eplb.yaml
  • recipes/llama4/trtllm/multimodal/agg.yaml
  • recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml
  • recipes/deepseek-r1/trtllm/disagg/mtp/mtp_decode.yaml
  • recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml
  • recipes/deepseek-r1/trtllm/disagg/simple/decode.yaml
  • recipes/deepseek-r1/trtllm/disagg/simple/prefill.yaml
  • recipes/run.sh
  • recipes/llama4/trtllm/eagle/eagle_prefill.yaml
  • recipes/llama4/trtllm/eagle/eagle_decode.yaml
  • recipes/gemma3/trtllm/vswa_prefill.yaml
  • recipes/deepseek-r1/trtllm/disagg/mtp/mtp_prefill.yaml
  • recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml
  • recipes/llama4/trtllm/multimodal/decode.yaml
  • recipes/gemma3/trtllm/vswa_decode.yaml
  • recipes/llama4/trtllm/eagle/eagle_agg.yml
  • recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml
  • recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml
  • recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml
  • recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml
  • recipes/llama4/trtllm/multimodal/prefill.yaml
  • recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml
  • recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml
  • recipes/qwen3/trtllm/agg.yaml
  • recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml
  • recipes/gemma3/trtllm/vswa_agg.yaml
  • recipes/deepseek-r1/trtllm/agg/simple/agg.yaml
  • recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml
  • recipes/qwen2-vl-7b-instruct/trtllm/encode.yaml
  • recipes/qwen3/trtllm/prefill.yaml
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

  • recipes/gpt-oss-120b/trtllm/disagg/README.md
  • recipes/README.md
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4159/merge) by BenHamm.
recipes/README.md

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'.


[error] 1-1: Pre-commit hook 'trailing-whitespace' failed and modified the file; please re-run pre-commit to verify.

🪛 markdownlint-cli2 (0.18.1)
recipes/README.md

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


47-47: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


54-54: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


61-61: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


75-75: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


90-90: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


103-103: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


118-118: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


137-137: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
recipes/gpt-oss-120b/trtllm/disagg/README.md (2)

1-25: Documentation structure and messaging are clear and appropriate.

The file effectively communicates the incomplete status of this recipe:

  • The ⚠️ warning banner prominently flags incompleteness
  • Sections are well-organized (Current Status → Missing Components → Alternative → Contributing)
  • Relative links to aggregated mode and contribution guidelines are helpful
  • Content aligns with the PR objective to document incomplete recipes

1-25: All referenced files and paths verified—no issues found.

The README accurately references existing files and directories with correct relative paths. The warning indicator and incomplete status are appropriate, and the suggested alternative link to the aggregated mode is valid.

recipes/README.md (1)

1-270: Remove trailing whitespace detected by pre-commit hook.

The pre-commit hook trailing-whitespace detected and fixed trailing whitespace in this file. Re-run pre-commit run --all-files locally to identify and remove any remaining trailing whitespace before pushing.

⛔ Skipped due to learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

@BenHamm BenHamm enabled auto-merge (squash) November 6, 2025 18:08
Copy link
Contributor

@athreesh athreesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about hyperlinks but otherwise LGTM

@BenHamm
Copy link
Contributor Author

BenHamm commented Nov 6, 2025

✅ Addressed feedback:

  • Added hyperlinks to the Recipes table (all model names now link to their recipe directories)
  • Fixed trailing whitespace issue
  • Ready for review

@dagil-nvidia
Copy link
Contributor

/ok to test e0592e6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants