feat: add PDL discount factor for DeepSeek model on GB200 by liyuanzhe1991 · Pull Request #616 · ai-dynamo/aiconfigurator

liyuanzhe1991 · 2026-03-18T16:51:40Z

Overview:

Previously, only TrtllmWideEPDeepSeekModel had a PDL (Programmatic Dependent Launch) discount factor (hardcoded 0.9), while DeepSeekModel (cutlass MoE backend) had none. This meant GB200 non-WideEP DeepSeek configurations did not benefit from the PDL latency reduction in generation-phase ops.
This PR adds the same PDL discount to DeepSeekModel and unifies both models to use a conditional factor based on SM version, making the behavior consistent and architecture-aware.

Details:

models.py:
- get_model() gains an optional sm_version: int = 0 parameter, propagated from the caller's database.
- DeepSeekModel.__init__ gains sm_version=0 keyword arg; sets self._pdl_factor = 0.9 if sm_version >= 100 else 1.0. All generation-phase ops now multiply self._num_layers * self._mtp_scale_factor * self._pdl_factor (embedding, logits_gemm, and p2p are excluded, matching WideEP convention).
- TrtllmWideEPDeepSeekModel.__init__ updated from hardcoded self._pdl_factor = 0.9 to the same conditional pattern 0.9 if sm_version >= 100 else 1.0, with sm_version passed from get_model().
- Factory function get_model() passes sm_version=sm_version when constructing both DeepSeekModel and TrtllmWideEPDeepSeekModel.
pareto_analysis.py:
- agg_pareto() call to get_model() now passes sm_version=database.system_spec["gpu"]["sm_version"].
inference_session.py:
- All three models.get_model() call sites (_get_disagg_summary_df prefill/decode, get_worker_candidates) now pass sm_version extracted from the corresponding database's system_spec.
test_inference_session.py:
- Mock _fake_get_model signature updated to accept sm_version=0 for compatibility.

Where should the reviewer start?

src/aiconfigurator/sdk/models.py — the core change: DeepSeekModel.__init__ (line ~1079) for the new sm_version parameter and _pdl_factor conditional, and the generation ops block (lines ~1286–1440) where self._pdl_factor is now applied. Also TrtllmWideEPDeepSeekModel.__init__ (line ~1465) for the unified conditional.
src/aiconfigurator/sdk/models.py get_model() (line ~152) for the new parameter and factory wiring.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to fix: deepseek moe overlap & PDL discount #564 (deepseek moe overlap & PDL discount)

Add sm_version parameter to get_model() and pass it from database system_spec through pareto_analysis and inference_session call sites. DeepSeekModel and TrtllmWideEPDeepSeekModel now both use a unified conditional PDL factor: 0.9 for SM>=100 (Blackwell/GB200), 1.0 otherwise. This applies the PDL discount to all generation-phase ops in the cutlass MoE path, matching the existing WideEP behavior. Signed-off-by: Yuanzhe Li <yuanli@nvidia.com> Made-with: Cursor

copy-pr-bot · 2026-03-18T16:51:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

liyuanzhe1991 added the feat label Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add PDL discount factor for DeepSeek model on GB200#616

feat: add PDL discount factor for DeepSeek model on GB200#616
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
liyuanzhe1991:feat/cutlass-moe-pdl-factor

liyuanzhe1991 commented Mar 18, 2026

Uh oh!

copy-pr-bot bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liyuanzhe1991 commented Mar 18, 2026

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

copy-pr-bot bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant