Skip to content

feat: add PDL discount factor for DeepSeek model on GB200#616

Draft
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
liyuanzhe1991:feat/cutlass-moe-pdl-factor
Draft

feat: add PDL discount factor for DeepSeek model on GB200#616
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
liyuanzhe1991:feat/cutlass-moe-pdl-factor

Conversation

@liyuanzhe1991
Copy link
Contributor

Overview:

Previously, only TrtllmWideEPDeepSeekModel had a PDL (Programmatic Dependent Launch) discount factor (hardcoded 0.9), while DeepSeekModel (cutlass MoE backend) had none. This meant GB200 non-WideEP DeepSeek configurations did not benefit from the PDL latency reduction in generation-phase ops.
This PR adds the same PDL discount to DeepSeekModel and unifies both models to use a conditional factor based on SM version, making the behavior consistent and architecture-aware.

Details:

  • models.py:
    • get_model() gains an optional sm_version: int = 0 parameter, propagated from the caller's database.
    • DeepSeekModel.__init__ gains sm_version=0 keyword arg; sets self._pdl_factor = 0.9 if sm_version >= 100 else 1.0. All generation-phase ops now multiply self._num_layers * self._mtp_scale_factor * self._pdl_factor (embedding, logits_gemm, and p2p are excluded, matching WideEP convention).
    • TrtllmWideEPDeepSeekModel.__init__ updated from hardcoded self._pdl_factor = 0.9 to the same conditional pattern 0.9 if sm_version >= 100 else 1.0, with sm_version passed from get_model().
    • Factory function get_model() passes sm_version=sm_version when constructing both DeepSeekModel and TrtllmWideEPDeepSeekModel.
  • pareto_analysis.py:
    • agg_pareto() call to get_model() now passes sm_version=database.system_spec["gpu"]["sm_version"].
  • inference_session.py:
    • All three models.get_model() call sites (_get_disagg_summary_df prefill/decode, get_worker_candidates) now pass sm_version extracted from the corresponding database's system_spec.
  • test_inference_session.py:
    • Mock _fake_get_model signature updated to accept sm_version=0 for compatibility.

Where should the reviewer start?

  • src/aiconfigurator/sdk/models.py — the core change: DeepSeekModel.__init__ (line ~1079) for the new sm_version parameter and _pdl_factor conditional, and the generation ops block (lines ~1286–1440) where self._pdl_factor is now applied. Also TrtllmWideEPDeepSeekModel.__init__ (line ~1465) for the unified conditional.
  • src/aiconfigurator/sdk/models.py get_model() (line ~152) for the new parameter and factory wiring.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Add sm_version parameter to get_model() and pass it from database
system_spec through pareto_analysis and inference_session call sites.

DeepSeekModel and TrtllmWideEPDeepSeekModel now both use a unified
conditional PDL factor: 0.9 for SM>=100 (Blackwell/GB200), 1.0
otherwise. This applies the PDL discount to all generation-phase ops
in the cutlass MoE path, matching the existing WideEP behavior.

Signed-off-by: Yuanzhe Li <yuanli@nvidia.com>
Made-with: Cursor
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant