feat: add PDL discount factor for DeepSeek model on GB200#616
Draft
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
Draft
feat: add PDL discount factor for DeepSeek model on GB200#616liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
liyuanzhe1991 wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
Add sm_version parameter to get_model() and pass it from database system_spec through pareto_analysis and inference_session call sites. DeepSeekModel and TrtllmWideEPDeepSeekModel now both use a unified conditional PDL factor: 0.9 for SM>=100 (Blackwell/GB200), 1.0 otherwise. This applies the PDL discount to all generation-phase ops in the cutlass MoE path, matching the existing WideEP behavior. Signed-off-by: Yuanzhe Li <yuanli@nvidia.com> Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview:
Previously, only
TrtllmWideEPDeepSeekModelhad a PDL (Programmatic Dependent Launch) discount factor (hardcoded 0.9), whileDeepSeekModel(cutlass MoE backend) had none. This meant GB200 non-WideEP DeepSeek configurations did not benefit from the PDL latency reduction in generation-phase ops.This PR adds the same PDL discount to
DeepSeekModeland unifies both models to use a conditional factor based on SM version, making the behavior consistent and architecture-aware.Details:
models.py:get_model()gains an optionalsm_version: int = 0parameter, propagated from the caller's database.DeepSeekModel.__init__gainssm_version=0keyword arg; setsself._pdl_factor = 0.9 if sm_version >= 100 else 1.0. All generation-phase ops now multiplyself._num_layers * self._mtp_scale_factor * self._pdl_factor(embedding, logits_gemm, and p2p are excluded, matching WideEP convention).TrtllmWideEPDeepSeekModel.__init__updated from hardcodedself._pdl_factor = 0.9to the same conditional pattern0.9 if sm_version >= 100 else 1.0, withsm_versionpassed fromget_model().get_model()passessm_version=sm_versionwhen constructing bothDeepSeekModelandTrtllmWideEPDeepSeekModel.pareto_analysis.py:agg_pareto()call toget_model()now passessm_version=database.system_spec["gpu"]["sm_version"].inference_session.py:models.get_model()call sites (_get_disagg_summary_dfprefill/decode,get_worker_candidates) now passsm_versionextracted from the corresponding database'ssystem_spec.test_inference_session.py:_fake_get_modelsignature updated to acceptsm_version=0for compatibility.Where should the reviewer start?
src/aiconfigurator/sdk/models.py— the core change:DeepSeekModel.__init__(line ~1079) for the newsm_versionparameter and_pdl_factorconditional, and the generation ops block (lines ~1286–1440) whereself._pdl_factoris now applied. AlsoTrtllmWideEPDeepSeekModel.__init__(line ~1465) for the unified conditional.src/aiconfigurator/sdk/models.pyget_model()(line ~152) for the new parameter and factory wiring.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)