[BugFix] fix paddle optional get assert in sm103#7816
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-14 17:12:09
📋 Review 摘要
PR 概述:为 SM103 架构(Blackwell 变体)添加正确的 gencode flags 及 -DNDEBUG 编译参数,修复 paddle optional 断言错误
变更范围:custom_ops/setup_ops.py
影响面 Tag:[OP]
📝 PR 规范检查
Usage or Command 和 Accuracy Tests 两个段落仅含 HTML 注释占位符(<!-- ... -->),按规范应填写实际内容或 N/A。
标题建议(可直接复制):
[BugFix] fix paddle optional get assert in sm103(现有标题格式合规,无需修改)
PR 描述建议(可直接复制):
## Motivation
Fix paddle optional get assert in sm103, as following:
`paddle::optional<T>::get() const [with T = paddle::Tensor]: Assertion 'this->is_initialized()' failed` (in `paddle/utils/optional.h:563`)
## Modifications
在 `custom_ops/setup_ops.py` 中为 SM103 架构添加以下修改:
1. 新增 `get_gencode_flags()` 中 `cc_val == 103` 分支,使用 `arch=compute_103a,code=sm_103a` gencode flags
2. 新增 `has_sm103 = 103 in sm_versions and nvcc_version >= 13.0` 检测变量(需 NVCC ≥ 13.0)
3. 更新 `has_generic_fp8` 排除 SM103,避免走通用 FP8 路径
4. 将 SM103 加入 `-O3 -DNDEBUG` 编译参数块(修复 paddle optional assert 根因)
5. 将 SM103 归入 SM100 Blackwell 配置块
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | PR 描述 | Usage or Command 和 Accuracy Tests 段落仅含 HTML 注释,应填 N/A |
总体评价
SM103 的编译支持添加逻辑清晰,与现有 SM90/SM100 处理模式保持一致,-DNDEBUG 修复路径符合项目惯例。仅 PR 描述两个空段落需补 N/A,无阻塞性技术问题。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览⏳ Required 任务进行中:4/10 通过,6 个运行中,请等待结果。2 个可选任务失败(不阻塞合并)。
2 任务状态汇总2.1 Required任务 : 4/10 通过
2.2 可选任务 — 23/27 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7816 +/- ##
==========================================
Coverage ? 63.13%
==========================================
Files ? 461
Lines ? 64083
Branches ? 9806
==========================================
Hits ? 40457
Misses ? 20852
Partials ? 2774
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
问题原因:
gqa_rope_write_cache会被FA4调用,其实现中对cache_k_dequant_scales等paddle::optional<paddle::Tensor>类型,直接使用了cache_k_dequant_scales.get()而没有检查是否为null;此问题在此前测试中已经暴露,并针对这个kernel在2.5分支中做了修复 https://github.com/PaddlePaddle/FastDeploy/pull/7311/changes ,但没有同步到dev/2.6分支修复考虑:
类似问题1,可能还有较多其他kernel存在对
paddle::optional<paddle::Tensor>类型的不规范调用,潜在修改影响面较大,需要进一步筛查并修改;先为sm103完善编译选项适配,同样能解决问题Modifications
为sm103 增加编译适配
"-DNDEBUG"Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.