Skip to content

[BugFix] fix paddle optional get assert in sm103#7816

Merged
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
zoooo0820:fix_sm103_fa4
May 14, 2026
Merged

[BugFix] fix paddle optional get assert in sm103#7816
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
zoooo0820:fix_sm103_fa4

Conversation

@zoooo0820
Copy link
Copy Markdown
Collaborator

@zoooo0820 zoooo0820 commented May 14, 2026

Motivation

  • 修复sm103下开启FA4出现的paddle optional 相关的assert报错
/usr/local/lib/python3.12/site-packages/paddle/include/paddle/utils/optional.h:563: paddle::optional<T>::reference_const_type paddle::optional<T>::get() const [with T = paddle::Tensor; reference_const_type = const paddle::Tensor&]: Assertion `this->is_initialized()' failed.

问题原因:

  1. gqa_rope_write_cache 会被FA4调用,其实现中对cache_k_dequant_scalespaddle::optional<paddle::Tensor> 类型,直接使用了cache_k_dequant_scales.get()而没有检查是否为null;此问题在此前测试中已经暴露,并针对这个kernel在2.5分支中做了修复 https://github.com/PaddlePaddle/FastDeploy/pull/7311/changes ,但没有同步到dev/2.6分支
  2. sm103才出现上述问题的原因:针对sm90/100的编译中有"-DNDEBUG"选项,但未对sm103添加,该选项会移除标准 assert,所以在其他设备上未遇到此问题。
  3. 在同时存在多个编译架构时,如[90,100,103],由于90/100已经添加上述编译选项,在sm103上运行也不存在问题;但单独执行 build.sh 或 单独指定架构 [103]会触发上述Bug

修复考虑:

类似问题1,可能还有较多其他kernel存在对paddle::optional<paddle::Tensor> 类型的不规范调用,潜在修改影响面较大,需要进一步筛查并修改;先为sm103完善编译选项适配,同样能解决问题

Modifications

为sm103 增加编译适配 "-DNDEBUG"

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 14, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-14 17:12:09

📋 Review 摘要

PR 概述:为 SM103 架构(Blackwell 变体)添加正确的 gencode flags 及 -DNDEBUG 编译参数,修复 paddle optional 断言错误
变更范围custom_ops/setup_ops.py
影响面 Tag[OP]

📝 PR 规范检查

Usage or CommandAccuracy Tests 两个段落仅含 HTML 注释占位符(<!-- ... -->),按规范应填写实际内容或 N/A

标题建议(可直接复制):

  • [BugFix] fix paddle optional get assert in sm103(现有标题格式合规,无需修改)

PR 描述建议(可直接复制):

## Motivation
Fix paddle optional get assert in sm103, as following:
`paddle::optional<T>::get() const [with T = paddle::Tensor]: Assertion 'this->is_initialized()' failed` (in `paddle/utils/optional.h:563`)

## Modifications`custom_ops/setup_ops.py` 中为 SM103 架构添加以下修改:
1. 新增 `get_gencode_flags()``cc_val == 103` 分支,使用 `arch=compute_103a,code=sm_103a` gencode flags
2. 新增 `has_sm103 = 103 in sm_versions and nvcc_version >= 13.0` 检测变量(需 NVCC ≥ 13.0)
3. 更新 `has_generic_fp8` 排除 SM103,避免走通用 FP8 路径
4. 将 SM103 加入 `-O3 -DNDEBUG` 编译参数块(修复 paddle optional assert 根因)
5. 将 SM103 归入 SM100 Blackwell 配置块

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
📝 PR 规范 PR 描述 Usage or CommandAccuracy Tests 段落仅含 HTML 注释,应填 N/A

总体评价

SM103 的编译支持添加逻辑清晰,与现有 SM90/SM100 处理模式保持一致,-DNDEBUG 修复路径符合项目惯例。仅 PR 描述两个空段落需补 N/A,无阻塞性技术问题。

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-14 17:16:42

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

⏳ Required 任务进行中:4/10 通过,6 个运行中,请等待结果。2 个可选任务失败(不阻塞合并)。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
37(0) 37 27 2 7 1 0

2 任务状态汇总

2.1 Required任务 : 4/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
xpu_4cards_case_test / run_xpu_4cards_cases - 运行中 - Job -
xpu_8cards_case_test / run_xpu_8cards_cases - 运行中 - Job -
Extracted partial CE model tasks to run in CI. / run_ce_cases - 运行中 - Job -
Run Base Tests / base_tests - 运行中 - Job -
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage - 运行中 - Job -
Run Four Cards Tests / run_4_cards_tests - 运行中 - Job -
其余 4 个必选任务通过 - - - - -

2.2 可选任务 — 23/27 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Check PR Template 10s Job -
Trigger Jenkins for PR 16m45s Job -
Run iluvatar Tests / run_iluvatar_cases - Job -
⏸️ CI_HPU - - -
其余 23 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@c2df4c6). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7816   +/-   ##
==========================================
  Coverage           ?   63.13%           
==========================================
  Files              ?      461           
  Lines              ?    64083           
  Branches           ?     9806           
==========================================
  Hits               ?    40457           
  Misses             ?    20852           
  Partials           ?     2774           
Flag Coverage Δ
GPU 72.25% <ø> (?)
XPU 7.14% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit be8a72a into PaddlePaddle:develop May 14, 2026
39 of 42 checks passed
@zoooo0820 zoooo0820 deleted the fix_sm103_fa4 branch May 14, 2026 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants