[XPU] Fix precision for paddle.Tensor.bmm by YqGe585 · Pull Request #78951 · PaddlePaddle/Paddle

YqGe585 · 2026-05-11T12:07:10Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

On XRE5 hardware, the XPU bmm kernel defaults to FC_TF32 (tfloat32 with only 10 mantissa bits) for float32 inputs, while GPU uses CUBLAS_COMPUTE_32F (full fp32 with 23 mantissa bits). This causes precision discrepancies that scale with matrix dimensions — the original failing case had max_abs_diff=0.000183254 and max_rel_diff=0.394162.

Fix

Override FCCalcType from FC_TF32 to FC_FLOAT when the input type is float32, in the bmm forward kernel, backward kernel, and batched FC utility path. This ensures full fp32 accumulation matching GPU behavior.

Modified files

paddle/phi/kernels/xpu/bmm_kernel.cc — Forward kernel override
paddle/phi/kernels/xpu/bmm_grad_kernel.cc — Backward kernel override
paddle/phi/kernels/xpu/bmm_xpu_utils.h — Batched FC utility override

Verification

All 19 test cases from all_config.txt now pass with max_abs_diff in range 1.04e-07 to 2.98e-07 (well within atol=1e-4 tolerance).

Does this PR introduce a precision change?

Yes — XPU precision corrected to align with GPU (TF32 accumulation → full fp32 accumulation for bmm float32).

… for float32 accumulation GPU uses CUBLAS_COMPUTE_32F (full fp32) for float32 bmm, while XPU defaults to FC_TF32 (tfloat32 with only 10 mantissa bits), causing precision discrepancies that scale with matrix dimensions. Override FC_TF32 to FC_FLOAT for float32 bmm in forward, backward, and batched FC paths to match GPU precision. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

paddle-bot · 2026-05-11T12:07:16Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Fix precision for paddle.Tensor.bmm#78951

[XPU] Fix precision for paddle.Tensor.bmm#78951
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-188-xpu-precision

YqGe585 commented May 11, 2026

Uh oh!

paddle-bot Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YqGe585 commented May 11, 2026

PR Category

PR Types

Description

Fix

Modified files

Verification

Does this PR introduce a precision change?

Uh oh!

paddle-bot Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant