Skip to content

[XPU] Fix precision for paddle.Tensor.bmm#78951

Open
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-188-xpu-precision
Open

[XPU] Fix precision for paddle.Tensor.bmm#78951
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-188-xpu-precision

Conversation

@YqGe585
Copy link
Copy Markdown
Member

@YqGe585 YqGe585 commented May 11, 2026

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

On XRE5 hardware, the XPU bmm kernel defaults to FC_TF32 (tfloat32 with only 10 mantissa bits) for float32 inputs, while GPU uses CUBLAS_COMPUTE_32F (full fp32 with 23 mantissa bits). This causes precision discrepancies that scale with matrix dimensions — the original failing case had max_abs_diff=0.000183254 and max_rel_diff=0.394162.

Fix

Override FCCalcType from FC_TF32 to FC_FLOAT when the input type is float32, in the bmm forward kernel, backward kernel, and batched FC utility path. This ensures full fp32 accumulation matching GPU behavior.

Modified files

  • paddle/phi/kernels/xpu/bmm_kernel.cc — Forward kernel override
  • paddle/phi/kernels/xpu/bmm_grad_kernel.cc — Backward kernel override
  • paddle/phi/kernels/xpu/bmm_xpu_utils.h — Batched FC utility override

Verification

All 19 test cases from all_config.txt now pass with max_abs_diff in range 1.04e-07 to 2.98e-07 (well within atol=1e-4 tolerance).

Does this PR introduce a precision change?

Yes — XPU precision corrected to align with GPU (TF32 accumulation → full fp32 accumulation for bmm float32).

… for float32 accumulation

GPU uses CUBLAS_COMPUTE_32F (full fp32) for float32 bmm, while XPU defaults
to FC_TF32 (tfloat32 with only 10 mantissa bits), causing precision discrepancies
that scale with matrix dimensions. Override FC_TF32 to FC_FLOAT for float32 bmm
in forward, backward, and batched FC paths to match GPU precision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 11, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant