[XPU] Fix precision for paddle.Tensor.__truediv__ complex number operations by YqGe585 · Pull Request #78950 · PaddlePaddle/Paddle

YqGe585 · 2026-05-11T11:51:38Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

Fix XPU precision for paddle.Tensor.__truediv__ complex number division operations. Three originally failing cases on XPU are now resolved:

bool / complex(0,2) — AccuracyError where XPU output differed from GPU
float64 / complex64 — PaddleError where XPU kernel threw error
complex64 / float64 — PaddleError where XPU kernel threw error

Root cause: XPU lacked kernel support for complex number division and casting operations. Three fixes were applied:

DivideKernel<complex64,XPUContext>

Implemented complex division via real/imag decomposition using float-level XPU APIs, since XPU lacks native broadcast_div<complex64>. Formula: (a+bi)/(c+di) = ((ac+bd)/(c²+d²)) + ((bc-ad)/(c²+d²))i. This fixes the float64/complex64 and complex64/float64 PaddleError cases.

DivideGradKernel<complex64,XPUContext>

Implemented complex division gradients (dx = dout/conj(y), dy = -dout*conj(out/y)) via real/imag decomposition with ExpandGradKernel for broadcasting. This fixes the bool/complex(0,2) AccuracyError case.

CastKernel<complex128,XPUContext> + COMPLEX128 case

Enabled float64→complex128 and float32→complex128 casting needed for type promotion when float64 and complex64 are combined. Added both the COMPLEX128 case in the CastKernel switch and a CastKernel specialization that extracts real part and delegates to CastKernel.

All changes are gated by PADDLE_WITH_XPU_FFT compile flag, following the pattern established by existing complex64 specializations in add/multiply kernels.

Modified files:

paddle/phi/kernels/xpu/elementwise_divide_kernel.cc (+57 lines)
paddle/phi/kernels/xpu/elementwise_divide_grad_kernel.cc (+158 lines)
paddle/phi/kernels/xpu/cast_kernel.cc (+40 lines)

Does this PR introduce a precision change?

Yes — XPU precision for complex number division operations now aligns with GPU/CPU behavior.

…ide/divide_grad and complex128 cast kernels Add XPU kernel support for complex number division operations: 1. DivideKernel<complex64,XPUContext>: implement complex division via real/imag decomposition using float-level XPU APIs, since XPU lacks native complex64 broadcast_div. 2. DivideGradKernel<complex64,XPUContext>: implement complex division gradients (dx = dout/conj(y), dy = -dout*conj(out/y)) using real/imag decomposition with ExpandGradKernel for broadcasting. 3. CastKernel<complex128,XPUContext> and COMPLEX128 case in CastKernel switch: enable float64->complex128 and float32->complex128 casting needed for type promotion when float64 and complex64 are combined. All changes gated by PADDLE_WITH_XPU_FFT compile flag, following the pattern established by existing complex64 specializations in add/multiply kernels. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

paddle-bot · 2026-05-11T11:51:45Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Fix precision for paddle.Tensor.truediv complex number operations#78950

[XPU] Fix precision for paddle.Tensor.truediv complex number operations#78950
YqGe585 wants to merge 1 commit into
PaddlePaddle:developfrom
YqGe585:xpu-api-fixer/PAD-187-xpu-precision

YqGe585 commented May 11, 2026

Uh oh!

paddle-bot Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YqGe585 commented May 11, 2026

PR Category

PR Types

Description

DivideKernel<complex64,XPUContext>

DivideGradKernel<complex64,XPUContext>

CastKernel<complex128,XPUContext> + COMPLEX128 case

Does this PR introduce a precision change?

Uh oh!

paddle-bot Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant