Skip to content

[Fix] fix IndexElementwiseGet kernel CUDA error(700) on 0-size input#78251

Open
DanielSun11 wants to merge 1 commit intoPaddlePaddle:developfrom
DanielSun11:fix/index-elementwise-get-0size
Open

[Fix] fix IndexElementwiseGet kernel CUDA error(700) on 0-size input#78251
DanielSun11 wants to merge 1 commit intoPaddlePaddle:developfrom
DanielSun11:fix/index-elementwise-get-0size

Conversation

@DanielSun11
Copy link
Contributor

@DanielSun11 DanielSun11 commented Mar 10, 2026

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

问题背景

当使用高级整数索引(list-of-list)对第一维为 0 的 Tensor 执行 __getitem__ 时,触发 CUDA error(700)(非法内存访问):

import paddle
x = paddle.zeros([0, 5, 4, 3], dtype='complex128')
out = x[[[2, 3, 4], [1, 2, 5]]]  # CUDA error(700)

错误发生在 IndexElementwiseGetKernel 的 GPU 实现中。

根因分析

调用链:__getitem__tensor__getitem_dygraphApplyGetitemAdvancedIndexindex_elementwise_get_ad_funcIndexElementwiseGetKernel

AdvancedIndex 构造函数将被索引的维度用索引形状替换得到 src_sizes,如对 x.shape=[0,5,4,3][[2,3,4],[1,2,5]](shape=[2,3])索引维度 0,得到 src_sizes = [2, 3, 5, 4, 3](numel=90)。

因此 kernel 中:

  • out->numel() = 90 != 0,原有 if (out->numel() == 0) return; 不触发
  • x.numel() = 0x.data<T>() 返回 nullptr
  • GPU kernel 访问 nullptr + offset(offset = index_val × stride)→ CUDA error(700)

反向 kernel(IndexElementwiseGetGradKernel)存在同样问题:x_gradx 同 shape(numel=0),对 x_grad->data<T>() = nullptr 写入也会触发非法访问。

修复方案

在三个 kernel 文件中增加对输入为空的早退检查:

  1. GPU forward (index_elementwise_get_kernel.cu):当 x.numel() == 0 时,用 GpuMemsetAsync 将输出填零并 return
  2. CPU forward (index_elementwise_get_kernel.cc):当 x.numel() == 0 时,用 memset 将输出填零并 return
  3. GPU backward (index_elementwise_get_grad_kernel.cu):当 x_grad->numel() == 0 时(即 x.numel() == 0)直接 return

新增单测

  • test/legacy_test/test_index_elementwise.py:新增 TestIndexElementwiseGet0SizeInput,覆盖 complex128、bool、float32、float64、int64、float16 等 dtype,包含正负索引及一维索引等场景(9 个测试方法)
  • test/legacy_test/test_index_elementwise_grad.py:新增 TestIndexElementwiseGet0SizeInputGrad,覆盖 float32、float64 及负索引的反向场景(3 个测试方法)

是否引起精度变化

@paddle-bot
Copy link

paddle-bot bot commented Mar 10, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@ae907b8). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...le/phi/kernels/cpu/index_elementwise_get_kernel.cc 33.33% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (33.33%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #78251   +/-   ##
==========================================
  Coverage           ?   33.33%           
==========================================
  Files              ?        1           
  Lines              ?        3           
  Branches           ?        0           
==========================================
  Hits               ?        1           
  Misses             ?        2           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants