[Fix] fix IndexElementwiseGet kernel CUDA error(700) on 0-size input#78251
Open
DanielSun11 wants to merge 1 commit intoPaddlePaddle:developfrom
Open
[Fix] fix IndexElementwiseGet kernel CUDA error(700) on 0-size input#78251DanielSun11 wants to merge 1 commit intoPaddlePaddle:developfrom
DanielSun11 wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (33.33%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #78251 +/- ##
==========================================
Coverage ? 33.33%
==========================================
Files ? 1
Lines ? 3
Branches ? 0
==========================================
Hits ? 1
Misses ? 2
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
问题背景
当使用高级整数索引(list-of-list)对第一维为 0 的 Tensor 执行
__getitem__时,触发 CUDA error(700)(非法内存访问):错误发生在
IndexElementwiseGetKernel的 GPU 实现中。根因分析
调用链:
__getitem__→tensor__getitem_dygraph→ApplyGetitem→AdvancedIndex→index_elementwise_get_ad_func→IndexElementwiseGetKernelAdvancedIndex构造函数将被索引的维度用索引形状替换得到src_sizes,如对x.shape=[0,5,4,3]用[[2,3,4],[1,2,5]](shape=[2,3])索引维度 0,得到src_sizes = [2, 3, 5, 4, 3](numel=90)。因此 kernel 中:
out->numel() = 90 != 0,原有if (out->numel() == 0) return;不触发x.numel() = 0,x.data<T>()返回nullptrnullptr + offset(offset = index_val × stride)→ CUDA error(700)反向 kernel(
IndexElementwiseGetGradKernel)存在同样问题:x_grad与x同 shape(numel=0),对x_grad->data<T>() = nullptr写入也会触发非法访问。修复方案
在三个 kernel 文件中增加对输入为空的早退检查:
index_elementwise_get_kernel.cu):当x.numel() == 0时,用GpuMemsetAsync将输出填零并 returnindex_elementwise_get_kernel.cc):当x.numel() == 0时,用memset将输出填零并 returnindex_elementwise_get_grad_kernel.cu):当x_grad->numel() == 0时(即x.numel() == 0)直接 return新增单测
test/legacy_test/test_index_elementwise.py:新增TestIndexElementwiseGet0SizeInput,覆盖 complex128、bool、float32、float64、int64、float16 等 dtype,包含正负索引及一维索引等场景(9 个测试方法)test/legacy_test/test_index_elementwise_grad.py:新增TestIndexElementwiseGet0SizeInputGrad,覆盖 float32、float64 及负索引的反向场景(3 个测试方法)是否引起精度变化
否