[WIP][API-Compat] Add paddle.compat.min/max and new PHI kernel (min/max_with_index) #74512
+1,920
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
New features
Description
本 PR 为 #74495 的 reopen 版本,rebase 到了一个更新的版本,并且解决了与 #74506 的冲突。#74495 当时只用于CI发现问题,本 PR 尝试对其中“共用amin/amax backward op 但amin/amax 不支持某些整数类型“的问题进行了修复,基于SFINAE与python端检查。目前本 PR 在 #74506 未合入前会显得改动过多,实际上是包含了部分前序 PR 的改动,前序 PR merge 后应该可以自动 resolve。
本 PR 尚未完成:缺少对应的单测(进行了测试,见最后的TODO),并且依赖一个前置 PR(前置PR目前没有合入,合入后本PR信息将会修改): #74446,需要其中的
ForbidKeywordsDecorator
装饰器。本 PR 新增的 feature:
(min/max)_with_index_grad
,与 torch 行为一致,基于 amin/amax 的梯度操作修改而来。paddle.compat.min
,paddle.compat.max
,与 torch 的行为进行对齐。torch.min
/torch.max
输入输出关系很复杂(一个API包含了太多功能):minimum
/maximum
一致除上述【情况2】在 CUDA GPU 后端下会调用
(min/max)_with_index
,其余情况都是由 python 调用_C_ops.xxx
获得结果的。其中情况1/2/3 在CUDA GPU后端下应该都具有较好的性能(没有进行组合,调用单算子完成),而情况2在其他后端下使用 argmin/max 与 take_along_axis 组合(并且需要配合 squeeze_ 操作),不是最优性能方案,但应当具有较高的开发性价比。TODO
test_compat_minmax.py
: 能达到单测覆盖率要求的算子单测。Pcard-89620