forked from PaddlePaddle/Paddle
-
Notifications
You must be signed in to change notification settings - Fork 0
Test #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zyfncg
wants to merge
1,004
commits into
zhangyuqin1998:dev/flashep
Choose a base branch
from
zyfncg:dev/flashep
base: dev/flashep
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Test #1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix comparison warning * fix
…e#75665) * 【CUDA Kernel No.39】collect_fpn_proposals算子Kernel修复 * fix index path
* Add moe_unpermute_kernel.h * 修复typo
* refractor & fix moe_permute * refractor
* fix: prevent memcpy over-read in im2col_sh1sw1dh1dw1ph1pw1 NCHW branches - Add bounds clamping for all memcpy operations in the specialized fast path - Add zero-fill for shortfall cases to ensure complete output tensor coverage - Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary * fix: prevent memcpy over-read in filter_width==1 case of im2col_sh1sw1dh1dw1ph1pw1 - Fix unsafe memcpy in NCHW path when filter_width == 1 - Prevent negative size_t conversion when output_width < plw + prw - Clamp copy size to available source span (im_width) to avoid over-read - Add zero-fill for shortfall cases to ensure complete output coverage * fix: enhance im2col_common to prevent overflow in arithmetic operations - Convert dimensions to 64-bit integers to avoid overflow during calculations - Update index calculations for col and im arrays to use 64-bit arithmetic - Ensure safe access to tensor data by checking bounds before indexing
--------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Copilot <[email protected]>
…addle#75747) --------- Co-authored-by: Nyakku Shigure <[email protected]>
…5724) * add log * support dynamic_shape
* clean py3.8 in dockerfile - part * fix
* fix: using latest API * switch check_prim_pir ON * fix: Code Style Issue * remove: useless whitelist. * fix: code-style issue. * Update test/legacy_test/test_dropout_op.py Co-authored-by: Nyakku Shigure <[email protected]> * fix: code-style issue. --------- Co-authored-by: Nyakku Shigure <[email protected]>
* fix * fix * fix dcu
* feat: debugging info * fix: non-cuda device’s logging error. * remove: cuda version checking useless * fix: syntax error * fix: code-style issue. * fix: build error * fix: syntax error * feat: ctcloss.zero_infinity * Remove zero_infinity parameter from ctc_loss Removed the 'zero_infinity' parameter from the ctc_loss function call. * fix: code-style issue. * fix: code-style issue. ? * fix: code-style issue.
* support hf checkpoint fix support cast add id macro fix * add test and fix some bug * fix full param bug * add full param cast test --------- Co-authored-by: xingmingyyj <[email protected]>
…75642) * Add partial_concat_grad_kernel.h * Change to gpu * 修改目录 * Fix
* sharding stage3 bugfix * sharding stage3 bugfix * sharding stage3 bugfix * sharding stage3 bugfix * sharding stage3 bugfix * sharding stage3 bugfix
…4284)" (PaddlePaddle#76090) This reverts commit e2a8155.
…Paddle#74284)" (PaddlePaddle#76090)" This reverts commit e5f8345.
…into dev/flashep
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
PR Types
Description