Skip to content

Conversation

@zyfncg
Copy link

@zyfncg zyfncg commented Nov 12, 2025

PR Category

PR Types

Description

Le-soleile and others added 30 commits October 10, 2025 20:12
* fix comparison warning

* fix
…e#75665)

* 【CUDA Kernel No.39】collect_fpn_proposals算子Kernel修复

* fix index path
* refractor & fix moe_permute

* refractor
* fix: prevent memcpy over-read in im2col_sh1sw1dh1dw1ph1pw1 NCHW branches

- Add bounds clamping for all memcpy operations in the specialized fast path
- Add zero-fill for shortfall cases to ensure complete output tensor coverage
- Maintain performance by using memcpy when safe, falling back to element-wise operations only when necessary

* fix: prevent memcpy over-read in filter_width==1 case of im2col_sh1sw1dh1dw1ph1pw1

- Fix unsafe memcpy in NCHW path when filter_width == 1
- Prevent negative size_t conversion when output_width < plw + prw
- Clamp copy size to available source span (im_width) to avoid over-read
- Add zero-fill for shortfall cases to ensure complete output coverage

* fix: enhance im2col_common to prevent overflow in arithmetic operations

- Convert dimensions to 64-bit integers to avoid overflow during calculations
- Update index calculations for col and im arrays to use 64-bit arithmetic
- Ensure safe access to tensor data by checking bounds before indexing
---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Copilot <[email protected]>
* clean py3.8 in dockerfile - part

* fix
* fix: using latest API

* switch check_prim_pir ON

* fix: Code Style Issue

* remove: useless whitelist.

* fix: code-style issue.

* Update test/legacy_test/test_dropout_op.py

Co-authored-by: Nyakku Shigure <[email protected]>

* fix: code-style issue.

---------

Co-authored-by: Nyakku Shigure <[email protected]>
aztice and others added 30 commits October 27, 2025 17:25
* feat: debugging info

* fix: non-cuda device’s logging error.

* remove: cuda version checking

useless

* fix: syntax error

* fix: code-style issue.

* fix: build error

* fix: syntax error

* feat: ctcloss.zero_infinity

* Remove zero_infinity parameter from ctc_loss

Removed the 'zero_infinity' parameter from the ctc_loss function call.

* fix: code-style issue.

* fix: code-style issue.

?

* fix: code-style issue.
* support hf checkpoint

fix

support cast

add id macro

fix

* add test and fix some bug

* fix full param bug

* add full param cast test

---------

Co-authored-by: xingmingyyj <[email protected]>
…75642)

* Add partial_concat_grad_kernel.h

* Change to gpu

* 修改目录

* Fix
* sharding stage3 bugfix

* sharding stage3 bugfix

* sharding stage3 bugfix

* sharding stage3 bugfix

* sharding stage3 bugfix

* sharding stage3 bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.