Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1004 commits
Select commit Hold shift + click to select a range
0f34ab6
【CUDA Kernel No.122】expand_modality_expert_id算子Kernel修复 -part (#75708)
Le-soleile Oct 10, 2025
451814c
【CUDA Kernel No.57】global_scatter算子Kernel修复 -part (#75699)
Le-soleile Oct 10, 2025
129fab3
[XPU] Auto bump XHPC to 20251007 (#75688)
paddle-xpu-bot Oct 10, 2025
ee159d0
【CUDA Kernel No.53】fused_token_prune算子Kernel修复 -part (#75701)
Le-soleile Oct 10, 2025
d0c2788
del deprecated uts part2 (#75726)
swgu98 Oct 10, 2025
3c407fa
[Test] Remove deprecated uts (part3) (#75730)
SigureMo Oct 11, 2025
474d1ab
fix comparison warning (#75652)
co63oc Oct 11, 2025
3dd52c5
【CUDA Kernel No.39】collect_fpn_proposals算子Kernel修复 -part (#75665)
youge325 Oct 11, 2025
d30a353
【CUDA Kernel No.81】moe_unpermute算子Kernel修复 -part (#75644)
Le-soleile Oct 11, 2025
abb153b
python2.7 change to python in pyCov_multithreading (#75669)
co63oc Oct 11, 2025
fabaa95
add python3.13 in build_utils.sh (#75723)
co63oc Oct 11, 2025
fe2a8fc
【CUDA Kernel No.132】moe_gate_dispatch_permute_grad算子Kernel修复 -part (#…
Le-soleile Oct 11, 2025
ea2cc97
refractor & fix moe_permute (#75725)
A-nnonymous Oct 11, 2025
f556d04
[XPU] support index_elementwise_get kernel (#75486)
cqulilujia Oct 11, 2025
91a4c15
Fix im2col cpu (#75731)
scyyh11 Oct 11, 2025
4f3effe
rename test_mkldnn_matmul_elementwise_add_fuse_pass [fluid_ops] (#75…
co63oc Oct 11, 2025
2c02b6c
[Test] Move cpp unittests to test directory (#75632)
SigureMo Oct 11, 2025
6759447
Replace `mkldnn` with `onednn` in `test_build_strategy.py` (#75746)
co63oc Oct 12, 2025
08fe857
[SOT] Support builtin dispatch for `is_compiled_with_onednn` (#75747)
co63oc Oct 12, 2025
cf92c0c
[CI] Add Report Preview URLs Workflow (#75687)
ooooo-create Oct 12, 2025
fcf3c3f
Disable CUBLAS TF32 for default for better precision. (#75476)
A-nnonymous Oct 13, 2025
8224888
【pipeparellal】 PipelineParallel support dynamic_shape (#75724)
xiaoguoguo626807 Oct 13, 2025
402b977
[XPU] Auto bump XHPC to 20251010 (#75751)
paddle-xpu-bot Oct 13, 2025
7975faf
cuda13 almalinux trt (#75695)
swgu98 Oct 13, 2025
7a86836
replace mkldnn to onednn in strings (#75745)
co63oc Oct 13, 2025
5beed39
clean py3.8 in dockerfile (#75732)
co63oc Oct 13, 2025
290c4da
time string format in progress bar (#75736)
MayYouBeProsperous Oct 13, 2025
e9f2910
【UnitTestFix No.4】Fix unittest `test_dropout_op` (#75729)
aztice Oct 13, 2025
1990bcc
[DeepEP] support M2N (#75582)
zhoutianzi666 Oct 13, 2025
a02d1aa
[深度对齐] dot (#75717)
cszdrg Oct 13, 2025
31f801d
fix (#75605)
cszdrg Oct 13, 2025
169e64c
【UnitTestFix No.1】fix test_activation_op.py (#75553)
scyyh11 Oct 13, 2025
0af06ea
add comment for unused variables (#75489)
co63oc Oct 13, 2025
4edb367
[Compat] add device.XXX and cuda.XXX (#75692)
fxyfxy777 Oct 13, 2025
5efc7b7
[Stride] Add new stride op into list (#75719)
Eddie-Wang1120 Oct 13, 2025
721ed68
rename mkldnn to onednn in paddle/fluid/inference/goapi/ (#75604)
co63oc Oct 13, 2025
1e7d6bf
fix field_name.compare (#75681)
co63oc Oct 13, 2025
b38cd1e
clean IS_TRT_VERSION_GE(6000) - part (#75735)
co63oc Oct 13, 2025
fb6e6ac
clean IS_TRT_VERSION_GE(6000) - part (#75734)
co63oc Oct 13, 2025
6cbb11f
clean IS_TRT_VERSION_GE(6000) in paddle/fluid/platform/tensorrt (#75733)
co63oc Oct 13, 2025
44222df
clean some IS_TRT_VERSION_GE (#75682)
co63oc Oct 13, 2025
11e3a28
clean IS_TRT_VERSION_GE(6000) (#75683)
co63oc Oct 13, 2025
cc367e8
【UnitTestFix No.15】test_allgather 单测 修复 (#75748)
tjujingzong Oct 13, 2025
67f1062
remove unused variable in activation.py (#75795)
co63oc Oct 13, 2025
01c1b09
新增 cross 别名 (#75743)
aztice Oct 13, 2025
0ee9730
4th-batch-13-统计代码逻辑错误 (#75753)
ApricityXX Oct 13, 2025
deed9d3
[Precision Depth Alignment] fix beta and threshold of paddle.nn.func…
zhengshengning Oct 13, 2025
2171de2
up_grade fc (#75613)
xingmingyyj Oct 13, 2025
bac79fe
【stride】Set value when dstplace != srcplace and one tenosr is not con…
xiaoguoguo626807 Oct 14, 2025
4f4f4ed
4th-batch-14-条件判断代码逻辑错误 (#75754)
ApricityXX Oct 14, 2025
938be7a
4th-batch-39-代码使用了已被弃置地函数 (#75777)
ApricityXX Oct 14, 2025
7cf8540
4th-batch-70-代码数值精度错误 (#75788)
ApricityXX Oct 14, 2025
e2c1180
4th-batch-72-描述文本错误 (#75789)
ApricityXX Oct 14, 2025
8c646f5
4th-batch-74-链接存在安全和失效风险 (#75791)
ApricityXX Oct 14, 2025
1562704
4th-batch-75-代码存在拼写错误 (#75792)
ApricityXX Oct 14, 2025
b15dcca
4th-batch-132to133-内存越界风险 (#75750)
ApricityXX Oct 14, 2025
7d65bd1
4th-batch-128-增加对返回值的检查 (#75767)
ApricityXX Oct 14, 2025
4b8a279
4th-batch-125-多线程析构问题 (#75769)
ApricityXX Oct 14, 2025
06aa086
4th-batch-29-变量名拼写错误 (#75765)
ApricityXX Oct 14, 2025
492c28f
4th-batch-41-代码运算逻辑错误 (#75780)
ApricityXX Oct 14, 2025
19de988
4th-batch-30-代码检测对象错误 (#75766)
ApricityXX Oct 14, 2025
bb2460b
4th-batch-53-多进程处理异常 (#75785)
ApricityXX Oct 14, 2025
331a6a6
4th-batch-50-分布式训练初始化一致性问题 (#75784)
ApricityXX Oct 14, 2025
c376041
4th-batch-47-代码存在多进程独立性问题 (#75782)
ApricityXX Oct 14, 2025
89f4bd9
4th-batch-49-代码存在梯度暂存与恢复问题 (#75783)
ApricityXX Oct 14, 2025
0a235a3
[Auto Parallel] Add co_shard spmd_rule for tile (#75246)
ooooo-create Oct 14, 2025
f9beaa3
4th-batch-38-日志信息错误 (#75776)
ApricityXX Oct 14, 2025
35d684e
4th-batch-15-获取参数缺乏条件判断可能导致错误 (#75755)
ApricityXX Oct 14, 2025
6364dc7
clean some CUDA_VERSION >= 10020 (#75815)
co63oc Oct 14, 2025
7de0256
4th-batch-37-代码逻辑错误 (#75775)
ApricityXX Oct 14, 2025
e499bfb
4th-batch-36-代码方法使用不当 (#75772)
ApricityXX Oct 14, 2025
1a45628
clean IS_TRT_VERSION_GE(6000) (#75809)
co63oc Oct 14, 2025
d49968f
清理废弃函数1013 (#75802)
ApricityXX Oct 14, 2025
16e0275
[Bug fix] Fix missing instantiation of isfinite/isinf/isnan kernels o…
youge325 Oct 14, 2025
a17b4a3
4th-batch-73-代码存在逻辑缺失 (#75790)
ApricityXX Oct 14, 2025
ae8f3fc
4th-batch-117-可能的静默失败 (#75768)
ApricityXX Oct 14, 2025
f285cfa
4th-batch-111to113-修复一些代码逻辑问题 (#75771)
ApricityXX Oct 14, 2025
11ca894
4th-batch-80-反向传播查找梯度时可能缺失关键映射 (#75798)
ApricityXX Oct 14, 2025
8c5f78d
4th-batch-17-代码限制多设备场景 (#75757)
ApricityXX Oct 14, 2025
352133c
4th-batch-28-代码变量赋值逻辑错误(#75764)
ApricityXX Oct 14, 2025
903f7c7
[SOT] Allow user specify a region to safe capture control flow (#75548)
SigureMo Oct 14, 2025
098a840
4th-batch-106-缓存机制失效造成性能浪费 (#75779)
ApricityXX Oct 14, 2025
fb24b38
4th-batch-96-可能导致访问不存在的属性引发运行时异常 (#75819)
ApricityXX Oct 14, 2025
fc6250a
4th-batch-31-代码检测对象错误 (#75770)
ApricityXX Oct 14, 2025
47699dd
4th-batch-43-代码参数定义存在冲突 (#75781)
ApricityXX Oct 14, 2025
a19482d
[CINN] Fix get static value for arange strategy (#75837)
SigureMo Oct 15, 2025
3d7a91a
Implement `__cuda_stream__` protocol (#75854)
SigureMo Oct 15, 2025
04ae617
fix memory leak bugs (#75852)
sneaxiy Oct 15, 2025
d1af165
[Compat] add device.XXX and cuda.XXX (#75744)
fxyfxy777 Oct 15, 2025
4971c3c
【FlexCheckpoint】fix_the_layer_id_macro (#75556)
zty-king Oct 15, 2025
a103d8c
4th-batch-21-代码未正确验证变量 (#75762)
ApricityXX Oct 15, 2025
13d9cf7
fix typos plateform platform (#75806)
co63oc Oct 15, 2025
8f2955d
4th-batch-139-容器返回值比较错误 (#75749)
ApricityXX Oct 15, 2025
8e37ed6
replace Mkldnn to Onednn in compute_propagate_scales_onednn_pass (#75…
co63oc Oct 15, 2025
6f808ba
[DLPack] Bump DLPack to v1.2 and implement C functions exchange API (…
SigureMo Oct 15, 2025
0729a58
fix test_sum_op (#75849)
YqGe585 Oct 15, 2025
b7e55f5
clean get_cuda_version() < 11020 in tests - part (#75839)
co63oc Oct 15, 2025
f734c6f
clean TENSORRT_MAJOR_VERSION EQUAL 7 check (#75844)
co63oc Oct 15, 2025
02bdf9a
clean CUDA_VERSION >= 11020 in cusparseLt.h (#75814)
co63oc Oct 15, 2025
5b2b185
update trt_version in tensorrt linalg.py (#75793)
co63oc Oct 15, 2025
3303cf5
update trt_version in conv.py (#75591)
co63oc Oct 15, 2025
7a56f63
[Bug fix] Fix isinf misidentifying NaN as Inf in bfloat16.h (#75807)
youge325 Oct 15, 2025
be8a65c
use getTensorIOMode to fix bind_index (#75833)
co63oc Oct 15, 2025
2717d4f
API、Tensor and GradNode support unique name (#75752)
DanielSun11 Oct 15, 2025
c7d4e23
4th-batch-98-字典语法使用错误 (#75822)
ApricityXX Oct 15, 2025
254cf3a
Add new/malloc operation to release memory check (#75875)
swgu98 Oct 15, 2025
bd295b5
[CppExtension] Keep hirachey in build directory (#75866)
SigureMo Oct 16, 2025
e58582b
4th-batch-89-未对元素进行检查 (#75803)
ApricityXX Oct 16, 2025
038cc70
4th-batch-93to94-未校验数据有效性 (#75813)
ApricityXX Oct 16, 2025
e27e524
fix enable_custom_device (#75873)
YqGe585 Oct 16, 2025
dce6f6c
[Bug Fix] Allow float16/bfloat16 Scalar to be converted to complex ty…
youge325 Oct 16, 2025
1aa4a86
【CUDA Kernel No.103】seed算子Kernel修复 -part (#75577)
Le-soleile Oct 16, 2025
5225a06
【CUDA Kernel No.60】Add gru_kernel.h -part (#75845)
algorithm1832 Oct 16, 2025
cf1335f
[AutoParallel] Adapt auto parallel for double grad and triple grad (#…
waliwali777 Oct 16, 2025
6ca20eb
[Auto-Paralllel] fix shard_dataloader with no-tensor (#75252)
Xing-lil Oct 16, 2025
fd95aba
[Compat] Add missing interfaces for PyTorch compat (#75874)
SigureMo Oct 16, 2025
5dbecdc
[XPU] Update XHPC to 20251014 and add some dim check in FlashAttnKern…
ZibinGuo Oct 16, 2025
9999342
fix unused variable (#75862)
co63oc Oct 17, 2025
c5e8259
Support `model.to` with `device=tensor.place` (#75867)
HydrogenSulfate Oct 17, 2025
c34161d
[Compat] Fix `transpose` implementation and add negative indexing sup…
SigureMo Oct 17, 2025
1f1b56d
set MKL_NUM_THREADS (#75880)
zhengshengning Oct 17, 2025
57d59ba
[Auto Parallel] Add co_shard spmd_rule for bmm (#75555)
ooooo-create Oct 17, 2025
45f3410
fix value_grad ele_size error (#75903)
changeyoung98 Oct 17, 2025
0481988
clean some CUDA_VERSION >= 11020 (#75865)
co63oc Oct 17, 2025
ceeaeaa
clean some IS_TRT_VERSION_GE(7200) (#75864)
co63oc Oct 17, 2025
01cfb0c
clean some IS_TRT_VERSION_GE(7000) (#75863)
co63oc Oct 17, 2025
dd19cfb
update test/cpp/inference/infer_ut/CMakeLists.txt (#75858)
co63oc Oct 17, 2025
66483e0
replace trt_version in tensorrt/impls (#75826)
co63oc Oct 17, 2025
6eb5588
clean get_cuda_version() < 11020 in tests (#75811)
co63oc Oct 17, 2025
37f7dbe
[Precision Depth Alignment] paddle.log_sigmoid (#75898)
zhengshengning Oct 17, 2025
984aee4
fix errors caused by gpu in conditions (#75551)
YqGe585 Oct 17, 2025
b91c61d
Fix dnn related tests for custom device (#75609)
YqGe585 Oct 17, 2025
3289717
[TVM FFI] Bump tvm ffi to `0.1.0b20` in unittests (#75902)
SigureMo Oct 17, 2025
8e58cb9
[Precision Depth Alignment] paddle.log aligns with torch precision (…
zhengshengning Oct 17, 2025
d2f4afd
[Precision Depth Alignment] fix eps of paddle.logit from float to dou…
zhengshengning Oct 17, 2025
5458524
4th-batch-16-函数变量未被使用 (#75756)
ApricityXX Oct 18, 2025
945ea69
Revert "Disable NVIDIA_TF32_OVERRIDE by default for better precision.…
A-nnonymous Oct 19, 2025
33eff52
4th-batch-55-日志打印信息错误 (#75786)
ApricityXX Oct 20, 2025
0017da8
【UnitTestFix No.16】test_reducescatter 单测修复 (#75886)
tjujingzong Oct 20, 2025
29f9ea2
Disable PaddleX in DCU/NPU (#75958)
tianshuo78520a Oct 20, 2025
ab79f8c
[big tensor] Paddle/paddle/phi/kernels/funcs gpuBigtensor (#75856)
cszdrg Oct 20, 2025
0a58d74
[XPU] use xpudnn interface for pool2d and pool2d_grad (#75630)
cqulilujia Oct 20, 2025
a1345b2
[Precision Depth Alignment] Modify the negative_slope parameter of th…
zrr1999 Oct 20, 2025
696c6c6
[AutoParallel] Add dense2dist in op_ad_func (#75691)
waliwali777 Oct 20, 2025
8f6b9df
fix: Enhance matmul_grad to ensure output shape matches original inpu…
scyyh11 Oct 21, 2025
a59ca03
【CUDA Kernel No.46】cvm算子Kernel修复 -part (#75703)
Le-soleile Oct 21, 2025
d8be320
4th-batch-101-检查不严谨可能导致分片行为不一致 (#75823)
ApricityXX Oct 21, 2025
3c66d04
4th-batch-97-变量命名失误 (#75820)
ApricityXX Oct 21, 2025
52c493c
4th-batch-92-防止因配置错误或状态不同步导致的梯度缓冲区错配问题 (#75812)
ApricityXX Oct 21, 2025
9f7f2ba
4th-batch-109-增加长度检查 (#75774)
ApricityXX Oct 21, 2025
b381231
[CUDAGraph] Remove CUDAGraph replay after capture and use the same de…
DrRyanHuang Oct 21, 2025
0378119
fix custom device save error (#75961)
YqGe585 Oct 21, 2025
e106142
fix blas for custom device (#75969)
YqGe585 Oct 21, 2025
011e42d
Revert "Revert "Disable NVIDIA_TF32_OVERRIDE by default for better pr…
A-nnonymous Oct 21, 2025
1f00e21
[Compat] Define the macro `CHECK` only when it is not already defined…
co63oc Oct 21, 2025
3836c2d
[DLPack] Implement dtype and device exchange protocol (#75973)
SigureMo Oct 21, 2025
4b0215a
[CppExtension] Support `os.PathLike` in `CppExtension`/`CUDAExtension…
SigureMo Oct 22, 2025
449eb7e
Support md5 checksum for API output tensor (#75835)
DanielSun11 Oct 22, 2025
d05a775
fix shape=int for size_args_decorator (#75983)
HydrogenSulfate Oct 22, 2025
c46e8c7
fix typo disable_loggling -> disable_logging (#75978)
co63oc Oct 22, 2025
bd5058a
fix _get_arch_info (#75921)
co63oc Oct 22, 2025
7a31a7e
clean some IS_TRT_VERSION_GE(5130) (#75946)
co63oc Oct 22, 2025
05af9c2
clean some IS_TRT_VERSION_GE(8000) (#75944)
co63oc Oct 22, 2025
5f1ea8a
clean some IS_TRT_VERSION_LT(8000) (#75919)
co63oc Oct 22, 2025
c09079f
clean get_cuda_version < 8100 (#75895)
co63oc Oct 22, 2025
19e5388
clean get_cuda_version() < 11020 - part (#75618)
co63oc Oct 22, 2025
4648625
clean get_cuda_version() < 11020 in test_variable_length_memory_effic…
co63oc Oct 22, 2025
ab39b91
clean IS_TRT_VERSION_LT(8000) in tensorrt plugin (#75920)
co63oc Oct 22, 2025
e006936
fix test_dynamic_engine (#75943)
co63oc Oct 22, 2025
c7a658d
[Bug Fix] Fix missing header include in activation_offloader.h (#75936)
youge325 Oct 22, 2025
31870e6
revert_mkl_num_threads (#75985)
zhengshengning Oct 22, 2025
e1ffaed
[Bug Fix] Improve error handling and compatibility in TensorRT engine…
youge325 Oct 22, 2025
05439b1
4th-batch-68-代码梯度计算错误 (#75787)
ApricityXX Oct 22, 2025
3e31bf4
Revert test_activation_op.py to fix bug caused by commit deed9d360d (…
scyyh11 Oct 22, 2025
5ec5c07
4th-batch-19-代码调用错误 (#75759)
ApricityXX Oct 22, 2025
d2f87b7
4th-batch-17-代码限制多设备场景(补充修复) (#75959)
ApricityXX Oct 22, 2025
2bb1097
【UnitTestFix No.3】fix test_conv3d_transpose_op.py (#75945)
scyyh11 Oct 22, 2025
a84cc0e
[Bug Fix] add missing header include in ir_context.h (#75927)
youge325 Oct 22, 2025
1f84292
add tensorrt 10 support int64 (#75951)
co63oc Oct 22, 2025
2b9ba85
[Compat] Try import `tvm_ffi` when enable torch proxy (#75991)
SigureMo Oct 22, 2025
89931f0
clean pip3.8 in Dockerfile.develop.npu (#75893)
co63oc Oct 23, 2025
ff34cae
fix masked_fill_grad value_grad bug (#75988)
changeyoung98 Oct 23, 2025
4263da4
4th-batch-20-代码存在未被使用的变量 (#75761)
ApricityXX Oct 23, 2025
b65dadd
use op_test.get_cuda_version (#75994)
co63oc Oct 23, 2025
89d92c3
merge ifdef PADDLE_WITH_CUDA in build_strategy.cc (#75962)
co63oc Oct 23, 2025
6692ccb
[Cherry-pick] Optimize FlashMask v3 performance (#75737) (#75984)
umiswing Oct 23, 2025
481a88f
[Stride] Disable Split Stride Kernel (#75987)
Eddie-Wang1120 Oct 23, 2025
9fe6225
[Bug Fix] Fix NaN/Inf check to support float16, bfloat16, and complex…
youge325 Oct 23, 2025
74f6ea8
[Stride] Optimizing H2D Copy by TensorIterator and OpenMP (#75192)
Eddie-Wang1120 Oct 23, 2025
83d4454
[Precision Depth Alignment] implement torch compatible max_pool2d gra…
zrr1999 Oct 23, 2025
189706c
fix to_tensor bug (#76000)
wanghuancoder Oct 23, 2025
2db3061
[CINN] Fix bug of infer_symbol_shape for crop op (#75992)
zyfncg Oct 23, 2025
ca3f6ef
【CUDA Kernel No.93】psroi_pool_grad_kernel算子修复 (#75938)
xxiu1 Oct 23, 2025
246c4a9
fix win32 rms_norm. (#76007)
A-nnonymous Oct 23, 2025
c85fc97
Update check_approval.sh (#76012)
luotao1 Oct 24, 2025
a799f8d
[Fix] log sigmoid complex (#75953)
scyyh11 Oct 24, 2025
7d3ae36
[PHI] Flash Attention V3 128B aligned chunking load/store (#76003)
Enigmatisms Oct 24, 2025
fafb525
[Slice] Fix big tensor (#76004)
Eddie-Wang1120 Oct 24, 2025
cdfc18c
fix python version in ci/utils.sh (#75997)
co63oc Oct 24, 2025
ed8e5e3
clean pip3.8 in Dockerfile.develop.dtk (#75738)
co63oc Oct 24, 2025
eee3605
fix repeat IS_TRT_VERSION_GE (#75975)
co63oc Oct 24, 2025
f0747d3
clean IS_TRT_VERSION_GE(5000) (#75990)
co63oc Oct 24, 2025
4887335
[Bug Fix] Fix CastDataTypeFunctor for low-precision floats to complex…
youge325 Oct 24, 2025
70b14ac
[Bug Fix] Fix CastKernel for low-precision to complex type conversion…
youge325 Oct 24, 2025
fbe99bc
[Storage]Add deleter for mmap_storage get_slice (#75966)
changeyoung98 Oct 24, 2025
f29f693
[Bug Fix] Support isfinite/isnan/isinf for float16/bfloat16 on CUDA/H…
youge325 Oct 24, 2025
69887cd
Add LOG Guard and optimize the PyLayer LOG (#76010)
DanielSun11 Oct 24, 2025
915abce
[API] Remove dtype check in static branch to allow pass bf16 data to …
SigureMo Oct 24, 2025
b4a5484
[API] Support tensor shape in `reshape` with compatible API (#76025)
SigureMo Oct 24, 2025
bb79a54
[Precision Depth Alignment] Change the pad_value parameter of pad3d f…
zhengshengning Oct 24, 2025
5772ceb
[XPU] Auto bump XHPC to 20251024 (#76035)
paddle-xpu-bot Oct 26, 2025
f8eb896
use add_executable to replace cuda_add_executable (#75998)
co63oc Oct 27, 2025
ed0e828
[Bug Fix] Fix compilation issues in enforce_test.cc for different cuF…
youge325 Oct 27, 2025
f516d97
[Bug Fix] Fix C2593 error on MSVC for bfloat16 in RemainderGradDy (#7…
youge325 Oct 27, 2025
02f6932
[Bug Fix] Fix warnings related to deprecated std::iterator usage in e…
youge325 Oct 27, 2025
92e2a00
[XPU]Modify the independent XPU memory monitoring module (#76018)
qw86972190 Oct 27, 2025
8be343c
Update coverage (#76045)
tianshuo78520a Oct 27, 2025
72ace2f
fix some tests (#75956)
1184319564 Oct 27, 2025
e54142a
fix typo blockDim (#76016)
co63oc Oct 27, 2025
6d05ec9
【CUDA Kernel No.88】partial_allgather算子Kernel修复 -part (#75643)
Le-soleile Oct 27, 2025
e6f0eca
Advance Logging for `place.cc` (#75888)
aztice Oct 27, 2025
1fd2b5a
Pr support load hf checkpoint (#75928)
zty-king Oct 27, 2025
70a0660
【CUDA Kernel No.89】partial_concat_grad算子Kernel修复 -part (#75642)
Le-soleile Oct 27, 2025
e05b3b1
add SetDataType INT64 (#76017)
co63oc Oct 27, 2025
efc6b44
Fix ComparePriority to satisfy strict weak ordering for std::sort (#7…
feixi21 Oct 27, 2025
feeef7e
Temporary fix of moe_gat_dispatch_w_permute optest. (#76039)
A-nnonymous Oct 27, 2025
310e746
fix test_incubate_fused_loss (#76068)
zhengshengning Oct 28, 2025
d768c1a
clean CUDA_ARCH_FP16_SUPPORTED - part (#76024)
co63oc Oct 28, 2025
9f19eef
clean CUDA_ARCH_FP16_SUPPORTED - part (#76022)
co63oc Oct 28, 2025
ccdfb90
clean CUDA_ARCH_FP16_SUPPORTED(__CUDA_ARCH__) - part (#76021)
co63oc Oct 28, 2025
168742e
clean CUDA_VERSION >= 7050 (#76020)
co63oc Oct 28, 2025
3dbac78
fix typo load_static_dict (#75739)
co63oc Oct 28, 2025
3313952
Fix some tests for custom device (#76063)
1184319564 Oct 28, 2025
5c2e29e
sharding stage3 bugfix (#76005)
AlAuAu Oct 28, 2025
f07eb72
[Dy2St] Remove import of ast2 in `gast.py` (#76057)
co63oc Oct 28, 2025
8e10916
fix cinn 0size dynshape bug (#76093)
DrRyanHuang Oct 29, 2025
e2a8155
Revert "Update deep_ep intranode & internode kernels (#74284)" (#76090)
lshpku Oct 29, 2025
7263266
Revert "clean CUDA_ARCH_FP16_SUPPORTED - part (#76022)" (#76084)
co63oc Oct 29, 2025
316ec54
[CUDAGraph] Remove CUDAGraph legacy unitest (#76043)
DrRyanHuang Oct 29, 2025
7072d8a
add notify_dispatch api in deepep
zyfncg Oct 17, 2025
10684bf
add python api in buffer
zyfncg Oct 20, 2025
3896d23
fix param
zyfncg Oct 20, 2025
873e4db
add test file
zyfncg Oct 20, 2025
8edacce
modify nvshmem
zyfncg Oct 29, 2025
6e0a5a1
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
zyfncg Oct 29, 2025
e5f8345
Reapply "Update deep_ep intranode & internode kernels (#74284)" (#76090)
zyfncg Oct 29, 2025
0b9ca97
Add kernel of notify_combine
zyfncg Nov 11, 2025
a1c6383
Revert "Reapply "Update deep_ep intranode & internode kernels (#74284…
zyfncg Nov 12, 2025
1c3a399
update code
zyfncg Nov 12, 2025
36be241
Merge branch 'dev/flashep' of https://github.com/zhangyuqin1998/Paddl…
zyfncg Nov 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
6 changes: 3 additions & 3 deletions .github/actions/check-bypass/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ runs:
- id: check-bypass
name: Check Bypass
env:
CI_TEAM_MEMBERS: '["SigureMo", "risemeup1", "tianshuo78520a", "0x3878f", "swgu98", "luotao1", "XieYunshen"]'
uses: PFCCLab/ci-bypass@v1
CI_TEAM_MEMBERS: '["tianshuo78520a", "swgu98", "risemeup1", "XieYunshen","luotao1"]'
uses: PFCCLab/ci-bypass@v2
with:
github-token: ${{ inputs.github-token }}
non-pull-request-event-strategy: 'always-skipped'
non-pull-request-event-strategy: 'never-skipped'
type: 'composite'
composite-rule: |
{
Expand Down
87 changes: 87 additions & 0 deletions .github/workflows/Api-Benchmark-baseline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: Api-benchmark-baseline

on:
workflow_dispatch:
inputs:
PR_ID:
required: false
type: string
COMMIT_ID:
required: false
type: string
job-name:
required: true
default: 'api-benchmark'
type: choice
options:
- api-benchmark
- others
schedule:
- cron: '0 21 * * *'

permissions: read-all

defaults:
run:
shell: bash

jobs:
clone:
name: Api benchmark clone
uses: ./.github/workflows/_Clone-linux.yml
with:
clone_dir: Paddle-build
is_pr: 'false'

build-docker:
name: Api benchmark build docker
needs: clone
uses: ./.github/workflows/docker.yml
with:
clone_dir: Paddle-build
task: build

build:
name: Api benchmark build
if: github.event_name == 'schedule' && github.event.schedule == '0 21 * * *'
needs: [clone, build-docker]
uses: ./.github/workflows/_Linux-build.yml
with:
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}
is_pr: 'false'

api-benchmark-baseline-schedule:
name: Api benchmark baseline with schedule
strategy:
matrix:
run-labels: [api-bm-20, api-bm-27]
uses: ./.github/workflows/_Api-Benchmark.yml
needs: [clone, build-docker, build]
with:
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}
baseline: 'true'
run-labels: ${{ matrix.run-labels }}

api-benchmark-baseline-pr-20:
name: Api benchmark baseline with PR on 20
if: github.event_name == 'workflow_dispatch' && github.event.inputs.job-name == 'api-benchmark'
uses: ./.github/workflows/_Api-Benchmark.yml
needs: [clone, build-docker]
with:
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}
baseline: 'true'
MANUALLY_PR_ID: ${{ inputs.PR_ID }}
MANUALLY_COMMIT_ID: ${{ inputs.COMMIT_ID }}
run-labels: api-bm-20

api-benchmark-baseline-pr-27:
name: Api benchmark baseline with PR on 27
if: github.event_name == 'workflow_dispatch' && github.event.inputs.job-name == 'api-benchmark'
uses: ./.github/workflows/_Api-Benchmark.yml
needs: [clone, build-docker]
with:
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}
baseline: 'true'
MANUALLY_PR_ID: ${{ inputs.PR_ID }}
MANUALLY_COMMIT_ID: ${{ inputs.COMMIT_ID }}
run-labels: api-bm-27
23 changes: 12 additions & 11 deletions .github/workflows/CI-Build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: CI-Build
on:
pull_request:
types: [opened, synchronize]
branches: [develop, release/**]
branches: [develop, release/**, fleety_*]

permissions: read-all

Expand All @@ -21,6 +21,7 @@ jobs:
uses: ./.github/workflows/_Clone-linux.yml
with:
clone_dir: Paddle-build
workflow-name: 'CI-build'

build-docker:
name: build docker images
Expand All @@ -33,74 +34,74 @@ jobs:
inference:
name: PR-CI-Inference
uses: ./.github/workflows/_Inference.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_inference_image: ${{ needs.build-docker.outputs.docker_build_image }}
clone-can-skip: ${{ needs.clone.outputs.can-skip }}

build:
name: Linux-build
uses: ./.github/workflows/_Linux-build.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

static-check:
name: Static-Check
uses: ./.github/workflows/_Static-Check.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

ce-framework:
name: CE-Framework
uses: ./.github/workflows/_CE-Framework.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

ce-cinn-framework:
name: CE-CINN-Framework
uses: ./.github/workflows/_CE-CINN-Framework.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

api-benchmark:
name: Api-Benchmark
uses: ./.github/workflows/_Api-Benchmark.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

auto-parallel:
name: Auto-Parallel
uses: ./.github/workflows/_Auto-Parallel.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

model-benchmark:
name: Model-Benchmark
uses: ./.github/workflows/_Model-Benchmark.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_build_image: ${{ needs.build-docker.outputs.docker_build_image }}

doc-preview:
name: Doc-Preview
uses: ./.github/workflows/_Doc-Preview.yml
needs: [build-docker, build]
needs: [clone, build-docker, build]
with:
can-skip: ${{ needs.build.outputs.can-skip }}
docker_doc_image: ${{ needs.build-docker.outputs.docker_doc_image }}


slice:
name: Slice
uses: ./.github/workflows/_Slice.yml
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/CI-Windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ name: CI-Windows
on:
pull_request:
types: [opened, synchronize]
branches: [develop, release/**]
branches: [develop, release/**, fleety_*]

permissions: read-all

concurrency:
group: ${{ github.event.pull_request.number }}-Windows
group: ${{ github.event.pull_request.number }}-${{ github.workflow }}
cancel-in-progress: true

env:
Expand Down
22 changes: 15 additions & 7 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: CI
on:
pull_request:
types: [opened, synchronize]
branches: [develop, release/**]
branches: [develop, release/**, fleety_*]

permissions: read-all

Expand All @@ -19,6 +19,8 @@ jobs:
clone:
name: Clone-linux
uses: ./.github/workflows/_Clone-linux.yml
with:
workflow-name: 'CI'

build-docker:
name: build docker images
Expand All @@ -28,47 +30,53 @@ jobs:
sot:
name: PR-CI-SOT
uses: ./.github/workflows/_SOT.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_cpu_image: ${{ needs.build-docker.outputs.docker_cpu_image }}
clone-can-skip: ${{ needs.clone.outputs.can-skip }}

mac:
name: Mac-CPU
uses: ./.github/workflows/_Mac.yml
needs: clone
with:
clone-can-skip: ${{ needs.clone.outputs.can-skip }}

xpu:
name: Linux-XPU
uses: ./.github/workflows/_Linux-XPU.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_xpu_image: ${{ needs.build-docker.outputs.docker_xpu_image }}
clone-can-skip: ${{ needs.clone.outputs.can-skip }}

dcu:
name: Linux-DCU
uses: ./.github/workflows/_Linux-DCU.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_dcu_image: ${{ needs.build-docker.outputs.docker_dcu_image }}
clone-can-skip: ${{ needs.clone.outputs.can-skip }}

cpu:
name: Linux-CPU
uses: ./.github/workflows/_Linux-CPU.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_cpu_image: ${{ needs.build-docker.outputs.docker_cpu_image }}

npu:
name: Linux-NPU
uses: ./.github/workflows/_Linux-NPU.yml
needs: [cpu, build-docker]
needs: [clone, cpu, build-docker]
with:
can-skip: ${{ needs.cpu.outputs.can-skip }}
docker_npu_image: ${{ needs.build-docker.outputs.docker_npu_image }}

distribute:
name: Distribute-stable
uses: ./.github/workflows/_Distribute-stable.yml
needs: build-docker
needs: [clone, build-docker]
with:
docker_distribute_image: ${{ needs.build-docker.outputs.docker_distribute_image }}
clone-can-skip: ${{ needs.clone.outputs.can-skip }}
8 changes: 8 additions & 0 deletions .github/workflows/CheckPRTemplate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,15 @@ jobs:
- name: Clone paddle
uses: actions/checkout@v4

- name: Check bypass
id: check-bypass
uses: ./.github/actions/check-bypass
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
workflow-name: template

- name: Check PR Template
if: steps.check-bypass.outputs.can-skip != 'true'
env:
AGILE_PULL_ID: ${{ github.event.pull_request.number }}
AGILE_COMPILE_BRANCH: ${{ github.base_ref }}
Expand Down
4 changes: 0 additions & 4 deletions .github/workflows/Coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ jobs:
PADDLE_VERSION: 0.0.0
CUDA_VISIBLE_DEVICES: 0,1
WITH_DISTRIBUTE: "ON"
PRECISION_TEST: "ON"
WITH_PIP_CUDA_LIBRARIES: "OFF"
WITH_FLAGCX: "ON"
LITE_GIT_TAG: develop
Expand Down Expand Up @@ -114,7 +113,6 @@ jobs:
-e COVERALLS_UPLOAD \
-e PADDLE_VERSION \
-e WITH_DISTRIBUTE \
-e PRECISION_TEST \
-e WITH_PIP_CUDA_LIBRARIES \
-e WITH_FLAGCX \
-e LITE_GIT_TAG \
Expand Down Expand Up @@ -272,7 +270,6 @@ jobs:
COVERALLS_UPLOAD: "ON"
PADDLE_VERSION: 0.0.0
WITH_DISTRIBUTE: "ON"
PRECISION_TEST: "ON"
WITH_UNITY_BUILD: "ON"
PY_VERSION: 3.9
WITH_SHARED_PHI: "ON"
Expand Down Expand Up @@ -315,7 +312,6 @@ jobs:
-e COVERALLS_UPLOAD \
-e PADDLE_VERSION \
-e WITH_DISTRIBUTE \
-e PRECISION_TEST \
-e WITH_UNITY_BUILD \
-e PY_VERSION \
-e WITH_SHARED_PHI \
Expand Down
Loading