Add Pad Reflect 1D CUDA support #14659

YavorGIvanov · 2025-07-13T00:46:13Z

No description provided.

JohannesGaessler

Please tell me whether you want to address the comment regarding the loop in this PR.

ggml/src/ggml-cuda/pad_reflect_1d.cu

JohannesGaessler · 2025-07-15T09:17:00Z

ggml/src/ggml-cuda/pad_reflect_1d.cu

+    const char * src0_ptr = (const char *)src0 + i3*nb03 + i2*nb02 + i1*nb01;
+    char * dst_ptr = (char *)dst + i3*nb3 + i2*nb2 + i1*nb1;
+
+    for (int64_t i0 = threadIdx.x; i0 < ne0; i0 += blockDim.x) {


This is going to produce correct results but generally speaking you will get much better performance if each thread just works on a single value instead of looping over ne0. However, it would also be fine to just merge it as-is and maybe change this later if it ever becomes relevant for end-to-end performance.

Co-authored-by: Johannes Gäßler <[email protected]>

CISC · 2025-07-31T12:30:59Z

@YavorGIvanov Shall we merge as-is or are you looking into the review comment by @JohannesGaessler?

CISC · 2025-08-22T07:55:53Z

@YavorGIvanov ping

YavorGIvanov · 2025-08-22T11:02:15Z

Let's merge as is. @CISC

* Add Pad Reflect 1D CUDA support * Update ggml/src/ggml-cuda/pad_reflect_1d.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>

Add Pad Reflect 1D CUDA support

12f5f7c

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 13, 2025

am17an requested a review from JohannesGaessler July 13, 2025 14:49

JohannesGaessler approved these changes Jul 15, 2025

View reviewed changes

Update ggml/src/ggml-cuda/pad_reflect_1d.cu

f908dce

Co-authored-by: Johannes Gäßler <[email protected]>

CISC mentioned this pull request Jul 29, 2025

Feature Request: Implement missing ops from backends #14909

Open

4 tasks

CISC merged commit b1ab918 into ggml-org:master Aug 22, 2025
47 checks passed

YavorGIvanov deleted the feature/pad-reflect-cuda-support branch August 22, 2025 11:47

bugparty mentioned this pull request Sep 13, 2025

CUDA: Optimize PAD_REFLECT_1D #15957

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Pad Reflect 1D CUDA support #14659

Add Pad Reflect 1D CUDA support #14659

Uh oh!

YavorGIvanov commented Jul 13, 2025

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

JohannesGaessler Jul 15, 2025

Uh oh!

CISC commented Jul 31, 2025

Uh oh!

CISC commented Aug 22, 2025

Uh oh!

YavorGIvanov commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Pad Reflect 1D CUDA support #14659

Add Pad Reflect 1D CUDA support #14659

Uh oh!

Conversation

YavorGIvanov commented Jul 13, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohannesGaessler Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

CISC commented Jul 31, 2025

Uh oh!

CISC commented Aug 22, 2025

Uh oh!

YavorGIvanov commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants