Skip to content

Conversation

@YavorGIvanov
Copy link
Contributor

No description provided.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 13, 2025
@am17an am17an requested a review from JohannesGaessler July 13, 2025 14:49
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me whether you want to address the comment regarding the loop in this PR.

const char * src0_ptr = (const char *)src0 + i3*nb03 + i2*nb02 + i1*nb01;
char * dst_ptr = (char *)dst + i3*nb3 + i2*nb2 + i1*nb1;

for (int64_t i0 = threadIdx.x; i0 < ne0; i0 += blockDim.x) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to produce correct results but generally speaking you will get much better performance if each thread just works on a single value instead of looping over ne0. However, it would also be fine to just merge it as-is and maybe change this later if it ever becomes relevant for end-to-end performance.

@CISC
Copy link
Collaborator

CISC commented Jul 31, 2025

@YavorGIvanov Shall we merge as-is or are you looking into the review comment by @JohannesGaessler?

@CISC
Copy link
Collaborator

CISC commented Aug 22, 2025

@YavorGIvanov ping

@YavorGIvanov
Copy link
Contributor Author

Let's merge as is. @CISC

@CISC CISC merged commit b1ab918 into ggml-org:master Aug 22, 2025
47 checks passed
@YavorGIvanov YavorGIvanov deleted the feature/pad-reflect-cuda-support branch August 22, 2025 11:47
qnixsynapse pushed a commit to janhq/llama.cpp that referenced this pull request Aug 25, 2025
* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants