-
Notifications
You must be signed in to change notification settings - Fork 13.5k
CUDA: add set #14980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: add set #14980
Conversation
|
Part of #14909 |
|
Hi, @JohannesGaessler Could you please review the changes when you have a chance? Thank you in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my current goals is to consolidate and deduplicate the code around copying data in the CUDA backend. As such I think rather than adding new kernels here it would be better to re-use the existing code. If the operation is not inlace you can use cudaMemsetAsync to set dst with the contents of src0. Afterwards you can use ggml_cpy_flt_cuda in cpy.cu to do the copy. That kernel does not have an argument for the offset but it's not needed as you can simply apply the offset in host code.
|
I have already used
If I’m wrong, please correct me. Thanks for your help! |
|
Hi @am17an, thanks again for your previous review. Since @JohannesGaessler hasn’t had a chance to respond for a few weeks, would it be possible to ask another maintainer or contributor to review this as well? I’d really appreciate any additional feedback to help move this forward. |
|
Sorry, I forgot about this PR. The code in
Set both types to float.
Use the same shape twice. |
Make sure to read the contributing guidelines before submitting a PR