Skip to content

Add GPU kernels#28

Merged
amontoison merged 33 commits intomasterfrom
vecchia_mul_kernel
Feb 22, 2025
Merged

Add GPU kernels#28
amontoison merged 33 commits intomasterfrom
vecchia_mul_kernel

Conversation

@amontoison
Copy link
Member

@amontoison amontoison changed the title Add a kernel vecchia_mul_kernel! Add GPU kernels Feb 21, 2025
@amontoison amontoison marked this pull request as draft February 21, 2025 06:00

pos = colptrL[index]
offset = offsets[index]
mj = m[index]
Copy link
Member

@michel2323 michel2323 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What may be faster is doing mj = maximum(m) outside the kernel and pass as Val(mj) and then if j<mj. Same for i. Just suggesting if this is too slow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is not working right now @michel2323 and I don't understand now. :(

for s in 1:mj
for t in s:mj
acc = 0.0
for i = 1:r
Copy link
Member

@michel2323 michel2323 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't one move this inner loop to the outer loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't because it's a dot product between samples[s] and samples[t].

@amontoison amontoison marked this pull request as ready for review February 22, 2025 21:49
@amontoison amontoison merged commit d4e745c into master Feb 22, 2025
9 checks passed
@amontoison amontoison deleted the vecchia_mul_kernel branch February 22, 2025 21:49
@amontoison
Copy link
Member Author

@michel2323 @CalebDerrickson
I was using the sparsity pattern of L (sparsity pattern of the Cholesky factor) instead of the sparsity pattern of the Hessian of the objective...
All tests passed on GPU. 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants