Skip to content

Hess tri structure gpu implementation#31

Merged
CalebDerrickson merged 3 commits intomasterfrom
hess_tri_structure_gpu_implementation
Feb 24, 2025
Merged

Hess tri structure gpu implementation#31
CalebDerrickson merged 3 commits intomasterfrom
hess_tri_structure_gpu_implementation

Conversation

@CalebDerrickson
Copy link
Collaborator

First verison of kernel implementation. There might be a way to flatten the kernel more, allowing for more parallelization, but this should be fine for the moment.

@CalebDerrickson CalebDerrickson merged commit 1c0dc1c into master Feb 24, 2025
9 checks passed
@CalebDerrickson CalebDerrickson deleted the hess_tri_structure_gpu_implementation branch February 24, 2025 22:37
backend = KA.get_backend(hrows)
kernel = vecchia_generate_hess_tri_structure_kernel!(backend)

f(x) = (x * (x+1)) ÷ 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@amontoison amontoison Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok you use this function after but you could directly do it in the kernel.
It's what I do there:
https://github.com/exanauts/VecchiaMLE.jl/blob/master/src/VecchiaMLE_kernels.jl#L90-L93

Comment on lines +18 to +19
#include("test_cpu_diagnostics.jl")
#include("test_memory_allocation_outliers_cpu.jl")
Copy link
Member

@amontoison amontoison Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they commented???

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using the test VecchiaMLE to check my kernel. Forgot to uncomment them. oops!

Comment on lines +152 to +156
carry_offsets = CUDA.ones(Int, n)
view(carry_offsets, 2:n) .+= cumsum(f.(view(colptr_diff, 1:n-1)))

idx_offsets = CUDA.ones(Int, n)
view(idx_offsets, 2:n) .+= cumsum(view(colptr_diff, 1:n-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to add new allocations :)

Comment on lines +211 to +213
println("CPU:\n")
println(hrows)
println(hcols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CalebDerrickson You can't keep these debug printing when you merge your modifcations into the main branch.

@amontoison
Copy link
Member

@michel2323 Can you review the PRs of @CalebDerrickson during this week?

@CalebDerrickson CalebDerrickson restored the hess_tri_structure_gpu_implementation branch February 24, 2025 22:54
@amontoison
Copy link
Member

Off-topic: They rebooted moonshot, we have again CI with GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants