-
Notifications
You must be signed in to change notification settings - Fork 12
Description
We wish to update the LHS matrix in the Poisson/Helmoltz solver used in gyrokinetics (fem_poisson_perp). Here's a path to do it.
On the CPU
We simply use the mat_triples approach used in the fem_poisson_perp_new method. The only unresolved issue is that we need to experiment with doing this without destroying and creating (again) the SuperMatrix A. I think it should be possible, but I see some comments in the source code saying that I tried this some years ago, but it errored out.
On the GPU
The incoming epsilon and kSq arrays are assumed to be on GPU memory. We can't use the mat_triples approach because this struct is not defined on GPUs, and it'll take a lot of work to move all that stuff to GPU. So here we propose the following approach: fill the (flat) array of nonzero values csr_val_cu that cudss.cu uses to populate the LHS matrix.
The challenge here is that we have kernels (l2g) that compute the global i,j indices of a given nonzero-value contribution in the LHS matrix. We do not now a priori how the i,j entry maps to the flat array of nonzero values csr_val_cu (which has size nprob*nnz, and here nprob is just the number of cells along z, and nnz is the number of nonzero values of the problem in a single z-slab).
So we need the mapping
global i,j -> linear index in csr_val_cu
In order to get this mapping:
- We continue to populate the
mat_tripleson the CPU infem_poisson_perp_new. - Once we know the list of
mat_triples, we loop through the grid and:
2a) At each cell, we use thel2gkernel to get the global indicesi,j.
2b) Each cell makesnum_basis^2contributions. For each contribution we loop through the list ofmat_triples, and we find the entry with the same row and column indices asi,j. We record the (linear) location of this entry in thecsr_val_idxarray.
Then, in the fem_poisson_perp_update_lhs method, we can call the lhs_stencil kernel but rather than filling a mat_triples, we will have it directly insert a value in/accumulate a value to cudss_ops->csr_val_cu[csr_val_idx[k]].
NOTE: one ugly thing is that step 2b above involves a search, and we do this search for each cell and each of num_basis^2 contributions. To make matters worse, at the moment we are doing a brute-force search. So this is expensive. But we only do this at t=0, so hopefully is ok.
This approach is prototyped in the fem_poisson_update-prototyping branch.