-
Notifications
You must be signed in to change notification settings - Fork 37
Major refactor in spectral helper + 3D Rayleigh Benard #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactor in spectral helper + 3D Rayleigh Benard #563
Conversation
|
Not quite sure why the mirror fails. |
tlunet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. Just a few comments here and there, if you can address those
| u_pad = self.itransform(u_hat, padding=padding).real | ||
|
|
||
| fexpl_pad = self.xp.zeros_like(u_pad) | ||
| fexpl_pad[iu][:] = -(u_pad[iu] * Dx_u_pad[iu] + u_pad[iv] * Dy_u_pad[iu] + u_pad[iw] * Dz_u_pad[iu]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need the [:] at the end ?
Also, try to use np.copyto instead of the = sign, it can be faster in some situations (and does exactly the same thing).
Finally, try to decompose operations of big vectors, e.g like :
out = a*u1 + b*u2 + c*u3into
np.copyto(out, u1)
out *= a
out += b*u2
out += c*u3Yes, it's a bit cumbersome, but it avoids allocating a lot of new variables during computation, which can be expensive in computation time and memory consumption. And you can still keep the non-optimized more readable code commented on top.
Remember that every time you do an operation that is not inplace with numpy arrays, you allocate a temporary one implicitly. Ideally, the eval_f method should only works with pre-allocated temporary arrays, trying to minimize the required number of temporaries, and only do in-place operations or copies. I suspect this has a big impact on performance for large size problems.
PS : this comment works also when you do matrix-vector multiplication. If possible, try to use some functions that write the result into an existing out array, as it can be done with the np.dot function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like good idea. Will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some memory profiling and it turned out that storing big matrices of size 5N^3^2 with one entry for every component and every point in space requires loads of memory. I was storing a few because it made computations, in particular on GPU, faster. However, if the code is faster in theory but we can't fit it on device, we gain nothing. So I removed all unnecessary big matrices and apply smaller ones more often instead.
Also, I implemented some of the optimizations suggested by Thibaut above.
Each of these big matrices seemed to be roughly the size of 8 solution size objects which are kept in memory and I removed 5 of them, which should give a big impact. I'll have a look at the solver and then see how much more I can now fit into memory.
|
Monodomain project fixed in #566 and upstream |
|
Anything else her? Ready to merge? Or do you want to modify this further? |
|
Let's merge this for now, and eventually improve later. The PR is already big enough 😅 |
After figuring out how to do DCT on GPUs with mpi4py-fft, I refactored a lot. The main goal was to get rid of the implementation of the DCT via FFTs in pySDC and leave that to the FFT libraries. One advantage is that this enables complex-to-complex DCT, whereas the previous implementation could only do real-to-real. Furthermore, all the implementation for transforms is now in the 1d basis implementations and not duplicated in the ND discretization implementation. Overall this implementation is a lot more straightforward and easier to follow than before.
Another reason for the refactor was going 3D. I had a version of 3D Rayleigh-Benard working with the previous implementation, as long as I used the same resolution in each direction. The problem was figuring out the global size of the array with padding when you don't know which axes have been transformed and which are distributed. In particular because there are multiple choices for distribution in >2D.
Since this version makes a lot more direct use of mpi4py-fft, this is now left to mpi4py-fft. The interface is pretty generic and I implemented a bunch of tests for the 3D stuff with padding.
Finally, the greater goal was to do 3D Rayleigh-Benard and this is also part of this PR, including tests for simple configurations where the result of the solver and right hand side evaluation are known.
It's a large PR and I do apologize. But I didn't see a way of merging the refactor in smaller chunks..