-
Notifications
You must be signed in to change notification settings - Fork 16
[ISSUE-815] Generate Random Numbers Asynchronously on the GPU #859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: SharedDevelopment
Are you sure you want to change the base?
Conversation
…g to #include <cuda_runtime.h>
stiber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor cleanup.
I have implemented all the changes you requested as well as renamed the stream used by all the synchronous kernels to simulationStream (simulationStream_ as a member variable). I have also added new documentation to the developer docs and linked it into index.md. The old MersenneTwister files are still there if anyone wanted to try it out again, but if you'd like I could remove them. |
stiber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great; I will merge this.
|
I take it back; this needs to have SharedDevelopment merged into it. There may be a conflict with the changes to device memory allocation/deallocation being moved to |
stiber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the comments below, need to examine GPUModel::allocEdgeIndexMap() and GPUModel::copyCPUtoGPU() to see if they need rewrites because of DeviceVector
…g the diff and reviewing changes to resolve conflicts.
|
I reverted and remerged SharedDevelopment into AndrewDevelopment, manually reviewing changes and deleting old unnecessary code from the SharedDevelopment commit before Ben merged his code in. This resolved a lot of the changes you requested, but I am unsure if the OperationManager should execute the copyCPUtoGPU so maybe you could ask Ben. I also moved the AsyncGenerator deletion as you requested. |
Closes #815
Description
Replaced the custom Mersenne Twister GPU kernel with an AsyncPhilox_d class that asynchronously fills GPU buffers with random noise using cuRAND's Philox generator. The class supports double-buffering and is designed for concurrent execution.
GPUModel initializes Philox states and fills two initial buffers via loadPhilox() on a member AsyncPhilox_d instance. During each advance() call, requestSegment() retrieves a float* slice from the currently active buffer, sized appropriately for each vertex and ready to be used in advanceVertices().
Once a buffer is consumed, fillBuffer() is triggered on the other buffer while the current one continues to serve slices. This ensures continuous data availability through double-buffering.
AsyncPhilox_d uses its own internal CUDA stream to launch fill kernels asynchronously. To enable true concurrency, all other compute kernels needed to also use non-default streams. This is necessary because stream 0 (the default stream) implicitly synchronizes with all other streams, preventing concurrent execution and causing the scheduler to serialize kernel launches even when they could run in parallel.
Checklist (Mandatory for new features)
Testing (Mandatory for all changes)
test-medium-connected.xmlPassedtest-large-long.xmlPassed