Skip to content

Conversation

@AndrewBMadison
Copy link
Contributor

@AndrewBMadison AndrewBMadison commented Jun 10, 2025

Closes #815

Description

Replaced the custom Mersenne Twister GPU kernel with an AsyncPhilox_d class that asynchronously fills GPU buffers with random noise using cuRAND's Philox generator. The class supports double-buffering and is designed for concurrent execution.

GPUModel initializes Philox states and fills two initial buffers via loadPhilox() on a member AsyncPhilox_d instance. During each advance() call, requestSegment() retrieves a float* slice from the currently active buffer, sized appropriately for each vertex and ready to be used in advanceVertices().

Once a buffer is consumed, fillBuffer() is triggered on the other buffer while the current one continues to serve slices. This ensures continuous data availability through double-buffering.

AsyncPhilox_d uses its own internal CUDA stream to launch fill kernels asynchronously. To enable true concurrency, all other compute kernels needed to also use non-default streams. This is necessary because stream 0 (the default stream) implicitly synchronizes with all other streams, preventing concurrent execution and causing the scheduler to serialize kernel launches even when they could run in parallel.

Checklist (Mandatory for new features)

  • Added Documentation
  • Added Unit Tests

Testing (Mandatory for all changes)

  • GPU Test: test-medium-connected.xml Passed
  • GPU Test: test-large-long.xml Passed

@AndrewBMadison AndrewBMadison self-assigned this Jun 11, 2025
@AndrewBMadison AndrewBMadison requested a review from stiber June 11, 2025 01:25
Copy link
Contributor

@stiber stiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup.

@AndrewBMadison
Copy link
Contributor Author

Minor cleanup.

I have implemented all the changes you requested as well as renamed the stream used by all the synchronous kernels to simulationStream (simulationStream_ as a member variable). I have also added new documentation to the developer docs and linked it into index.md. The old MersenneTwister files are still there if anyone wanted to try it out again, but if you'd like I could remove them.

@AndrewBMadison AndrewBMadison requested a review from stiber June 23, 2025 17:04
stiber
stiber previously approved these changes Jun 30, 2025
Copy link
Contributor

@stiber stiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; I will merge this.

@stiber
Copy link
Contributor

stiber commented Jun 30, 2025

I take it back; this needs to have SharedDevelopment merged into it. There may be a conflict with the changes to device memory allocation/deallocation being moved to OperationManager. We will need to let all the tests run, plus I will add a comment to one location in the code.

Copy link
Contributor

@stiber stiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comments below, need to examine GPUModel::allocEdgeIndexMap() and GPUModel::copyCPUtoGPU() to see if they need rewrites because of DeviceVector

@AndrewBMadison
Copy link
Contributor Author

I reverted and remerged SharedDevelopment into AndrewDevelopment, manually reviewing changes and deleting old unnecessary code from the SharedDevelopment commit before Ben merged his code in. This resolved a lot of the changes you requested, but I am unsure if the OperationManager should execute the copyCPUtoGPU so maybe you could ask Ben. I also moved the AsyncGenerator deletion as you requested.

@AndrewBMadison AndrewBMadison requested a review from stiber July 2, 2025 20:08
@stiber stiber requested review from NicolasJPosey and removed request for stiber September 5, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants