[ISSUE-815] Generate Random Numbers Asynchronously on the GPU #859

AndrewBMadison · 2025-06-10T22:58:46Z

Closes #815

Description

Replaced the custom Mersenne Twister GPU kernel with an AsyncPhilox_d class that asynchronously fills GPU buffers with random noise using cuRAND's Philox generator. The class supports double-buffering and is designed for concurrent execution.

GPUModel initializes Philox states and fills two initial buffers via loadPhilox() on a member AsyncPhilox_d instance. During each advance() call, requestSegment() retrieves a float* slice from the currently active buffer, sized appropriately for each vertex and ready to be used in advanceVertices().

Once a buffer is consumed, fillBuffer() is triggered on the other buffer while the current one continues to serve slices. This ensures continuous data availability through double-buffering.

AsyncPhilox_d uses its own internal CUDA stream to launch fill kernels asynchronously. To enable true concurrency, all other compute kernels needed to also use non-default streams. This is necessary because stream 0 (the default stream) implicitly synchronizes with all other streams, preventing concurrent execution and causing the scheduler to serialize kernel launches even when they could run in parallel.

Checklist (Mandatory for new features)

Added Documentation
Added Unit Tests

Testing (Mandatory for all changes)

GPU Test: test-medium-connected.xml Passed
GPU Test: test-large-long.xml Passed

…5-async-rng

…g to #include <cuda_runtime.h>

…Book.h

stiber

Minor cleanup.

Simulator/Connections/Connections.h

Simulator/Connections/Neuro/ConnGrowth.h

Simulator/Connections/Neuro/ConnGrowth_d.cpp

Simulator/Core/GPUModel.cpp

Simulator/Edges/AllEdges.cpp

Simulator/Vertices/AllVertices.h

Simulator/Vertices/Neuro/AllIZHNeurons_d.cpp

Simulator/Vertices/Neuro/AllLIFNeurons_d.cpp

Simulator/Vertices/Neuro/AllSpikingNeurons_d.cpp

…rding to the code style

AndrewBMadison · 2025-06-23T17:04:07Z

Minor cleanup.

I have implemented all the changes you requested as well as renamed the stream used by all the synchronous kernels to simulationStream (simulationStream_ as a member variable). I have also added new documentation to the developer docs and linked it into index.md. The old MersenneTwister files are still there if anyone wanted to try it out again, but if you'd like I could remove them.

stiber

Looks great; I will merge this.

stiber · 2025-06-30T17:31:06Z

I take it back; this needs to have SharedDevelopment merged into it. There may be a conflict with the changes to device memory allocation/deallocation being moved to OperationManager. We will need to let all the tests run, plus I will add a comment to one location in the code.

stiber

Besides the comments below, need to examine GPUModel::allocEdgeIndexMap() and GPUModel::copyCPUtoGPU() to see if they need rewrites because of DeviceVector

Simulator/Core/GPUModel.cpp

This reverts commit 5d7ee8c, reversing changes made to 7aa77eb.

…g the diff and reviewing changes to resolve conflicts.

AndrewBMadison · 2025-07-02T20:08:26Z

I reverted and remerged SharedDevelopment into AndrewDevelopment, manually reviewing changes and deleting old unnecessary code from the SharedDevelopment commit before Ben merged his code in. This resolved a lot of the changes you requested, but I am unsure if the OperationManager should execute the copyCPUtoGPU so maybe you could ask Ben. I also moved the AsyncGenerator deletion as you requested.

AndrewBMadison added 30 commits March 12, 2025 21:52

building a new class for gpu rng

22cd237

refactor

0e70083

curand rng class

69b86c9

Merge remote-tracking branch 'origin/SharedDevelopment' into issue-81…

c740f2a

…5-async-rng

revert change

8f537d0

fix performance metrics

4014610

Testing refactor

6058b88

fix performance metrics bug

b0fa9ec

added logs and increased size of buffers

4ea4b41

testing various gpu memory allocation sizes

e67a65e

added include paths for CUDA include in cmakelists to fix a bug tryin…

2fa2f4c

…g to #include <cuda_runtime.h>

testing

7278cb2

testing full simulation and adding in error handling

92bf648

changed rng to philox and performed an analysis on the distribution

8f8f4ac

turned off file logging and fixed double free bug

f56d31a

fixed non-blocking stream

d30ca17

improved nvtx support

e40cdf1

clang-format

a3dd0f2

Merge branch 'SharedDevelopment' into issue-815-async-rng

a6cdf91

comments and renaming

339bca4

Merge branch 'issue-815-async-rng' into AndrewDevelopment

9219836

clang

00c418e

cleaned up changes

51da776

clang

383e054

format

8719559

format

74104c8

fixed cpp version of utils library and added appropriate include for …

6d6b56f

…Book.h

revert clang changes on MersenneTwister

f0a60a2

fixing cpu errors with cuda includes

2872d15

cleaned up linking and unnecessary includes

522c85f

AndrewBMadison added 3 commits June 10, 2025 17:30

clang format issues

55ba665

clang format issues

13af9d7

removed unnecessary file removal

935fe1f

AndrewBMadison self-assigned this Jun 11, 2025

AndrewBMadison requested a review from stiber June 11, 2025 01:25

stiber requested changes Jun 15, 2025

View reviewed changes

AndrewBMadison added 4 commits June 23, 2025 07:54

Wrote better comments with doxygen in mind and renamed variables acco…

5d092ba

…rding to the code style

clang

4fdd643

Added documentation

66a97fa

fix nvtx variable names

7aa77eb

AndrewBMadison requested a review from stiber June 23, 2025 17:04

stiber previously approved these changes Jun 30, 2025

View reviewed changes

Merge branch 'SharedDevelopment' into AndrewDevelopment

5d7ee8c

stiber dismissed their stale review via 5d7ee8c June 30, 2025 17:29

stiber requested changes Jun 30, 2025

View reviewed changes

Simulator/Core/GPUModel.cpp Show resolved Hide resolved

Simulator/Core/GPUModel.cpp Outdated Show resolved Hide resolved

Simulator/Core/GPUModel.cpp Show resolved Hide resolved

AndrewBMadison added 3 commits July 2, 2025 10:29

Revert "Merge branch 'SharedDevelopment' into AndrewDevelopment"

ca319a9

This reverts commit 5d7ee8c, reversing changes made to 7aa77eb.

Manually remerged SharedDevelopment into AndrewDevelopment by applyin…

21f9063

…g the diff and reviewing changes to resolve conflicts.

moved deallocation of AsyncGenerator to deleteDeviceStruct

f269c67

AndrewBMadison requested a review from stiber July 2, 2025 20:08

stiber mentioned this pull request Sep 5, 2025

Fix our Mersenne Twister implementation #882

Closed

stiber requested review from NicolasJPosey and removed request for stiber September 5, 2025 16:10

[ISSUE-815] Generate Random Numbers Asynchronously on the GPU #859

Are you sure you want to change the base?

[ISSUE-815] Generate Random Numbers Asynchronously on the GPU #859

Uh oh!

Conversation

AndrewBMadison commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist (Mandatory for new features)

Testing (Mandatory for all changes)

Uh oh!

stiber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AndrewBMadison commented Jun 23, 2025

Uh oh!

stiber left a comment

Choose a reason for hiding this comment

Uh oh!

stiber commented Jun 30, 2025

Uh oh!

stiber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AndrewBMadison commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndrewBMadison commented Jun 10, 2025 •

edited

Loading