Skip to content

[SYSTEMDS-3556] Counter-based random number generator#2260

Merged
Baunsgaard merged 1 commit intoapache:mainfrom
Baunsgaard:CounterBR
May 13, 2025
Merged

[SYSTEMDS-3556] Counter-based random number generator#2260
Baunsgaard merged 1 commit intoapache:mainfrom
Baunsgaard:CounterBR

Conversation

@Baunsgaard
Copy link
Contributor

This commit adds a new random generator for SystemDS, that improves the speed at witch we generate random matrices. The algorithm uses Philox4x64_10 to generate batches of random double values with 64 bit randomness. The implementation is based of an implementation in openRAND, and verified on various statistical methods.

  • Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64 bits of randomness. While java.util.Random has a period of only (2^48), while the period of Philox4_64 is 2^256 - 1.
  • Speed: While the Java version of Philox4x64_10 is only about half as fast as java.util.Random, there is a Cuda kernel version available producing the exact same sequence of random numbers. This means that the Cuda and Java versions can be used interchangeably. If a system has support for Cuda, the kernel is used, if not, the Java version can be used as a fallback. The kernel version is around 200 times faster than java.util.Random, and even faster, if the results are not copied to the CPU but kept in the GPU's memory.
  • Parallelisation: When using state based PRNGs, it is impossible to generate the same random matrix when changing the block size. With counter based PRNGs it is possible to change the block size but still compute the same random matrix by using the global index (row * row_size + col) as the counter.

Co-authored-by: ichbinstudent 45435943+ichbinstudent@users.noreply.github.com
Co-authored-by: chris-1187 christian.munz@posteo.de

Closes #2186

@codecov
Copy link

codecov bot commented May 13, 2025

Codecov Report

Attention: Patch coverage is 86.25954% with 18 lines in your changes missing coverage. Please review.

Project coverage is 72.74%. Comparing base (24e44fd) to head (c6f6137).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...ysds/runtime/util/PhiloxUniformCBPRNGenerator.java 90.58% 4 Missing and 4 partials ⚠️
...he/sysds/runtime/matrix/data/LibMatrixDatagen.java 68.42% 3 Missing and 3 partials ⚠️
...sysds/runtime/util/PhiloxNormalCBPRNGenerator.java 80.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2260      +/-   ##
============================================
+ Coverage     72.71%   72.74%   +0.02%     
- Complexity    45812    45848      +36     
============================================
  Files          1474     1477       +3     
  Lines        171903   172025     +122     
  Branches      33597    33613      +16     
============================================
+ Hits         124996   125131     +135     
+ Misses        37467    37453      -14     
- Partials       9440     9441       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

This commit adds a new random generator for SystemDS, that improves the
speed at witch we generate random matrices. The algorithm uses Philox4x64_10
to generate batches of random double values with 64 bit randomness.
The implementation is based of an implementation in openRAND, and verified
on various statistical methods.

- Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64
bits of randomness. While java.util.Random has a period of only (2^48),
while the period of Philox4_64 is 2^256 - 1.
- Speed: While the Java version of Philox4x64_10 is only about half as
fast as java.util.Random, there is a Cuda kernel version available
producing the exact same sequence of random numbers. This means that the
Cuda and Java versions can be used interchangeably. If a system has
support for Cuda, the kernel is used, if not, the Java version can be
used as a fallback. The kernel version is around 200 times faster than
java.util.Random, and even faster, if the results are not copied to the
CPU but kept in the GPU's memory.
- Parallelisation: When using state based PRNGs, it is impossible to
generate the same random matrix when changing the block size. With counter
based PRNGs it is possible to change the block size but still compute the
same random matrix by using the global index (row * row_size + col) as the
counter.

Closes apache#2186
Closes apache#2260

Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com>
Co-authored-by: chris-1187 <christian.munz@posteo.de>
@Baunsgaard Baunsgaard merged commit c6f6137 into apache:main May 13, 2025
44 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants