Skip to content

[SYSTEMDS-3556] Counter based random number generator#2186

Closed
ichbinstudent wants to merge 13 commits intoapache:mainfrom
chris-1187:counter-based-rng
Closed

[SYSTEMDS-3556] Counter based random number generator#2186
ichbinstudent wants to merge 13 commits intoapache:mainfrom
chris-1187:counter-based-rng

Conversation

@ichbinstudent
Copy link
Contributor

Summary

Adding counter based RNG which improves random quality and speed

Details

The PRNG uses Philox4x64_10 to generate batches of random double values with a 64 bit randomness. The algorithm was tested for correctness by comparing its output for certain counters and keys with an existing implementation in c.
This reference implementation called openRAND was tested by the authors using various statistical methods described here.

Advantages

  • Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64 bits of randomness. While java.util.Random has a period of only (2^48), while the period of Philox4_64 is 2^256 - 1.
  • Speed: While the Java version of Philox4x64_10 is only about half as fast as java.util.Random, there is a Cuda kernel version available producing the exact same sequence of random numbers. This means that the Cuda and Java versions can be used interchangeably. If a system has support for Cuda, the kernel ist used, if not, the Java version can be used as a fallback. The kernel version is around 200 times faster than java.util.Random, and even faster, if the results are not copied to the CPU but kept in the GPU's memory.
  • Parallelisation: When using state based PRNGs, it is impossible to generate the same random matrix when changing the block size. With counter based PRNGs it is possible to change the block size but still compute the same random matrix by using the global index (row * row_size + col) as the counter.

chris-1187 and others added 2 commits January 20, 2025 19:02
…hbinstudent@users.noreply.github.com>

Signed-off-by: chris-1187 <christian.munz@posteo.de>
@codecov
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 86.25954% with 18 lines in your changes missing coverage. Please review.

Project coverage is 72.47%. Comparing base (9484f11) to head (fe8a986).
Report is 117 commits behind head on main.

Files with missing lines Patch % Lines
...ysds/runtime/util/PhiloxUniformCBPRNGenerator.java 90.58% 4 Missing and 4 partials ⚠️
...he/sysds/runtime/matrix/data/LibMatrixDatagen.java 68.42% 3 Missing and 3 partials ⚠️
...sysds/runtime/util/PhiloxNormalCBPRNGenerator.java 80.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2186      +/-   ##
============================================
+ Coverage     71.84%   72.47%   +0.63%     
- Complexity    44596    45465     +869     
============================================
  Files          1448     1472      +24     
  Lines        168967   170995    +2028     
  Branches      32934    33331     +397     
============================================
+ Hits         121393   123937    +2544     
+ Misses        38244    37641     -603     
- Partials       9330     9417      +87     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many good things here, please address the minor comments, and describe the performance differences in the PR.

Next steps, is to integrate it into the compiler as options, i would suggest to do it via arguments to random "CB_uniform" could be the argument for instance.

@Baunsgaard
Copy link
Contributor

If you resolve some of the comments, please feel free to mark them as such.

ichbinstudent and others added 3 commits January 26, 2025 14:21
Signed-off-by: chris-1187 <christian.munz@posteo.de>
if (valuePRNG instanceof PRNGenerator) {
rngStream = Stream.generate(() -> min + (range * ((PRNGenerator) valuePRNG).nextDouble())).iterator();
} else if (valuePRNG instanceof CounterBasedPRNGenerator) {
rngStream = Arrays.stream(((CounterBasedPRNGenerator)valuePRNG).getDoubles(ctr, blockrows * blockcols)).map(i -> min + (range * i)).iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup the stream looks nice, however, since it modifies old code, we need to be careful that the new code returns the same values as the previous.

Signed-off-by: chris-1187 <christian.munz@posteo.de>
@Baunsgaard Baunsgaard changed the title SYSTEMDS-3556 Counter based random number generator [SYSTEMDS-3556] Counter based random number generator Feb 5, 2025
Removed because can be compiled from the provided sources
@ichbinstudent
Copy link
Contributor Author

Ok, deleted the compiled kernel file.

Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request May 13, 2025
This commit adds a new random generator for SystemDS, that improves the
speed at witch we generate random matrices. The algorithm uses Philox4x64_10
to generate batches of random double values with 64 bit randomness.
The implementation is based of an implementation in openRAND, and verified
on various statistical methods.

- Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64
bits of randomness. While java.util.Random has a period of only (2^48),
while the period of Philox4_64 is 2^256 - 1.
- Speed: While the Java version of Philox4x64_10 is only about half as
fast as java.util.Random, there is a Cuda kernel version available
producing the exact same sequence of random numbers. This means that the
Cuda and Java versions can be used interchangeably. If a system has
support for Cuda, the kernel is used, if not, the Java version can be
used as a fallback. The kernel version is around 200 times faster than
java.util.Random, and even faster, if the results are not copied to the
CPU but kept in the GPU's memory.
- Parallelisation: When using state based PRNGs, it is impossible to
generate the same random matrix when changing the block size. With counter
based PRNGs it is possible to change the block size but still compute the
same random matrix by using the global index (row * row_size + col) as the
counter.

Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com>
Co-authored-by: chris-1187 <christian.munz@posteo.de>

Closes apache#2186
@Baunsgaard
Copy link
Contributor

Hi @ichbinstudent and @chris-1187

I will take this PR over and get it merged. See #2260

Thanks for the contribution.

@Baunsgaard Baunsgaard closed this May 13, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue May 13, 2025
Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request May 13, 2025
This commit adds a new random generator for SystemDS, that improves the
speed at witch we generate random matrices. The algorithm uses Philox4x64_10
to generate batches of random double values with 64 bit randomness.
The implementation is based of an implementation in openRAND, and verified
on various statistical methods.

- Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64
bits of randomness. While java.util.Random has a period of only (2^48),
while the period of Philox4_64 is 2^256 - 1.
- Speed: While the Java version of Philox4x64_10 is only about half as
fast as java.util.Random, there is a Cuda kernel version available
producing the exact same sequence of random numbers. This means that the
Cuda and Java versions can be used interchangeably. If a system has
support for Cuda, the kernel is used, if not, the Java version can be
used as a fallback. The kernel version is around 200 times faster than
java.util.Random, and even faster, if the results are not copied to the
CPU but kept in the GPU's memory.
- Parallelisation: When using state based PRNGs, it is impossible to
generate the same random matrix when changing the block size. With counter
based PRNGs it is possible to change the block size but still compute the
same random matrix by using the global index (row * row_size + col) as the
counter.

Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com>
Co-authored-by: chris-1187 <christian.munz@posteo.de>

Closes apache#2186
Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request May 13, 2025
This commit adds a new random generator for SystemDS, that improves the
speed at witch we generate random matrices. The algorithm uses Philox4x64_10
to generate batches of random double values with 64 bit randomness.
The implementation is based of an implementation in openRAND, and verified
on various statistical methods.

- Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64
bits of randomness. While java.util.Random has a period of only (2^48),
while the period of Philox4_64 is 2^256 - 1.
- Speed: While the Java version of Philox4x64_10 is only about half as
fast as java.util.Random, there is a Cuda kernel version available
producing the exact same sequence of random numbers. This means that the
Cuda and Java versions can be used interchangeably. If a system has
support for Cuda, the kernel is used, if not, the Java version can be
used as a fallback. The kernel version is around 200 times faster than
java.util.Random, and even faster, if the results are not copied to the
CPU but kept in the GPU's memory.
- Parallelisation: When using state based PRNGs, it is impossible to
generate the same random matrix when changing the block size. With counter
based PRNGs it is possible to change the block size but still compute the
same random matrix by using the global index (row * row_size + col) as the
counter.

Closes apache#2186
Closes apache#2260

Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com>
Co-authored-by: chris-1187 <christian.munz@posteo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants