[SYSTEMDS-3556] Counter based random number generator#2186
[SYSTEMDS-3556] Counter based random number generator#2186ichbinstudent wants to merge 13 commits intoapache:mainfrom
Conversation
…hbinstudent@users.noreply.github.com> Signed-off-by: chris-1187 <christian.munz@posteo.de>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2186 +/- ##
============================================
+ Coverage 71.84% 72.47% +0.63%
- Complexity 44596 45465 +869
============================================
Files 1448 1472 +24
Lines 168967 170995 +2028
Branches 32934 33331 +397
============================================
+ Hits 121393 123937 +2544
+ Misses 38244 37641 -603
- Partials 9330 9417 +87 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Baunsgaard
left a comment
There was a problem hiding this comment.
Many good things here, please address the minor comments, and describe the performance differences in the PR.
Next steps, is to integrate it into the compiler as options, i would suggest to do it via arguments to random "CB_uniform" could be the argument for instance.
src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixDatagen.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixDatagen.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/matrix/data/RandomMatrixGenerator.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/sysds/test/component/matrix/LibMatrixDatagenTest.java
Show resolved
Hide resolved
src/test/java/org/apache/sysds/test/component/matrix/LibMatrixDatagenTest.java
Outdated
Show resolved
Hide resolved
|
If you resolve some of the comments, please feel free to mark them as such. |
Signed-off-by: chris-1187 <christian.munz@posteo.de>
| if (valuePRNG instanceof PRNGenerator) { | ||
| rngStream = Stream.generate(() -> min + (range * ((PRNGenerator) valuePRNG).nextDouble())).iterator(); | ||
| } else if (valuePRNG instanceof CounterBasedPRNGenerator) { | ||
| rngStream = Arrays.stream(((CounterBasedPRNGenerator)valuePRNG).getDoubles(ctr, blockrows * blockcols)).map(i -> min + (range * i)).iterator(); |
There was a problem hiding this comment.
yup the stream looks nice, however, since it modifies old code, we need to be careful that the new code returns the same values as the previous.
Signed-off-by: chris-1187 <christian.munz@posteo.de>
Removed because can be compiled from the provided sources
|
Ok, deleted the compiled kernel file. |
This commit adds a new random generator for SystemDS, that improves the speed at witch we generate random matrices. The algorithm uses Philox4x64_10 to generate batches of random double values with 64 bit randomness. The implementation is based of an implementation in openRAND, and verified on various statistical methods. - Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64 bits of randomness. While java.util.Random has a period of only (2^48), while the period of Philox4_64 is 2^256 - 1. - Speed: While the Java version of Philox4x64_10 is only about half as fast as java.util.Random, there is a Cuda kernel version available producing the exact same sequence of random numbers. This means that the Cuda and Java versions can be used interchangeably. If a system has support for Cuda, the kernel is used, if not, the Java version can be used as a fallback. The kernel version is around 200 times faster than java.util.Random, and even faster, if the results are not copied to the CPU but kept in the GPU's memory. - Parallelisation: When using state based PRNGs, it is impossible to generate the same random matrix when changing the block size. With counter based PRNGs it is possible to change the block size but still compute the same random matrix by using the global index (row * row_size + col) as the counter. Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com> Co-authored-by: chris-1187 <christian.munz@posteo.de> Closes apache#2186
|
Hi @ichbinstudent and @chris-1187 I will take this PR over and get it merged. See #2260 Thanks for the contribution. |
This commit adds a new random generator for SystemDS, that improves the speed at witch we generate random matrices. The algorithm uses Philox4x64_10 to generate batches of random double values with 64 bit randomness. The implementation is based of an implementation in openRAND, and verified on various statistical methods. - Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64 bits of randomness. While java.util.Random has a period of only (2^48), while the period of Philox4_64 is 2^256 - 1. - Speed: While the Java version of Philox4x64_10 is only about half as fast as java.util.Random, there is a Cuda kernel version available producing the exact same sequence of random numbers. This means that the Cuda and Java versions can be used interchangeably. If a system has support for Cuda, the kernel is used, if not, the Java version can be used as a fallback. The kernel version is around 200 times faster than java.util.Random, and even faster, if the results are not copied to the CPU but kept in the GPU's memory. - Parallelisation: When using state based PRNGs, it is impossible to generate the same random matrix when changing the block size. With counter based PRNGs it is possible to change the block size but still compute the same random matrix by using the global index (row * row_size + col) as the counter. Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com> Co-authored-by: chris-1187 <christian.munz@posteo.de> Closes apache#2186
This commit adds a new random generator for SystemDS, that improves the speed at witch we generate random matrices. The algorithm uses Philox4x64_10 to generate batches of random double values with 64 bit randomness. The implementation is based of an implementation in openRAND, and verified on various statistical methods. - Quality: Instead of 32 bits in java.util.Random, Philox4_64 produces 64 bits of randomness. While java.util.Random has a period of only (2^48), while the period of Philox4_64 is 2^256 - 1. - Speed: While the Java version of Philox4x64_10 is only about half as fast as java.util.Random, there is a Cuda kernel version available producing the exact same sequence of random numbers. This means that the Cuda and Java versions can be used interchangeably. If a system has support for Cuda, the kernel is used, if not, the Java version can be used as a fallback. The kernel version is around 200 times faster than java.util.Random, and even faster, if the results are not copied to the CPU but kept in the GPU's memory. - Parallelisation: When using state based PRNGs, it is impossible to generate the same random matrix when changing the block size. With counter based PRNGs it is possible to change the block size but still compute the same random matrix by using the global index (row * row_size + col) as the counter. Closes apache#2186 Closes apache#2260 Co-authored-by: ichbinstudent <45435943+ichbinstudent@users.noreply.github.com> Co-authored-by: chris-1187 <christian.munz@posteo.de>
Summary
Adding counter based RNG which improves random quality and speed
Details
The PRNG uses Philox4x64_10 to generate batches of random double values with a 64 bit randomness. The algorithm was tested for correctness by comparing its output for certain counters and keys with an existing implementation in c.
This reference implementation called openRAND was tested by the authors using various statistical methods described here.
Advantages