-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Hi,
We recently explored different linear algebra libraries for our project. EJML showed really good results and we went with it originally, but then noticed some huge performance degradations on bigger matrices. I've created a sample project here: https://github.com/anatoliy-balakirev/ejml-nd4j-benchmark which is basically a small JMH benchmark, running matrix multiplication using EJML and ND4J (https://github.com/deeplearning4j/deeplearning4j).
I used the following command line (which may be a bit naive as there is only one warmup run and 3 iterations, but should be good enough to highlight the issue):
./mvnw jmh:benchmark -Djmh.f=1 -Djmh.wi=1 -Djmh.i=3 -Djmh.bm=avgt
The results are as follows:
Benchmark (matrixDimensions) Mode Cnt Score Error Units
MatrixOperationBenchmark.testMatrixMultiplicationEjml 155x9441;9441x9441 avgt 3 2.410 ± 0.817 s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml 3000x3000;3000x3000 avgt 3 3.843 ± 3.063 s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml 3300x3300;3300x3300 avgt 3 5.089 ± 1.766 s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml 3500x3500;3500x3500 avgt 3 6.314 ± 4.315 s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml 4000x4000;4000x4000 avgt 3 9.395 ± 1.378 s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml 9441x9441;9441x9441 avgt 3 133.552 ± 92.515 s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J 155x9441;9441x9441 avgt 3 0.680 ± 0.511 s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J 3000x3000;3000x3000 avgt 3 0.661 ± 0.396 s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J 3300x3300;3300x3300 avgt 3 0.793 ± 0.889 s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J 3500x3500;3500x3500 avgt 3 0.890 ± 0.573 s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J 4000x4000;4000x4000 avgt 3 1.301 ± 0.620 s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J 9441x9441;9441x9441 avgt 3 13.279 ± 4.434 s/op
As you can see, on those sizes EJML is 6-10 times slower than ND4J. On smaller sizes (these commented out lines, which you can uncomment and give it a try: https://github.com/anatoliy-balakirev/ejml-nd4j-benchmark/blob/main/src/test/java/benchmark/MatrixOperationBenchmark.java#L88-L108) EJML is actually faster than ND4J.
The full log (where you can also see some hardware details, logged by ND4J) is here:
benchmark.log
For now we ended up using EJML up to some matrix multiplication complexity and then switching to ND4J (we have a lot of matrices of those bigger sizes, so the execution time piles up). Is there any way to make EJML performance on par with ND4J for those sizes or here it's rather the maximum we can get from the pure Java version?