Update README.md

Atry · web-flow · commit dc522b0a272a · 2018-03-30T15:14:29.000+08:00
ND4J's reduced sum is actually slower than Compute.scala on large arrays
diff --git a/README.md b/README.md
@@ -340,7 +340,6 @@ Some information can be found in the benchmark result:
  * Apparently, Compute.scala supports both NVIDIA GPU and AMD GPU, while ND4J does not support AMD GPU.
  * Compute.scala is faster than ND4J on large arrays or complex expressions.
  * ND4J is faster than Compute.scala when performing one simple primary operation on very small arrays.
- * ND4J's reduced sum is faster than Compute.scala.
  * ND4J's `permute` and `broadcast` are extremely slow, causing very low score in the convolution benchmark (unlike this benchmark, Deeplearning4j's convolution operation internally uses some undocumented variant of `permute` and `broadcast` in ND4J, which are not extremely slow).
 
 ## Future work
@@ -351,4 +350,4 @@ Now this project is only a minimum viable product. Many important features are s
 * Add more OpenCL math functions ([#101](https://github.com/ThoughtWorksInc/Compute.scala/issues/101)).
 * Further optimization of performance ([#62, #103](https://github.com/ThoughtWorksInc/Compute.scala/labels/performance)).
 
-Contribution is welcome. Check [good first issues](https://github.com/ThoughtWorksInc/Compute.scala/labels/good%20first%20issue) to start hacking.
+Contribution is welcome. Check [good first issues](https://github.com/ThoughtWorksInc/Compute.scala/labels/good%20first%20issue) to start hacking.