Skip to content

Commit 2c737bc

Browse files
committed
[SYSTEMDS-3896] Improved SIMD Vectorized Counting NNZ
This patch makes an additional performance improvement which further reduced the runtime on an 8GB matrix from 850ms to 770ms (non-vectorized 1100) by avoiding unnecessary scalar ops. Furthermore, we fix the hard-coded AVX512 vector size to the general vector length (which failed on non-Intel hardware in gitactions).
1 parent 7b34a67 commit 2c737bc

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

src/main/java/org/apache/sysds/runtime/util/UtilFunctions.java

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -880,15 +880,17 @@ public static boolean isNonZero(Object obj) {
880880
}
881881

882882
public static int computeNnz(final double[] a, final int ai, final int len) {
883-
int lnnz = 0;
884883
final int end = ai + len;
885884
final int rest = (end - ai) % vLen;
885+
int lnnz = len;
886886

887+
//start from len and subtract number of zeros because
888+
//DoubleVector defines an eq but no neq operation
887889
for(int i = ai; i < ai + rest; i++)
888-
lnnz += (a[i] != 0.0) ? 1 : 0;
889-
for(int i = ai + rest; i < end; i += 8) {
890+
lnnz -= (a[i] == 0.0) ? 1 : 0;
891+
for(int i = ai + rest; i < end; i += vLen) {
890892
DoubleVector aVec = DoubleVector.fromArray(SPECIES, a, i);
891-
lnnz += vLen-aVec.eq(0).trueCount();
893+
lnnz -= aVec.eq(0).trueCount();
892894
}
893895
return lnnz;
894896
}

0 commit comments

Comments
 (0)