Skip to content

Files differ when oblas compiled with SSE #24

@osresearch

Description

@osresearch

Building on x86-64 (amd ryzen 7) with gcc 9.4.0-1ubuntu1~20.04.2 the _mm256_loadu2_m128i intrinsic is missing:

cc -D_DEFAULT_SOURCE -O3 -g -std=c99 -Wall -march=native  -funroll-loops -ftree-vectorize -fno-inline  -DOBLAS_AVX -DOCTMAT_ALIGN=32  -c -o oblas.o oblas.c
In file included from oblas.c:18:
oblas_avx.c: In function ‘oaxpy’:
oblas_avx.c:56:7: warning: implicit declaration of function ‘_mm256_loadu2_m128i’; did you mean ‘_mm256_loadu_si256’? [-Wimplicit-function-declaration]
   56 |       _mm256_loadu2_m128i((__m128i *)OCT_MUL_HI[u], (__m128i *)OCT_MUL_HI[u]);
      |       ^~~~~~~~~~~~~~~~~~~
      |       _mm256_loadu_si256

Compiling oblas with SSE (after cleaning the old partial compile) completes:

 deps/oblas/liboblas.a:
-       $(MAKE) -C deps/oblas CPPFLAGS+="-DOBLAS_AVX -DOCTMAT_ALIGN=32"
+       $(MAKE) -C deps/oblas CPPFLAGS+="-DOBLAS_SSE -DOCTMAT_ALIGN=32"

But the decode process produces corrupt output in the decoded peace_and_war.txt file from offset 0x280 to 0x4ff, and from 0x780 to 0x9ff, etc.

Compiling in classic mode works fine:

 deps/oblas/liboblas.a:
-       $(MAKE) -C deps/oblas CPPFLAGS+="-DOBLAS_AVX -DOCTMAT_ALIGN=32"
+       $(MAKE) -C deps/oblas CPPFLAGS+="-DOCTMAT_ALIGN=32"

Diffing the data.rq files between the classic and SSE versions show that the corruption is likely happening in the encode process since the first block has the corrupted data present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions