Skip to content

Wrong Code Generation With AVX512 + LTO + O3 #80494

@Disservin

Description

@Disservin

Hi,

The smaller reproductions were provided by dzaima following
my initial issue report (see at the end)

There's a wrong code generation with the following 2 (3) reproductions, which
leads to squares[0] having a garbage value after a do while loop.

Godbolt 1
Godbolt 2

Reproduction 1

#include <cstdlib>
#include <cstdint>
#include <cstdio>

__attribute__((noinline,optnone))
void print_failure(int* f, int z) {
    printf("FAIL! ptr=%p, squares[0] =? %d\n", (void*)f, z);
    for (int i = 0; i < 7; i++) {
        printf("squares[%d] = %d\n", i, f[i]);
    }
    printf("\n");
}

__attribute__((noinline))
int do_table(uint64_t* board, int cond) {
    int squares[7];
    int size = 0;

    if (cond==123) {
        printf("the untaken if\n");
        uint64_t b = *board;
        do { squares[size++] = 123; b&=b-1; } while (b);
        squares[0] = squares[1];
    }

    uint64_t b = *board;
    do { squares[size++] = 123; b&=b-1; } while (b);
    
    if (squares[0] < 0) {
        print_failure(squares, squares[0]);
        exit(20);
    }

    for (int i = 0; i < size; ++i) squares[i]^= 123;

    return 0;
}


__attribute__((optnone))
int main(int argc, char* argv[]) {
    uint64_t v = 0x20202;
    int cond = -12345;
    do_table(&v, cond);
    printf("didn't fail");
}
// -O3 -march=znver4 -flto=full

Reproduction 2

#include <cstdlib>
#include <cstring>
#include <cstdint>
#include <cstdio>

__attribute__((noinline,optnone))
void print_failure(int* f, int z) {
    printf("FAIL! ptr=%p, squares[0] =? %d\n", (void*)f, z);
    for (int i = 0; i < 7; i++) {
        printf("squares[%d] = %d\n", i, f[i]);
    }
    printf("\n");
}

__attribute__((noinline))
int do_table(uint64_t* board, int cond) {
    int squares[7];
    int size = 0;

    if (cond==123) {
        printf("the untaken if\n");
        uint64_t b = *board;
        do { squares[size++] = 123; b&=b-1; } while (b);
        squares[0] = squares[1];
    }

    uint64_t b = *board;
    do { squares[size++] = 123; b&=b-1; } while (b);

    if (squares[0] < 0) {
        print_failure(squares, squares[0]);
        exit(20);
    }

    #pragma clang loop vectorize_width(16)
    for (int i = 0; i < size; ++i) squares[i]^= 123;

    return 0;
}


__attribute__((optnone))
int main(int argc, char* argv[]) {
    uint64_t v = 0x20202;
    int cond = -12345;
    do_table(&v, cond);
    printf("didn't fail");
}

// -O3 -flto=full
Initial Issue description

in advance, I'm sorry that I can't provide you with a smaller reproduction as of this moment, it's hard for us to diagnose why this behaves the way it does.

Following I will explain the issue that we, ran into here official-stockfish/Stockfish#4450.
We later merged a temporary workaround here but ultimately we believe that we've come across a compiler bug.

Prerequisites:

  • AVX512 CPU or Intel SDE

  • clone https://github.com/Disservin/Stockfish.git and git checkout minimal-repo

Reproduction:

  1. cd src && make -j build ARCH=x86-64-avx512 COMP=clang CXX=clang++-18 EXTRACXXFLAGS="-g3 -fno-omit-frame-pointer" (or clang 17)

  2. Run
    ./stockfish
    or when using SDE
    sde -spr -- ./stockfish

➜  src git:(minimal-repo) ✗ sde -spr -- ./stockfish 
info string Found 1 tablebases
[1]    708806 segmentation fault (core dumped)  sde -spr -- ./stockfish

We are currently under the impression that this might be a compiler bug in clang.

What we have tested so far:

  • does not crash with -O1
  • does not crash with debug=yes or optimize=no
  • does not crash if LTO (link-time optimization) is disabled.
  • does not crash when compiled with gcc 12.2.0 (LTO enabled).
  • does not reproduce under most sanitizers (excluding -fsanitize=nullability-assign)

Only the architectures below are problematic; others do not crash.
x86-64-vnni512
x86-64-avx512

What we have so far diagnosed:

diff --git a/src/syzygy/tbprobe.cpp b/src/syzygy/tbprobe.cpp
index ad15e751..95aefbfe 100644
--- a/src/syzygy/tbprobe.cpp
+++ b/src/syzygy/tbprobe.cpp
@@ -730,8 +730,8 @@ Ret do_probe_table(const Position& pos, T* entry, WDLScore wdl, ProbeState* resu
 
 
     // THIS EXITS??!!
-    // if (squares[0] < 0)
-    //     _exit(20);
+    if (squares[0] < 0)
+        _exit(20);
 
     d = entry->get(stm, tbFile);
  • The exit here is triggered, squares[0] seems to have a garbage value at this point, however
    we are not sure at all why this would be the case. The do while loop should be executed and have set it to at least some positive non-garbage value.

  • Running sde with the align checker reported, the following, though I'm not sure if the two things are related.

 TID: 0 executed instruction with an unaligned memory reference to address 0x7ffe90e6adad INSTR: 0x7f6ade5923dc: IFORM: VMOVDQU64_YMMu64_MASKmskw_MEMu64_AVX512 :: vmovdqu64 ymm17, ymmword ptr [rdi]
    IMAGE:    /lib/x86_64-linux-gnu/libc.so.6
    FUNCTION: __strrchr_evex
    FUNCTION ADDR: 0x7f6ade5923c0
# $eof
  • Running sde with sde -spr -null_check 1 -ptr-check 1 -- ./stockfish, returned
➜  src git:(minimal-repo) ✗ sde -spr -null_check 1 -ptr-check 1 -- ./stockfish 
info string Found 1 tablebases
SDE ERROR: DEREFERENCING BAD MEMORY POINTER PC=0x559415e9f021 MEMEA=0x55931c9ec8f4 mov eax, dword ptr [r8+rax*4]
Image: /home/max/Documents/Github/Stockfish/src/stockfish+0x6021 (in multi-region image, region# 1)
Function: main
  • The crash happens when using squares[0] as an index
Program received signal SIGSEGV, Segmentation fault.
0x000055b3c3aca021 in Stockfish::(anonymous namespace)::do_probe_table<Stockfish::(anonymous namespace)::TBTable<(Stockfish::(anonymous namespace)::TBType)0>, Stockfish::Tablebases::WDLScore> (pos=..., 
    wdl=Stockfish::Tablebases::WDLDraw, entry=<optimized out>, result=<optimized out>) at syzygy/tbprobe.cpp:825
825                 idx = (MapA1D1D4[squares[0]] * 63 + (squares[1] - adjust1)) * 62 + squares[2] - adjust2;
  • entry->hasPawns should be false and is also false by looking through the debugger (gdb)
  • Making random changes to the body of the following if statement also fix it if (entry->hasPawns) (i.e. comment out something). This branch is not taken.
  • Initializing Square squares[TBPIECES] to some value, i.e. = {}; seems to fix it, however squares shouldn't need this initialization since it's not accessed in a non safe way, this also seems like a random changes, similar to the one mentioned earlier
  • The mentioned max_element function in the issue, is probably completely unrelated because this branch isn't taken, but perhaps this branch is causing weird optimizations
  • The mentioned repo from above is a reduced version of the official one, our master branch currently has a workaround by disabling the optimizations for that function, https://github.com/official-stockfish/Stockfish/blob/master/src/syzygy/tbprobe.cpp#L711
  • We have a somewhat smaller reproduction on godbolt, however this (unfortunately) still includes a null pointer dereference, which our code does not have... though maybe it is of help? https://godbolt.org/z/MqxcW671j
  • With the trimmed down repo, I was able to reproduce it with clang 16 too after removing -fno-omit-frame-pointer, the original issue mentioned clang 15 as well though.

I will continue with trying to come up with a smaller reproduction, if this is too vague for you.
Any help is much appreciated :D

You can also reproduce this on the master repository if you want, by doing the following:

Reproduction:

  1. Remove current workaround CLANG_AVX512_BUG_FIX from do_probe_table inside src/syzygy/tbprobe.cpp L711.

  2. cd src && make -j build ARCH=x86-64-avx512 COMP=clang CXX=clang++-17 EXTRACXXFLAGS="-g3" (or clang 15-18)

  3. Create a text file with this content, replace PATH with the syzygy tablebases directory, which you got earlier

setoption name SyzygyPath value PATH
position fen 8/8/3K4/1r6/8/8/4k3/2R5 b - - 0 18
go
ucinewgame
  1. Run
    ./stockfish < input
    or when using SDE
    sde -spr -- ./stockfish < input
➜  src git:(master) ✗ ./stockfish < input 
Stockfish dev-20240126-fcbb02ff by the Stockfish developers (see AUTHORS file)
info string Found 145 tablebases
info string NNUE evaluation using nn-baff1ede1f90.nnue
info string NNUE evaluation using nn-baff1edbea57.nnue
[1]    291189 segmentation fault (core dumped) ./stockfish < input

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions