Some routine encountered in evolution.eigensystem triggers a segfault when run inside the profiler. This behaviour seems to be dependent on numba - removing all the numba.njit decorators causes all the code to work just fine, with no Python exceptions and generally no indication of any fault. The segfault occurs with both versions 0.39 and 0.40 of numba using Python 3.6.5, numpy 1.15 and scipy 1.1.0.
The behaviour appears on both macOS (High Sierra) and Linux (the cx1 cluster at Imperial).
Example to reproduce: rabi.py.txt
When attaching a debugger to python and running the example file (or any similar reproducer), I get a message
stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
(the address is always the same, it seems), and the offending disassembled instruction is
0x1000540f1: mov rax, qword ptr [rdi + 0x10]
This makes me think that the rdi register is either being zeroed, or not written to before its use. It does not appear to be an out-of-bounds array access, but more a type of null pointer dereference (in this case with a 16-byte offset). The same behaviour occurs whether I'm using lldb on my Mac, or gdb on the cluster. On the cluster, the stack size is unlimited (output of ulimit -s), so I don't think stack size is the problem (and anyway, all allocations should be on the heap).
I don't have the debugging symbols for Python/numpy/numba installed (and they're not possible to get through conda), so I haven't yet hunted down the issue well enough to determine whose fault it is.
I also tried setting the environment variables OMP_NUM_THREADS, NUMBA_NUM_THREADS and MKL_NUM_THREADS all to 1 to remove threading issues. This does not appear to have any effect, and anyway, only one thread appears to be active at the critical time when running in the debugger.
Possibly related to issue numba/numba#3229?
At any rate, this doesn't seem to be a blocking problem because it only manifests itself when using the profiler.
Some routine encountered in
evolution.eigensystemtriggers a segfault when run inside the profiler. This behaviour seems to be dependent onnumba- removing all thenumba.njitdecorators causes all the code to work just fine, with no Python exceptions and generally no indication of any fault. The segfault occurs with both versions 0.39 and 0.40 ofnumbausing Python 3.6.5,numpy1.15 andscipy1.1.0.The behaviour appears on both macOS (High Sierra) and Linux (the
cx1cluster at Imperial).Example to reproduce: rabi.py.txt
When attaching a debugger to
pythonand running the example file (or any similar reproducer), I get a messagestop reason = EXC_BAD_ACCESS (code=1, address=0x10)(the address is always the same, it seems), and the offending disassembled instruction is
0x1000540f1: mov rax, qword ptr [rdi + 0x10]This makes me think that the
rdiregister is either being zeroed, or not written to before its use. It does not appear to be an out-of-bounds array access, but more a type of null pointer dereference (in this case with a 16-byte offset). The same behaviour occurs whether I'm usinglldbon my Mac, orgdbon the cluster. On the cluster, the stack size isunlimited(output ofulimit -s), so I don't think stack size is the problem (and anyway, all allocations should be on the heap).I don't have the debugging symbols for Python/numpy/numba installed (and they're not possible to get through
conda), so I haven't yet hunted down the issue well enough to determine whose fault it is.I also tried setting the environment variables
OMP_NUM_THREADS,NUMBA_NUM_THREADSandMKL_NUM_THREADSall to 1 to remove threading issues. This does not appear to have any effect, and anyway, only one thread appears to be active at the critical time when running in the debugger.Possibly related to issue numba/numba#3229?
At any rate, this doesn't seem to be a blocking problem because it only manifests itself when using the profiler.