Fix another libgap segmentation fault #40728
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow-up to #40613.
Save the following into
b.sage
then run
sage b.sage
. It should segmentation fault soon.The cause
As is well known, if Cython generates code of the following form
then the
__Pyx_XDECREF(__pyx_t_8)
gets optimized toif (__pyx_t_8 != 0) SEGMENTATION_FAULT;
,and if GAP error is raised, we get null pointer dereference.
Before #40613,
KeyboardInterrupt
orAlarmInterrupt
jumps tosig_on()
,sig_GAP_Enter()
is used, first jumps toGAP_Enter()
, then asig_error()
call jumps tosig_on()
to raise a Python exception,GAP_Enter()
is used, jumps toGAP_Enter()
, which then raises a Python exception.After #40613, in all cases, the outermost
GAP_Enter()
is jumped to.Debug guide
There are 3 parts of debugging:
__pyx_t_8
,(B)
,(A)
.Finding the variable name is easy: it suffices to run
sage --gdb
then run the script.When it segmentation fault, look at the instruction that causes the segmentation fault.
In this case, it is
therefore the variable is
[rbp-0x90]
.How to map back from assembly to
.c
filecompile_commands.json
meson-generated_src_sage_libs_gap_element.pyx.c.o
fromgap/element.pyx.c
.o
to.s
then add-S -fverbose-asm -masm=intel
and recompile, then read the.s
file generatedalternatively I believe adding
-g
at appropriate places also give you this.How to find the call
Break at either
__pyx_f_4sage_4libs_3gap_4util_gap_interrupt_asap
,GAP_THROW
, or__longjmp_chk
.Then
CYSIGNALS_CRASH_QUIET=1 rr record
it,rr replay -e
andreverse-continue
from the point of segmentation fault.Relevant backtrace:
run
frame 48
and see the assembly iswhich correspond to (look at
element.pyx.c:8996
and scroll a few lines up)We determine this is the call
(B)
.How to find the assignment instruction
We want to find
(A)
.Assume we're in
rr
, runcontinue
thenreverse-stepi
a few times until thecmp
instruction is executed, then runwhere the argument to
watch
is copied from the output ofp
.Then
reverse-continue
. Whichever instruction last set the memory address is the instruction(A)
.Turns out it's the
__Pyx_GetModuleGlobalName
(to fetch thelibgap
Python global variable).Conclusion
Don't access any Python object between
GAP_Enter
...GAP_Leave
. Not even accessing the Python global variablelibgap
.Side note, if I add
cdef libgap
right before thefrom ... import libgap
line, then this particular cause is eliminated, but it still segmentation fault because of theenumerate()
plus some complex combination of compiler optimization.This fix creates a temporarily variable, but I think the overhead of creating a Python
tuple
is small.We're still accessing
gap_elements[i]
, which is a Python API call (!!!), but as it happens,GAP_AssList
never throws, so we're safe.📝 Checklist
⌛ Dependencies