-
Notifications
You must be signed in to change notification settings - Fork 36
Debugging tips
-
Cython provides a cython-aware gdb frontend, cygdb:
-
However, gdb/cygdb are only practically useful on Linux, because gdb does not work well on newer versions of macOS.
-
checking the version of libtiledb in the running python process:
-
import os; os.getpid()
to get the pid from python - in shell,
pmap <pid> | grep libtiledb
-
-
It is reasonably practical to single-step debug small sections of the Cython-generated C++ code. Some familiarity with the CPython object model is very helpful here.
-
The Cython option
Cython.Compiler.Options.``emit_code_comments
controls whether Cython emits a copy of the source code into the output C++ file; this is on by default and should be enabled for debugging. Each line of C++ code will be preceded by a commented-out version of the source Cython code. -
Each context block in the generated C++ will have the corresponding line number in the original Cython code. So, start from a Cython line number, find that block, and set a breakpoint at the line below the context comment in the generated
libtiledb.cpp
. -
In order to see all of the python code corresponding to C++ code while single-stepping, it is recommended to increase the lldb code-listing verbosity:
(lldb) settings set stop-line-count-before 8
-
Start the python interpreter under lldb and run a command which will invoke the targeted section of Cython/C++ code.
- or run a script (potentially w/ args). Assuming LINENO in
libtiledb.cpp
as per above:
$ lldb -- python -i MYSCRIPT.py (lldb) b libtiledb.cpp:LINENO >>> import tiledb >>> [run command to trigger breakpoint, then step, view values, etc.]
-
To print Cython
PyObject*
variables in the debugger, install the following LLDB script: https://github.com/malor/cpython-lldb -
Then, within a
libtiledb.cpp
frame:- individual
PyObject*
variables should pretty-print withp
, for example:p __pyx_v_uri
- the LLDB command
frame variable
will show known variables in the frame
- individual
- or run a script (potentially w/ args). Assuming LINENO in
- Ideally, the Cython code will have primitive types which can be printed with the usual lldbp(rint)
command. However, to print the contents of aPyObject*
inside the debugger, see the following discussion; these commands may be called in the debugger:- https://stackoverflow.com/questions/5356773/python-get-string-representation-of-pyobject
- checking the version of libtiledb in the running python process:
import os; os.getpid()
- in shell:
vmmap -p <pid> | grep libtiledb
-
Given a memory address, ADDR,
ctypes
may be used to read value(s) from that address:>>> import ctypes >>> p = ctypes.cast(ADDR, ctypes.POINTER(ctypes.c_uint64)) >>> p[0], p[1] ^ equivalent to *p *(p+1) etc.
-
Defining the following function will allow most tests to be copy-pasted into the REPL from
test_libtiledb.py
, and run directly:>>> import tiledb, numpy as np >>> self = lambda: None; self.path = lambda x: os.path.join("/tmp", x) >>> [paste non-indented test block, and run]
- Install gdb from Homebrew
- Follow signing instructions to give sufficient access to gdb:
- [TBD: so far unsuccessful. unclear as of 2019/3, (link 1) (link 2), whether any gdb version supports macOS 10.14]
TileDB-Py's setup.py
supports a command line argument --modular
which enables a modular build. By default, code in separate .pyx files is sourced into the main libtiledb.pyx
file using the Cython include
command. When setup.py
is run with --modular
, the Cython compile-time constant TILEDBPY_MODULAR
is set to True
, and all files listed in MODULAR_SOURCES
within setup.py
are built as separate Cython modules (initially the only modular file is np2buf.pyx
). When TILEDBPY_MODULAR
is set, import
is used to make the necessary function definitions available in libtiledb.pyx. The goal of this mechanism is to reduce the compilation time by limiting the size of the pyx file. For more details and usage example, see the following commits:
- Modularization: https://github.com/TileDB-Inc/TileDB-Py/commit/11dcba6d1dc49f72c604fc49ab225f85983f9c78
- Usage: https://github.com/TileDB-Inc/TileDB-Py/commit/a898f7e7f58760a923cfc694e409f0fda46a9a61
Given a function (in pure python) which creates a DenseArray:
def foo():
arr = tiledb.DenseArray(...)
import pdb; pdb.set_trace()
Entering pdb at this point, we can print out the array:
(Pdb) p arr
<tiledb.libtiledb.DenseArray object at 0x000000123456789>
Copy the address!
Now, set a breakpoint (or repeat pdb.set_trace()
) in a location where we expect the refcount of
arr
to be zero -- for example, some location after the function return. At that point we can check the refcount and referrers as follows:
(Pdb) import ctypes, sys
(Pdb) o = ctypes.cast(0x000000123456789, ctypes.py_object)
(Pdb) o
py_object(<tiledb.libtiledb.DenseArray object at 0x000000123456789>)
(Pdb) sys.getrefcount(o.value)
?
(Pdb) gc.get_referrers(o.value)
[...]
(note that ctypes.cast(<addr>, ctypes.py_object)
does not increase the refcount of the target object -- which can be verified by assigning a second variable to the identical ctypes.cast
call.
TileDB-Py can be run with libtiledb compiled aginst address sanitizer, by using the --enable-sanitizer=address
TileDB bootstrap option, and then preloading the ASAN library before running TileDB-Py:
export LD_PRELOAD=/usr/lib64/libasan.so.4.0.0
(path above is for CentOS 7 / AL2; paths will vary based on Linux distribution)
TileDB-Py may be built with address sanitizer support using the following exports before running setup.py
:
export LFLAGS="-fsanitize=address"
export CXXFLAGS="-fsanitize=address -g -fno-omit-frame-pointer"