Skip to content

Debugging tips

Isaiah Norton edited this page Nov 19, 2019 · 14 revisions

Cython debugging for TileDB-Py

Debugging on Linux

Debugging on macOS

  • It is reasonably practical to single-step debug small sections of the Cython-generated C++ code. Some familiarity with the CPython object model is very helpful here.

  • The Cython option Cython.Compiler.Options.``emit_code_comments controls whether Cython emits a copy of the source code into the output C++ file; this is on by default and should be enabled for debugging. Each line of C++ code will be preceded by a commented-out version of the source Cython code.

  • Each context block in the generated C++ will have the corresponding line number in the original Cython code. So, start from a Cython line number, find that block, and set a breakpoint at the line below the context comment in the generated libtiledb.cpp.

  • In order to see all of the python code corresponding to C++ code while single-stepping, it is recommended to increase the lldb code-listing verbosity:

    (lldb) settings set stop-line-count-before 8
    
  • Start the python interpreter under lldb and run a command which will invoke the targeted section of Cython/C++ code.

    • or run a script (potentially w/ args). Assuming LINENO in libtiledb.cpp as per above:
    $ lldb -- python -i MYSCRIPT.py
    (lldb) b libtiledb.cpp:LINENO
    >>> import tiledb
    >>> [run command to trigger breakpoint, then step, view values, etc.]
    
    • To print Cython PyObject* variables in the debugger, install the following LLDB script: https://github.com/malor/cpython-lldb

    • Then, within a libtiledb.cpp frame:

      • individual PyObject* variables should pretty-print with p, for example: p __pyx_v_uri
      • the LLDB command frame variable will show known variables in the frame

- Ideally, the Cython code will have primitive types which can be printed with the usual lldb p(rint) command. However, to print the contents of a PyObject* inside the debugger, see the following discussion; these commands may be called in the debugger: - https://stackoverflow.com/questions/5356773/python-get-string-representation-of-pyobject

Misc debugging

  • Given a memory address, ADDR, ctypes may be used to read value(s) from that address:

    >>> import ctypes
    >>> p = ctypes.cast(ADDR, ctypes.POINTER(ctypes.c_uint64))
    >>> p[0], p[1]
        ^ equivalent to *p *(p+1) etc.
    
  • Defining the following function will allow most tests to be copy-pasted into the REPL from test_libtiledb.py, and run directly:

    >>> import tiledb, numpy as np
    >>> self = lambda: None; self.path = lambda x: os.path.join("/tmp", x)
    >>> [paste non-indented test block, and run]
    

Debugging on macOS with gdb (note: does not currently work):

Modular compilation

TileDB-Py's setup.py supports a command line argument --modular which enables a modular build. By default, code in separate .pyx files is sourced into the main libtiledb.pyx file using the Cython include command. When setup.py is run with --modular, the Cython compile-time constant TILEDBPY_MODULAR is set to True, and all files listed in MODULAR_SOURCES within setup.py are built as separate Cython modules (initially the only modular file is np2buf.pyx). When TILEDBPY_MODULAR is set, import is used to make the necessary function definitions available in libtiledb.pyx. The goal of this mechanism is to reduce the compilation time by limiting the size of the pyx file. For more details and usage example, see the following commits:

Analyzing reference count problems

Given a function (in pure python) which creates a DenseArray:

def foo():
  arr = tiledb.DenseArray(...)
  import pdb; pdb.set_trace()

Entering pdb at this point, we can print out the array:

(Pdb) p arr
<tiledb.libtiledb.DenseArray object at 0x000000123456789>

Copy the address!

Now, set a breakpoint (or repeat pdb.set_trace()) in a location where we expect the refcount of arr to be zero -- for example, some location after the function return. At that point we can check the refcount and referrers as follows:

(Pdb) import ctypes, sys
(Pdb) o = ctypes.cast(0x000000123456789, ctypes.py_object)
(Pdb) o
py_object(<tiledb.libtiledb.DenseArray object at 0x000000123456789>)
(Pdb) sys.getrefcount(o.value)
?
(Pdb) gc.get_referrers(o.value)
[...]

(note that ctypes.cast(<addr>, ctypes.py_object) does not increase the refcount of the target object -- which can be verified by assigning a second variable to the identical ctypes.cast call.

Clone this wiki locally