Skip to content

Commit 9802810

Browse files
committed
[Python] Implement NumPy array lifetime management for RDataFrame in C++
Implement NumPy array lifetime management for RDataFrame purely in C++, so we don't need to call back into Python. This is more stable, as the Python callback might be destroyed before the RDataSource when the Python interpreter is shut down. Follows up on #15031. We can further follow up on this by moving the Python object deleter callback part of the Cppyy API, so we don't have to use `gInterpreter.Declare`. Closes #19706.
1 parent 287a6bf commit 9802810

File tree

1 file changed

+33
-21
lines changed
  • bindings/pyroot/pythonizations/python/ROOT/_pythonization

1 file changed

+33
-21
lines changed

bindings/pyroot/pythonizations/python/ROOT/_pythonization/_rdataframe.py

Lines changed: 33 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -540,11 +540,6 @@ def _make_name_rvec_pair(key, value):
540540
return ROOT.std.pair["std::string", type(pyvec)](key, ROOT.std.move(pyvec))
541541

542542

543-
# For references to keep alive the NumPy arrays that are read by
544-
# MakeNumpyDataFrame.
545-
_numpy_data = {}
546-
547-
548543
def _MakeNumpyDataFrame(np_dict):
549544
r"""
550545
Make an RDataFrame from a dictionary of numpy arrays
@@ -567,23 +562,40 @@ def _MakeNumpyDataFrame(np_dict):
567562

568563
# How we keep the NumPy arrays around as long as the RDataSource is alive:
569564
#
570-
# 1. Cache a container with references to the NumPy arrays in a global
571-
# dictionary. Note that we use a copy of the original dict as the
572-
# container, because otherwise the caller of _MakeNumpyDataFrame can
573-
# invalidate our cache by mutating the np_dict after the call.
574-
#
575-
# 2. Together with the array data, store a deleter function to delete the
576-
# cache element in the cache itself.
577-
#
578-
# 3. The C++ side gets a reference to the deleter function via
579-
# std::function. Note that the C++ side can only get a non-owning
580-
# reference to the Python function, which is the reason why we have to
581-
# keep the deleter alive in the cache itself.
565+
# 1. Create a new dict with references to the NumPy arrays and take
566+
# ownership of it on the C++ side (Py_INCREF). We use a copy of the
567+
# original dict, because otherwise the caller of _MakeNumpyDataFrame
568+
# can invalidate our cache by mutating the np_dict after the call.
582569
#
583-
# 4. The RDataSource calls the deleter in its destructor.
570+
# 2. The C++ side gets a deleter std::function, calling Py_DECREF when the
571+
# RDataSource is destroyed.
572+
573+
def _ensure_deleter_declared():
574+
# If the function is already known to ROOT, skip declaring again
575+
try:
576+
ROOT.__ROOT_Internal.MakePyDeleter
577+
return
578+
except AttributeError:
579+
pass
580+
581+
ROOT.gInterpreter.Declare(
582+
r"""
583+
#include <Python.h>
584+
585+
namespace __ROOT_Internal {
586+
587+
inline std::function<void()> MakePyDeleter(std::intptr_t ptr) {
588+
PyObject *obj = reinterpret_cast<PyObject*>(ptr);
589+
Py_INCREF(obj);
590+
return [obj](){ Py_DECREF(obj); };
591+
}
592+
593+
} // namespace __ROOT_Internal
594+
"""
595+
)
596+
597+
_ensure_deleter_declared()
584598

585599
np_dict_copy = dict(**np_dict)
586600
key = id(np_dict_copy)
587-
_numpy_data[key] = (lambda: _numpy_data.pop(key), np_dict_copy)
588-
deleter = ROOT.std.function["void()"](_numpy_data[key][0])
589-
return ROOT.Internal.RDF.MakeRVecDataFrame(deleter, *args)
601+
return ROOT.Internal.RDF.MakeRVecDataFrame(ROOT.__ROOT_Internal.MakePyDeleter(key), *args)

0 commit comments

Comments
 (0)