Skip to content

Commit 523f64b

Browse files
jcristhcho3
andauthored
Support serializing models larger than 2**31 - 1 (#624)
* Support serializing models larger than 2**31 - 1 Previously `treelite` used `ctypes.string_at` to copy the serialized bytes return value to a new python `bytes` object. This method takes a pointer and a length (expressed as an `int`). Python `bytes` objects have a max capacity of `Py_ssize_t`, not `int`. This meant that serializing very large models could error as the `size` parameter would overflow an `int`. We now use `PyBytes_FromStringAndSize` directly, avoiding this issue. It's hard to write a sane test for this that can run on CI, but I've verified that things are working locally. * Fix tests --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Hyunsu Cho <phcho@nvidia.com>
1 parent b457e5d commit 523f64b

File tree

2 files changed

+15
-2
lines changed

2 files changed

+15
-2
lines changed

python/treelite/model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
from . import compat
1414
from .core import _LIB, _check_call
15-
from .util import c_array, c_str, py_str
15+
from .util import bytes_from_string_and_size, c_array, c_str, py_str
1616

1717

1818
class Model:
@@ -345,7 +345,7 @@ def serialize_bytes(self) -> bytes:
345345
self.handle, ctypes.byref(out_bytes), ctypes.byref(out_bytes_len)
346346
)
347347
)
348-
return ctypes.string_at(out_bytes, out_bytes_len.value)
348+
return bytes_from_string_and_size(out_bytes, out_bytes_len.value)
349349

350350
@classmethod
351351
def deserialize(cls, filename: Union[str, pathlib.Path]) -> Model:

python/treelite/util.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@
2020
_NUMPY_TYPE_TABLE = {"uint32": np.uint32, "float32": np.float32, "float64": np.float64}
2121

2222

23+
_PyBytes_FromStringAndSize = ctypes.pythonapi.PyBytes_FromStringAndSize
24+
_PyBytes_FromStringAndSize.argtypes = (ctypes.c_char_p, ctypes.c_ssize_t)
25+
_PyBytes_FromStringAndSize.restype = ctypes.py_object
26+
27+
2328
def typestr_to_ctypes_type(type_info):
2429
"""Obtain ctypes type corresponding to a given Type str"""
2530
return _CTYPES_TYPE_TABLE[type_info]
@@ -35,6 +40,14 @@ def c_str(string):
3540
return ctypes.c_char_p(string.encode("utf-8"))
3641

3742

43+
def bytes_from_string_and_size(ptr, size):
44+
"""Copy `size` bytes from `ptr` to create a new python `bytes` object"""
45+
# Theoretically `ctypes.string_at` does this, but the `size` argument
46+
# there only takes an `int`, while python bytes object can support up to a
47+
# `ssize_t` in size.
48+
return _PyBytes_FromStringAndSize(ptr, size)
49+
50+
3851
def py_str(string):
3952
"""Convert C string back to Python string"""
4053
return string.decode("utf-8")

0 commit comments

Comments
 (0)