Skip to content

Support serializing models larger than 2**31 - 1#624

Merged
hcho3 merged 3 commits intodmlc:mainlinefrom
jcrist:support-larger-serialized-messages
Sep 30, 2025
Merged

Support serializing models larger than 2**31 - 1#624
hcho3 merged 3 commits intodmlc:mainlinefrom
jcrist:support-larger-serialized-messages

Conversation

@jcrist
Copy link
Contributor

@jcrist jcrist commented Sep 21, 2025

Previously treelite used ctypes.string_at to copy the serialized bytes return value to a new python bytes object. This method takes a pointer and a length (expressed as an int). Python bytes objects have a max capacity of Py_ssize_t, not int. This meant that serializing very large models could error as the size parameter would overflow an int.

We now use PyBytes_FromStringAndSize directly, avoiding this issue.

It's hard to write a sane test for this that can run on CI, but I've verified that things are working locally.

Previously `treelite` used `ctypes.string_at` to copy the serialized
bytes return value to a new python `bytes` object. This method takes a
pointer and a length (expressed as an `int`). Python `bytes` objects
have a max capacity of `Py_ssize_t`, not `int`. This meant that
serializing very large models could error as the `size` parameter would
overflow an `int`.

We now use `PyBytes_FromStringAndSize` directly, avoiding this issue.

It's hard to write a sane test for this that can run on CI, but I've
verified that things are working locally.
@codecov
Copy link

codecov bot commented Sep 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.77%. Comparing base (b457e5d) to head (dd25f15).
⚠️ Report is 17 commits behind head on mainline.

Additional details and impacted files
@@             Coverage Diff              @@
##           mainline     #624      +/-   ##
============================================
+ Coverage     84.76%   84.77%   +0.01%     
============================================
  Files            77       77              
  Lines          6747     6752       +5     
  Branches        531      531              
============================================
+ Hits           5719     5724       +5     
  Misses         1028     1028              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hcho3 hcho3 merged commit 523f64b into dmlc:mainline Sep 30, 2025
20 checks passed
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Nov 14, 2025
Update Treelite to 4.6.1 to incorporate the following improvements:
* **Support XGBoost 3.1** (dmlc/treelite#638)
* Support scikit-learn 1.7 (dmlc/treelite#611)
* Fix FIL prediction for isolation forest (dmlc/treelite#617, dmlc/treelite#620)
* Support serializing models larger than `2**31 - 1` (dmlc/treelite#624)

Closes #7370
Closes #7368 
Closes #3838

In addition, update FIL to support vector bias from Treelite models.

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)

Approvers:
  - Jim Crist-Harif (https://github.com/jcrist)
  - Robert Maynard (https://github.com/robertmaynard)
  - Simon Adorf (https://github.com/csadorf)

URL: #7471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants