Support serializing models larger than 2**31 - 1#624
Merged
hcho3 merged 3 commits intodmlc:mainlinefrom Sep 30, 2025
Merged
Conversation
Previously `treelite` used `ctypes.string_at` to copy the serialized bytes return value to a new python `bytes` object. This method takes a pointer and a length (expressed as an `int`). Python `bytes` objects have a max capacity of `Py_ssize_t`, not `int`. This meant that serializing very large models could error as the `size` parameter would overflow an `int`. We now use `PyBytes_FromStringAndSize` directly, avoiding this issue. It's hard to write a sane test for this that can run on CI, but I've verified that things are working locally.
hcho3
approved these changes
Sep 30, 2025
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## mainline #624 +/- ##
============================================
+ Coverage 84.76% 84.77% +0.01%
============================================
Files 77 77
Lines 6747 6752 +5
Branches 531 531
============================================
+ Hits 5719 5724 +5
Misses 1028 1028 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
rapids-bot bot
pushed a commit
to rapidsai/cuml
that referenced
this pull request
Nov 14, 2025
Update Treelite to 4.6.1 to incorporate the following improvements: * **Support XGBoost 3.1** (dmlc/treelite#638) * Support scikit-learn 1.7 (dmlc/treelite#611) * Fix FIL prediction for isolation forest (dmlc/treelite#617, dmlc/treelite#620) * Support serializing models larger than `2**31 - 1` (dmlc/treelite#624) Closes #7370 Closes #7368 Closes #3838 In addition, update FIL to support vector bias from Treelite models. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - Robert Maynard (https://github.com/robertmaynard) - Simon Adorf (https://github.com/csadorf) URL: #7471
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously
treeliteusedctypes.string_atto copy the serialized bytes return value to a new pythonbytesobject. This method takes a pointer and a length (expressed as anint). Pythonbytesobjects have a max capacity ofPy_ssize_t, notint. This meant that serializing very large models could error as thesizeparameter would overflow anint.We now use
PyBytes_FromStringAndSizedirectly, avoiding this issue.It's hard to write a sane test for this that can run on CI, but I've verified that things are working locally.