-
Notifications
You must be signed in to change notification settings - Fork 4
feat: message chunking #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: message chunking #80
Conversation
Coverage Report
|
||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request centralizes message chunking logic into the SDK to eliminate duplication across worker-cli and orb-discovery. The implementation provides a configurable chunk size parameter with a default of 3.0 MB and uses greedy bin-packing to split entities into appropriately-sized chunks for gRPC ingestion.
Changes:
- Added chunking module with
create_message_chunks()andestimate_message_size()functions - Exported chunking functions from the SDK's public API
- Added comprehensive test suite covering edge cases including empty lists, single/multiple chunks, custom chunk sizes, order preservation, and large entities
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| netboxlabs/diode/sdk/chunking.py | New module implementing greedy bin-packing chunking algorithm with size estimation |
| netboxlabs/diode/sdk/init.py | Exports chunking functions to SDK's public API |
| tests/test_chunking.py | Comprehensive test suite with 10 test cases covering various chunking scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mfiedorowicz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ldrozdz93 , generally it's advisable to create GitHub issue first so we can track, triage and prioritise issues properly. Nevertheless I see it valuable, the main ask here is to document chunking in the README. Additionally, we would like to have functional parity in Diode Go SDK too, hence could you please add feature request issue there?
|
Mind failing linting issues: |
Thanks for the feedback @mfiedorowicz. I'm aware this is kind of a shortcut. A customer is actively blocked by this and I wanted to make it quick. I'll create a FR for Go SDK and add docs. |
|
@mfiedorowicz ready for review |
|
@mfiedorowicz I've created a feature request for diode-sdk-go. Let me know if that's not what you meant. |
|
🎉 This PR is included in version 1.9.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Problem
Message chunking logic was duplicated across worker-cli and orb-discovery, using a heuristic algorithm. It turned out to not be enough for a customer's dataset.
Solution
Centralized chunking logic in the SDK with a configurable chunk size parameter (default: 3.0 MB). The implementation uses greedy bin-packing that accumulates entities until adding the next entity would exceed the size limit, then starts a new chunk.
Changes
netboxlabs/diode/sdk/chunking.pywithcreate_message_chunks()andestimate_message_size()tests/test_chunking.py) with 10 test cases covering edge cases