-
Notifications
You must be signed in to change notification settings - Fork 14
allow configuring message sizes #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/flight_service/do_get.rs
Outdated
| encoder_builder = encoder_builder.with_max_flight_data_size(max_message_size); | ||
| } | ||
|
|
||
| let stream = | ||
| encoder_builder | ||
| .with_max_flight_data_size(usize::MAX) | ||
| .build(stream.map_err(|err| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think there's something wrong here. See how on line 140 you set the user-provided max_message_size just to immediately get unconditionally replaced with usize::MAX on line 145.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about leaving this as:
let stream = FlightDataEncoderBuilder::new()
.with_schema(stream.schema().clone())
.with_max_flight_data_size(self.max_message_size)
.build(stream.map_err(|err| {
FlightError::Tonic(Box::new(datafusion_error_to_tonic_status(&err)))
}));And make self.max_message_size instead of being Option<usize> to let it be some reasonable non-optional usize default? maybe 128 * 1024 * 1024? (2 * 1024 * 1024 would match the current FlightDataEncoderBuilder defaults, but I imagine we probably need something bigger)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we choose to have a higher default, it would probably also make sense to add a
impl ArrowFlightEndpoint {
fn into_flight_server(self) ->FlightServiceServer
}Where we drive the message sizes ourselves.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, it feels to me like it pushes datafusion-distributed to be more of a "package" than a "library". It also is a footgun as long as you can create your own incorrectly configured FlightServiceServer. So I'm split. If you want to go that way happy to do so, but let me fix up the current thing first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do some quick benchmarks. If I see that it's pretty obvious that saner defaults improve performance let's go for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==== Comparison with the previous benchmark from 2025-10-27 15:32:35 UTC ====
os: macos
arch: aarch64
cpu cores: 16
threads: 2 -> 2
workers: 8 -> 8
=============================================================================
Query 1: prev=2672 ms, new=2658 ms, diff=1.01 faster ✔
Query 2: prev= 359 ms, new= 383 ms, diff=1.07 slower ✖
Query 3: prev=1067 ms, new=1088 ms, diff=1.02 slower ✖
Query 4: prev=1126 ms, new=1010 ms, diff=1.11 faster ✔
Query 5: prev=1587 ms, new=1553 ms, diff=1.02 faster ✔
Query 6: prev=1127 ms, new=1200 ms, diff=1.06 slower ✖
Query 7: prev=2717 ms, new=2532 ms, diff=1.07 faster ✔
Query 8: prev=2486 ms, new=2495 ms, diff=1.00 slower ✖
Query 9: prev=3106 ms, new=2963 ms, diff=1.05 faster ✔
Query 10: prev=1589 ms, new=1513 ms, diff=1.05 faster ✔
Query 11: prev= 585 ms, new= 491 ms, diff=1.19 faster ✔
Query 12: prev=1674 ms, new=1641 ms, diff=1.02 faster ✔
Query 13: prev=2273 ms, new=1927 ms, diff=1.18 faster ✔
Query 14: prev=1528 ms, new=1154 ms, diff=1.32 faster ✅
Query 15: prev=1323 ms, new=1176 ms, diff=1.12 faster ✔
Query 16: prev= 258 ms, new= 256 ms, diff=1.01 faster ✔
Query 17: prev=4174 ms, new=3533 ms, diff=1.18 faster ✔
Query 18: prev=4411 ms, new=4304 ms, diff=1.02 faster ✔
Query 19: prev=2539 ms, new=2760 ms, diff=1.09 slower ✖
Query 20: prev=1558 ms, new=2006 ms, diff=1.29 slower ❌
Query 21: prev=4545 ms, new=4137 ms, diff=1.10 faster ✔
Query 22: prev= 385 ms, new= 457 ms, diff=1.19 slower ✖
It has always been a bit flaky to run these benchmarks in our own laptops, so there might be a fair amount of noise here. However, I do see far more green checks than red ones, so I'm inclined to go for higher defaults.
examples/in_memory_cluster.rs
Outdated
| .add_service(FlightServiceServer::new(endpoint)) | ||
| .add_service( | ||
| FlightServiceServer::new(endpoint) | ||
| .max_decoding_message_size(MAX_MESSAGE_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I see that this value is 4Mb by default, which seems fine for your normal day-to-day CRUD API, but a fairly low default for a distributed query engine.
I'd advocate for taking this PR one step further and actually provide more reasonable defaults apart from letting people configure them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think if we have a default it should be very high if not unlimited. I'll let you make a decision based on #207 (comment) and then happy to implement either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say let's go for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know what you think of b653b64
7024098 to
b653b64
Compare
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 thanks for all the extensive docs and comments!
Closes #206
Open to opinions but I think there's little point in limiting message sizes in this application? Should we just go with
usize::MAXas the recommendation?I don't think we can set a default value like
usize::MAXas the library because if you don't also configure your gRPC client you'd run into serious issues that occur seemingly at random (i.e. won't show up until you put it into prod possibly).