fix: [Geneva uploader] Optimize upload path by using Bytes to eliminate unnecessary clones (specifically for retry scenario) #470
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
Replaces
Vec<u8>
withbytes::Bytes
for compressed telemetry data to eliminate expensive memory clones in the upload path, especially critical for retry scenarios.Problem
The
upload_batch()
method was cloning the entire compressed payload (potentially MBs) on every upload attempt:Impact for a 1 MB batch with 3 retry attempts:
Solution
Changed EncodedBatch.data from Vec to bytes::Bytes:
How it works:
Bytes::from(Vec<u8>)
does NOT copy - it takes ownership of theVec
's allocation and wraps it in a reference-counted container (zero-copy conversion)Bytes::clone()
just increments an atomic refcount - no data is copiedWhy Bytes instead of Arc?
While
Arc<Vec<u8>>
would also provide cheap cloning,Bytes
is the better choice for our HTTP upload use case:reqwest
integration:reqwest::Body
hasimpl From<Bytes>
that wraps the buffer directly without copying (used at uploader.rs:242). WithArc
, we'd need topass .as_ref()
(giving &[u8]), which causesreqwest
to allocate internally when converting to Body.Bytes
is the standard type for byte buffers in Rust's async ecosystem (tokio
,hyper
,reqwest
), ensuring optimal integration.Bytes::slice()
provides zero-copy views into the buffer, which could be useful for - Optimizing Bond encoding pipeline (bond_encoder.rs
,central_blob.rs
) if we identify unnecessary intermediate copies during schema/event serialization.Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes