Skip to content

Conversation

@fvaleye
Copy link
Collaborator

@fvaleye fvaleye commented Oct 8, 2025

Description

I spent some time identifying areas where we could reduce memory usage, such as avoiding unnecessary cloning or using pre-allocated memory. I also analyzed CPU-intensive operations to look for potential improvements.

This work is mainly intended to spark discussion on whether small code changes could lead to performance benefits.

1. JSON parsing:

  • Reduce String allocation by using StreamDeserializer

2. ShareableBuffer with BytesMut

  • Mostly to use zero-copy operations (previously with intermediate to_vec() operations)

3. Partition processing

  • Pre-compute sorted columns once, then use slice() instead of take() per partition

4. HashMap pre-allocation

  • Use more HashMap::with_capacity(estimated_capacity) when known

5. String clone reduction

6. Memory take optimization

  • Use std::mem::take() to move files instead of cloning

fvaleye added 13 commits October 8, 2025 14:49
…le zero-copy buffer sharing

Signed-off-by: Florian Valeye <[email protected]>
@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Oct 8, 2025
@codecov
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 78.30189% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.38%. Comparing base (52e6fa2) to head (5440789).

Files with missing lines Patch % Lines
crates/core/src/writer/utils.rs 52.63% 9 Missing ⚠️
crates/core/src/logstore/mod.rs 86.11% 0 Missing and 5 partials ⚠️
crates/core/src/writer/record_batch.rs 66.66% 2 Missing and 3 partials ⚠️
crates/core/src/writer/json.rs 33.33% 2 Missing ⚠️
crates/core/src/operations/vacuum.rs 66.66% 0 Missing and 1 partial ⚠️
crates/core/src/operations/write/execution.rs 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3826      +/-   ##
==========================================
- Coverage   74.40%   74.38%   -0.02%     
==========================================
  Files         147      147              
  Lines       39544    39565      +21     
  Branches    39544    39565      +21     
==========================================
+ Hits        29421    29431      +10     
- Misses       8737     8744       +7     
- Partials     1386     1390       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ion-elgreco
Copy link
Collaborator

I think it would be more practical if we have each section in a separate pr and then some benchmarks of before and after, then it's easier to see the added benefit

@fvaleye fvaleye force-pushed the fix/several-performance-optimizations branch from f7a67fc to 5440789 Compare October 8, 2025 15:51
@fvaleye fvaleye marked this pull request as draft October 8, 2025 16:39
@fvaleye
Copy link
Collaborator Author

fvaleye commented Oct 8, 2025

I think it would be more practical if we have each section in a separate pr and then some benchmarks of before and after, then it's easier to see the added benefit

Sure! I’d like to use this PR as an opportunity to discuss.
I'm hoping to have tracing capabilities before enabling any changes.
I’ll also add some local, isolated benchmarks to evaluate the potential benefits.

@roeap
Copy link
Collaborator

roeap commented Oct 8, 2025

I do remember that @rtyler recommended a crate/allocator that helps with tracking memory consumption.

ion-elgreco pushed a commit that referenced this pull request Oct 12, 2025
# Description
Slight change in the find_files by only cloning the string for the path
(from this [PR](#3826))

# Benchmark
- Before optimization:
Time: 668.73 µs
Clones all fields: path, partition_values HashMap, stats strings, etc.

- After optimization:
Time: 366.62 µs
Only clones the path String once
Moves the Add struct instead of deep cloning

Signed-off-by: Florian Valeye <[email protected]>
@rtyler
Copy link
Member

rtyler commented Oct 19, 2025

If you have a moment to resolve the conflicts @fvaleye I am comfortable merging this change as is

@fvaleye
Copy link
Collaborator Author

fvaleye commented Oct 20, 2025

If you have a moment to resolve the conflicts @fvaleye I am comfortable merging this change as is

I’m splitting the work into separate PRs to better isolate the performance improvements.
I’ve already implemented the most impactful ones, and once everything is complete, I’ll close this PR 😄
I’m keeping it open for now to track the changes I’ve investigated.

ion-elgreco pushed a commit that referenced this pull request Oct 23, 2025
…wn (#3895)

# Description
Following [this](#3826), this
work implements a first round of preallocating collections when their
size is known, to avoid unnecessary reallocations. This applies to
`Vec`, `HashMap` (with_capacity), and other dynamically sized
collections.
Preallocating memory improves efficiency and reduces overhead when the
final size is known.

Signed-off-by: Florian Valeye <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants