Skip to content

Conversation

@mkleen
Copy link
Contributor

@mkleen mkleen commented Jan 2, 2026

Which issue does this PR close?

Relates to #19052 (comment)

Rationale for this change

This adds heap memory estimation to statistics.

What changes are included in this PR?

NA.

Are these changes tested?

Yes

Are there any user-facing changes?

Adds a new HeapSize trait and implementations for all relevant types used in memory estimation. The trait is taken from arrow-rs, where it is currently private, and is intended as a temporary solution until arrow-rs is updated.

@github-actions github-actions bot added common Related to common crate execution Related to the execution crate labels Jan 2, 2026
pub fn heap_size(&self) -> usize {
// column_statistics + num_rows + total_byte_size
self.column_statistics.capacity() * size_of::<ColumnStatistics>()
+ size_of::<Precision<usize>>() * 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here Precision<usize> is an enum and does not have a heap allocated fields, so it is allocated in the stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these things are usually Arc'ed - so everything should be moved to the heap, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So size_of::<Precision<usize>>() * 2 should be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, if we want to follow the trait in arrow, which I think according to #19599 (comment) was the conclusion of the next step? Do you plan on push a commit to do so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i am on it. pr coming up soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to get the heap size of arrays to implement it for Statistics? What's the chain of fields that takes us there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bummer. Isn't there ways to get the size of an array in memory? E.g. Array::get_array_memory_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may actually work. Thanks, I will try that.

This adds a heap_size method returning the amount of memory a statistics
struct allocates on the heap.
@github-actions github-actions bot removed the execution Related to the execution crate label Jan 10, 2026
@mkleen mkleen force-pushed the stats-limit branch 3 times, most recently from 17a4cd1 to 153d1ad Compare January 10, 2026 06:12
fn heap_size(&self) -> usize {
self.num_rows.heap_size()
+ self.total_byte_size.heap_size()
+ self
Copy link
Contributor Author

@mkleen mkleen Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_rows and total_byte_size will result in 0, so this is included for consistency, but could also be omitted.

@mkleen mkleen requested review from adriangb and martin-g January 10, 2026 06:22
@mkleen mkleen changed the title Add heap_size to statistics Add support for HeapSize to statistics Jan 10, 2026
@mkleen mkleen changed the title Add support for HeapSize to statistics Add heap memory estimation to statistics Jan 10, 2026
@mkleen mkleen changed the title Add heap memory estimation to statistics Add heap memory estimation for statistics Jan 10, 2026
@github-actions github-actions bot added the execution Related to the execution crate label Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate execution Related to the execution crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants