perf(arrow): Reduce the amount of allocated objects#645
perf(arrow): Reduce the amount of allocated objects#645zeroshade merged 5 commits intoapache:mainfrom
Conversation
zeroshade
left a comment
There was a problem hiding this comment.
Thanks for this, just a couple questions / nit picks
|
Thank you for taking a look @zeroshade! I have changed a few things:
Each Loki query can touch hundreds of streams that are spread over thousands of data objects, which results in thousands of arrow records being processed. Small functions such as I find it a bit tricky with |
|
Most of our patterns is like following:
We already build fields for one time use and do not modify them later. So it feels inefficient to clone them right after Access to |
I'm confused here, why would yield allocate an entire record on the heap as opposed to just creating a copy of the func (s *Schema) FieldIter() iter.Seq[Field] {
return slices.Values(s.fields)
}and would then be used like: for f := range sc.FieldIter() {
// do stuff with f
}So I'm curious what your code path is like such that this would result in allocating the entire record on the heap? At worst it would allocate the Schema object on the heap? Unless I'm missing something? |
|
The rest of the changes look good to me! |
|
That is a great question! I also was confused in the beginning, because I saw those objects being allocated in a memory profile report. I did some research and I think that it happens sometimes because the compiler is conservative. If it is not 100% sure in the result of escape analysis (the value is indeed passed into another func that passes it into another func) or a structure is too large it falls back to allocate it on the heap instead of stack. I did build a sandbox example https://godbolt.org/z/h19PnWM7T which shows |
|
Super weird. Though I'd expect records on the heap in general because they are interface types. That said, could you avoid the heap alloc by making it a value receiver instead of a pointer receiver? The schema object is fairly small, so a value receiver would avoid the escape, right? |
|
Okay, I think we can get the current version of this in for now while we discuss ways to improve the |
|
@zeroshade hey! I tried to play with value and pointer receivers, like you suggested, but have not got any improvements. Let's keep this version for now, we can find another solution for Fields later. Thank you! |
Rationale for this change
apache/arrow-gois used in the new query engine in Grafana Loki and we are actively working on improving its performance. Here are few low hanging fruits that reduce allocations by many gigabytes when used on the hot path.What changes are included in this PR?
This PR reduces amount of allocated objects.
arrow/datatype_binary.go: straightforward, thoseLayoutobjects are allocated on each call currently.arrow/schema.go:Schema.Fields()does not return a clone anymore andNewSchemaWithEndian()does not clone inputs anymore. All call sites are func-signature compatible, but should not modify those args/results anymore. In our opinion this is a reasonable trade-off for higher performance. I am happy to discuss other ways to implement that.Are these changes tested?
apache/arrow-gounit test suite.grafana/lokitest suite for the query engine.Are there any user-facing changes?
Yes. Explained above.