-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Introduce growInRange to reduce array overallocation #12844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| + minLength | ||
| + " is larger than requested maximum array length " | ||
| + maxLength); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate this first, and only then check if the array is already large enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I had placed the validation first. I changed it to cover cases where INITIAL_CAPACITY >= minSize > maxSize. In these cases, we've already allocated beyond maxSize, so we might as well use that memory. Do you think that ends up being more confusing than validating first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be strange that we allow minSize to be > maxSize, that could introduce latent bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this some more and I applied your suggestion in the latest commit. I still think there's a case where checking first will be inconvenient, e.g. if we don't have information about maxLength where we initialize the array and we end up initializing to a capacity larger than maxLength. But this is hypothetical, maybe we don't have any cases like that in the code base. At this time, I agree the proposal you two made is better.
| */ | ||
| public class NeighborArray { | ||
| public class NeighborArray implements Accountable { | ||
| private static final int INITIAL_CAPACITY = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msokolov @benwtrent Can you double check it's ok to grow this array on demand vs. pre-sizing like today? It looks like it could save significant amounts of memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaih as well as there are new concurrency concerns on some of these things.
I can read through here to see if anything stands out to me in the neighborarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to grow on demand, then we found arrays were always fully-populated and switched to preallocating, but the diversity check changed and they no longer are! So as far as that goes, growing on demand seems good to me, although I agree we need to check carefully due to concurrent updates to this data structure - that could be a deal-breaker or require locking we might not want? I'll read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I believe it's safe to grow on demand because the concurrent impl is used only when merging, and in that case we add the initial set of nodes on one thread and then NeighborArray is only updated when adding reciprocal neighbors (in HnswGraphBuilder.addDiverseNeighbors), where we acquire a write lock that should prevent concurrent updates. Basically - we already handle synchronization around these arrays for the purpose of adding/removing entries, so we should be free to resize (and replace the array pointer) in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to grow because it's under lock as @msokolov pointed out. However I think we previously avoid growing because we don't want to spend extra time on resizing and copy-paste?
Could you run the knn benchmark to check the performance is ok? (Maybe better the multithread merge version to check btw the resizing is safe
| return (long) node.length * (Integer.BYTES + Float.BYTES) | ||
| + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2L | ||
| + RamUsageEstimator.NUM_BYTES_OBJECT_REF * 2L | ||
| + Integer.BYTES * 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the RamUsageEstimator utility functions instead of estimating RAM usage manually? (shallowSizeOfInstance and sizeOf specifically)
| for (NeighborArray[] neighborArraysPerNode : graph) { | ||
| if (neighborArraysPerNode != null) { | ||
| for (NeighborArray neighborArrayPerNodeAndLevel : neighborArraysPerNode) { | ||
| total += neighborArrayPerNodeAndLevel.ramBytesUsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm this can make this method potentially a bit slow, it might be ok since normally nobody will call it in any critical code path I guess, but it would still be better if we can have an estimation that runs faster rather than the every accurate account?
| */ | ||
| public class NeighborArray { | ||
| public class NeighborArray implements Accountable { | ||
| private static final int INITIAL_CAPACITY = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to grow because it's under lock as @msokolov pointed out. However I think we previously avoid growing because we don't want to spend extra time on resizing and copy-paste?
Could you run the knn benchmark to check the performance is ok? (Maybe better the multithread merge version to check btw the resizing is safe
| } | ||
|
|
||
| int potentialLength = oversize(minLength, Integer.BYTES); | ||
| if (potentialLength > maxLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think growExact(array, Math.min(potentialLength, maxLength)) would be clearer?
|
|
||
| /** | ||
| * Returns an array whose size is at least {@code minSize}, generally over-allocating | ||
| * exponentially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense if we delegate the existing grow(int[], int) to the new one to avoid having double code path? Something like return growInRange(array, minSize, Integer.MAX_VALUE) would work I guess.
|
I ran knnPerfTest in Lucene util against this PR. 100k cohere vectors. Flushing was depending on memory usage (Do we check the ram usage of onHeapgraph during indexing to determine when to flush?) main branch: This branch: Indexing is about 3.5x slower with this change. I don't know if its due to the OnHeapGraph memory estimation being slow or the node resizing. I am gonna run a profiler to see whats up. Good news is that search latency and recall are unchanged. Forced merging time seems about the same as well :). |
|
Yeah, |
|
Thank you for running the benchmarks @benwtrent, you were quicker than I was. I was worried the modified |
|
@benwtrent Thanks for running the benchmark. I looked at the profile and I think we call @stefanvodita I can think of two options here:
|
|
Thanks for the suggestions @zhaih! I have to think about option 2 a bit more. If we change Benchmark config: main: This PR: This PR with the old memory estimation: |
|
I thought some more about option 2. It does seem quite tricky. I'm not sure if a more precise estimate is worth pursuing or not. I've pushed the version where we keep memory estimation like it was and I've added @dungba88's suggestions. I'll try to do a few more benchmark runs to see if there is a measurable slow-down and if we can reduce it by increasing |
|
Yeah I don't think we need to spend too much effort on option2, especially
in this PR, because providing an upper bound on memory usage is good enough
to me. Plus we'll never be able to precisely account how much the memory is
gonna be used.
Maybe leaving a TODO or just mentioning in the comment that this estimation
is not accurate seems ok to me.
…On Thu, Nov 30, 2023 at 1:20 PM Stefan Vodita ***@***.***> wrote:
I thought some more about option 2. It does seem quite tricky.
OnHeapHnswGraph only knows about NeighborArrays being created, but it
doesn't know about nodes being added to the arrays - HnswGraphBuilder
handles that. I don't know this code well, so I could be missing something.
I'm not sure if a more precise estimate is worth pursuing or not. I've
pushed the version where we keep memory estimation like it was and I've
added @dungba88 <https://github.com/dungba88>'s suggestions. I'll try to
do a few more benchmark runs to see if there is a measurable slow-down and
if we can reduce it by increasing INITIAL_CAPACITY.
—
Reply to this email directly, view it on GitHub
<#12844 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEFSB7ALXIILG7WBJTDKO5DYHD2C7AVCNFSM6AAAAAA7ZUIX5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZUGU3TQMJTHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I did 5 benchmark runs for 4 configurations. To avoid making this comment way too large, I'll just report the averages across the 5 runs per configuration. |
It would be good to know the actual memory savings. I don't know how to measure the trade-off without know what we are trading off for. |
| public NeighborArray(int maxSize, boolean descOrder) { | ||
| node = new int[maxSize]; | ||
| score = new float[maxSize]; | ||
| node = new int[INITIAL_CAPACITY]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was new to this class, just curious what happen if INITIAL_CAPACITY > maxSize. I thought maxSize is the maximum size that we ever needed.
But looking at previous code, it seems the array can still be expanded even if we already fully pre-allocated it with maxSize, from the line node = ArrayUtil.grow(node);. Is it just because it was initial grow on-demand, then change to full prelocation, but didn't clean that up.
If it's indeed the maximum size that we ever need, maybe Math.min(INITIAL_CAPACITY, maxSize) would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to come from the M parameter, which is maximum number of connection (neighbors) a node can have, which is M on upper level and M*2 on level 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never ever allow INITIAL_CAPACITY to be used to create these arrays when it is larger than maxSize. This would be a serious bug IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's some leftover stuff from before, yes. Although it's also possible we have a bug in our maxSize initialization? I don't think we do though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand why candidate with INITIAL_CAPACITY == 1000 would be slower. Is it GC-related? Have you been able to look at JFR at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be due to the over-allocation that every node now needs to allocate 1000 size array whether they need it or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happen if INITIAL_CAPACITY > maxSize
Great point. I fixed this and tested with INITIAL_CAPACITY == 1000 again. In the profiler, it looks the same as running the baseline. I assume nodes tend not to have 1000 neighbors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanvodita with a maxConn of 64, nodes will at most have 128 neighbors (maybe 129 as when we exceed the accounted size, we will then remove one before storing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Is there some recommended value or normal range for maxConn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just due to having consuming more memory -> more GC cycles needed -> higher latency.
So I rechecked the code, when we insert a node, we will first collect beamWidth number of candidates, and then try to diversely add those candidates to the neighborArray. So I think:
- in case that
beamWidth > maxSize, we can just init this withmaxSizeand done, because it's likely in a larger graph that the first fill will directly fill theNeighborArrayto full and there's no point on resizing it with any init size. - in case that
beamWidth < maxSize, we can just init this withbeamWidthsuch that the init fill will likely fill the array in a nearly full state?
|
I did some memory profiling and it doesn't look promising. Let's take initial capacity 100 as an example. Total allocation is 3.41GB compared to 1.29GB on the baseline, and peak heap usage is 946MiB compared to 547MiB on baseline. It's worse for smaller initial capacities and larger capacities devolve into the baseline behavior. I might try a few runs on a different data-set, but maybe these neighbor arrays tend to be mostly used in the general case. Baseline: baseline.jfr.zip Candidate: candidate-100.jfr.zip |
|
Unless anyone has other ideas, I’ll revert the changes to |
| } else { | ||
| indexesMissingFromCache = | ||
| ArrayUtil.grow(indexesMissingFromCache, numberOfMissingFromCache + 1); | ||
| ArrayUtil.growInRange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add a comment here saying that indexesMissingFromCache cannot grow beyond categoryPaths.
Was also thinking if an assert is needed here but the check in growInRange should be enough.
|
@zhaih - I also tried your idea about In the profiler, it doesn't look very different from baseline. I didn't see any resizing. If I understood you correctly, you were expecting there to be occasional resizing of arrays when The latency isn't measurably better either. Results are averaged from 5 runs (baseline is here). |
|
I reverted the changes to |
| assert minLength >= 0 | ||
| : "length must be positive (got " + minLength + "): likely integer overflow?"; | ||
|
|
||
| if (minLength > maxLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be assert instead, as this class is marked as internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's contrast this with the assertion above. A negative minLength is probably not intended, but we can techincally handle it, so we don't stop execution unless assertions are enabled. On the other hand, minLength > maxLength is not a case we can handle while obeying the contract of this method. I guess the contrast is unintended input vs invalid input. That's why I prefer the current arrangement, but let me know if you strongly feel otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, I didn't realize that.
I just learnt recently from #12624 that assert was used for the internal code path to catch bug, while throwing exceptions was used for the code path that users can directly control. In Lucene we always have assertion enabled, so assert would surely throw in tests. This class is marked as internal and not to be used by users so I thought it would be fine to just use assert here.
Anyhow, I'm fine with either. Maybe other people could have more thoughts here.
| int node = nodesOnLevel.nextInt(); | ||
| NeighborArray neighbors = hnsw.getNeighbors(level, node); | ||
| long maxNeighborsSize; | ||
| if (level == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally we would rather want to avoid having test to duplicate assumption or logic made on prod path? This seems to be a specific implementation decision that could be changed independently. I couldn't think of a better way though.
But I'm unsure about the need of this newly added code. It seems we only compute it in a single test, and we want to have a better estimation? The test seems to verify that our over-estimation cannot be more than 30% of the actual size. If we provide a better estimation maybe we can lower the tolerant threshold?
Still it seems strange that if we truly need this better estimation but it is only in test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this addition made sense with the changes to OnHeapHnswGrap. I forgot to remove it when I reverted those changes. Reverted now.
zhaih
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@stefanvodita Could you move the change entry to 9.10? Then I can merge it |
In cases where we know there is an upper limit to the potential size of an array, we can use `growInRange` to avoid allocating beyond that limit.
69c7a2f to
85e87f5
Compare
In cases where we know there is an upper limit to the potential size of an array, we can use `growInRange` to avoid allocating beyond that limit.


In cases where we know there is an upper limit to the potential size of an array, we can use
growInRangeto avoid allocating beyond that limit.We address such cases in
DirectoryTaxonomyReaderandNeighborArray.Closes #12839