Use value-based LRU cache in NodeHash #12738

dungba88 · 2023-10-31T07:48:33Z

Description

First attempt to introduce value-based LRU cache in NodeHash. There are some inefficiencies, but the functionalities work.

See this comment for the main idea

I think we can use ByteBlockPool to store the byte[] slices, just appending a new byte[] slice when we store a new suffix. We never delete individual suffixes, but rather discard the entire "secondary" hash map in the double barrel cache, so we could just drop/recycle the ByteBlockPool at that point too.

mikemccand

I love this approach! Thank you for tackling it so quickly!

It's wonderful that it passes all tests :)

We just need to fix the nocommits (switch to ByteBlockPool to hold the copied per-node byte[] values)!

Thanks @dungba88

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

mikemccand · 2023-10-31T09:52:09Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

+      throws IOException {
+    // nocommit: this is non-optimal, we should have a BytesReader that wraps and read the
+    // ByteBlockPool directly
+    byte[] bytes = table.getBytes(address);


I think we can change address from the "global" address (in the FST's growing append-only byte[]), to the more compact address of the ByteBlockPool belonging to each of two (primary and fallback) hash sets?

I ended up storing the relative differences between the global address and the ByteBlockPool address. It retains the existing behavior of FST operation (which always rely on the global address). See the ByteBlockPoolReverseBytesReader

Ahhh that's right, the hash map must retain the true FST offset since that's what future added nodes must link to!

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

lucene/core/src/java/org/apache/lucene/util/fst/ReverseBytesReader.java

dungba88 · 2023-10-31T23:51:39Z

I solved most of the nocommits (only 1 left). But I ended up using a List<byte[]> where each item is a node instead of ByteBlockPool due to the following reasons:

With ByteBlockPool we have 2 unavoidable double byte-copies: (1) when write from BytesStore to the primary table and (2) when promote an entry from the fallback table to the primary table. In both situations we need to first write into a temporary byte[].
Some additional side benefits:
- The fallback and primary tables can share some of copied nodes (when it is promoted), thus reducing the memory usage
- We automatically get the length of the node without any traversing
- When creating the BytesReader for FST operations, we just the byte[] as is and put it to the ReverseByteReader, no copy here

The downside is that it's limited to 2 billion nodes (which I think should suffice). This can also be overcome by using a nested list with fixed inner list size.

dungba88

Added some explanation

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

dungba88 · 2023-11-01T04:29:51Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java


    public PagedGrowableHash() {
      entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT);
+      copiedOffsets = new PagedGrowableWriter(32, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT);


We are doubling the bytes because we need to write both the node global address and the local copiedNodes offset (in adjacent position). Global address will be written on the even position.

Hmm, I see: we need two long values stored per entry? One for the true FST appending byte[] offset, and another for the local pool'd copy of just the byte[] for this node? Maybe rename entries to fstNodeAddress and then copiedNodeAddress for the pool'd offsets?

dungba88 · 2023-11-01T04:52:58Z

Ok, it's ready for review. I'll add the CHANGES.txt entry once it's approved.

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

mikemccand · 2023-11-01T10:41:06Z

But I ended up using a List<byte[]> where each item is a node instead of ByteBlockPool due to the following reasons:

Hmm -- this is sizable added RAM overhead per entry. Added array header (16 or 20 bytes), added pointers to these arrays (4 or 8 bytes), tiny objects for GC to crawl, though maybe they mostly die young if the RAM budget is low enough. Yes, we can share some of these byte[] with both hashes but I suspect that's not a net/net win.

We automatically get the length of the node without any traversing

We are paying 4 bytes per entry for this :) And it seems a shame since FST's node encoding is already self-delimiting, and, as a side effect of looking up that node we will already have computed that length.

With ByteBlockPool we have 2 unavoidable double byte-copies: (1) when write from BytesStore to the primary table and (2) when promote an entry from the fallback table to the primary table. In both situations we need to first write into a temporary byte[].

I don't think this added cost is so much. We freeze the node into the true FST appending byte store, then, we copy those "last N bytes" over to primary hash. Eventually it's moved to fallback, and, maybe it never gets promoted back (single copy), or, maybe it does (+1 copy) but that's "worth it" since we achieve some minimization, i.e. the cost correlates nicely with the win (the whole point of this double LRU hash).

dungba88 · 2023-11-01T10:51:59Z

Eventually it's moved to fallback, and, maybe it never gets promoted back (single copy), or, maybe it does (+1 copy)

There is actually already one copy before this, which is where we read from the BytesStore into the temporary byte[]. So in case it never got promoted it's 2 copy and when it is, it is 4 copy (one to read from the fallback table into a temporary byte[] and one to write to the primary table).

When it got promoted, we also had one side effect that the byte can be shared between the primary and fallback.

I see there is a tradeoff here. If we don't care too much about CPU then we can use BytesRefArray like @gf2121 suggested.

mikemccand

I like where this is going! But I sure hope we can pack each copied node's byte[] into a single ByteBlockPool instead of separate byte[] ... the added RAM overhead of the latter is high.

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

mikemccand · 2023-11-01T10:53:14Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java


    public PagedGrowableHash() {
      entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT);
+      copiedOffsets = new PagedGrowableWriter(32, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT);


Hmm, I see: we need two long values stored per entry? One for the true FST appending byte[] offset, and another for the local pool'd copy of just the byte[] for this node? Maybe rename entries to fstNodeAddress and then copiedNodeAddress for the pool'd offsets?

mikemccand · 2023-11-01T10:54:55Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

+              PackedInts.bitsRequired(lastNodeAddress),
+              PackedInts.COMPACT);
      mask = size - 1;
+      offsetMask = size * 2 - 1;


Hmm -- why are entries and copiedOffsets not simply side-by-side values arrays for the hash map? Why is copiedOffsets 2X the size? It seems like we could have them precisely match (side by side value arrays for the same hash entry)?

The index of entries are the hash of the node arcs while the index of copiedOffsets are the hash of the address hence their positions are not matched.

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

mikemccand · 2023-11-01T13:54:21Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

+      long pos = Long.hashCode(pointer) & mask;
+      // find an empty slot
+      while (fstNodeAddress.get(pos) != 0) {
+        pos = (pos + 1) & mask;


Hmm we found linear probing to be slower than quadratic in a prior PR. But if we consolidate down to the two parallel PagedGrowableWriter we can just use the quadratic hash we already use.

Is it this PR (#12716)? That's interesting. I also got the linear probing from that PR. Lemme change it to quadratic.

mikemccand · 2023-11-01T13:56:02Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

+     * (long), returning the local copiedNodes start address if the two nodes are matched, or -1
+     * otherwise
+     */
+    private int getMatchedNodeLength(FSTCompiler.UnCompiledNode<T> node, long address)


Is this duplicating the nodesEqual code? Instead of that, could we have an instance variable that sets the length as a side effect of nodeEquals? A bit messy, but ... I think worth it?

This is actually renamed from nodesEqual (it was removed), so there is no duplication. The old behavior is essentially getMatchedNodeLength != -1

mikemccand · 2023-11-02T09:08:07Z

Thanks @dungba88 -- I will review!

But first I tried running IndexToFST (recently born helper tool, now in luceneutil) on a wikimediumall index, creating the FST from all of its body field terms, but hit this exciting doozie:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 350                                                                                                             
        at org.apache.lucene.util.ByteBlockPool.readBytes(ByteBlockPool.java:250)                                                                                                                                      
        at org.apache.lucene.util.fst.NodeHash$PagedGrowableHash.getBytes(NodeHash.java:248)                                                                                                                           
        at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:121)                                                                                                                                                  
        at org.apache.lucene.util.fst.FSTCompiler.compileNode(FSTCompiler.java:294)                                                                                                                                    
        at org.apache.lucene.util.fst.FSTCompiler.freezeTail(FSTCompiler.java:702)                                                                                                                                     
        at org.apache.lucene.util.fst.FSTCompiler.add(FSTCompiler.java:776)                                                                                                                                            
        at IndexToFST.main(IndexToFST.java:65)

Seems like a lastFallbackNodeLength was -1 (node not found in fallback) sort of situation? Not sure...

dungba88 · 2023-11-02T09:19:40Z

Yes, I just noticed that, and pushed out a fix.

Seems like I was using the primary table pos instead of the fallback pos. And I added an assertion to catch it earlier.

Let me also re-run the test with my local Github.

mikemccand

This is looking closer! I left a bunch of small comments. Thanks @dungba88!

lucene/core/src/java/org/apache/lucene/util/fst/ByteBlockPoolReverseBytesReader.java

lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java

lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

mikemccand · 2023-11-02T09:35:32Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

          // at 0:
-          assert node != 0;
+          assert node != FST.FINAL_END_NODE && node != FST.NON_FINAL_END_NODE;
+          byte[] buf = new byte[Math.toIntExact(node - startAddress + 1)];


Hmm why the +1 here?

node is the last address of the node and startAddress is its starting address. Hence we compute the length by end - start + 1 (like if end == start then length must be 1)

Brain hurts! Reverse ob1 error in my brain :)

What confuses me here is if fstCompiler.addNode writes 3 bytes, won't we compute a length of 4?

Oh, I see! FSTCompiler#addNode has this at the end:

final long thisNodeAddress = bytes.getPosition() - 1; bytes.reverse(startAddress, thisNodeAddress); nodeCount++; return thisNodeAddress;

So that last byte address it returns is inclusive, since it did the -1, so we have to +1 to undo that. OK I think it makese sense ;)

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

mikemccand

Very close!

mikemccand · 2023-11-03T11:30:03Z

lucene/core/src/java/org/apache/lucene/util/fst/ByteBlockPoolReverseBytesReader.java

+  }
+
+  @Override
+  public boolean reversed() {


I wonder why FST.BytesReader even has this method? It might be a holdover (now dead?) from the pack days (long ago removed). But we should not try to fix it here ... this change is awesome enough already!

I'll open a spinoff issue for this -- it seems at quick glance to be dead/pointless code.

mikemccand · 2023-11-03T11:36:22Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

+      fstNodeAddress.set(hashSlot, nodeAddress);
      count++;
+      copiedNodes.append(bytes);
+      copiedBytes += bytes.length;


Hmm couldn't we just use copiedBytes.getPosition() instead? Isn't that the same?

I didn't see this method in ByteBlockPool, maybe it got removed? It's also strange that this class does not have a method to get its size, although we might be able to compute with the bufferUpTo and byteUpTo.

I could add that method in ByteBlockPool as well.

Huh, it was just there, in 9.x, disappeared when I git pulld! Something must've just removed it? Found it:

LUCENE-10560: Faster merging of TermsEnum (#1052)

Maybe it was just dead code and @jpountz removed? But let's bring it back from the dead here?

@mikemccand This change did not touch ByteBlockPool? Or did I misunderstand what you said?

@mikemccand This change did not touch ByteBlockPool? Or did I misunderstand what you said?

Woops sorry you are right! Your commit did not touch ByteBlockPool. Now I am really confused about what I though I saw early this AM. I must've been hallucinating. Maybe I am just an LLM.

Sorry for the false accusation!

I could add that method in ByteBlockPool as well.

+1 to add it LOL.

Yeah I added it, and found there is a potential inconsistenct in blocksize between the Allocator and ByteBlockPool, added a TODO to fix that later.

mikemccand · 2023-11-03T11:37:10Z

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

+      copiedBytes += bytes.length;
+      // write the offset, which points to the last byte of the node we copied since we later read
+      // this node in reverse
+      copiedNodeAddress.set(hashSlot, copiedBytes - 1);


Can we assert this slot is 0 before we set it?

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java

mikemccand · 2023-11-03T11:43:15Z

Thanks @dungba88!

I confirmed that IndexToFST now works again, and, when given "up to" inf RAM to use, it produces the same sized minimal fst.bin as main at 367244208 bytes.

It's a bit slower than main, but that's to be expected and I think OK. We can optimize later ... being able to fully cap RAM usage, nomatter how big an FST you produce, is worth this tradeoff. (Note: we still need to fix FST writing to spool to disk to achieve RAM capping, the other PR @dungba88 is tackling -- thank you!).

main:

  saved FST to "fst.bin": 367244208 bytes; 47.758 sec
  saved FST to "fst.bin": 367244208 bytes; 48.134 sec

This PR:

  saved FST to "fst.bin": 367244208 bytes; 54.765 sec
  saved FST to "fst.bin": 367244208 bytes; 58.616 sec

mikemccand · 2023-11-03T12:10:20Z

Test2BFST is happy, yay!

BUILD SUCCESSFUL in 56m 36s

mikemccand · 2023-11-04T11:51:35Z

lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java

  public abstract static class Allocator {
+    // TODO: ByteBlockPool assume the blockSize is always {@link BYTE_BLOCK_SIZE}, but this class
+    // allow arbitrary value of blockSize. We should make them consistent.
    protected final int blockSize;


Hmm do we have any if or assert that confirms Allocator's blockSize == ByteBlockPool.BYTE_BLOCK_SIZE when passed to ByteBlockPool?

We don't have, and this Allocator seems to be only used by ByteBlockPool so maybe we don't need it to have custom block size?

mikemccand · 2023-11-04T11:53:55Z

lucene/core/src/test/org/apache/lucene/util/TestByteBlockPool.java

+      totalBytes += size;
+
+      // make sure we report the correct position
+      assertEquals(totalBytes, pool.getPosition());


mikemccand

This change looks great to me! I think it's ready! I'll merge today.

We'll let this bake in main for a few days, and given that the prior big change (capping RAM used by NodeHash) seems to have baked successfully on main as well, let's backport these recent FST improvements next week?

mikemccand · 2023-11-04T15:29:37Z

I merged to main, thank you @dungba88 for the fast iterations! I could barely keep up just reviewing :)

After all this FST dust settles let's remember to add your CHANGES.txt entry summarizing all the progress with capping RAM usage of FSTs. I think we can make one entry (something like "FST Compiler can now build arbitrary large FSTs with capped/controllable RAM usage"), linking to the N GitHub issues/PRs it took to accomplish. Maybe after we backport to 9.x?

dungba88 · 2023-11-04T23:28:41Z

Thank you @mikemccand ! Agree we should have a single changes entry summarizing all different PR

* Use value-based LRU cache in NodeHash (#12714) * tidy code * Add a nocommit about OffsetAndLength * Fix the readBytes method * Use List<byte[]> instead of ByteBlockPool * Move nodesEqual to PagedGrowableHash * Add generic type * Fix the count variable * Fix the RAM usage measurement * Use PagedGrowableWriter instead of HashMap * Remove unused generic type * Update the ramBytesUsed formula * Retain the FSTCompiler.addNode signature * Switch back to ByteBlockPool * Remove the unnecessary assertion * Remove fstHashAddress * Add some javadoc * Fix the address offset when reading from fallback table * tidy code * Address comments * Add assertions

dungba88 marked this pull request as draft October 31, 2023 07:48

mikemccand reviewed Oct 31, 2023

View reviewed changes

Use value-based LRU cache in NodeHash (apache#12714)

0c03d8a

dungba88 force-pushed the refactor-nodehash branch from ce4c571 to 0c03d8a Compare October 31, 2023 15:22

dungba88 added 4 commits November 1, 2023 00:23

tidy code

728ea1f

Add a nocommit about OffsetAndLength

c12e892

Fix the readBytes method

11ab1fa

Use List<byte[]> instead of ByteBlockPool

b5a1be9

dungba88 force-pushed the refactor-nodehash branch from f9d0b38 to b5a1be9 Compare October 31, 2023 23:41

dungba88 added 5 commits November 1, 2023 08:54

Move nodesEqual to PagedGrowableHash

2d2ad31

Add generic type

fe33813

Fix the count variable

dd7cf64

Fix the RAM usage measurement

3180fe2

Use PagedGrowableWriter instead of HashMap

b9d4209

dungba88 force-pushed the refactor-nodehash branch from 6db6000 to b9d4209 Compare November 1, 2023 04:23

dungba88 commented Nov 1, 2023

View reviewed changes

Remove unused generic type

b6715a6

dungba88 marked this pull request as ready for review November 1, 2023 04:52

Update the ramBytesUsed formula

fafd6a0

gf2121 reviewed Nov 1, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java Outdated Show resolved Hide resolved

gf2121 reviewed Nov 1, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java Outdated Show resolved Hide resolved

Retain the FSTCompiler.addNode signature

214091c

dungba88 force-pushed the refactor-nodehash branch from 5b28eee to 214091c Compare November 1, 2023 09:51

mikemccand reviewed Nov 1, 2023

View reviewed changes

dungba88 added 2 commits November 1, 2023 22:15

Switch back to ByteBlockPool

06bc741

Remove the unnecessary assertion

42dfbe3

mikemccand reviewed Nov 1, 2023

View reviewed changes

Remove fstHashAddress

8a0ea42

dungba88 force-pushed the refactor-nodehash branch from e85238a to 8a0ea42 Compare November 1, 2023 14:35

dungba88 mentioned this pull request Nov 1, 2023

Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash()? #12744

Open

dungba88 force-pushed the refactor-nodehash branch from 5d6a0a8 to a033a69 Compare November 2, 2023 00:35

Add some javadoc

3e3416c

dungba88 force-pushed the refactor-nodehash branch from a033a69 to 3e3416c Compare November 2, 2023 00:51

Fix the address offset when reading from fallback table

8d95bf5

tidy code

9397ec8

mikemccand reviewed Nov 2, 2023

View reviewed changes

Address comments

bf1bc49

dungba88 force-pushed the refactor-nodehash branch from f766057 to bf1bc49 Compare November 2, 2023 12:39

mikemccand reviewed Nov 3, 2023

View reviewed changes

dungba88 added 2 commits November 3, 2023 22:37

Add assertions

1afd4b9

Merge branch 'apache:main' into refactor-nodehash

8c4856d

mikemccand mentioned this pull request Nov 4, 2023

Remove FST.BytesReader#reversed method? #12759

Closed

mikemccand reviewed Nov 4, 2023

View reviewed changes

mikemccand approved these changes Nov 4, 2023

View reviewed changes

dungba88 mentioned this pull request Nov 4, 2023

Improve bytes copy in NodeHash #12760

Closed

mikemccand merged commit b8a9b0a into apache:main Nov 4, 2023

mikemccand mentioned this pull request Nov 4, 2023

Specialize arc store for continuous label in FST #12748

Merged

Use value-based LRU cache in NodeHash #12738

Use value-based LRU cache in NodeHash #12738

Uh oh!

Conversation

dungba88 commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dungba88 commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dungba88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dungba88 commented Nov 1, 2023

Uh oh!

Uh oh!

Uh oh!

mikemccand commented Nov 1, 2023

Uh oh!

dungba88 commented Nov 1, 2023

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dungba88 Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand commented Nov 2, 2023

Uh oh!

dungba88 commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dungba88 commented Oct 31, 2023 •

edited

Loading

dungba88 commented Oct 31, 2023 •

edited

Loading

dungba88 Nov 1, 2023 •

edited

Loading

dungba88 commented Nov 2, 2023 •

edited

Loading