Skip to content

Conversation

@dungba88
Copy link
Contributor

@dungba88 dungba88 commented Dec 6, 2023

Description

This is the same with #12624, except for a slight change in implementation of ReadWriteDataOutput. This PR keeps the original implementation that BytesStore did (just the getReverseBytesReader() part), to make sure there is no regression. My theory is that ReverseRandomAccessReader need to seek the correct buffer on every readByte, and reading from the ByteBuffer has some (small) overhead comparing to accessing from the byte array directly. But these should have insignificant impact on the performance. More extensive benchmark is needed.

See the diff between 2 approaches at dungba88#13

Using Test2BFST, there is some improvement over the ByteBuffersDataInput:

ReverseRandomAccessReader with ByteBuffersDataInput

  1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585]
  1> 0...: took 0 seconds
  1> 1000000...: took 27 seconds
  1> 2000000...: took 54 seconds
  1> 3000000...: took 82 seconds
  1> 4000000...: took 109 seconds
  1> 5000000...: took 137 seconds
  1> 6000000...: took 165 seconds
  1> 7000000...: took 192 seconds
  1> 8000000...: took 219 seconds
  1> 9000000...: took 247 seconds
  1> 10000000...: took 275 seconds
  1> 11000000...: took 300 seconds

This PR

  1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; arcCount=2264078585]
  1> 0...: took 0 seconds
  1> 1000000...: took 22 seconds
  1> 2000000...: took 44 seconds
  1> 3000000...: took 66 seconds
  1> 4000000...: took 89 seconds
  1> 5000000...: took 111 seconds
  1> 6000000...: took 133 seconds
  1> 7000000...: took 155 seconds
  1> 8000000...: took 178 seconds
  1> 9000000...: took 200 seconds
  1> 10000000...: took 222 seconds
  1> 11000000...: took 245 seconds

I ran the test several times and the results are consistent.

@dungba88
Copy link
Contributor Author

dungba88 commented Dec 6, 2023

I think we'll likely go with the approach in #12624 (simpler despite some potential regression). But this approach is to have some comparison, and in case someone has strong argument for favoring the latency.

To reviewer: Discussion outside of ReadWriteDataOutput should also go to #12624 instead.

@dungba88 dungba88 marked this pull request as draft December 6, 2023 09:22
@dungba88
Copy link
Contributor Author

Note: We already merged #12624 , but this PR can be used as benchmark candidate for #12884

@mikemccand
Copy link
Member

Actually 300 -> 245 seconds is quite a performance gain. It seems FST's own store is quite a bit faster than the ByteBuffer backed store. I think we should make this change? I'll try to review soon.

@dungba88 dungba88 changed the title Allow FST builder to use different writer (alternative reverse BytesReader) Optimize FST on-heap BytesReader Dec 12, 2023
return new ReverseBytesReader(byteBuffers.get(0).array());
}
return new FST.BytesReader() {
private byte[] current = byteBuffers.get(0).array();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array() returns the internal array and there's no byte-copy here

// use a faster implementation for single-block case
return new ReverseBytesReader(byteBuffers.get(0).array());
}
return new FST.BytesReader() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do tests exercise this path sometimes? We would need tests to either make large FSTs, or, randomize the block size so sometimes it's tiny?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe testRealFST did this sometimes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm still worried about test coverage of this. Conditionals like this are dangerous for Lucene, since our tests only test tiny FSTs, this code path would be rarely/never executed, likely even by our nightly benchmarks. Ideally we would find a way to randomize the page size in many tests, or at least, one test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant TestFSTs.testRealTerms. The default block size is 32KB and the FST is 55-80KB, thus 2-3 blocks.

return new FST.BytesReader() {
private byte[] current = byteBuffers.get(0).array();
private int nextBuffer = -1;
private int nextRead = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the = 0 -- it's javac's default already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've removed it

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments -- I think this is nearly ready? (though it it still marked as draft?) -- given the gains of this optimization I think it makes sense to specialize the single buffer vs multiple buffer cases.

@dungba88 dungba88 marked this pull request as ready for review December 23, 2023 06:58
assert byteBuffers != null; // freeze() must be called first
if (byteBuffers.size() == 1) {
// use a faster implementation for single-block case
return new ReverseBytesReader(byteBuffers.get(0).array());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert hasArray() before this? .array() is optional (only valid for DirectByteBuffer I think).

Copy link
Contributor Author

@dungba88 dungba88 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we call toWriteableBufferList the array should be accessible. I think if the array is somehow not accessible HeapByteBuffer (the one being used) would throw an exception and thus would be caught by tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I realize the code is safe but it takes some thinking to confirm it -- maybe add the assert with your awesome above explanation? So future code readers know why this ByteBuffer is for sure backed by an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I've added the assertion. I decided to put it right after the ByteBuffer list is created (in freeze()) and assert all ByteBuffer in the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks, looks great @dungba88!

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dungba88 -- I plan to merge some time today.

@mikemccand mikemccand merged commit 4c883a4 into apache:main Jan 6, 2024
mikemccand pushed a commit that referenced this pull request Jan 6, 2024
* Move size() to FSTStore

* Remove size() completely

* Allow FST builder to use different DataOutput

* access BytesStore byte[] directly for copying

* Rename BytesStore

* Change class to final

* Reorder methods

* Remove unused methods

* Rename truncate to setPosition() and remove skipBytes()

* Simplify the writing operations

* Update comment

* remove unused parameter

* Simplify BytesStore operation

* tidy code

* Rename copyBytes to writeTo

* Simplify BytesStore operations

* Embed writeBytes() to FSTCompiler

* Fix the write bytes method

* Remove the default block bits constant

* add assertion

* Rename method parameter names

* Move reverse to FSTCompiler

* Revert setPosition call

* Address comments

* Return immediately when writing 0 bytes

* Add comment &

* Rename variables

* Fix the compile error

* Remove isReadable()

* Remove isReadable()

* Optimize ReadWriteDataOutput

* tidy code

* Freeze the DataOutput once finished()

* Refactor

* freeze the DataOutput before use

* Improvement of ReadWriteDataOutput

* tidy code

* Address comments and add off-heap FST tests

* Remove the hardcoded random

* Ignore the Test2BFSTOffHeap test

* Simplify ReadWriteDataOutput

* Do not expose blockBits

* tidy code

* Remove 0 initialization

* Add assertion and comment
@mikemccand mikemccand added this to the 9.10.0 milestone Jan 6, 2024
@dungba88
Copy link
Contributor Author

dungba88 commented Jan 8, 2024

Thanks @mikemccand for merging

slow-J pushed a commit to slow-J/lucene that referenced this pull request Jan 16, 2024
* Move size() to FSTStore

* Remove size() completely

* Allow FST builder to use different DataOutput

* access BytesStore byte[] directly for copying

* Rename BytesStore

* Change class to final

* Reorder methods

* Remove unused methods

* Rename truncate to setPosition() and remove skipBytes()

* Simplify the writing operations

* Update comment

* remove unused parameter

* Simplify BytesStore operation

* tidy code

* Rename copyBytes to writeTo

* Simplify BytesStore operations

* Embed writeBytes() to FSTCompiler

* Fix the write bytes method

* Remove the default block bits constant

* add assertion

* Rename method parameter names

* Move reverse to FSTCompiler

* Revert setPosition call

* Address comments

* Return immediately when writing 0 bytes

* Add comment &

* Rename variables

* Fix the compile error

* Remove isReadable()

* Remove isReadable()

* Optimize ReadWriteDataOutput

* tidy code

* Freeze the DataOutput once finished()

* Refactor

* freeze the DataOutput before use

* Improvement of ReadWriteDataOutput

* tidy code

* Address comments and add off-heap FST tests

* Remove the hardcoded random

* Ignore the Test2BFSTOffHeap test

* Simplify ReadWriteDataOutput

* Do not expose blockBits

* tidy code

* Remove 0 initialization

* Add assertion and comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants