Try out a tantivy's term dictionary format

### Description

Hello! 

I've been working on a [benchmark](https://tony-x.github.io/search-benchmark-game/) for a while to compare the features and performance of Lucene and [Tantivy](https://github.com/quickwit-oss/tantivy), a rust search engine library which was heavily inspired by Lucene.

The benchmark uses the corpus and queries from luceneutil (the framework for Lucene nightly bench). Since not all query types are supported by Tantivy, currently it focuses on Term/Boolean/PhraseQuery. Tantivy in general showed performance advantages for now and I got motivated to understand why.

I documented the two engines' inverted index implementations per my understanding. Here is the [wiki](https://github.com/Tony-X/search-benchmark-game/wiki/Inverted-index-deep-dive). Specifically, both engines use FST to aid the term lookup but the way they use them are quite different. In summary, Lucene uses FST to map term prefixes followed by scanning the on-disk blocks of terms. Tantivy uses FST to maps all the terms to their ordinals and use that ordinal/index to decode at most one full block. 

The proposal here is to try Tantivy's term dictionary which I can see some advantages
1. it can determine a term does not existing with only FST operations.
2. decoding less terms in worst case (a term within a large gap between two prefixes)
3. it is simpler? (might be subjective, but it took me days to digest [Lucene90BlockTreeTermsWriter](https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.html) and I'm still not sure I got every bits correct...)


What do you think?  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try out a tantivy's term dictionary format #12513

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Try out a tantivy's term dictionary format #12513

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions