|
68 | 68 | import static org.apache.lucene.backward_codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.TERMS_META_EXTENSION; |
69 | 69 | import static org.apache.lucene.util.fst.FSTCompiler.getOnHeapReaderWriter; |
70 | 70 |
|
71 | | -/* |
72 | | - TODO: |
73 | | -
|
74 | | - - Currently there is a one-to-one mapping of indexed |
75 | | - term to term block, but we could decouple the two, ie, |
76 | | - put more terms into the index than there are blocks. |
77 | | - The index would take up more RAM but then it'd be able |
78 | | - to avoid seeking more often and could make PK/FuzzyQ |
79 | | - faster if the additional indexed terms could store |
80 | | - the offset into the terms block. |
81 | | -
|
82 | | - - The blocks are not written in true depth-first |
83 | | - order, meaning if you just next() the file pointer will |
84 | | - sometimes jump backwards. For example, block foo* will |
85 | | - be written before block f* because it finished before. |
86 | | - This could possibly hurt performance if the terms dict is |
87 | | - not hot, since OSs anticipate sequential file access. We |
88 | | - could fix the writer to re-order the blocks as a 2nd |
89 | | - pass. |
90 | | -
|
91 | | - - Each block encodes the term suffixes packed |
92 | | - sequentially using a separate vInt per term, which is |
93 | | - 1) wasteful and 2) slow (must linear scan to find a |
94 | | - particular suffix). We should instead 1) make |
95 | | - random-access array so we can directly access the Nth |
96 | | - suffix, and 2) bulk-encode this array using bulk int[] |
97 | | - codecs; then at search time we can binary search when |
98 | | - we seek a particular term. |
99 | | -*/ |
100 | | - |
101 | 71 | /** |
| 72 | + * Required for the writing side of ES812PostingsFormat. Based on Lucene 9.0 postings format, which encodes postings |
| 73 | + * in packed integer blocks for fast decode. |
| 74 | + * |
102 | 75 | * Block-based terms index and dictionary writer. |
103 | 76 | * |
104 | 77 | * <p>Writes terms dict and index, block-encoding (column stride) each term's metadata for each set |
|
0 commit comments