Skip to content

Commit afa5b34

Browse files
committed
comment
1 parent f1c4a51 commit afa5b34

File tree

1 file changed

+3
-30
lines changed

1 file changed

+3
-30
lines changed

server/src/main/java/org/elasticsearch/index/codec/postings/Lucene90BlockTreeTermsWriter.java

Lines changed: 3 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -68,37 +68,10 @@
6868
import static org.apache.lucene.backward_codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.TERMS_META_EXTENSION;
6969
import static org.apache.lucene.util.fst.FSTCompiler.getOnHeapReaderWriter;
7070

71-
/*
72-
TODO:
73-
74-
- Currently there is a one-to-one mapping of indexed
75-
term to term block, but we could decouple the two, ie,
76-
put more terms into the index than there are blocks.
77-
The index would take up more RAM but then it'd be able
78-
to avoid seeking more often and could make PK/FuzzyQ
79-
faster if the additional indexed terms could store
80-
the offset into the terms block.
81-
82-
- The blocks are not written in true depth-first
83-
order, meaning if you just next() the file pointer will
84-
sometimes jump backwards. For example, block foo* will
85-
be written before block f* because it finished before.
86-
This could possibly hurt performance if the terms dict is
87-
not hot, since OSs anticipate sequential file access. We
88-
could fix the writer to re-order the blocks as a 2nd
89-
pass.
90-
91-
- Each block encodes the term suffixes packed
92-
sequentially using a separate vInt per term, which is
93-
1) wasteful and 2) slow (must linear scan to find a
94-
particular suffix). We should instead 1) make
95-
random-access array so we can directly access the Nth
96-
suffix, and 2) bulk-encode this array using bulk int[]
97-
codecs; then at search time we can binary search when
98-
we seek a particular term.
99-
*/
100-
10171
/**
72+
* Required for the writing side of ES812PostingsFormat. Based on Lucene 9.0 postings format, which encodes postings
73+
* in packed integer blocks for fast decode.
74+
*
10275
* Block-based terms index and dictionary writer.
10376
*
10477
* <p>Writes terms dict and index, block-encoding (column stride) each term's metadata for each set

0 commit comments

Comments
 (0)