comment

ChrisHegarty · ChrisHegarty · commit afa5b3449b1a · 2025-05-23T09:21:03.000+01:00
diff --git a/server/src/main/java/org/elasticsearch/index/codec/postings/Lucene90BlockTreeTermsWriter.java b/server/src/main/java/org/elasticsearch/index/codec/postings/Lucene90BlockTreeTermsWriter.java
@@ -68,37 +68,10 @@
 import static org.apache.lucene.backward_codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.TERMS_META_EXTENSION;
 import static org.apache.lucene.util.fst.FSTCompiler.getOnHeapReaderWriter;
 
-/*
-  TODO:
-
-    - Currently there is a one-to-one mapping of indexed
-      term to term block, but we could decouple the two, ie,
-      put more terms into the index than there are blocks.
-      The index would take up more RAM but then it'd be able
-      to avoid seeking more often and could make PK/FuzzyQ
-      faster if the additional indexed terms could store
-      the offset into the terms block.
-
-    - The blocks are not written in true depth-first
-      order, meaning if you just next() the file pointer will
-      sometimes jump backwards.  For example, block foo* will
-      be written before block f* because it finished before.
-      This could possibly hurt performance if the terms dict is
-      not hot, since OSs anticipate sequential file access.  We
-      could fix the writer to re-order the blocks as a 2nd
-      pass.
-
-    - Each block encodes the term suffixes packed
-      sequentially using a separate vInt per term, which is
-      1) wasteful and 2) slow (must linear scan to find a
-      particular suffix).  We should instead 1) make
-      random-access array so we can directly access the Nth
-      suffix, and 2) bulk-encode this array using bulk int[]
-      codecs; then at search time we can binary search when
-      we seek a particular term.
-*/
-
 /**
+ * Required for the writing side of ES812PostingsFormat. Based on Lucene 9.0 postings format, which encodes postings
+ * in packed integer blocks for fast decode.
+ *
  * Block-based terms index and dictionary writer.
  *
  * <p>Writes terms dict and index, block-encoding (column stride) each term's metadata for each set