-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Description
It seems that #12699 has inadvertantly broken reading term dictionaries created in Lucene 9.8<=.
To replicate a bug, one can index wikibigall with LuceneUtil & Lucene 9.8 & force-merge.
Then attempt to read the created index using a wildcard query:
Path path = Paths.get("/data/local/lucene/indices/wikibigall.lucene-main.opt.Lucene90.dvfields.nd6.72652M/index");
try (FSDirectory dir = FSDirectory.open(path);
DirectoryReader reader = DirectoryReader.open(dir)) {
IndexSearcher searcher = new IndexSearcher(reader);
searcher.count(new WildcardQuery(new Term("body", "*fo*")));
}
This will result in a trace similar to below:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 3 out of bounds for length 3
at org.apache.lucene.store.ByteArrayDataInput.readByte(ByteArrayDataInput.java:136)
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:110)
at org.apache.lucene.store.ByteArrayDataInput.readVInt(ByteArrayDataInput.java:114)
at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame.load(IntersectTermsEnumFrame.java:158)
at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame.load(IntersectTermsEnumFrame.java:149)
at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.pushFrame(IntersectTermsEnum.java:203)
at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum._next(IntersectTermsEnum.java:531)
at org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:373)
at org.apache.lucene.search.MultiTermQueryConstantScoreBlendedWrapper$1.rewriteInner(MultiTermQueryConstantScoreBlendedWrapper.java:111)
at org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper$RewritingWeight.rewrite(AbstractMultiTermQueryConstantScoreWrapper.java:179)
at org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper$RewritingWeight.bulkScorer(AbstractMultiTermQueryConstantScoreWrapper.java:220)
at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:930)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:678)
at org.apache.lucene.search.IndexSearcher.lambda$4(IndexSearcher.java:636)
at org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$0(TaskExecutor.java:118)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at org.apache.lucene.search.TaskExecutor$TaskGroup.invokeAll(TaskExecutor.java:153)
at org.apache.lucene.search.TaskExecutor.invokeAll(TaskExecutor.java:76)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:640)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:607)
at org.apache.lucene.search.IndexSearcher.count(IndexSearcher.java:423)
at Corruption.main(Corruption.java:18)
We are currently not sure if this effects Lucene 9.9 created indices & reading via Lucene 9.9.
EDIT: This failure does NOT occur for indices created by 9.9 and read by 9.9.
NOTE: This also fails with just a prefix wildcard query. It seems to be all multi-term queries could be affected.
Will provide more example stack traces in issue comments.
Version and environment details
Lucene 9.9 reading Lucene 9.8 indices.