-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Describe the bug
Lucene fixed a performance bug: https://issues.apache.org/jira/browse/LUCENE-7419
It complains that TokenStream.end() is quite costly. This bug is marked as a blocker bug. According to the bug report, it affects 5.5.5, 6.2, and 7.0. The reporter complains that TokenStream.end() wrongly calls getAttribute().The buggy end() method is as follows:
public void end() throws IOException {
clearAttributes(); // LUCENE-3849: don't consume dirty atts
PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
if (posIncAtt != null) {
posIncAtt.setPositionIncrement(0);
}
}
Elemental uses lucene 4.10.4. I checked the source code of lucene 4.10.4. Its code is identical to the buggy code:
public void end() throws IOException {
clearAttributes(); // LUCENE-3849: don't consume dirty atts
PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
if (posIncAtt != null) {
posIncAtt.setPositionIncrement(0);
}
}
As a result, this bug should also affect 4.10.4.
To Reproduce
In the lucene bug report (LUCENE-7419), Michael McCandless mentioned that this bug was found by elasticsearch:
"This is the apparent source of the very unexpected slowdown here: elastic/elasticsearch#19867 (comment)"
He also explained how to reproduce such a bug.
Elemental calls the buggy method at the following locations:
<--XMLToQuery.phraseQuery
<--XMLToQuery.nearQuery
<--XMLToQuery.getTerm
<--MarkableTokenFilter.incrementToken
<--MarkableTokenFilter.incrementToken
<--RangeIndexWorker.analyzeContent
The lucene bug is fixed in 6.2.