[BUG] org.apache.lucene.analysis.TokenStream.end() can slow down elemental

**Describe the bug**

Lucene fixed a performance bug: https://issues.apache.org/jira/browse/LUCENE-7419

It complains that TokenStream.end() is quite costly. This bug is marked as a blocker bug. According to the bug report, it affects 5.5.5, 6.2, and 7.0. The reporter complains that TokenStream.end() wrongly calls getAttribute().The buggy end() method is as follows:

  

>  public void end() throws IOException {
>    clearAttributes(); // LUCENE-3849: don't consume dirty atts
>    PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
>     if (posIncAtt != null) {
>       posIncAtt.setPositionIncrement(0);
>     }
>   }

Elemental uses lucene 4.10.4. I checked the source code of lucene 4.10.4. Its code is identical to the buggy code:


>  public void end() throws IOException {
>     clearAttributes(); // LUCENE-3849: don't consume dirty atts
>     PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
 >    if (posIncAtt != null) {
 >      posIncAtt.setPositionIncrement(0);
 >    }
 >  }

As a result, this bug should also affect 4.10.4.

**To Reproduce**

In the lucene bug report (LUCENE-7419), Michael McCandless mentioned that this bug was found by elasticsearch:

"This is the apparent source of the very unexpected slowdown here: https://github.com/elastic/elasticsearch/pull/19867#issuecomment-240841821"

He also explained how to reproduce such a bug. 


Elemental calls the buggy method at the following locations:

   <--XMLToQuery.phraseQuery
   <--XMLToQuery.nearQuery
   <--XMLToQuery.getTerm
   <--MarkableTokenFilter.incrementToken
   <--MarkableTokenFilter.incrementToken
   <--RangeIndexWorker.analyzeContent

The lucene bug is fixed in 6.2. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] org.apache.lucene.analysis.TokenStream.end() can slow down elemental #140

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] org.apache.lucene.analysis.TokenStream.end() can slow down elemental #140

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions