Skip to content

Conversation

@javanna
Copy link
Member

@javanna javanna commented Mar 4, 2025

The old lucene versions plugin allows users to read indices created by ancient Elasticsearch version, starting from 5.0. Especially for 5.x which relied on Lucene 6.x, some special logic is required around postings format support. That revolves around reading of FSTs, but has a consequence of requiring quite a few fork of other Lucene classes due to their visibility.

This commit attempts to add javadocs to clarify the intent of some of these classes. It also includes some simplifications, in that Lucene50PostingsReader can be reused as-is and some other classes are only needed in tests hence are moved to the test folder.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Mar 4, 2025
throw new IllegalArgumentException(formatName);
}

private class FieldsWriter extends FieldsConsumer {
Copy link
Member Author

@javanna javanna Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to have a FieldsWriter that throws exceptions, we can simply throw exception directly when a fields writer is currently requested, as that's only used for writing.

* Holds all state required for {@link Lucene50PostingsReader} to produce a {@link
* org.apache.lucene.index.PostingsEnum} without re-seeking the terms dict.
*/
public static final class IntBlockTermState extends BlockTermState {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is public, we can rely on the Lucene original variant where needed (only in tests with the removal of Lucene50PostingsReader)

The old lucene versions plugin allows users to read indices created by ancient
Elasticsearch version, starting from 5.0. Especially for 5.x which relied on Lucene
6.x, some special logic is required around postings format support. That revolves
around reading of FSTs, but has a consequence of requiring quite a few fork of
other Lucene classes due to their visibility.

This commit attempts to add javadocs to clarify the intent of some of these classes.
It also includes some simplifications, in that Lucene50PostingsReader can be reused as-is
and some other classes are only needed in tests hence are moved to the test folder.
@javanna javanna force-pushed the refactoring/old_lucene_versions_legacy_adapting_docs branch from f08c513 to d56f308 Compare March 5, 2025 07:57
@javanna
Copy link
Member Author

javanna commented Mar 5, 2025

@drempapis if it makes reviewing easier I can split this into a couple smaller PRs, let me know what you think.

*/
public abstract class LegacyAdaptingPerFieldPostingsFormat extends PostingsFormat {
@UpdateForV10(owner = UpdateForV10.Owner.SEARCH_FOUNDATIONS)
public final class LegacyAdaptingPerFieldPostingsFormat extends PostingsFormat {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this no longer needs to be abstract and subclassed, because the two only subclasses we had did exactly the same thing in their getPostingsFormat method:

        if (formatName.equals("Lucene50")) {
            return new BWCLucene50PostingsFormat();
        } else {
            return new BWCCodec.EmptyPostingsFormat();
        }     

This logic is now incorporated in this class.

* </ul>
* </dl>
* This is a fork of {@link org.apache.lucene.backward_codecs.lucene50.Lucene50PostingsFormat} that allows to read from indices
* written by versions older than Lucene 7.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good entry!

public FieldsProducer fieldsProducer(SegmentReadState state) throws IOException {
PostingsReaderBase postingsReader = new Lucene50PostingsReader(state);
// The postings reader is the original one from Lucene 5.0
PostingsReaderBase postingsReader = new org.apache.lucene.backward_codecs.lucene50.Lucene50PostingsReader(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't an import be used here? The xpack equivalent class has been deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am not sure why I did not do so, fixing

Copy link
Contributor

@drempapis drempapis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@javanna javanna added the auto-backport Automatically create backport pull requests when merged label Mar 10, 2025
@javanna javanna merged commit 03d8bdf into elastic:main Mar 10, 2025
17 checks passed
@javanna javanna deleted the refactoring/old_lucene_versions_legacy_adapting_docs branch March 10, 2025 19:00
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
…tic#124053)

The old lucene versions plugin allows users to read indices created by ancient
Elasticsearch version, starting from 5.0. Especially for 5.x which relied on Lucene
6.x, some special logic is required around postings format support. That revolves
around reading of FSTs, but has a consequence of requiring quite a few fork of
other Lucene classes due to their visibility.

This commit attempts to add javadocs to clarify the intent of some of these classes.
It also includes some simplifications, in that Lucene50PostingsReader can be reused as-is
and some other classes are only needed in tests hence are moved to the test folder.
javanna added a commit that referenced this pull request Mar 11, 2025
…#124053) (#124509)

* Docs and simplifications to support for Lucene ancient versions (#124053)

The old lucene versions plugin allows users to read indices created by ancient
Elasticsearch version, starting from 5.0. Especially for 5.x which relied on Lucene
6.x, some special logic is required around postings format support. That revolves
around reading of FSTs, but has a consequence of requiring quite a few fork of
other Lucene classes due to their visibility.

This commit attempts to add javadocs to clarify the intent of some of these classes.
It also includes some simplifications, in that Lucene50PostingsReader can be reused as-is
and some other classes are only needed in tests hence are moved to the test folder.

* iter

* iter
albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this pull request Mar 13, 2025
…tic#124053)

The old lucene versions plugin allows users to read indices created by ancient
Elasticsearch version, starting from 5.0. Especially for 5.x which relied on Lucene
6.x, some special logic is required around postings format support. That revolves
around reading of FSTs, but has a consequence of requiring quite a few fork of
other Lucene classes due to their visibility.

This commit attempts to add javadocs to clarify the intent of some of these classes.
It also includes some simplifications, in that Lucene50PostingsReader can be reused as-is
and some other classes are only needed in tests hence are moved to the test folder.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025
…tic#124053)

The old lucene versions plugin allows users to read indices created by ancient
Elasticsearch version, starting from 5.0. Especially for 5.x which relied on Lucene
6.x, some special logic is required around postings format support. That revolves
around reading of FSTs, but has a consequence of requiring quite a few fork of
other Lucene classes due to their visibility.

This commit attempts to add javadocs to clarify the intent of some of these classes.
It also includes some simplifications, in that Lucene50PostingsReader can be reused as-is
and some other classes are only needed in tests hence are moved to the test folder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >refactoring :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants