Skip to content

Conversation

@burqen
Copy link
Contributor

@burqen burqen commented Nov 12, 2025

Previously both DeduplicatingFieldInfosFormat and TSDBSyntheticIdCodec.RewriteFieldInfosFormat would iterate over FieldInfos.

This is now optimised by letting TSDBSyntheticIdCodec.RewriteFieldInfosFormat extend DeduplicatingFieldInfosFormat in order to let RewriteFieldInfosFormat utilise the iteration done by DeduplicatingFieldInfosFormat.

Also let TSDBSyntheticIdCodec extend DeduplicateFieldInfosCodec so that we can simplify the codec wrapping to always use only one of them.

Additional changes:

  • Elasticsearch versioned codex are always wrapped by DeduplicateFieldInfosCodec, instead of extending it. This makes it possible for TSDBSyntheticIdCodec to extend DeduplicateFieldInfosCodec and hopefully as a side effect make it easier to reason about.
  • Move DecouplingFieldInfosCodec to upper level.

No codec should extend DeduplicatingFieldInfoCodec, instead always wrap
it.
Previously both DeduplicatingFieldInfosFormat and
TSDBSyntheticIdCodec.RewriteFieldInfosFormat would iterate over
FieldInfos.

This is now optimised by letting
TSDBSyntheticIdCodec.RewriteFieldInfosFormat extend
DeduplicatingFieldInfosFormat in order to let RewriteFieldInfosFormat
utilise the iteration done by DeduplicatingFieldInfosFormat.

Also let TSDBSyntheticIdCodec extend DeduplicateFieldInfosCodec so that
we can simplify the codec wrapping to always use only one of them.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @burqen, I've created a changelog YAML for you.

@burqen burqen changed the title Single loop for FielfInfo processing Single loop for FieldfInfo processing Nov 12, 2025
@burqen burqen requested review from fcofdez and tlrx November 12, 2025 14:43
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return codec;
String name = e.getValue().getName();
Codec codec = e.getValue();
return useTsdbSyntheticId ? new TSDBSyntheticIdCodec(name, codec) : new DeduplicateFieldInfosCodec(name, codec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
return useTsdbSyntheticId ? new TSDBSyntheticIdCodec(name, codec) : new DeduplicateFieldInfosCodec(name, codec);
return useTsdbSyntheticId ? new TSDBSyntheticIdCodec(codec) : new DeduplicateFieldInfosCodec(codec);

Comment on lines 22 to 25
protected DeduplicateFieldInfosCodec(String name, Codec delegate) {
super(name, delegate);
this.fieldInfosFormat = createFieldInfosFormat(delegate.fieldInfosFormat());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected DeduplicateFieldInfosCodec(String name, Codec delegate) {
super(name, delegate);
this.fieldInfosFormat = createFieldInfosFormat(delegate.fieldInfosFormat());
}
protected DeduplicateFieldInfosCodec(Codec delegate) {
super(delegate.getName(), delegate);
this.fieldInfosFormat = createFieldInfosFormat(delegate.fieldInfosFormat());
}


protected void validateFieldInfos(FieldInfos fieldInfos) {}

protected FieldInfo processFieldInfo(FieldInfo fi) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
protected FieldInfo processFieldInfo(FieldInfo fi) {
protected FieldInfo wrapFieldInfo(FieldInfo fi) {


private static Codec unwrappedCodec(CodecService codecService, String codecName) {
Codec codec = codecService.codec(codecName);
if (codec instanceof DeduplicateFieldInfosCodec deduplicatingCodec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be:

Suggested change
if (codec instanceof DeduplicateFieldInfosCodec deduplicatingCodec) {
if (codec instanceof FilterCoded filtered) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, the delegate is hidden from us there.

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that we should revert the change around not extending DeduplicateFieldInfosCodec in the default codecs. The reason is that Lucene would use SPI to load the Codec and it will just instantiate the codec with the no-args constructor and thus we won't get to deduplicate the fields. This only applies when a node is restarted for example and we need to read the codec from the SegmentInfo and I guess that it applies to search nodes in serverless too.

@burqen
Copy link
Contributor Author

burqen commented Nov 13, 2025

I'm afraid that we should revert the change around not extending DeduplicateFieldInfosCodec in the default codecs. The reason is that Lucene would use SPI to load the Codec and it will just instantiate the codec with the no-args constructor and thus we won't get to deduplicate the fields. This only applies when a node is restarted for example and we need to read the codec from the SegmentInfo and I guess that it applies to search nodes in serverless too.

I can see that Lucene is used to load the codecs here, but they will all be wrapped afterwards. Is there some other place as well where the codecs are service loaded outside of CodecService? https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/codec/CodecService.java#L66-L69

@fcofdez
Copy link
Contributor

fcofdez commented Nov 13, 2025

Is there some other place as well where the codecs are service loaded outside of CodecService?

Yes, in Lucene, when a commit is read from disk (see https://github.com/apache/lucene/blob/e02bdb4c3c547488342b423e1b9b2b25519bd427/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L409-L412). We kind of rely implicitly that for reading a segment with a particular codec we can use the SPI loaded codec.

@burqen
Copy link
Contributor Author

burqen commented Nov 13, 2025

Is there some other place as well where the codecs are service loaded outside of CodecService?

Yes, in Lucene, when a commit is read from disk (see https://github.com/apache/lucene/blob/e02bdb4c3c547488342b423e1b9b2b25519bd427/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L409-L412). We kind of rely implicitly that for reading a segment with a particular codec we can use the SPI loaded codec.

Got it! Thanks for pointing that out. I'll revert and find some other approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants