Skip to content

Conversation

@Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Jul 31, 2025

[DRAFT] (no tests yet)

Consider the following edge case:

"os": {
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "foo": {
          "type": "keyword",
          "ignore_above": 100
        }
      }
    }

If synthetic source is enabled, we'll have name.foo track source (current behavior). And if name.foo trips ignore_above, we'll create an extra StoredField to continue tracking source within the name.foo.

This however becomes a problem if there are multiple keyword multi fields, all of which trip ignore_above. In such a case, they'll all create an additional StoredField and we'll be storing the source multiple times, which is not space efficient.

This change designates one of the keyword multi fields as the one responsible for tracking source. Upon tripping ignore_above, this field will create a StoredField (same behavior as now), while the remaining keyword multi fields will not (new behavior).

@lkts lkts changed the title [DRAFT] Dont track source for keyworld fields that are multi-field, unless psecifically designated so [DRAFT] Dont track source for keyworld fields that are multi-field, unless specifically designated so Jul 31, 2025
* Returns whether the given builder is synthetic source compatible. To be compatible, the builder must
* track source in some way, whether that be via the "store" param or using doc values.
*/
private boolean isSyntheticSourceCompatible(KeywordFieldMapper.Builder keywordBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can make MultiFields.Builder an interface and do this stuff in a decorator or something inside TextFieldMapper? I know we have hasSyntheticSourceCompatibleKeywordField already but this is growing more and more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that its growing too much. I was also thinking about moving this logic somewhere closer to where it belongs. I'll see what I can do.

/**
* Returns whether the given builder should be used to track source for synthetic source purposes.
*/
private boolean shouldTrackSource(KeywordFieldMapper.Builder keywordBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling with this name. Isn't this something like isUsedForSyntheticSourceByParent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... my thinking was to not expose this edge case as much through naming, but I can also see someone misunderstanding what this is used for. I'll change it

);
}

public Builder shouldTrackSourceForSyntheticSource(boolean shouldTrackSourceForSyntheticSource) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same - usedByParentFieldForSyntheticSource?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants