Skip to content

Conversation

Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Sep 5, 2025

This addresses #134096

More specifically, this PR does the following:

  • refactors how ignore_above is initialized. More specifically, FieldTypes and FieldMappers no longer have a simple integer representing ignore_above as thats been proved to be error prone. For example, in Text fields incorrectly fallback to source in block loader for logsdb indices #134096 we checked the wrong default value. With this change, ignore_above is now represented by a new class IgnoreAbove, which doesn't allow the setting of the default value and instead infers it from IndexMode and IndexVersion.
  • introduces IgnoreAbove class, which provides nice utility functions like isIgnored() and isSet(), which are reused across several field mappers. This eliminates the need to save ignoreAboutDefault and removes duplicate code code blocks such as string.length() > ignoreAbove
  • introduces TextRollingUpgradeIT, which is a nearly identical copy of MatchOnlyTextRollingUpgradeIT. This BWC test was previously missing, hence its addition. The addition of this test also helps validate Don't store keyword multi fields when they trip ignore_above #132962
  • adds some unit test coverage, especially around the use of syntheticSourceDelegate inside of blockLoader(). See my comments for more details

@elasticsearchmachine
Copy link
Collaborator

Hi @Kubik42, I've created a changelog YAML for you.

@Kubik42 Kubik42 force-pushed the kubik42-isignoreaboveset branch from f0bcd32 to 8843756 Compare September 5, 2025 23:22
}

public boolean isIgnoreAboveSet() {
return ignoreAbove != ignoreAboveDefaultValue;
Copy link
Contributor Author

@Kubik42 Kubik42 Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of storing ignoreAboveDefaultValue, but its better than the alternatives:

  • add indexMode and indexCreatedVersion as class fields in KeywrodFieldType and then call getIgnoreAboveDefaultValue() every time isIgnoreAboveSet() is called
  • pass indexMode and indexCreatedVersion as parameters into isIgnoreAboveSet()
  • decide whether ignore_above is set in the constructor; ie. have an extra field called isIgnoreAboveSet at the class level - this is error prone as it must be manually set to false in 3/5 constructors that KeywordFieldType has.

Perhaps we could revisit the default value for ignore above in the future and just default everything to one value rather than picking and choosing based on the index mode, but thats out of scope for this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think keeping that of default ignored above isn't too bad here. I think it is better than the alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could revisit the default value for ignore above in the future and just default everything to one value rather than picking and choosing based on the index mode, but thats out of scope for this change.

Also that would require more input from other teams.

private final FieldValues<String> scriptValues;
private final boolean isDimension;
private final boolean isSyntheticSource;
private final IndexMode indexMode;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed - wasn't being used anywhere

/**
* Generates a string containing a random number of random length alphas, all delimited by space.
*/
public static String randomAlphasDelimitedBySpace(int maxAlphas, int minCodeUnits, int maxCodeUnits) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this helps with BWC tests by providing randomized strings that are more accurate to what customers would use; ie. space separated tokens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this in the TextRollingUpgradeIT? This is the only user for now.

@elasticsearchmachine
Copy link
Collaborator

Hi @Kubik42, I've created a changelog YAML for you.

@Kubik42 Kubik42 marked this pull request as ready for review September 5, 2025 23:37
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

}

public boolean isIgnoreAboveSet() {
return ignoreAbove != ignoreAboveDefaultValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think keeping that of default ignored above isn't too bad here. I think it is better than the alternative.

}

public boolean isIgnoreAboveSet() {
return ignoreAbove != ignoreAboveDefaultValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could revisit the default value for ignore above in the future and just default everything to one value rather than picking and choosing based on the index mode, but thats out of scope for this change.

Also that would require more input from other teams.

@Kubik42 Kubik42 marked this pull request as draft September 8, 2025 15:09
@Kubik42 Kubik42 force-pushed the kubik42-isignoreaboveset branch from 9c6eee7 to 577aa8b Compare September 8, 2025 22:00
@Kubik42 Kubik42 changed the title Refactor how ignore_above is set Fixed text fields block loader Sep 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @Kubik42, I've created a changelog YAML for you.

@Kubik42 Kubik42 force-pushed the kubik42-isignoreaboveset branch 4 times, most recently from 0a92f63 to 6f8b187 Compare September 8, 2025 23:10
@Kubik42
Copy link
Contributor Author

Kubik42 commented Sep 9, 2025

Update: so while I was brainstorming for how to cleanly handle ignore_above getting set an index-level, I was digging around for the original motivation for these changes and found #113538.

tldr: when we have an index-level ignore_above, we use that as a default for this.ignoreAbove. As a result, the statement ignoreAbove != ignoreAboveDefault would evaluate to false. This is a problem bc we use this statement here to evaluate whether ignore_above is set. This actually used to be an issue until it was fixed in #113570.

@elasticsearchmachine
Copy link
Collaborator

Hi @Kubik42, I've created a changelog YAML for you.

@Kubik42 Kubik42 force-pushed the kubik42-isignoreaboveset branch 3 times, most recently from a725363 to b07ddf4 Compare September 9, 2025 06:18
@Kubik42 Kubik42 changed the title Fixed text fields block loader Fixed a bug where text fields in LogsDB indices did not use their keyword multi fields for block loading Sep 10, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two more comments, LGTM otherwise.

/**
* Generates a string containing a random number of random length alphas, all delimited by space.
*/
public static String randomAlphasDelimitedBySpace(int maxAlphas, int minCodeUnits, int maxCodeUnits) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this in the TextRollingUpgradeIT? This is the only user for now.

@Kubik42 Kubik42 added v8.19.5 v8.18.8 v9.0.8 auto-backport Automatically create backport pull requests when merged v9.1.5 labels Sep 10, 2025
@Kubik42 Kubik42 merged commit 2c3bee7 into elastic:main Sep 10, 2025
33 of 34 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 134253

Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Sep 10, 2025
…word multi fields for block loading (elastic#134253)

* Added keyword multi field with ignore_above to match only text bwc tests

* Rework ignore_above

* Added unit tests

* Undo changed to match only text bwc test

* formatting

* Removed indexMode from field type

* Added another test case

* Fixed failing bwc tests

* Improved msg

* Added additional tests

* Added IgnoreAbove record, addressed index-level ignore above

* Fixed test typos

* Added IgnoreAboveTest

* Enforce at least one value or defaultValue to always be non-null when IgnoreAbove is initialized

* When initializing IgnoreAbove, dont use defaultValue from builder - this fixes failing BWC test

* Fixed typo

* Switched IgnoreAbove to constructor only, removed the ability to set default directly

* Update docs/changelog/134253.yaml

* Update 134253.yaml

* Move IgnoreAbove into Mapper and make it final, move everything new out of IndexSettings and into IgnoreAbove

* Fixed typo

* Added helpful constructor to IgnoreAbove

* Added helpful constructor to IgnoreAbove

(cherry picked from commit 2c3bee7)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java
#	server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
@Kubik42
Copy link
Contributor Author

Kubik42 commented Sep 11, 2025

💚 All backports created successfully

Status Branch Result
9.0

Questions ?

Please refer to the Backport tool documentation

Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Sep 11, 2025
…word multi fields for block loading (elastic#134253)

* Added keyword multi field with ignore_above to match only text bwc tests

* Rework ignore_above

* Added unit tests

* Undo changed to match only text bwc test

* formatting

* Removed indexMode from field type

* Added another test case

* Fixed failing bwc tests

* Improved msg

* Added additional tests

* Added IgnoreAbove record, addressed index-level ignore above

* Fixed test typos

* Added IgnoreAboveTest

* Enforce at least one value or defaultValue to always be non-null when IgnoreAbove is initialized

* When initializing IgnoreAbove, dont use defaultValue from builder - this fixes failing BWC test

* Fixed typo

* Switched IgnoreAbove to constructor only, removed the ability to set default directly

* Update docs/changelog/134253.yaml

* Update 134253.yaml

* Move IgnoreAbove into Mapper and make it final, move everything new out of IndexSettings and into IgnoreAbove

* Fixed typo

* Added helpful constructor to IgnoreAbove

* Added helpful constructor to IgnoreAbove

(cherry picked from commit 2c3bee7)

# Conflicts:
#	modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java
#	qa/rolling-upgrade/src/javaRestTest/java/org/elasticsearch/upgrades/TextRollingUpgradeIT.java
#	server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java
#	server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java
#	server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java
#	server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldTypeTests.java
#	server/src/test/java/org/elasticsearch/index/mapper/MultiFieldsTests.java
#	x-pack/plugin/wildcard/src/main/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapper.java
Kubik42 added a commit that referenced this pull request Sep 11, 2025
…ir keyword multi fields for block loading (#134253) (#134516)

* Fixed a bug where text fields in LogsDB indices did not use their keyword multi fields for block loading (#134253)
@Kubik42 Kubik42 deleted the kubik42-isignoreaboveset branch September 11, 2025 17:29
@Kubik42
Copy link
Contributor Author

Kubik42 commented Sep 11, 2025

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

Kubik42 added a commit to Kubik42/elasticsearch that referenced this pull request Sep 12, 2025
Kubik42 added a commit that referenced this pull request Sep 12, 2025
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.5 v9.1.5 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants