Skip to content

Conversation

gmarouli
Copy link
Contributor

In this PR we attempt to fix #134770, we have identified the issue at the retrieval of the field value for the second time in order to populate the keyword multi-field.

We noticed that when the we performed the second retrieval it was not using the recently optimised value, but the value of the initial buffer. We added a new variable that "remembers" the optimised value and allows us to reuse it.

@gmarouli gmarouli added :Core/Infra/Core Core issues without another label auto-backport Automatically create backport pull requests when merged v8.19.5 labels Sep 16, 2025
@gmarouli gmarouli requested a review from a team as a code owner September 16, 2025 09:24
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team v9.2.0 labels Sep 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@gmarouli gmarouli added the >bug label Sep 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@gmarouli gmarouli requested a review from martijnvg September 16, 2025 11:57
- match:
hits.hits.0._source.foo: "c:\\windows\\system32\\svchost.exe"

- do:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I'd add a comment here pointing to the bug issue, so it's immediately clear why we are calling this twice (a future dev might be tempted to "clean up" things and remove the second call)

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One Q and one idea to increase test coverage but LGTM

var text = parser.getValueAsText();
assertThat(text, Matchers.notNullValue());
assertTextRef(text.bytes(), "bar\"baz\"");
// Retrieve the value for a second time to ensure the last value is available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make it sense to test the reset of the lastOptimizedValue too? e.g. calling nextFieldName/nextToken + getValueAsText and ensure the content is not stale?

Copy link
Contributor

@jordan-powers jordan-powers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks for fixing this Mary! I had one suggestion for testing, but overall LGTM!

// Retrieve the value for a second time to ensure the last value is available
text = parser.getValueAsText();
assertThat(text, Matchers.notNullValue());
assertTextRef(text.bytes(), "bar\"baz\"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider updating testGetValueRandomized() to also double-retrieve the value from the parser.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Mary for fixing this!

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 17, 2025
@elasticsearchmachine elasticsearchmachine merged commit 942e341 into elastic:main Sep 17, 2025
34 checks passed
@gmarouli gmarouli deleted the bug-fix/unicode-parser-optimisation branch September 17, 2025 14:36
@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Sep 17, 2025

💔 Backport failed

Status Branch Result
8.19
9.1

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 134790

gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 18, 2025
… (elastic#134790)

In this PR we attempt to fix elastic#134770, we have identified the issue at
the retrieval of the field value for the second time in order to
populate the keyword multi-field.

We noticed that when the we performed the second retrieval it was not
using the recently optimised value, but the value of the initial buffer.
We added a new variable that "remembers" the optimised value and allows
us to reuse it.
gmarouli added a commit that referenced this pull request Sep 29, 2025
…lue. (#135184)

In #134790 we introduced a bug that was caught by our tests.;

The problem manifested itself when a multi field mapper would call `getText()` which would set `_tokenIncomplete` to `false` after the `getOptimisedText()`. This would evaluate the condition `_currToken == JsonToken.VALUE_STRING && _tokenIncomplete && stringEnd > 0` to false and the `lastOptimisedValue` would not be reset.

We changed the code to always reset the `lastOptimisedValue` when a `next*()` method is called.

Furthermore, we introduced a randomised unit test that creates two `XContentParsers` and runs one as a baseline using none of the optimised code and the other one is accessing both optimised and non optimised in a random pattern. This test was able to catch both #134770 & #135256.

Fixes #135256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v8.19.5 v9.1.5 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Behavior Change Introduced by Optimizing How Escape Sequences Are Handled

5 participants