Skip to content

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Sep 22, 2025

In #134790 we introduced a bug that was caught by our tests.;

The problem manifested itself when a multi field mapper would call getText() which would set _tokenIncomplete to false after the getOptimisedText(). This would evaluate the condition _currToken == JsonToken.VALUE_STRING && _tokenIncomplete && stringEnd > 0 to false and the lastOptimisedValue would not be reset.

We changed the code to always reset the lastOptimisedValue when a next*() method is called.

Furthermore, we introduced a randomised unit test that creates two XContentParsers and runs one as a baseline using none of the optimised code and the other one is accessing both optimised and non optimised in a random pattern. This test was able to catch both #134770 & #135256.

Fixes #135256

@gmarouli gmarouli requested a review from a team as a code owner September 22, 2025 11:33
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Sep 22, 2025
@gmarouli gmarouli requested a review from martijnvg September 22, 2025 11:33
@gmarouli gmarouli added >bug :Core/Infra/Core Core issues without another label v9.1.5 v8.19.5 labels Sep 22, 2025
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team and removed needs:triage Requires assignment of a team area label labels Sep 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mary!

I think you also need to undo a part #135172 did: https://github.com/elastic/elasticsearch/pull/135172/files#diff-8ef1741dbe67a9d21e8154754d17238802a9b21344f997067befe4f618f985feR145 ?

I think it would be great if we can get a unit test that reproduces the issue. For example a json document that we know caused the logsdb randomized tests to fail. If we need a DotExpandingXContentParser to reproduce that should be ok. We should reproduce a the bug, which is that an incorrect value is returned for a specific field.

I think the unit test coverage needs to be improved. Maybe we should create a unit test that parses a random json document. First with a parser that never uses the uff-8 parsing optimization and secondly with a parser that uses the utf-8 parsing optimization. Both parsers should produce the exact same result. (I think we can let both parsers something like a map of maps). In order to make it more evil, we can randomly invoke text() or optimizedText() multiple times.

@gmarouli gmarouli marked this pull request as draft September 22, 2025 13:22
@gmarouli gmarouli changed the title Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. WIP - Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. Sep 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've updated the changelog YAML for you.

@gmarouli gmarouli added >non-issue and removed >bug labels Sep 25, 2025
@gmarouli
Copy link
Contributor Author

Changed to non-issue to remove the changelog.

@gmarouli gmarouli marked this pull request as ready for review September 25, 2025 08:05
@gmarouli gmarouli changed the title WIP - Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. Sep 25, 2025
@gmarouli gmarouli requested a review from martijnvg September 25, 2025 08:19
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @gmarouli! I left a few comments around testing.

Maybe also add a yaml test similar to reproduction in #135256?

@gmarouli
Copy link
Contributor Author

Thanks for the feedback @martijnvg , it is much simpler now and I learned a few extra things. I extended the existing test to capture both bugs we encountered.

@gmarouli gmarouli requested a review from martijnvg September 29, 2025 08:42
@gmarouli gmarouli added the auto-backport Automatically create backport pull requests when merged label Sep 29, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one testing suggestion, otherwise LGTM.

@gmarouli gmarouli added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Sep 29, 2025
@gmarouli gmarouli merged commit 5420780 into elastic:main Sep 29, 2025
31 of 34 checks passed
@gmarouli gmarouli deleted the bug-fix-utf8-optimiser branch September 29, 2025 12:46
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 135184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Core/Infra/Core Core issues without another label >non-issue Team:Core/Infra Meta label for core/infra team v8.19.5 v9.1.5 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESUTF8StreamJsonParser does not reset correctly the last optimised value.

3 participants