Skip to content

Conversation

parkertimmins
Copy link
Contributor

@parkertimmins parkertimmins commented Jun 18, 2025

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source.

This fixes the issue by replacing the problematic object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by .. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.

For example if the following flattened field values is indexed:

{
   "a": {
       "b": 5
       "b": {
          "c": 10,
          "d": { "e": 15 }
       }
   }
}

The following synthetic source will be produced:

{
   "a": {
      "b": 5
      "b.c": 10,
      "b.d.e": 15
   }
}

Fixes #122936

next = nextValue == null ? KeyValue.EMPTY : new KeyValue(nextValue);

var startPrefix = curr.prefix.diff(openObjects);
if (startPrefix.prefix.isEmpty() == false && startPrefix.prefix.getFirst().equals(lastScalarSingleLeaf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the conflict doesn't happen on the first part of the prefix, e.g.

field {
  path {
    to: 10
    to {
      foo: bar
    }
  }
}

Would this be caught here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, so this will become:

field {
  path {
    to: 10
    to.foo: bar
 }
}

When it get to the first key/value field.path.to|10 it will take the else block and traverse down into the object, adding field and path to the openObject context. When it reaches the key value field.path.to.foo|bar that object will still be open, and seeing that lastScalarSingleLeaf has a value of to, and that to is the first token in the startPrexix (to.foo), it will make a concatenated path.

(Updated a test to this situation to verify it)

// THEN
assertEquals(
"{\"a\":\"value_a\",\"a\":{\"b\":\"value_b\",\"b\":{\"c\":\"value_c\"},\"d\":\"value_d\"}}",
"{\"a\":\"value_a\",\"a.b\":\"value_b\",\"a.b.c\":\"value_c\",\"a.d\":\"value_d\"}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, there was a test which had a scalar/object mismatch. But it produced duplicate keys. When the xcontent was converted to jsont these duplicate keys originally threw an error, but now just drop the duplicates. (Something must have changed in xcontent stuff since the issue was opened to cause this change from an error to deduplication)

@parkertimmins parkertimmins marked this pull request as ready for review June 18, 2025 19:32
@parkertimmins parkertimmins requested review from kkrik-es and lkts June 18, 2025 19:33
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 18, 2025
@parkertimmins parkertimmins added auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings labels Jun 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jun 18, 2025
@parkertimmins parkertimmins added needs:triage Requires assignment of a team area label v8.19.0 v9.0.4 v8.18.4 and removed Team:StorageEngine needs:triage Requires assignment of a team area label labels Jun 18, 2025
@parkertimmins parkertimmins requested a review from lkts June 20, 2025 03:31
@parkertimmins parkertimmins merged commit 245dc07 into elastic:main Jun 20, 2025
29 checks passed
@parkertimmins parkertimmins deleted the parker/flattened-scalar-object-mismatch branch June 20, 2025 19:20
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.0
8.18 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129600

parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request Jun 20, 2025
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request Jun 20, 2025
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.

(cherry picked from commit 245dc07)

# Conflicts:
#	docs/reference/elasticsearch/mapping-reference/flattened.md
@parkertimmins
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19
8.18

Questions ?

Please refer to the Backport tool documentation

parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request Jun 20, 2025
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.

(cherry picked from commit 245dc07)
elasticsearchmachine pushed a commit that referenced this pull request Jun 20, 2025
…ect mismatch (#129600) (#129792)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
elasticsearchmachine pushed a commit that referenced this pull request Jun 20, 2025
…lar/object mismatch (#129600) (#129794)

* Make flattened synthetic source concatenate object keys on scalar/object mismatch (#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.

(cherry picked from commit 245dc07)

* remove methods not avaiable in java version

* skip testing console-result in docs
elasticsearchmachine pushed a commit that referenced this pull request Jun 20, 2025
…lar/object mismatch (#129600) (#129793)

* Make flattened synthetic source concatenate object keys on scalar/object mismatch (#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.

(cherry picked from commit 245dc07)

# Conflicts:
#	docs/reference/elasticsearch/mapping-reference/flattened.md

* remove methods not avaiable in java version

* skip testing console-result in docs
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
julian-elastic pushed a commit to julian-elastic/elasticsearch that referenced this pull request Jun 24, 2025
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
@martijnvg
Copy link
Member

martijnvg commented Jun 30, 2025

@parkertimmins Did this fix get back ported to the 9.0 branch? EDIT: I think it did. However this PR has still the backport pending label.

@parkertimmins
Copy link
Contributor Author

@martijnvg Yep, it backported to 8.18, 8.19, and 9.0. I'll go ahead and remove the backport-pending label.

@parkertimmins
Copy link
Contributor Author

A bug was found in v9.0.2, where attempting to get the synthetic source of a flattened field produced error messages like: Current context not Object but root and Can not write a field name, expecting a value (depending on whether just the flattened field or the whole source was produced). It turns about that the bug was fixed as a side effect of this PR. I'll document the issue here in case anyone comes across it:

The cause of the issue was that some object opening braces were missing from the synthetic source of the flattened field. So the json has more closing braces than opening braces, causing subsequent writes to go to the root context (hence the error message).

The bug is here: it should have an else clause with a break statement. This code is used to decide how many braces to open. The current path is compared against the previous path to see what objects have already been opened. shared() is meant to see what prefix of two paths match. The prefix that matches is exactly the objects than are already opened. But the shared() function does not actually find a matching prefix, instead is finds matching keys at any level in the two paths.

For example, assume the previous path is a.b.c, and the current path is b.b.c. shared(a.b.c, b.b.c) should return [], but currently returns [b] since the second key matches. Since we are deciding how many objects to open for the current path b.b.c, and there is no common prefix with a.b.c , we should open objects for both the first and second b's. But because shared() returned a value of [b] and b.b.c starts with a b, it will incorrectly think that the first b is already an open object.

Here's some code with a test showing the behavior along with a fix: https://github.com/elastic/elasticsearch/compare/v9.0.2...parkertimmins:elasticsearch:parker/flattened-test-from-v9.0.2?expand=1

parkertimmins added a commit that referenced this pull request Jul 1, 2025
There was a bug in previous version where flattened fields would produce incorrect synthetic source with too few opening braces. This bug was fixed as a side effect of #129600. Adding this test to confirm. See #129600 for a full explanation.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 2, 2025
There was a bug in previous version where flattened fields would produce incorrect synthetic source with too few opening braces. This bug was fixed as a side effect of elastic#129600. Adding this test to confirm. See elastic#129600 for a full explanation.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
There was a bug in previous version where flattened fields would produce incorrect synthetic source with too few opening braces. This bug was fixed as a side effect of elastic#129600. Adding this test to confirm. See elastic#129600 for a full explanation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.18.4 v8.19.0 v9.0.4 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synthetic _source can't be retrieved with overlapping keys in flattened field
5 participants