Skip to content

Conversation

@jordan-powers
Copy link
Contributor

As a result of the randomized testing enabled in #121462, we found that we currently fail to parse documents with arrays of objects containing counted_keyword fields.

For example, attempting to index this document with this mapping would produce an error:
Mapping:

"mappings":  {
  "properties": {
    "parent": {
      "type": "object",
      "properties": { 
        "child": {
          "type": "counted_keyword"
        }
      }
    }
  }
}

Document:

{ "parent": [ {"child": "a"}, {"child": "b"} ] }

Error:
"type":"illegal_argument_exception","reason":"DocValuesField \"parent.child_count\" appears more than once in this document (only one value is allowed per field)

This PR fixes this issue by using a custom docvalues field to store the count instead of the built-in lucene BinaryDocValues. This custom CountsBinaryDocValuesField has logic to handle multiple values for the same field.

@jordan-powers jordan-powers added >bug auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 v9.1.0 labels Feb 3, 2025
@jordan-powers jordan-powers self-assigned this Feb 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and i think this is backwards compatible as well since doc values have the same format. @martijnvg do you see any problems with BWC here?

@martijnvg martijnvg added the test-full-bwc Trigger full BWC version matrix tests label Feb 4, 2025
@martijnvg
Copy link
Member

and i think this is backwards compatible as well since doc values have the same format.

Yes, I think this change doesn't break bwc. Since keep storing the values as single binary array (even though the values may have come from multiple array elements or leaf fields with the same name), so the format doesn't change.

@jordan-powers jordan-powers enabled auto-merge (squash) February 4, 2025 16:06
@Override
public BytesRef binaryValue() {
try {
int bytesSize = (counts.size() + 1) * 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a constant with a meaningful name for 5?

@jordan-powers jordan-powers merged commit 13b743c into elastic:main Feb 4, 2025
17 checks passed
@jordan-powers jordan-powers deleted the counted_keyword_multi_subobject_fix branch February 4, 2025 19:46
@jordan-powers
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

elasticsearchmachine pushed a commit that referenced this pull request Feb 4, 2025
As a result of the randomized testing enabled in #121462, we found that
we currently fail to parse documents with arrays of objects containing
counted_keyword fields. This PR fixes this issue by using a custom
docvalues field to store the count instead of the built-in lucene
BinaryDocValues. This custom CountsBinaryDocValuesField has logic to
handle multiple values for the same field.

(cherry picked from commit 13b743c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine test-full-bwc Trigger full BWC version matrix tests v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants