Skip to content

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Sep 27, 2024

Add per-object param for controlling source recording, in line with #112706 for leaf fields.

Document how to keep synthetic source per field, object or index.

Related to #112012

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/ObjectMapper.java
#	server/src/test/java/org/elasticsearch/index/mapper/DocumentParserTests.java
#	server/src/test/java/org/elasticsearch/index/mapper/ObjectMapperTests.java
@kkrik-es kkrik-es marked this pull request as ready for review September 30, 2024 14:54
@kkrik-es kkrik-es requested a review from a team as a code owner September 30, 2024 14:54
@kkrik-es kkrik-es requested review from lkts and martijnvg and removed request for a team and lkts September 30, 2024 14:54
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I left a comment about what we say we guarantee with synthetic source keep arrays.

====== Keeping the origical source

It is possible to record the original source of an object or field, at extra storage cost, using param
`synthetic_source_keep`. The default value is `none`; setting it to `arrays` leads to storing the original source for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't guarantee that we store original source when synthetic_source_keep is set to arrays is enabled? Maybe we we should just guarantee the array ordering is guaranteed? This gives us better ways to reduce the overhead later on without breaking bwc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to main bwc with things like field: [1, 2, [3], [[4, [5]]], ["5"], 6] , then that makes things more difficult. Ideally we would synthesize this as: field: [1, 2, 3, 5, 5, 6] (assuming this is a number field).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is how it's implemented right now. Are you suggesting that we don't document the current behavior? I think it'll be a breaking change if we change it to return [1, 2, 3, 5, 5, 6] in the example above, whether we document it or not..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can be explicit that we don't guarantee the exact form.. is this what you have in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can be explicit that we don't guarantee the exact form.. is this what you have in mind?

Apologies, yes, this is what I mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wording to reflect that this may change, ptal.

@kkrik-es kkrik-es added auto-backport Automatically create backport pull requests when merged and removed auto-backport-and-merge labels Oct 3, 2024
subobjects auto:
- requires:
cluster_features: ["mapper.subobjects_auto"]
cluster_features: ["mapper.subobjects_auto", "mapper.bwc_workaround_9_0"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding gte_v8.16.0 as cluster feature avoid adding the new mapper.bwc_workaround_9_0 cluster feature?

Copy link
Contributor Author

@kkrik-es kkrik-es Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If added in skip, it will disable the test in main. If added here, it won't fix the test failures.

This is controlled through param `synthetic_source_keep` with the following option:

- `none`: synthetic source diverges from the original source as described above (default).
- `arrays`: arrays of the corresponding field or object preserve the original element ordering and duplicate elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kkrik-es kkrik-es requested a review from a team as a code owner October 3, 2024 15:14
@kkrik-es kkrik-es merged commit dd20248 into elastic:main Oct 3, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 113690

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Oct 3, 2024

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Oct 3, 2024
* Add object param for keeping synthetic source

* Update docs/changelog/113690.yaml

* fix merging

* add tests

* merge

* fix randomized tests

* add documentation

* dedup id in docs

* update documentation

* update documentation

* fix bwc

* fix bwc

* fix unintended

* Revert "fix bwc"

This reverts commit 18dc913.

* Revert "fix bwc"

This reverts commit f4ddb0e.

* add missing test

* fix transform

* fix transform

* fix transform

* fix transform

* fix transform

(cherry picked from commit dd20248)

# Conflicts:
#	rest-api-spec/build.gradle
elasticsearchmachine pushed a commit that referenced this pull request Oct 3, 2024
* Add object param for keeping synthetic source (#113690)

* Add object param for keeping synthetic source

* Update docs/changelog/113690.yaml

* fix merging

* add tests

* merge

* fix randomized tests

* add documentation

* dedup id in docs

* update documentation

* update documentation

* fix bwc

* fix bwc

* fix unintended

* Revert "fix bwc"

This reverts commit 18dc913.

* Revert "fix bwc"

This reverts commit f4ddb0e.

* add missing test

* fix transform

* fix transform

* fix transform

* fix transform

* fix transform

(cherry picked from commit dd20248)

# Conflicts:
#	rest-api-spec/build.gradle

* Update build.gradle

* Update MapperFeatures.java

* Update 20_synthetic_source.yml

* Update 21_synthetic_source_stored.yml

* Update 21_synthetic_source_stored.yml

* Update 21_synthetic_source_stored.yml

* Update 21_synthetic_source_stored.yml
matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 4, 2024
* Add object param for keeping synthetic source

* Update docs/changelog/113690.yaml

* fix merging

* add tests

* merge

* fix randomized tests

* add documentation

* dedup id in docs

* update documentation

* update documentation

* fix bwc

* fix bwc

* fix unintended

* Revert "fix bwc"

This reverts commit 18dc913.

* Revert "fix bwc"

This reverts commit f4ddb0e.

* add missing test

* fix transform

* fix transform

* fix transform

* fix transform

* fix transform
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants