Skip to content

Conversation

jordan-powers
Copy link
Contributor

In #131314 we fixed match_only_text fields with ignore_above keyword multi-fields in the case that the keyword multi-field is stored. However, the issue is still present if the keyword field is not stored, but instead has doc values.

This patch fixes that case.

Follow-up to #131314.

@jordan-powers jordan-powers self-assigned this Jul 16, 2025
@jordan-powers jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 v9.1.0 v9.2.0 labels Jul 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- match: { "hits.total.value": 1 }
- match:
hits.hits.0._source.foo: "Apache Lucene powers Elasticsearch"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to convince myself that there won't ever be duplicate values when some values come from doc_values and some from the original field. It might be nice to have a test that covers this case. Something like:

synthetic_source match_only_text as multi-field with ignored stored keyword as parent with multiple values:
  - do:
      indices.create:
        index: synthetic_source_test
        body:
          settings:
            index:
              mapping.source.mode: synthetic
          mappings:
            properties:
              foo:
                type: keyword
                store: false
                doc_values: true
                ignore_above: 10
                fields:
                  text:
                    type: match_only_text

  - do:
      index:
        index: synthetic_source_test
        id: "1"
        refresh: true
        body:
          foo: ["Apache Lucene powers Elasticsearch", "Apache"]

  - do:
      search:
        index: synthetic_source_test
        body:
          query:
            match_phrase:
              foo.text: apache lucene

  - match: { "hits.total.value": 1 }
  - match:
      hits.hits.0._source.foo: ["Apache", "Apache Lucene powers Elasticsearch"]

Copy link
Contributor

@parkertimmins parkertimmins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one test suggestion, but looks good. Nice work!

@jordan-powers jordan-powers enabled auto-merge (squash) July 16, 2025 19:38
@jordan-powers jordan-powers force-pushed the fix_match_only_text_multi_fields_3 branch from 16a8e7f to ea6e60f Compare July 17, 2025 02:04
@jordan-powers jordan-powers disabled auto-merge July 17, 2025 02:06
@jordan-powers jordan-powers merged commit 7a01565 into elastic:main Jul 17, 2025
34 checks passed
jordan-powers added a commit to jordan-powers/elasticsearch that referenced this pull request Jul 17, 2025
In elastic#131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.1

jordan-powers added a commit to jordan-powers/elasticsearch that referenced this pull request Jul 17, 2025
In elastic#131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
elasticsearchmachine pushed a commit that referenced this pull request Jul 17, 2025
In #131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
elasticsearchmachine pushed a commit that referenced this pull request Jul 17, 2025
In #131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
@jordan-powers jordan-powers deleted the fix_match_only_text_multi_fields_3 branch July 28, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants