Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Apr 22, 2025

Fixes a sneaky bug in single value query that happens when run against a keyword field that:

  • Is defined on every field
  • Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index with two documents:

{"a": "foo"}
{"a": ["foo", "bar"]}

I don't think this is particularly likely in production, but it's quite likely in tests. Which is where I hit this - in the serverless tests we index an index with four documents into three shards and two of the documents look just like this. So about 1/3 or the time we triggered this bug.

Mechanically this is triggered by the SingleValueMatchQuery incorrectly rewriting itself to MatchAll in the scenario above. This fixes that.

Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
@nik9000 nik9000 requested review from dnhatn and iverase April 22, 2025 11:11
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000 nik9000 added auto-backport Automatically create backport pull requests when merged v9.0.1 and removed v9.0.0 labels Apr 22, 2025
Comment on lines 218 to 229
if (v instanceof Double n) {
fields.add(new DoubleField("foo", n, Field.Store.NO));
} else if (v instanceof Float n) {
fields.add(new DoubleField("foo", n, Field.Store.NO));
} else if (v instanceof Number n) {
long l = n.longValue();
fields.add(new LongField("foo", l, Field.Store.NO));
} else if (v instanceof String s) {
fields.add(new KeywordField("foo", v.toString(), Field.Store.NO));
} else {
throw new UnsupportedOperationException();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: could be simplified with switch like in #120875

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Nik!

@nik9000 nik9000 merged commit c2fdc06 into elastic:main Apr 22, 2025
17 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.x
9.0

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
* ESQL: Fix sneaky bug in single value query (#127146)

Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.

* Revert "Switch"

This reverts commit cc7805d.
elasticsearchmachine pushed a commit that referenced this pull request Apr 22, 2025
* ESQL: Fix sneaky bug in single value query (#127146)

Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.

* Revert "Switch"

This reverts commit cc7805d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.1 v8.19.0 v9.0.1 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants