Skip to content

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Oct 18, 2024

It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: #107357

This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.

However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.

In order to not create a backwards compatibility problem, we have compromised with the following:

  • Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
  • For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning

Fixes #107357
Fixes #116799

@craigtaverner craigtaverner added >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Oct 18, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@craigtaverner craigtaverner force-pushed the enrich_strict_range_types branch from a5859c8 to 498adc3 Compare November 8, 2024 16:41
And removed incorrect warnings check from another test. But that test is not providing any warnings, so we have a bug in enrich warnings production.
@craigtaverner craigtaverner added auto-backport Automatically create backport pull requests when merged v8.17.0 v8.16.1 labels Nov 13, 2024
@craigtaverner craigtaverner marked this pull request as ready for review November 13, 2024 20:34
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@craigtaverner craigtaverner changed the title Added strict range type checks for ENRICH Added stricter range type checks and runtime warnings for ENRICH Nov 14, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

…rce.

This should also cover the many places in test code we use null configurations.
Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just left a comment about some runtime validations

Comment on lines 94 to 99
if (inputDataType == DataType.UNSUPPORTED) {
throw new EsqlIllegalArgumentException("ENRICH cannot match on unsupported input data type");
}
if (fieldType == null) {
throw new EsqlIllegalArgumentException("ENRICH cannot match on non-existent field");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these cases really happen? Shouldn't they be prevented at Verification time?
In case, it would be good to have a test for them.

Some nodes don't have the Source field in the transport request
Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
@craigtaverner craigtaverner merged commit f3cd482 into elastic:main Nov 19, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.16 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 115091

craigtaverner added a commit to craigtaverner/elasticsearch that referenced this pull request Nov 20, 2024
…stic#115091)

It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357

This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.

However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.

In order to not create a backwards compatibility problem, we have compromised with the following:

* Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
* For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning

Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
elasticsearchmachine pushed a commit that referenced this pull request Nov 20, 2024
#115091) (#117130)

* Added stricter range type checks and runtime warnings for ENRICH (#115091)

It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: #107357

This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.

However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.

In order to not create a backwards compatibility problem, we have compromised with the following:

* Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
* For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning

Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).

* Fix compile error likely due to mismatched ordering of backports
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Nov 20, 2024
…stic#115091)

It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357

This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.

However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.

In order to not create a backwards compatibility problem, we have compromised with the following:

* Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
* For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning

Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…stic#115091)

It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357

This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.

However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.

In order to not create a backwards compatibility problem, we have compromised with the following:

* Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
* For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning

Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
@craigtaverner
Copy link
Contributor Author

Backported to 8.17 in #117130, and we'll not bother with 8.16 at this point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.1 v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENRICH docs for ES|QL describe limitations which are not valid ES|QL: Enrich with wrong field type throws NumberFormatException

3 participants