-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Added stricter range type checks and runtime warnings for ENRICH #115091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added stricter range type checks and runtime warnings for ENRICH #115091
Conversation
Hi @craigtaverner, I've created a changelog YAML for you. |
a5859c8
to
498adc3
Compare
And removed incorrect warnings check from another test. But that test is not providing any warnings, so we have a bug in enrich warnings production.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @craigtaverner, I've updated the changelog YAML for you. |
…rce. This should also cover the many places in test code we use null configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just left a comment about some runtime validations
if (inputDataType == DataType.UNSUPPORTED) { | ||
throw new EsqlIllegalArgumentException("ENRICH cannot match on unsupported input data type"); | ||
} | ||
if (fieldType == null) { | ||
throw new EsqlIllegalArgumentException("ENRICH cannot match on non-existent field"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these cases really happen? Shouldn't they be prevented at Verification time?
In case, it would be good to have a test for them.
Some nodes don't have the Source field in the transport request
Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
💔 Backport failed
You can use sqren/backport to manually backport by running |
…stic#115091) It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357 This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
#115091) (#117130) * Added stricter range type checks and runtime warnings for ENRICH (#115091) It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: #107357 This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time). * Fix compile error likely due to mismatched ordering of backports
…stic#115091) It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357 This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
…stic#115091) It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357 This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
Backported to 8.17 in #117130, and we'll not bother with 8.16 at this point |
It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: #107357
This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.
However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.
In order to not create a backwards compatibility problem, we have compromised with the following:
Fixes #107357
Fixes #116799