-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ES|QL] Implicit casting string literal to intervals in EsqlScalarFunction and GroupingFunction #115814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Implicit casting string literal to intervals in EsqlScalarFunction and GroupingFunction #115814
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
@elasticmachine update branch |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
Aren't you the greatest? ❤️ |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some first comments.
Generally, IMO this functionality can be interesting, but starting on this path, I'm thinking "BUCKET(datetime, "1 day")
now works, but how about BUCKET(float, "100")
" or "EVAL x = "1953-09-02T00:00:00.000Z" + "2 days"
, but how about EVAL x = "1" + 2"
?"
And if we'd consider autocasting for numbers, we might need to do it for all functions. IPs?
I guess it's all doable, just wondering if we commit to it now.
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Outdated
Show resolved
Hide resolved
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Outdated
Show resolved
Hide resolved
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Show resolved
Hide resolved
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I've left few comments. The one that I'd like you to have a careful look at is the one of getTargetType
method that returns an UNSUPPORTED
data type.
* resolving this reference before implicit casting may cause this reference to have customMessage=true, it prevents further | ||
* attempts to resolve this reference. | ||
*/ | ||
new ImplicitCasting(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union types was/is tricky, too much back and forth and too many places where things are handled carefully. If @craigtaverner and @alex-spies are ok with this change, it's ok with me as well; they know best the union types area.
// If the function takes strings as input, there is no need to cast a string literal to it. | ||
// Return UNSUPPORTED, so that ImplicitCasting doesn't process it. | ||
if (isString(type)) { | ||
return UNSUPPORTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strange way of saying that there shouldn't be an automatic conversion/casting for a specific data type.
Returning here a data type UNSUPPORTED can have double meaning:
- the actual target type is UNSUPPORTED
- the conversion shouldn't take place
I know this code existed before this PR, it might work and get the job done, but it's a confusing code; it is not clearly expressing the intention.
If this method - getTargetType
- would have lived inside Analyzer and be private
I would have been ok with it, but it's a public method, it can be accessed from anywhere in code and this means that this code duality I mentioned above can be confusing to anyone else wanting to re-use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strange way of saying that there shouldn't be an automatic conversion/casting for a specific data type.
Returning here a data type UNSUPPORTED can have double meaning:
- the actual target type is UNSUPPORTED
- the conversion shouldn't take place
Understood, UNSUPPORTED
is used to indicate that implicit casting is not supported for this argument here. I was thinking of using NULL
, however it may lead people to interpret it as casting to a NULL
. Before having a better solution, I'll add more comments here to indicate the purpose of using UNSUPPORTED
here, perhaps we may want to add a flag in ArgSignature
to indicate whether to do implicit casting for this argument or not, with a targetType if yes, but it adds one more attribute to ArgSignature
, I'm not quite sure if it worths it or not.
If this method -
getTargetType
- would have lived inside Analyzer and beprivate
I would have been ok with it, but it's a public method, it can be accessed from anywhere in code and this means that this code duality I mentioned above can be confusing to anyone else wanting to re-use it.
I can make getTargetType
a private method in EsqlFunctionRegistry
. Actually Analyzer
doesn't call it, it is public because AbstractFunctionTestCase
calls it. AbstractFunctionTestCase
generates docs for automatically from annotations, however not all of the functions are registered in EsqlFunctionRegistry
, like operators, so AbstractFunctionTestCase
calls some EsqlFunctionRegistry
methods separately to generate docs for them. I'll make it a private method here, as this target type is not externalized in the generated docs.
@craigtaverner and @alex-spies Could you help me take a look at the change regarding to moving I don't recall whether there is a specific reason that Thank you! |
Thank you @astefan ! I'll wait for a second pair of eyes from union type's perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits.
I find the messaging around implicit casting a bit inconsistent, though if we'll want to improve this, we can do it subsequently. So:
STATS BY BUCKET(birth_date, "1 w")
will work. But if I mistype, for STATS BY BUCKET(birth_date, "1w")
I'll get "... second argument of [BUCKET(birth_date, \"1w\")] must be [integral, date_period or time_duration] ..."
, which is not actually true with autocasting.
ES_TO_TYPE = Collections.unmodifiableMap(map); | ||
// DATETIME has different esType and typeName, add an entry in NAME_TO_TYPE with date as key | ||
map = TYPES.stream().collect(toMap(DataType::typeName, t -> t)); | ||
map.put("date", DataType.DATETIME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not go into NAME_OR_ALIAS_TO_TYPE
below? date
is an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into NAME_OR_ALIAS_TO_TYPE
a bit further, and it is used by inline_cast
only for now. It should have date
I think, so that we can support ::date
, today only ::datetime
is supported. I opened a separate issue to do it, as it requires additional tests, and I'd like to address it separately.
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Show resolved
Hide resolved
private static Expression castStringLiteralToTemporalAmount(Expression from) { | ||
try { | ||
TemporalAmount result = maybeParseTemporalAmount(from.fold().toString().strip()); | ||
if (result == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is null
acceptable in any case here?
I find reading the flow of calls a bit "uneven": castString...()
will try to maybeParse...()
, which if it returns null
is OK, but if it can't, it'll throw. Can maybeParse...()
throw on empty value? Do we need to "maybe" parse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybeParseTemporalAmount
might return a null, if the literal string does not follow the format of a temporal(there is space between integer and unit), for example "1", "1w", "" etc. It doesn't throw on empty value. The verification of a non temporal format string is deferred to resolveType()
to return a proper message. Here are a few examples that the error message is from resolveType()
:
stats max(emp_no) by bucket(emp_no, "5")
returns second argument of [bucket(emp_no, "5")] must be [numeric], found value ["5"] type [keyword]
stats max(emp_no) by bucket(hire_date, "5")
returns second argument of [bucket(hire_date, "5")] must be [integral, date_period or time_duration], found value ["5"] type [keyword]
stats max(emp_no) by bucket(hire_date, "5w")
returns second argument of [bucket(hire_date, "5w")] must be [integral, date_period or time_duration], found value ["5w"] type [keyword]
stats max(emp_no) by bucket(hire_date, "")
returns second argument of [bucket(hire_date, "")] must be [integral, date_period or time_duration], found value [""] type [keyword]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't throw on empty value.
Yes, I was wondering if it could.
The (small, can be ignored ;) ) issue I see is that maybeParseTemporalAmount()
is only invoked if isTemporalAmount()
. Meaning that it has to be one.
But the failure case is not treated uniformly:
- If it's not well formatted because there's just one token (or zero, but that's just theoretical, I think, b/c of the grammar), it returns a
null
. - If it's not well formatted because the first token isn't a number, or the second one isn't a defined unit, it'll throw.
And I'm not sure why, since also in the former case, as written, it has to be a temporal amount and it cannot be just one token (or empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't throw on empty value.
Yes, I was wondering if it could.
The (small, can be ignored ;) ) issue I see is that
maybeParseTemporalAmount()
is only invoked ifisTemporalAmount()
. Meaning that it has to be one. But the failure case is not treated uniformly:
- If it's not well formatted because there's just one token (or zero, but that's just theoretical, I think, b/c of the grammar), it returns a
null
.- If it's not well formatted because the first token isn't a number, or the second one isn't a defined unit, it'll throw.
And I'm not sure why, since also in the former case, as written, it has to be a temporal amount and it cannot be just one token (or empty).
Thanks for explaining it, now I understand the comments better! The validations of Bucket
's second argument happen in two places, one in ImplicitCasting
, the other in Bucket.resolveType()
. Bucket
is kind of unique, as its second argument - buckets
's data type depends on its first argument - field's data type, and it is checked in resolveType()
, ImplicitCasting
doesn't check the data type of the other arguments in the same function in order to decide which data type to convert to.
If it has zero or one token - ImplicitCasting
keeps it unchanged, and defer it to Bucket.resolveType()
throws a proper error message depending on the data type of its first argument, the error message may suggest a temporal or not - this is just for the purpose of getting a proper error message.
If it has more than one token - ImplicitCasting
validates the string, and throw error for invalid values.
I guess we'll also want to update the table here, to include grouping function(s)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Since ImplicitCasting is about string literals, it should not interact with union-types, which is only about FieldAttributes. As a separate concern, it might be nice to have tests that use both union-types and implicit casting in the same test, but I do not expect there to be any issues.
Yes, this table is updated by this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I didn't review the PR in full, just chiming in here on the discussion of the resolution optimizer batch.
Regarding union types and the placement of ImplicitCasting, I agree with Craig that this shouldn't affect things. In fact, I believe that, in an ideal world, their relative placement in the resolution batch shouldn't even matter.
I left a comment because depending on the order here may not be solving all edge cases. Please don't let this stop you from merging this PR - but if we end up solving this by amending the verification/resolution of STATS (so that we don't prematurely mark STATS as unresolveable in case ImplicitCasting wasn't applied before resolution), we may be able to preserve the original order of the resolution rules.
Sorry, I wasn't clear enough. The table (in this PR) mentions the following functions: |
Oh, the table header is changed to this: |
Thanks for reviewing @alex-spies ! I have some ideas about what is going on with the example you shared, I was afraid that this is a regression, and then I found it happened on main as well. I agreed that although moving I don't have a good idea of a quick fix for it yet, I'm thinking of adding another attribute(or an equivalent machnism) in the |
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…ction and GroupingFunction (elastic#115814) * implicit casting from string literals to datetime intervals (cherry picked from commit b37a829)
…ction and GroupingFunction (elastic#115814) * implicit casting from string literals to datetime intervals
…ction and GroupingFunction (elastic#115814) * implicit casting from string literals to datetime intervals
Resolves #115352
Previously, we support
ImplicitCasting
from string literals toDateTime
,IP
,Version
andBoolean
forEsqlScalarFunction
EsqlArithmeticOperation
andBinaryComparison
. This PR expands the scope ofImplicitCasting
from string literals toTemporal
types -DATE_PERIOD
andTIME_DURATION
in the following scenarios:GroupingFunction
-Bucket
takesTemporal
as inputEsqlScalarFunction
-Date_Trunc
andNeg
takeTemporal
as inputThis also enables named parameters for
DATE_PERIOD
andTIME_DURATION
without having to specify their types the either the query or the params.Here are some examples:
The support to
GroupingFunction
-Bucket
is newly added.ImplicitCasting
is moved beforeResolveRefs
, because a reference(UnresolvedAttribute) is created for aBucket
inAggregate
, resolving this reference beforeImplicitCasting
may cause this reference to havecustomMessage=true
, and it prevents further attempts to resolve this reference. IfBucket
's type is resolved byImplicitCasting
beforeResolveRefs
,ResolveRefs
can resolve the reference, and it won't be marked ascustomMessage=true
.EsqlFunctionRegistry
is updated to supportTemporal
as target data types.EsqlDataTypeConverter
is refactored to support theImplicitCasting
toTemporal
. There are twoTemporal
types -DATE_PERIOD
andTIME_DURATION
, when users provide a string without data type, we have to derive the correct type(whether it is a period or duration) according to its qualifier. The existing methods inEsqlDataTypeConverter
couldn't do it.parseTemporalAmount(Expression e)
is created for this purpose.Rate
is anAggregateFunction
under snapshot, it also takesTemporal
as input, however it is not covered by this PR. This PR doesn't enableImplicitCasting
forAggregateFunction
, it will be evaluated when there is a need.Implicit casting from string literals to
Temporal
is not supported in arithmetic operations(+
or-
) to reduce confusions.Updated implicit casting docs. Reorganized the implicit casting table with these changes below:
BinaryComparison
,ArithmeticOperation
andInListPredicate
intoOperator
.GroupingFunction
.GEO_POINT
andGEO_SHAPE
, as there are quite a few geo spatial functions are notEsqlScalarFunction
s, and we don't support the implicit casting for those functions yet, so these types are not fully supported by implicit casting yet. If they are required, they can be added later.DATE_PERIOD
andTIME_DURATION
ES|QL: document date_period and time_duration data types #112592, that I didn't forget, this document will be added in a separate PR.