Skip to content

Conversation

fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Oct 29, 2024

Resolves #115352

Previously, we support ImplicitCasting from string literals to DateTime, IP, Version and Boolean for EsqlScalarFunction EsqlArithmeticOperation and BinaryComparison. This PR expands the scope of ImplicitCasting from string literals to Temporal types - DATE_PERIOD and TIME_DURATION in the following scenarios:

GroupingFunction - Bucket takes Temporal as input
EsqlScalarFunction - Date_Trunc and Neg take Temporal as input

This also enables named parameters for DATE_PERIOD and TIME_DURATION without having to specify their types the either the query or the params.

Here are some examples:

FROM sample_data
| STATS min = min(@timestamp) BY bucket = BUCKET(@timestamp, "30 minutes")

FROM sample_data
| EVAL t = DATE_TRUNC("1 day", @timestamp)

{
  "query": "from sample_data | stats min = min(@timestamp) BY bucket = BUCKET(@timestamp, ?interval) | SORT min",
  "params": [{"interval": "1 minute"}]
}

{
  "query": "from sample_data | eval x = date_trunc(?interval, @timestamp)",
  "params": [{"interval": "1 minute"}]
}
  1. The support to GroupingFunction - Bucket is newly added. ImplicitCasting is moved before ResolveRefs, because a reference(UnresolvedAttribute) is created for a Bucket in Aggregate, resolving this reference before ImplicitCasting may cause this reference to have customMessage=true, and it prevents further attempts to resolve this reference. If Bucket's type is resolved by ImplicitCasting before ResolveRefs, ResolveRefs can resolve the reference, and it won't be marked as customMessage=true.

  2. EsqlFunctionRegistry is updated to support Temporal as target data types.

  3. EsqlDataTypeConverter is refactored to support the ImplicitCasting to Temporal. There are two Temporal types - DATE_PERIOD and TIME_DURATION, when users provide a string without data type, we have to derive the correct type(whether it is a period or duration) according to its qualifier. The existing methods in EsqlDataTypeConverter couldn't do it. parseTemporalAmount(Expression e) is created for this purpose.

  4. Rate is an AggregateFunction under snapshot, it also takes Temporal as input, however it is not covered by this PR. This PR doesn't enable ImplicitCasting for AggregateFunction, it will be evaluated when there is a need.

  5. Implicit casting from string literals to Temporal is not supported in arithmetic operations(+ or -) to reduce confusions.

  6. Updated implicit casting docs. Reorganized the implicit casting table with these changes below:

  • Combined BinaryComparison, ArithmeticOperation and InListPredicate into Operator.
  • Added GroupingFunction.
  • Removed numeric types to reduce confusion, as we don't want to promote string literal to numeric casting automatically.
  • Removed GEO_POINT and GEO_SHAPE, as there are quite a few geo spatial functions are not EsqlScalarFunctions, and we don't support the implicit casting for those functions yet, so these types are not fully supported by implicit casting yet. If they are required, they can be added later.
  1. I owe a document about the usage of DATE_PERIOD and TIME_DURATION ES|QL: document date_period and time_duration data types #112592, that I didn't forget, this document will be added in a separate PR.

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql
Copy link
Member Author

@elasticmachine update branch

@fang-xing-esql fang-xing-esql added auto-backport Automatically create backport pull requests when merged ES|QL-ui Impacts ES|QL UI labels Oct 30, 2024
@fang-xing-esql fang-xing-esql marked this pull request as ready for review October 30, 2024 21:44
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 30, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@stratoula
Copy link

Aren't you the greatest? ❤️

@fang-xing-esql
Copy link
Member Author

@elasticmachine update branch

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some first comments.

Generally, IMO this functionality can be interesting, but starting on this path, I'm thinking "BUCKET(datetime, "1 day") now works, but how about BUCKET(float, "100")" or "EVAL x = "1953-09-02T00:00:00.000Z" + "2 days", but how about EVAL x = "1" + 2"?"
And if we'd consider autocasting for numbers, we might need to do it for all functions. IPs?
I guess it's all doable, just wondering if we commit to it now.

@fang-xing-esql
Copy link
Member Author

Thanks for reviewing @bpintea @astefan !! Your comments have been addressed, and doc changes are also included in this PR. If you could take another look when you get a chance, it will be very appreciated. Thank you!

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I've left few comments. The one that I'd like you to have a careful look at is the one of getTargetType method that returns an UNSUPPORTED data type.

* resolving this reference before implicit casting may cause this reference to have customMessage=true, it prevents further
* attempts to resolve this reference.
*/
new ImplicitCasting(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union types was/is tricky, too much back and forth and too many places where things are handled carefully. If @craigtaverner and @alex-spies are ok with this change, it's ok with me as well; they know best the union types area.

// If the function takes strings as input, there is no need to cast a string literal to it.
// Return UNSUPPORTED, so that ImplicitCasting doesn't process it.
if (isString(type)) {
return UNSUPPORTED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange way of saying that there shouldn't be an automatic conversion/casting for a specific data type.

Returning here a data type UNSUPPORTED can have double meaning:

  • the actual target type is UNSUPPORTED
  • the conversion shouldn't take place

I know this code existed before this PR, it might work and get the job done, but it's a confusing code; it is not clearly expressing the intention.

If this method - getTargetType - would have lived inside Analyzer and be private I would have been ok with it, but it's a public method, it can be accessed from anywhere in code and this means that this code duality I mentioned above can be confusing to anyone else wanting to re-use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange way of saying that there shouldn't be an automatic conversion/casting for a specific data type.

Returning here a data type UNSUPPORTED can have double meaning:

  • the actual target type is UNSUPPORTED
  • the conversion shouldn't take place

Understood, UNSUPPORTED is used to indicate that implicit casting is not supported for this argument here. I was thinking of using NULL, however it may lead people to interpret it as casting to a NULL. Before having a better solution, I'll add more comments here to indicate the purpose of using UNSUPPORTED here, perhaps we may want to add a flag in ArgSignature to indicate whether to do implicit casting for this argument or not, with a targetType if yes, but it adds one more attribute to ArgSignature, I'm not quite sure if it worths it or not.

If this method - getTargetType - would have lived inside Analyzer and be private I would have been ok with it, but it's a public method, it can be accessed from anywhere in code and this means that this code duality I mentioned above can be confusing to anyone else wanting to re-use it.

I can make getTargetType a private method in EsqlFunctionRegistry. Actually Analyzer doesn't call it, it is public because AbstractFunctionTestCase calls it. AbstractFunctionTestCase generates docs for automatically from annotations, however not all of the functions are registered in EsqlFunctionRegistry, like operators, so AbstractFunctionTestCase calls some EsqlFunctionRegistry methods separately to generate docs for them. I'll make it a private method here, as this target type is not externalized in the generated docs.

@fang-xing-esql
Copy link
Member Author

@craigtaverner and @alex-spies Could you help me take a look at the change regarding to moving ImplicitCasting before ResolveRefs in Analyzer from union type's perspective? ImplicitCasting was after ResolveRefs and ResolveUnionTypes before this PR.

I don't recall whether there is a specific reason that ImplicitCasting has to be after ResolveUnionTypes, promoting it before ResolveRefs is mainly for resolving Bucket, which is referenced in an aggregation function. Existing tests all passed, and I couldn't think of a scenario that we have to keep the original order.

Thank you!

@fang-xing-esql
Copy link
Member Author

LGTM, but I've left few comments. The one that I'd like you to have a careful look at is the one of getTargetType method that returns an UNSUPPORTED data type.

Thank you @astefan ! I'll wait for a second pair of eyes from union type's perspective.

@fang-xing-esql fang-xing-esql removed the request for review from costin November 8, 2024 19:14
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits.
I find the messaging around implicit casting a bit inconsistent, though if we'll want to improve this, we can do it subsequently. So:
STATS BY BUCKET(birth_date, "1 w") will work. But if I mistype, for STATS BY BUCKET(birth_date, "1w") I'll get "... second argument of [BUCKET(birth_date, \"1w\")] must be [integral, date_period or time_duration] ...", which is not actually true with autocasting.

ES_TO_TYPE = Collections.unmodifiableMap(map);
// DATETIME has different esType and typeName, add an entry in NAME_TO_TYPE with date as key
map = TYPES.stream().collect(toMap(DataType::typeName, t -> t));
map.put("date", DataType.DATETIME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not go into NAME_OR_ALIAS_TO_TYPE below? date is an alias.

Copy link
Member Author

@fang-xing-esql fang-xing-esql Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into NAME_OR_ALIAS_TO_TYPE a bit further, and it is used by inline_cast only for now. It should have date I think, so that we can support ::date, today only ::datetime is supported. I opened a separate issue to do it, as it requires additional tests, and I'd like to address it separately.

private static Expression castStringLiteralToTemporalAmount(Expression from) {
try {
TemporalAmount result = maybeParseTemporalAmount(from.fold().toString().strip());
if (result == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is null acceptable in any case here?
I find reading the flow of calls a bit "uneven": castString...() will try to maybeParse...(), which if it returns null is OK, but if it can't, it'll throw. Can maybeParse...() throw on empty value? Do we need to "maybe" parse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybeParseTemporalAmount might return a null, if the literal string does not follow the format of a temporal(there is space between integer and unit), for example "1", "1w", "" etc. It doesn't throw on empty value. The verification of a non temporal format string is deferred to resolveType() to return a proper message. Here are a few examples that the error message is from resolveType():

stats max(emp_no) by bucket(emp_no, "5") returns second argument of [bucket(emp_no, "5")] must be [numeric], found value ["5"] type [keyword]

stats max(emp_no) by bucket(hire_date, "5") returns second argument of [bucket(hire_date, "5")] must be [integral, date_period or time_duration], found value ["5"] type [keyword]

stats max(emp_no) by bucket(hire_date, "5w") returns second argument of [bucket(hire_date, "5w")] must be [integral, date_period or time_duration], found value ["5w"] type [keyword]

stats max(emp_no) by bucket(hire_date, "") returns second argument of [bucket(hire_date, "")] must be [integral, date_period or time_duration], found value [""] type [keyword]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't throw on empty value.

Yes, I was wondering if it could.

The (small, can be ignored ;) ) issue I see is that maybeParseTemporalAmount() is only invoked if isTemporalAmount(). Meaning that it has to be one.
But the failure case is not treated uniformly:

  • If it's not well formatted because there's just one token (or zero, but that's just theoretical, I think, b/c of the grammar), it returns a null.
  • If it's not well formatted because the first token isn't a number, or the second one isn't a defined unit, it'll throw.

And I'm not sure why, since also in the former case, as written, it has to be a temporal amount and it cannot be just one token (or empty).

Copy link
Member Author

@fang-xing-esql fang-xing-esql Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't throw on empty value.

Yes, I was wondering if it could.

The (small, can be ignored ;) ) issue I see is that maybeParseTemporalAmount() is only invoked if isTemporalAmount(). Meaning that it has to be one. But the failure case is not treated uniformly:

  • If it's not well formatted because there's just one token (or zero, but that's just theoretical, I think, b/c of the grammar), it returns a null.
  • If it's not well formatted because the first token isn't a number, or the second one isn't a defined unit, it'll throw.

And I'm not sure why, since also in the former case, as written, it has to be a temporal amount and it cannot be just one token (or empty).

Thanks for explaining it, now I understand the comments better! The validations of Bucket's second argument happen in two places, one in ImplicitCasting, the other in Bucket.resolveType(). Bucket is kind of unique, as its second argument - buckets's data type depends on its first argument - field's data type, and it is checked in resolveType(), ImplicitCasting doesn't check the data type of the other arguments in the same function in order to decide which data type to convert to.

If it has zero or one token - ImplicitCasting keeps it unchanged, and defer it to Bucket.resolveType() throws a proper error message depending on the data type of its first argument, the error message may suggest a temporal or not - this is just for the purpose of getting a proper error message.

If it has more than one token - ImplicitCasting validates the string, and throw error for invalid values.

@bpintea
Copy link
Contributor

bpintea commented Nov 11, 2024

I guess we'll also want to update the table here, to include grouping function(s)?

Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since ImplicitCasting is about string literals, it should not interact with union-types, which is only about FieldAttributes. As a separate concern, it might be nice to have tests that use both union-types and implicit casting in the same test, but I do not expect there to be any issues.

@fang-xing-esql
Copy link
Member Author

I guess we'll also want to update the table here, to include grouping function(s)?

Yes, this table is updated by this PR.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I didn't review the PR in full, just chiming in here on the discussion of the resolution optimizer batch.

Regarding union types and the placement of ImplicitCasting, I agree with Craig that this shouldn't affect things. In fact, I believe that, in an ideal world, their relative placement in the resolution batch shouldn't even matter.

I left a comment because depending on the order here may not be solving all edge cases. Please don't let this stop you from merging this PR - but if we end up solving this by amending the verification/resolution of STATS (so that we don't prematurely mark STATS as unresolveable in case ImplicitCasting wasn't applied before resolution), we may be able to preserve the original order of the resolution rules.

@bpintea
Copy link
Contributor

bpintea commented Nov 13, 2024

I guess we'll also want to update the table here, to include grouping function(s)?

Yes, this table is updated by this PR.

Sorry, I wasn't clear enough. The table (in this PR) mentions the following functions:
ScalarFunction | BinaryComparison | ArithmeticOperation | InListPredicate | AggregateFunction
BUCKET (a grouping, not aggregate) is also affected by the PR, but not in the list.

@fang-xing-esql
Copy link
Member Author

I guess we'll also want to update the table here, to include grouping function(s)?

Yes, this table is updated by this PR.

Sorry, I wasn't clear enough. The table (in this PR) mentions the following functions: ScalarFunction | BinaryComparison | ArithmeticOperation | InListPredicate | AggregateFunction BUCKET (a grouping, not aggregate) is also affected by the PR, but not in the list.

Oh, the table header is changed to this:
||ScalarFunction*|Operator*|<<esql-group-functions, GroupingFunction>>|<<esql-agg-functions, AggregateFunction>>

@fang-xing-esql
Copy link
Member Author

fang-xing-esql commented Nov 13, 2024

Disclaimer: I didn't review the PR in full, just chiming in here on the discussion of the resolution optimizer batch.

Regarding union types and the placement of ImplicitCasting, I agree with Craig that this shouldn't affect things. In fact, I believe that, in an ideal world, their relative placement in the resolution batch shouldn't even matter.

I left a comment because depending on the order here may not be solving all edge cases. Please don't let this stop you from merging this PR - but if we end up solving this by amending the verification/resolution of STATS (so that we don't prematurely mark STATS as unresolveable in case ImplicitCasting wasn't applied before resolution), we may be able to preserve the original order of the resolution rules.

Thanks for reviewing @alex-spies ! I have some ideas about what is going on with the example you shared, I was afraid that this is a regression, and then I found it happened on main as well. I agreed that although moving ImplicitCasting before ResolveRefs solves the problem for Bucket with a field, however it doesn't solve the problem for the other situations that involves eval like your example shows. resolveAggregate may mark a reference with a customMessage, and prevent it from being resolved further, this happens not only to Bucket.

I don't have a good idea of a quick fix for it yet, I'm thinking of adding another attribute(or an equivalent machnism) in the UnresolvedAttribute to indicate - please retry resolving this UnresolvedAttribute, other than not considering resolving it when there is a customMessage. However, considering that UnresolvedAttribute is used quite pervasively, I'm leaning toward addressing it separately. I'm going to open an issue for the example you showed if there isn't one yet.

@fang-xing-esql fang-xing-esql changed the title [ES|QL] Implicit casting string literal to intervals [ES|QL] Implicit casting string literal to intervals in EsqlScalarFunction and GroupingFunction Nov 13, 2024
@fang-xing-esql fang-xing-esql merged commit b37a829 into elastic:main Nov 13, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 115814

@fang-xing-esql
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

fang-xing-esql added a commit to fang-xing-esql/Elasticsearch that referenced this pull request Nov 13, 2024
…ction and GroupingFunction (elastic#115814)

* implicit casting from string literals to datetime intervals

(cherry picked from commit b37a829)
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Nov 13, 2024
…ction and GroupingFunction (elastic#115814)

* implicit casting from string literals to datetime intervals
elasticsearchmachine pushed a commit that referenced this pull request Nov 14, 2024
…ction and GroupingFunction (#115814) (#116782)

* implicit casting from string literals to datetime intervals

(cherry picked from commit b37a829)
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…ction and GroupingFunction (elastic#115814)

* implicit casting from string literals to datetime intervals
@fang-xing-esql fang-xing-esql deleted the implicit-casting-string-literal-to-intervals branch December 5, 2024 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ES|QL] Support implicit casting from string literals to intervals - Date_Period and Time_Duration in EsqlQueryRequest params

8 participants