Skip to content

Conversation

@astefan
Copy link
Contributor

@astefan astefan commented Dec 6, 2023

Adds Analyzer level default limit for certain queries using mv_expand.
Fixes #101266

Fixes other small issues with different types of queries using mv_expand.
Fixes #102084
Fixes #102061

@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Dec 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Hi @astefan, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

*
* from test
* | sort emp_no
* | limit 10000
Copy link
Contributor

@luigidellaquila luigidellaquila Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this doesn't work: adding a limit there changes the query semantics by filtering out candidate records too early.
In this specific case, the | sort emp_no will be overridden by | sort first_name, so it can be completely removed and the query will become

from test
| mv_expand first_name
| sort first_name
| limit 15

In general, the problem is not that we cannot push LIMIT past MV_EXPAND; the following transformation is absolutelly valid:
| mv_expand x | limit 10 -> | limit 10 | mv_expand x | limit 10
and also the default limit is valid as long as it's higher than the original limit:
| mv_expand x | limit 10 -> | limit 10000 | mv_expand x | limit 10

The real problem here is that we cannot completely remove the limit 10 after mv_expand, so we end up trying to push it down it again in an infinite loop.
My biggest concern is that even if we find a way to avoid it now, we'll eventually add new rules to the planner and we'll hardly take this infinite loop into account, so it will eventually happen again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this doesn't work: adding a limit there changes the query semantics by filtering out candidate records too early.

Agree with you, but as the planner works now - where a TopN is needed - a limit is required at that specific place in the query. As I mentioned in the code comments for now I decided to use the implicit limit.

In this specific case, the | sort emp_no will be overridden by | sort first_name

What do you mean "overriden"? The sorts do not change. If you meant "the sort first_name anyway changes the order of the final documents, so it makes sort emp_no obsolete" then this is not true. sort emp_no followed by mv_expand and then sort first_name has to happen in this specific order for things to make sense.

cannot push LIMIT past MV_EXPAN

I think this is a misunderstanding on your part. The code in the optimizer doesn't push limit, but it makes a copy of limit and pushes the copy. The limit after mv_expand doesn't disappear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "overriden"?

I mean that the following operations do not depend on the input order (none of the existing commands do, apart from LIMIT), so the fact that you sort records by emp_no doesn't make any difference: data will be re-sorted by first_name afterwards anyway

The code in the optimizer doesn't push limit, but it makes a copy of limit and pushes the copy

conceptually there is no big difference, you are adding a new limit before mv_expand (that can be considered a pushdown) and you are also preserving the original limit.

@wchaparro wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@costin
Copy link
Member

costin commented Jan 3, 2024

I believe this is superseded by #103582 - if so, let's close this one.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

7 participants