-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[DON'T MERGE] Proof of Concept: ES|QL approximate query execution #131828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4345891
to
ee5caf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a shallow check. To me, it makes sense. I would wait for somebody else to have another opinion though, in case this extra query could lead to something bad somewhere
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/approximate/Approximate.java
Outdated
Show resolved
Hide resolved
* off at the leftmost STATS function, followed by "| STATS COUNT(*)". | ||
* This value can be used to pick a good sample probability. | ||
*/ | ||
public LogicalPlan countPlan() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra query is probably my major "concern". It looks ok, but it's still going to execute evals and wheres, which could end up executing a full query anyway (?). It looks a bit "dangerous".
As an idea, I wonder if we could use some kind of Lucene statistics for this. I don't know if we have them though, or if what we have is enough. Even if they were just approximates, they could let us avoid this extra query, maybe. This would be another block of work though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your concern. That's exactly why I wanted some early feedback.
The extra query is pretty similar to the extra query of the inline join subplan though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of
FROM data | STATS COUNT()
I guess we can get the count directly from Lucene.
But for a more complicated
FROM data | WHERE my_function(x) < 1 | STATS COUNT()
that's obv not possible.
We can use sampling again though to get an approximate count, which is good enough for setting the probability.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/approximate/Approximate.java
Show resolved
Hide resolved
319d98d
to
36a55ec
Compare
36a55ec
to
893b0f8
Compare
b00550a
to
39eb164
Compare
380e7ac
to
e47f0db
Compare
9dd8579
to
790ade4
Compare
Proof of concept for approximate query execution
This is for gathering early feedback; not for merging!
This is targeting queries of the form
Approximating rewrites it to
The sample probability is such that the approximated results are based on ~1000 docs. It's determined via the total result count:
You can use this as follows
With
"approximate": false
, the (correct) results are:(based on
"documents_found": 4675
)With
"approximate": true
, the (approxmiate) results are like:(based on
"documents_found": 990
)