-
Notifications
You must be signed in to change notification settings - Fork 23
Blog post on query cancellation #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
e6a6eb5
198d12c
667f570
173c07c
91be0df
0c5333d
bc5ca6c
6531630
fb69d39
5369558
4190c7e
ab0094c
6917588
c9cf1cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,328 @@ | ||||||||
| # Query Cancellation | ||||||||
|
|
||||||||
| ## The Challenge of Cancelling Long-Running Queries | ||||||||
|
|
||||||||
| Have you ever tried to cancel a query that just wouldn't stop? | ||||||||
| In this post, we'll take a look at why that can happen in DataFusion and what the community did to resolve the problem in depth. | ||||||||
|
|
||||||||
| ### Understanding Rust's Async Model | ||||||||
|
|
||||||||
| To really understand the cancellation problem you need to be somewhat familiar with Rust's asynchronous programming model. | ||||||||
|
||||||||
| To really understand the cancellation problem you need to be somewhat familiar with Rust's asynchronous programming model. | |
| DataFusion, somewhat unconventionally, [uses the Rust async system and the tokio task pool](https://docs.rs/datafusion/latest/datafusion/#thread-scheduling-cpu--io-thread-pools-and-tokio-runtimes) for CPU intensive processing. | |
| To really understand the the cancellation problem you need to understand the DataFusion execution model which and thus with Rust's asynchronous programming model. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
alamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for docs on poll_proceed and made_progress and I'm wondering if those functions have been renamed to has_budget_remaining and consume_budget recently? https://docs.rs/tokio/latest/tokio/task/coop/index.html#functions
Or, am I looking in the wrong place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poll_proceed/made_progress are the functions Tokio itself uses to consume budget in resources like Channel. consume_budget is implemented using those two as well.
At the moment they're still pub(crate) in tokio. I'm working on a PR to make them accessible at tokio-rs/tokio#7405. I wrote this post under the assumption that this will get merged. We shouldn't publish this post until that actually lands.
The DataFusion PR apache/datafusion#16398 currently contains three variants: one with the manual counter, one using has_budget_remaining/consume_budget as approximation, and one using poll_proceed/made_progress which doesn't compile just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also help here in the intro to emphasize other learnings a potential reader might get (to convince them to read more):
So manybe something like