Commit 42b7527
authored
feat: run VortexFileArrayStream on dedicated IoDispatcher (#1232)
This PR separates IO for the `LayoutBatchStream` from the calling
thread's runtime. This is beneficial for a few reasons
1. Now decoding into Arrow will not block the executor where IO occurs,
which previously could lead to timeouts
2. It allows us to use non-Tokio runtimes such as compio which uses
io_uring
To enable (2), we revamp the `VortexReadAt ` trait to return `!Send`
futures that can be launched on thread-per-core runtimes. We also
enforce the `VortexReadAt` implementers are clonable, again so they can
be shared across several concurrent futures. We implement a clonable
`TokioFile` type, and un-implement for things like `Vec<u8>` and `&[u8]`
that don't have cheap cloning.
Although read operations are now `!Send`, it is preferable
for`LayoutBatchStream` to be `Send` so that it can be polled from Tokio
or any other runtime. To achieve this, we add an `IoDispatcher`, which
sits in front of either a tokio or compio executor to dispatch read
requests.
A couple of things to consider:
* **Cancellation safety**: On Drop, the dispatcher will join its worker
threads, gracefully awaiting any already-submitted tasks before it
completes. It's probably more preferable to find a way to force an
immediate shutdown, so we may want to ferry a `shutdown_now` channel and
`select!` over both of them inside of the inner loop of the dispatcher
* **Async runtime safety**: The way you can implement cross-runtime IO
is that you provide both a reader (`VortexReadAt`) and a `IoDispatcher`
to the `LayoutBatchStreamBuilder`. There is nothing at compile time that
enforces that the reader and the dispatcher are compatible. Perhaps we
should parameterize `VortexReadAt` on a runtime and then we can do
type-level checks to ensure that all IO operations can only be paired
with the right runtime. That would likely inflate the PR a good bit1 parent 437392b commit 42b7527
File tree
36 files changed
+1174
-530
lines changed- .github/workflows
- .zed
- bench-vortex
- benches
- src
- pyvortex
- src
- vortex-array/src
- vortex-serde
- src
- chunked_reader
- file
- read
- builder
- layouts
- io
- dispatcher
- messages
- stream_writer
36 files changed
+1174
-530
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | | - | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | | - | |
| 12 | + | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
143 | | - | |
| 143 | + | |
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
185 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
0 commit comments