Skip to content

Commit 42b7527

Browse files
authored
feat: run VortexFileArrayStream on dedicated IoDispatcher (#1232)
This PR separates IO for the `LayoutBatchStream` from the calling thread's runtime. This is beneficial for a few reasons 1. Now decoding into Arrow will not block the executor where IO occurs, which previously could lead to timeouts 2. It allows us to use non-Tokio runtimes such as compio which uses io_uring To enable (2), we revamp the `VortexReadAt ` trait to return `!Send` futures that can be launched on thread-per-core runtimes. We also enforce the `VortexReadAt` implementers are clonable, again so they can be shared across several concurrent futures. We implement a clonable `TokioFile` type, and un-implement for things like `Vec<u8>` and `&[u8]` that don't have cheap cloning. Although read operations are now `!Send`, it is preferable for`LayoutBatchStream` to be `Send` so that it can be polled from Tokio or any other runtime. To achieve this, we add an `IoDispatcher`, which sits in front of either a tokio or compio executor to dispatch read requests. A couple of things to consider: * **Cancellation safety**: On Drop, the dispatcher will join its worker threads, gracefully awaiting any already-submitted tasks before it completes. It's probably more preferable to find a way to force an immediate shutdown, so we may want to ferry a `shutdown_now` channel and `select!` over both of them inside of the inner loop of the dispatcher * **Async runtime safety**: The way you can implement cross-runtime IO is that you provide both a reader (`VortexReadAt`) and a `IoDispatcher` to the `LayoutBatchStreamBuilder`. There is nothing at compile time that enforces that the reader and the dispatcher are compatible. Perhaps we should parameterize `VortexReadAt` on a runtime and then we can do type-level checks to ensure that all IO operations can only be paired with the right runtime. That would likely inflate the PR a good bit
1 parent 437392b commit 42b7527

File tree

36 files changed

+1174
-530
lines changed

36 files changed

+1174
-530
lines changed

.github/workflows/ci.yml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@ name: CI
22

33
on:
44
push:
5-
branches: [ "develop" ]
6-
pull_request: { }
7-
workflow_dispatch: { }
5+
branches: ["develop"]
6+
pull_request: {}
7+
workflow_dispatch: {}
88

99
permissions:
1010
actions: read
1111
contents: read
12-
checks: write # audit-check creates checks
13-
issues: write # audit-check creates issues
12+
checks: write # audit-check creates checks
13+
issues: write # audit-check creates issues
1414

1515
env:
1616
CARGO_TERM_COLOR: always
@@ -140,7 +140,7 @@ jobs:
140140
name: "miri"
141141
runs-on: ubuntu-latest
142142
env:
143-
MIRIFLAGS: -Zmiri-strict-provenance -Zmiri-symbolic-alignment-check -Zmiri-backtrace=full
143+
MIRIFLAGS: -Zmiri-strict-provenance -Zmiri-symbolic-alignment-check -Zmiri-backtrace=full -Zmiri-disable-isolation
144144
steps:
145145
- uses: rui314/setup-mold@v1
146146
- uses: actions/checkout@v4
@@ -182,4 +182,3 @@ jobs:
182182
- name: "Make sure no files changed after regenerating"
183183
run: |
184184
test -z "$(git status --porcelain)"
185-

.zed/tasks.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// Static tasks configuration.
2+
//
3+
// Example:
4+
[
5+
{
6+
"label": "clippy (workspace)",
7+
"command": "cargo clippy --all-targets --all-features",
8+
"shell": "system"
9+
}
10+
]

0 commit comments

Comments
 (0)