2023-08-24 kernel meeting notes #27
zachschuermann
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
summary
main java threads:
main rust threads:
attendees
@tdas @wjones127 @nicklan @ryan-johnson-databricks @schuermannator
notes stream
question from td for will: async
object storage layer doesn't even offer a sync API
nick:
block_on
is dangerous to use. wrong thread can cause it to panicblocks the current thread
if we want a truly small/clean/sync API, doing block_on isn't the right call. instead just disallow async
many rust projects just build an async API then throw a sync API on top (postgres etc. i think does this - zach)
other way is very hard. making sync API then do async
we want to be able to issue reads and then wait for the first (or prefetching etc)
hypothesis - there might not be a shim, more like parallel implementations (pure logic shared sans IO)
--> is this @wjones127 PR? yes! to review
more code than ideal but a decent way to appease delta-rs and FFI
consuming the iterator that produces addfiles
ryan: still - what do we gain from async? pragmatically.
in delta-kernel prototype from ryan: we had a thread-friendly version of the kernel where it tracks the queue of work items that 'need doing' (i'm going to need parquet file read, i'm going to need json parsed - no IO) helper thread picks from queue and does it. blocking from kernel perspective but not actually externally.
zach observation: can we encode the above in async rust? is it easier to do with queues? tradeoffs?
will: thought he could reduce the duplicate code more than he did.
would like to explore: async API in rust then FFI does sync wrapper around that (which requires an executor - no matter how simple)
same concern: do we want to depend on an executor (even lightweight one like block_on or smol)
nick: thing it buys: we get to use async crates.
does it buy us that much if we need to build sync version and async version?
idea:
if you do async io, you can provide an engine in the tableclient?
will question: how to wrap in C++? do C++17 at least? build system? duckDB does CMake for extensions.
care about C++ std for headers, library deps, ABI
zach question: strawman: start with sync
summarize: driving questions
Stream
s in the interfaceBeta Was this translation helpful? Give feedback.
All reactions