Rust InterOp Architecture Decision: The role of delta-kernel and/or delta-rs in delta-dotnet
#79
Replies: 2 comments 2 replies
-
|
Kernel's ffi types are the future. I haven't spent as much time as I should have investigating the kernel code because write support is a current limiter. I've helped with some python adapters, but that's it. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for typing this up so neatly. I think Delta Kernel will support all the table features. I'm not sure if Delta Kernel will ever help out with an operation like Z Ordering: dt = DeltaTable("tmp")
dt.optimize.z_order([country])So perhaps Delta Kernel will be useful for all the table features, but delta-rs will still be useful for Z Ordering-type operations. Or perhaps we can just use Delta Kernel and then implement Z Ordering natively in C#. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey @mightyshazam,
Jotting down some updates, hoping to get your thoughts on some options + questions below.
✏️Updates
I spent a lazy Friday reading through the
Bridgeimplementation. First of all, really great work, I especially love the DataFusion support we inherit fromdelta-rs, running SQL on top of Delta via C# is something I never thought I'd see 🙂.The Tokio runtime passing the Cancellation Token from C# to Rust is super smooth, I love how everything is async through and through, and the unsafe code is neatly tucked away in the
Bridge.It's obvious you've put a lot of effort and thoughtfulness into this project, I really appreciate it, it's fantastic for the Dotnet community.
As @MrPowers mentioned earlier, before I knew about
delta-dotnet- I've been dabbling in the work thedelta-kernelfolks have done, and used a very similar approach as you to convert the FFI into C# via ClangSharp. I spent a weekend throwing some working C# code together against thedelta-kernelFFI where you can convert Delta into a Arrow Table - but it's not nearly close to the amount of thoughtfulness or effort you've put into theBridge,Runtimeetc - and the underlying Rust code.My observation is, today -
delta-rsis a lot more feature-rich, compared todelta-kernelat this point (e.g. SQL support via DataFusion). I also understand that delta-rs has slowly started converting delta-kernel too.Given that
delta-dotnetalready works just fine for read/write, I'm wondering what your vision is for the project.There's 2 high level options I'm seeing.
Option 1:
delta-dotnetkeeps dep ondelta-rs, but no dep ondelta-kernelKeep the existing dep on
delta-rsfor foreseeable future.As
delta-rsadoptsdelta-kernel, the Bridge evolves it's FFI in lockstep withdelta-rsand enjoys newdelta-kernelfeatures anyway.Use the saved effort to build more handy features on top of the Bridge that's C# native, e.g. expose a DataFrame C# API (works against Arrow) - via
DataFrame.FromArrowRecordBatch.✅Pros:
delta-dotnet, enjoy asdelta-rsfolks do most of the hard work 🙂delta-dotnetimplementation is more or less feature complete for most use cases, just keep in cruise control, new features, etc.delta-rs- being Rust - is performant as is viaP/Invokeand hopefully supports multiple writers via partitioning (presumably, I haven't benchmarked yet, didn't see adelta-dotnetunit test.)TOMLfile you enable that feature, so hopefully there's a method theBridgeexposes that takes in params to connect to each cloud storages)❌ Cons:
delta-dotnetarchitecture tied todelta-rsfor good, rather than being tied todelta-kernel(more flexible).Option 2: interface out
delta-dotnet's dep ondelta-rs, and slowly migrate todelta-kernelas it maturesStep 0 - add
Interfacesto everything underTable/*.cs, and slowly migrate todelta-kernel, starting with Arrow-based read support which already works. Mark unsupported methods withNotSupportedExceptionor something similar, until Kernel supports it.Step 1 - When
delta-kerneladds write-support, adopt that.Step 2 - Phase out dependency on
delta-rs? Of course, we'd need the same SQL capabilities DataFusion adds to not break users. maybe the Kernel could add support for something similar via scan.✅Pros:
delta-kernel❌ Cons:
delta-rsParquet -> Arrow reader is rapid already.delta-kernelsupport, csproj wrappers, build pipelines, maintenance etc.DataFusionAKA SQL support.Would love to hear your thoughts on which Option you prefer, and if there's a 3rd+ logical Option.
I'm personally happy to support implementing both options - specially Option 2, since I have a bunch of context on the Kernel from digging into it - but, only if there's a clear-value add in this new architecture.
Option 1 doesn't really need "new architecture work" (you've done it already) - but we could focus on pushing the limits of
delta-dotnetand enabling new use cases, like beefing up the examples folder for Azure Blob etc, usingTokenCredential, patterns forNparallel threads writing to same table againstNpartitions, samples on how to deserialize Event Hubs/Kafka partitions (basically KDI in C#), ensure things work in a container without weird problems, adding aDataFrameAPI to extendTable, and so on.Beta Was this translation helpful? Give feedback.
All reactions