Skip to content

Conversation

NatElkins
Copy link

@NatElkins NatElkins commented Jul 11, 2025

Summary

This PR adds a DuckDB driver for Apache Arrow ADBC. The driver enables DuckDB.NET to work with Apache Arrow format data through the ADBC interface.

Implementation

The driver provides full ADBC support including:

  • Connection management - Opening/closing DuckDB connections
  • Query execution - Running SQL queries and returning Arrow format results
  • Metadata operations - Schema discovery using information_schema
  • Prepared statements - Statement preparation and reuse
  • Parameter binding - Bind Arrow arrays/batches as query parameters
  • Data type conversion - Mapping between DuckDB and Arrow types

Technical Details

Since DuckDB deprecated their Arrow C API, the driver converts DuckDB query results to Arrow format using the Apache Arrow C# library. Parameter binding works by converting Arrow values to DuckDB parameters for each row.

Testing

The driver includes comprehensive tests covering all supported functionality.

- Created DuckDB driver structure following ADBC patterns
- Implemented connection, statement, and database classes
- Added Arrow data conversion from DuckDB to Arrow format
- Added metadata operations (GetInfo, GetObjects, GetTableSchema)
- Created test project with basic tests
- Fixed various compilation issues
@NatElkins NatElkins changed the title DuckDB ADBC driver implementation C# DuckDB ADBC driver implementation Jul 11, 2025
@NatElkins NatElkins changed the title C# DuckDB ADBC driver implementation feat(csharp/drivers): add DuckDB ADBC driver implementation Jul 11, 2025
@NatElkins NatElkins changed the title feat(csharp/drivers): add DuckDB ADBC driver implementation feat(csharp/src/Drivers): add DuckDB ADBC driver implementation Jul 11, 2025
…ding

- Add complete DuckDB ADBC driver implementation
- Support query execution and result streaming
- Implement parameter binding for prepared statements
- Add Arrow data type conversions
- Include transaction and metadata support
- Fix Apache Arrow API compatibility issues
- Create simplified GetObjects implementation
- Driver builds successfully for .NET 8.0
- All existing DuckDB tests pass (12/13)

Still TODO:
- Fix compilation for netstandard2.0 target
- Complete GetObjects implementation
- Fix parameter binding test compilation errors
- Add conditional compilation for DateOnly/TimeOnly
- Fix DateOnly/TimeOnly compatibility for netstandard2.0
- Fix dictionary conversion for netstandard2.0
- Fix async using statements in tests
- Add decimal to double conversion for numeric literals
- Fix binary data conversion to handle Stream types
- Improve parameter binding error messages
- Fix null reference warnings in tests

The driver now builds successfully for all target frameworks. Some tests
are still failing due to parameter binding and type conversion issues
that need further investigation.
Fixed binary data handling where DuckDB returns UnmanagedMemoryStream for BLOB types. The stream needs to be read and converted to byte array before passing to Arrow converters.

- Added proper handling for BLOB data in DuckDBArrowStream
- Check field type and read Stream data into byte arrays for BinaryType/FixedSizeBinaryType fields
- All 20 DuckDB ADBC tests now pass
@CurtHagenlocher
Copy link
Contributor

CurtHagenlocher commented Jul 11, 2025

Thanks, @NatElkins! I haven't seen any indication that DuckDB plans on dropping ADBC support from their existing code (although it currently supports only the 1.0.0 spec). Do you have a reference for this?

- Fixed CS8602 null reference warnings by adding null-forgiving operator where appropriate
- Added XML documentation comments to all public members
- Suppressed System.Diagnostics.DiagnosticSource warning for net6.0 target

Build now completes with 0 warnings.
Changed target frameworks from 'net8.0;net6.0;netstandard2.0' to 'netstandard2.0;net472;net8.0' to follow the same pattern as the majority of other ADBC drivers. This also eliminates the need for SuppressTfmSupportBuildWarnings since we're no longer targeting net6.0.
@NatElkins
Copy link
Author

@CurtHagenlocher I am still very much in the messing around stages. However the current Arrow APIs in the main DuckDB repo have been marked as deprecated, and apparently the new way forward is to use a new community extension. See this blog post.

I just whipped something up to play around with it for now, but I'll probably work to use the Arrow IPC integration for this driver. Might require some changes in DuckDB.NET, not sure yet.

I'm only going down this route because it wasn't clear to me how I could use a C# ADBC connector with DuckDB today. If I'm missing something though please let me know!

@CurtHagenlocher
Copy link
Contributor

You can use the DuckDB driver directly through the C API. Look in https://github.com/apache/arrow-adbc/tree/main/csharp/test/Apache.Arrow.Adbc.Tests for some tests using DuckDB.

Implemented support for DuckDB's new Arrow IPC format using the nanoarrow extension.
This provides a more performant path for data transfer when enabled.

Features:
- Added DuckDBArrowIpcStream that uses to_arrow_ipc() function
- Added UseArrowIpc option (default: false due to type mapping differences)
- Automatic fallback to row-by-row conversion if IPC is unavailable
- Tests for both IPC and fallback modes

Note: Arrow IPC is currently experimental and disabled by default because:
- Type mapping differences between IPC and row-by-row modes (e.g., Decimal64 vs Decimal128)
- Requires nanoarrow extension to be installed
- Some edge cases with null handling need further investigation

Future work:
- Add native Arrow C API support to DuckDB.NET for zero-copy transfers
- Resolve type mapping differences for full compatibility
- Enable by default once stable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants