Skip to content

Conversation

Dan-J-D
Copy link

@Dan-J-D Dan-J-D commented Oct 3, 2025

Refactor open_table_with_storage_options to remove 'aws.' prefixes from options.

Description

Fixes issue where all s3 options are prefixed with 'aws.' and it not using the setting from OPTIONS correctly.

Ex.

ctx.sql("
    CREATE EXTERNAL TABLE deltatable
    STORED AS DELTA
    LOCATION 's3://mybucket/delta-table'
    OPTIONS (
        'aws.ALLOW_HTTP' 'true',
        'aws.ENDPOINT_URL' 'http://localhost:8333',
        'aws.ACCESS_KEY_ID' 'user1',
        'aws.SECRET_ACCESS_KEY' 'secret1'
    )",
)

Related Issue(s)

None that i'm aware of.

Documentation

https://datafusion.apache.org/user-guide/cli/datasources.html#s3

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Oct 3, 2025
@github-actions
Copy link

github-actions bot commented Oct 3, 2025

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@Dan-J-D Dan-J-D changed the title Refactor storage options handling in open_table fix: storage options handling in open_table Oct 3, 2025
@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.08%. Comparing base (c5eb4c2) to head (759a901).

Files with missing lines Patch % Lines
crates/core/src/delta_datafusion/mod.rs 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3807       +/-   ##
===========================================
+ Coverage   18.79%   76.08%   +57.29%     
===========================================
  Files          74      145       +71     
  Lines       12036    45327    +33291     
  Branches    12036    45327    +33291     
===========================================
+ Hits         2262    34487    +32225     
+ Misses       9588     9148      -440     
- Partials      186     1692     +1506     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Refactor open_table_with_storage_options to remove 'aws.' prefixes from options.

Signed-off-by: Dan-J-D <[email protected]>
Signed-off-by: Dan-J-D <[email protected]>
This reverts commit 422916f.

Signed-off-by: Dan-J-D <[email protected]>
This reverts commit 4794f74.

Signed-off-by: Dan-J-D <[email protected]>
@rtyler
Copy link
Member

rtyler commented Oct 5, 2025

I'm a bit confused here, why can't the user just call OPTIONS with something like ALLOW_HTTP instead of putting the aws. prefix there?

@Dan-J-D
Copy link
Author

Dan-J-D commented Oct 5, 2025

I'm a bit confused here, why can't the user just call OPTIONS with something like ALLOW_HTTP instead of putting the aws. prefix there?

When you just put ALLOW_HTTP, it prefixes it with format. and lowercases it. So the output options value would be format.allow_http, and just to follow the spec of ballista docs. I thought it would make more sense to prefix with aws. and then remove the prefix when opening the delta table.

@roeap
Copy link
Collaborator

roeap commented Oct 7, 2025

Hey @Dan-J-D - thanks for opening this!

As I am right now working on the datafusion table provider I noticed that we may need to re-think a bit how we integrate with datafusion's resources, specifically object stores. Recently I opened #3810 specifically for the TableProviderfactory.

I.e. looking through databricks and datafusion documentation for the CRATE TABLE commmand i think this property bag might not be the way we want to integrate credentials. Rather we need to make the table provider work with the datafusion sessions and have it leverage whatever stores are available on that.

Right now there is unfortunately no good way to achieve what we are trying to do here and it is definitely a more that valid use case!

There is more ... the DeltaTable which is returned by the open_table* functions currently implements TableProvider but should eventually not. The main reason being that it is a lazy structure that might not have the required state loaded to provide all info needed by a table provider.

There are some good news - once we leverage the datafusion session to get resources and do some smaller tweaks on the factory, things should just start to work as intended 🤞.

@Dan-J-D
Copy link
Author

Dan-J-D commented Oct 7, 2025

Hey @Dan-J-D - thanks for opening this!

As I am right now working on the datafusion table provider I noticed that we may need to re-think a bit how we integrate with datafusion's resources, specifically object stores. Recently I opened #3810 specifically for the TableProviderfactory.

I.e. looking through databricks and datafusion documentation for the CRATE TABLE commmand i think this property bag might not be the way we want to integrate credentials. Rather we need to make the table provider work with the datafusion sessions and have it leverage whatever stores are available on that.

Right now there is unfortunately no good way to achieve what we are trying to do here and it is definitely a more that valid use case!

There is more ... the DeltaTable which is returned by the open_table* functions currently implements TableProvider but should eventually not. The main reason being that it is a lazy structure that might not have the required state loaded to provide all info needed by a table provider.

There are some good news - once we leverage the datafusion session to get resources and do some smaller tweaks on the factory, things should just start to work as intended 🤞.

Yea sounds good, I've been expanding it in my local project. Are yall planning on supporting datafusion-ballista? I have been spending sometime to get it working together properly.

@roeap
Copy link
Collaborator

roeap commented Oct 18, 2025

Are yall planning on supporting datafusion-ballista?

Some work has been done in the past to support this. IIRC, it's mainly the codecs required to (de-)serialize our scans as datafusion protos.

The hope is that a great integration with datafusion is enough to also support ballista. (which might not be totally true). Right now though - unless someone from the community steps up - there is not much capacity to make this work a priority.

We would certainly do our best though to address any (specific / scoped) issues raised on this repo that help with that integration.

@Dan-J-D
Copy link
Author

Dan-J-D commented Oct 18, 2025

We would certainly do our best though to address any (specific / scoped) issues raised on this repo that help with that integration.

Alright, sounds good! I'll close t his PR seeing as there needs to be a bigger rewrite.

@Dan-J-D Dan-J-D closed this Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants