- 
                Notifications
    
You must be signed in to change notification settings  - Fork 133
 
Closed
Description
Description
Background
Users of datafusion-python can create table providers from at least two different pathways:
- A table provider created via PyCapsule (i.e. custom providers implemented in Rust or elsewhere, exposed to Python via 
__datafusion_table_provider__). - A view-based provider created via 
into_view()on an existingDataFrame, then registered with aSessionContext. 
These two types serve similar roles (they supply data / logical plans to DataFusion), but currently they behave differently, which can lead to confusion.
Problem / Confusion
- A user might reasonably expect that a table provider object created with 
into_view()could be registered with a session context in the same way as a PyCapsule‐exposed provider, but that may not always work (or may not be documented). - There is risk of mismatch in how the internals treat providers from the two sources (views vs PyCapsules).
 - Without a unified type or interface, it’s unclear whether certain operations should/can be supported for both.
 - The divergence might cause unexpected errors or surprising behavior for the user, especially around registration, reuse, or compatibility of providers.
 
Desired Behavior / Suggestion
- 
Define a single common
PyTableProvider(or similarly named abstraction) that works identically whether created viainto_view()or via a PyCapsule / external source. - 
Ensure that the
SessionContext.register_table_provider(...)accepts this common type regardless of source. - 
Document clearly:
- what kinds of table providers are accepted (views, PyCapsules, external)
 - how to obtain them from each path
 - equivalence or limitations (if any)
 
 - 
Possibly enhance the implementation so that a view-based provider can be converted (or wrapped) into the same internal abstraction that a PyCapsule provider uses.
 
Benefits
- Reduced confusion for users.
 - More consistency in the API.
 - Easier to reason about table providers across different parts of a codebase.
 - Potential fewer bugs when mixing providers from different sources.
 
Context
This issue is motivated by this comment
#1016 (comment)
Metadata
Metadata
Assignees
Labels
No labels