Skip to content

Commit f930181

Browse files
committed
docs: update documentation to reflect removal of TableProvider and usage of Table instead
1 parent f9a3a22 commit f930181

File tree

11 files changed

+163
-251
lines changed

11 files changed

+163
-251
lines changed

docs/source/user-guide/data-sources.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,9 +152,11 @@ as Delta Lake. This will require a recent version of
152152
.. code-block:: python
153153
154154
from deltalake import DeltaTable
155+
from datafusion import Table
155156
156157
delta_table = DeltaTable("path_to_table")
157-
ctx.register_table("my_delta_table", delta_table)
158+
table = Table.from_capsule(delta_table.__datafusion_table_provider__())
159+
ctx.register_table("my_delta_table", table)
158160
df = ctx.table("my_delta_table")
159161
df.show()
160162
@@ -167,7 +169,7 @@ work with custom table providers from Python libraries such as Delta Lake.
167169
:py:meth:`~datafusion.context.SessionContext.register_table_provider` is
168170
deprecated. Use
169171
:py:meth:`~datafusion.context.SessionContext.register_table` with a
170-
:py:class:`~datafusion.TableProvider` instead.
172+
:py:class:`~datafusion.Table` instead.
171173

172174
On older versions of ``deltalake`` (prior to 0.22) you can use the
173175
`Arrow DataSet <https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html>`_

docs/source/user-guide/io/table_provider.rst

Lines changed: 19 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -46,45 +46,40 @@ A complete example can be found in the `examples folder <https://github.com/apac
4646
}
4747
}
4848
49-
Once you have this library available, you can instantiate the Rust-backed
50-
provider directly in Python and register it with the
51-
:py:meth:`~datafusion.context.SessionContext.register_table` method.
52-
Objects implementing ``__datafusion_table_provider__`` are accepted as-is, so
53-
there is no need to build a Python ``TableProvider`` wrapper just to integrate
54-
with DataFusion.
55-
56-
When you need to register a DataFusion
57-
:py:class:`~datafusion.dataframe.DataFrame`, call
58-
:py:meth:`~datafusion.dataframe.DataFrame.into_view` to obtain an in-memory
59-
view. This is equivalent to the legacy ``TableProvider.from_dataframe()``
60-
helper.
49+
Once you have this library available, you can construct a
50+
:py:class:`~datafusion.Table` in Python and register it with the
51+
``SessionContext``. Tables can be created either from the PyCapsule exposed by your
52+
Rust provider or from an existing :py:class:`~datafusion.dataframe.DataFrame`.
53+
Call the provider's ``__datafusion_table_provider__()`` method to obtain the capsule
54+
before constructing a ``Table``. The ``Table.from_view()`` helper is
55+
deprecated; instead use ``Table.from_dataframe()`` or ``DataFrame.into_view()``.
6156

6257
.. note::
6358

6459
:py:meth:`~datafusion.context.SessionContext.register_table_provider` is
6560
deprecated. Use
6661
:py:meth:`~datafusion.context.SessionContext.register_table` with the
67-
provider instance or view returned by
68-
:py:meth:`~datafusion.dataframe.DataFrame.into_view` instead.
62+
resulting :py:class:`~datafusion.Table` instead.
6963

7064
.. code-block:: python
7165
72-
from datafusion import SessionContext
66+
from datafusion import SessionContext, Table
7367
7468
ctx = SessionContext()
7569
provider = MyTableProvider()
7670
71+
capsule = provider.__datafusion_table_provider__()
72+
capsule_table = Table.from_capsule(capsule)
73+
7774
df = ctx.from_pydict({"a": [1]})
78-
view_provider = df.into_view()
75+
view_table = Table.from_dataframe(df)
76+
# or: view_table = df.into_view()
7977
80-
ctx.register_table("provider_table", provider)
81-
ctx.register_table("view_table", view_provider)
78+
ctx.register_table("capsule_table", capsule_table)
79+
ctx.register_table("view_table", view_table)
8280
83-
ctx.table("provider_table").show()
81+
ctx.table("capsule_table").show()
8482
ctx.table("view_table").show()
8583
86-
The capsule-based helpers remain available for advanced integrations that need
87-
to manipulate FFI objects explicitly. ``TableProvider.from_capsule()`` continues
88-
to wrap an ``FFI_TableProvider`` (and will stay available as a compatibility
89-
alias if it is renamed in :issue:`1`), while ``TableProvider.from_dataframe()``
90-
simply forwards to :py:meth:`DataFrame.into_view` for convenience.
84+
Both ``Table.from_capsule()`` and ``Table.from_dataframe()`` create
85+
table providers that can be registered with the SessionContext using ``register_table()``.

python/datafusion/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@
5050
from .io import read_avro, read_csv, read_json, read_parquet
5151
from .plan import ExecutionPlan, LogicalPlan
5252
from .record_batch import RecordBatch, RecordBatchStream
53-
from .table_provider import TableProvider
5453
from .user_defined import (
5554
Accumulator,
5655
AggregateUDF,
@@ -88,7 +87,6 @@
8887
"SessionContext",
8988
"Table",
9089
"TableFunction",
91-
"TableProvider",
9290
"WindowFrame",
9391
"WindowUDF",
9492
"catalog",

python/datafusion/catalog.py

Lines changed: 96 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,17 @@
2020
from __future__ import annotations
2121

2222
from abc import ABC, abstractmethod
23-
from typing import TYPE_CHECKING, Protocol
23+
from typing import TYPE_CHECKING, Any, Protocol
24+
25+
import warnings
2426

2527
import datafusion._internal as df_internal
28+
from datafusion._internal import EXPECTED_PROVIDER_MSG
2629
from datafusion.utils import _normalize_table_provider
2730

2831
if TYPE_CHECKING:
2932
import pyarrow as pa
3033

31-
from datafusion import TableProvider
3234
from datafusion.context import TableProviderExportable
3335

3436
try:
@@ -131,12 +133,12 @@ def table(self, name: str) -> Table:
131133
return Table(self._raw_schema.table(name))
132134

133135
def register_table(
134-
self, name: str, table: Table | TableProvider | TableProviderExportable
136+
self, name: str, table: Table | TableProviderExportable | Any
135137
) -> None:
136138
"""Register a table or table provider in this schema.
137139
138140
Objects implementing ``__datafusion_table_provider__`` are also supported
139-
and treated as :class:`TableProvider` instances.
141+
and treated as table provider instances.
140142
"""
141143
provider = _normalize_table_provider(table)
142144
return self._raw_schema.register_table(name, provider)
@@ -151,31 +153,108 @@ class Database(Schema):
151153
"""See `Schema`."""
152154

153155

156+
_InternalRawTable = df_internal.catalog.RawTable
157+
_InternalTableProvider = df_internal.TableProvider
158+
159+
# Keep in sync with ``datafusion._internal.TableProvider.from_view``.
160+
_FROM_VIEW_WARN_STACKLEVEL = 2
161+
162+
154163
class Table:
155-
"""DataFusion table."""
164+
"""DataFusion table or table provider wrapper."""
156165

157-
def __init__(self, table: df_internal.catalog.RawTable) -> None:
158-
"""This constructor is not typically called by the end user."""
159-
self.table = table
166+
__slots__ = ("_table",)
167+
168+
def __init__(
169+
self,
170+
table: _InternalRawTable | _InternalTableProvider | Table,
171+
) -> None:
172+
"""Wrap a low level table or table provider."""
173+
174+
if isinstance(table, Table):
175+
table = table.table
176+
177+
if not isinstance(table, (_InternalRawTable, _InternalTableProvider)):
178+
raise TypeError(EXPECTED_PROVIDER_MSG)
179+
180+
self._table = table
181+
182+
def __getattribute__(self, name: str) -> Any:
183+
"""Restrict provider-specific helpers to compatible tables."""
184+
185+
if name == "__datafusion_table_provider__":
186+
table = object.__getattribute__(self, "_table")
187+
if not hasattr(table, "__datafusion_table_provider__"):
188+
raise AttributeError(name)
189+
return object.__getattribute__(self, name)
160190

161191
def __repr__(self) -> str:
162192
"""Print a string representation of the table."""
163-
return self.table.__repr__()
193+
return repr(self._table)
164194

165-
@staticmethod
166-
def from_dataset(dataset: pa.dataset.Dataset) -> Table:
167-
"""Turn a pyarrow Dataset into a Table."""
168-
return Table(df_internal.catalog.RawTable.from_dataset(dataset))
195+
@property
196+
def table(self) -> _InternalRawTable | _InternalTableProvider:
197+
"""Return the wrapped low level table object."""
198+
return self._table
199+
200+
@classmethod
201+
def from_dataset(cls, dataset: pa.dataset.Dataset) -> Table:
202+
"""Turn a :mod:`pyarrow.dataset` ``Dataset`` into a :class:`Table`."""
203+
204+
return cls(_InternalRawTable.from_dataset(dataset))
205+
206+
@classmethod
207+
def from_capsule(cls, capsule: Any) -> Table:
208+
"""Create a :class:`Table` from a PyCapsule exported provider."""
209+
210+
provider = _InternalTableProvider.from_capsule(capsule)
211+
return cls(provider)
212+
213+
@classmethod
214+
def from_dataframe(cls, df: Any) -> Table:
215+
"""Create a :class:`Table` from tabular data."""
216+
217+
from datafusion.dataframe import DataFrame as DataFrameWrapper
218+
219+
dataframe = df if isinstance(df, DataFrameWrapper) else DataFrameWrapper(df)
220+
return dataframe.into_view()
221+
222+
@classmethod
223+
def from_view(cls, df: Any) -> Table:
224+
"""Deprecated helper for constructing tables from views."""
225+
226+
from datafusion.dataframe import DataFrame as DataFrameWrapper
227+
228+
if isinstance(df, DataFrameWrapper):
229+
df = df.df
230+
231+
provider = _InternalTableProvider.from_view(df)
232+
warnings.warn(
233+
"Table.from_view is deprecated; use DataFrame.into_view or "
234+
"Table.from_dataframe instead.",
235+
category=DeprecationWarning,
236+
stacklevel=_FROM_VIEW_WARN_STACKLEVEL,
237+
)
238+
return cls(provider)
169239

170240
@property
171241
def schema(self) -> pa.Schema:
172242
"""Returns the schema associated with this table."""
173-
return self.table.schema
243+
return self._table.schema
174244

175245
@property
176246
def kind(self) -> str:
177247
"""Returns the kind of table."""
178-
return self.table.kind
248+
return self._table.kind
249+
250+
def __datafusion_table_provider__(self) -> Any:
251+
"""Expose the wrapped provider for FFI integrations."""
252+
253+
exporter = getattr(self._table, "__datafusion_table_provider__", None)
254+
if exporter is None:
255+
msg = "Underlying object does not export __datafusion_table_provider__()"
256+
raise AttributeError(msg)
257+
return exporter()
179258

180259

181260
class CatalogProvider(ABC):
@@ -233,15 +312,15 @@ def table(self, name: str) -> Table | None:
233312
...
234313

235314
def register_table( # noqa: B027
236-
self, name: str, table: Table | TableProvider | TableProviderExportable
315+
self, name: str, table: Table | TableProviderExportable | Any
237316
) -> None:
238317
"""Add a table to this schema.
239318
240319
This method is optional. If your schema provides a fixed list of tables, you do
241320
not need to implement this method.
242321
243322
Objects implementing ``__datafusion_table_provider__`` are also supported
244-
and treated as :class:`TableProvider` instances.
323+
and treated as table provider instances.
245324
"""
246325

247326
def deregister_table(self, name: str, cascade: bool) -> None: # noqa: B027

python/datafusion/context.py

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@
4848
import pandas as pd
4949
import polars as pl # type: ignore[import]
5050

51-
from datafusion import TableProvider
5251
from datafusion.catalog import CatalogProvider, Table
5352
from datafusion.expr import SortKey
5453
from datafusion.plan import ExecutionPlan, LogicalPlan
@@ -752,25 +751,24 @@ def register_view(self, name: str, df: DataFrame) -> None:
752751
self.ctx.register_table(name, view)
753752

754753
def register_table(
755-
self, name: str, table: Table | TableProvider | TableProviderExportable
754+
self, name: str, table: Table | TableProviderExportable | Any
756755
) -> None:
757-
"""Register a Table or TableProvider.
756+
"""Register a :py:class:`~datafusion.Table` with this context.
758757
759758
The registered table can be referenced from SQL statements executed against
760759
this context.
761760
762761
Plain :py:class:`~datafusion.dataframe.DataFrame` objects are not supported;
763762
convert them first with :meth:`datafusion.dataframe.DataFrame.into_view` or
764-
:meth:`datafusion.TableProvider.from_dataframe`.
763+
:meth:`datafusion.Table.from_dataframe`.
765764
766765
Objects implementing ``__datafusion_table_provider__`` are also supported
767-
and treated as :py:class:`~datafusion.TableProvider` instances.
766+
and treated as table provider instances.
768767
769768
Args:
770769
name: Name of the resultant table.
771-
table: DataFusion :class:`Table`, :class:`TableProvider`, or any object
772-
implementing ``__datafusion_table_provider__`` to add to the session
773-
context.
770+
table: DataFusion :class:`Table` or any object implementing
771+
``__datafusion_table_provider__`` to add to the session context.
774772
"""
775773
provider = _normalize_table_provider(table)
776774
self.ctx.register_table(name, provider)
@@ -793,14 +791,14 @@ def register_catalog_provider(
793791
self.ctx.register_catalog_provider(name, provider)
794792

795793
def register_table_provider(
796-
self, name: str, provider: Table | TableProvider | TableProviderExportable
794+
self, name: str, provider: Table | TableProviderExportable | Any
797795
) -> None:
798796
"""Register a table provider.
799797
800798
Deprecated: use :meth:`register_table` instead.
801799
802800
Objects implementing ``__datafusion_table_provider__`` are also supported
803-
and treated as :py:class:`~datafusion.TableProvider` instances.
801+
and treated as table provider instances.
804802
"""
805803
warnings.warn(
806804
"register_table_provider is deprecated; use register_table",

python/datafusion/dataframe.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060
import polars as pl
6161
import pyarrow as pa
6262

63-
from datafusion.table_provider import TableProvider
63+
from datafusion.catalog import Table
6464

6565
from enum import Enum
6666

@@ -315,8 +315,8 @@ def __init__(self, df: DataFrameInternal) -> None:
315315
"""
316316
self.df = df
317317

318-
def into_view(self) -> TableProvider:
319-
"""Convert ``DataFrame`` into a ``TableProvider`` view for registration.
318+
def into_view(self) -> Table:
319+
"""Convert ``DataFrame`` into a :class:`~datafusion.Table` for registration.
320320
321321
This is the preferred way to obtain a view for
322322
:py:meth:`~datafusion.context.SessionContext.register_table` for several reasons:
@@ -325,13 +325,13 @@ def into_view(self) -> TableProvider:
325325
``DataFrame.into_view()`` method without intermediate delegations.
326326
2. **Clear semantics**: The ``into_`` prefix follows Rust conventions,
327327
indicating conversion from one type to another.
328-
3. **Canonical method**: Other approaches like ``TableProvider.from_dataframe``
328+
3. **Canonical method**: Other approaches like ``Table.from_dataframe``
329329
delegate to this method internally, making this the single source of truth.
330-
4. **Deprecated alternatives**: The older ``TableProvider.from_view`` helper
330+
4. **Deprecated alternatives**: The older ``Table.from_view`` helper
331331
is deprecated and issues warnings when used.
332332
333-
``datafusion.TableProvider.from_dataframe`` calls this method under the hood,
334-
and the older ``TableProvider.from_view`` helper is deprecated.
333+
``datafusion.Table.from_dataframe`` calls this method under the hood, and the
334+
older ``Table.from_view`` helper is deprecated.
335335
336336
The ``DataFrame`` remains valid after conversion, so it can still be used for
337337
additional queries alongside the returned view.
@@ -345,9 +345,9 @@ def into_view(self) -> TableProvider:
345345
>>> df.collect() # The DataFrame is still usable
346346
>>> ctx.sql("SELECT value FROM values_view").collect()
347347
"""
348-
from datafusion.table_provider import TableProvider as _TableProvider
348+
from datafusion.catalog import Table as _Table
349349

350-
return _TableProvider(self.df.into_view())
350+
return _Table(self.df.into_view())
351351

352352
def __getitem__(self, key: str | list[str]) -> DataFrame:
353353
"""Return a new :py:class`DataFrame` with the specified column or columns.

0 commit comments

Comments
 (0)