Skip to content

Commit f0837de

Browse files
committed
docs: update README and user guide to reflect register_view method for DataFrame registration
1 parent c31395f commit f0837de

File tree

3 files changed

+62
-12
lines changed

3 files changed

+62
-12
lines changed

README.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ This produces the following chart:
8181

8282
## Registering a DataFrame as a View
8383

84-
You can use the `into_view` method to convert a DataFrame into a view and register it with the context.
84+
You can use SessionContext's `register_view` method to convert a DataFrame into a view and register it with the context.
8585

8686
```python
8787
from datafusion import SessionContext, col, literal
@@ -98,11 +98,8 @@ df = ctx.from_pydict(data, "my_table")
9898
# Filter the DataFrame (for example, keep rows where a > 2)
9999
df_filtered = df.filter(col("a") > literal(2))
100100

101-
# Convert the filtered DataFrame into a view
102-
view = df_filtered.into_view()
103-
104-
# Register the view with the context
105-
ctx.register_table("view1", view)
101+
# Register the dataframe as a view with the context
102+
ctx.register_view("view1", df_filtered)
106103

107104
# Now run a SQL query against the registered view
108105
df_view = ctx.sql("SELECT * FROM view1")

docs/source/user-guide/common-operations/views.rst

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
Registering Views
2020
======================
2121

22-
You can use the ``into_view`` method to convert a DataFrame into a view and register it with the context.
22+
You can use the context's ``register_view`` method to register a DataFrame as a view
2323

2424
.. code-block:: python
2525
@@ -37,11 +37,8 @@ You can use the ``into_view`` method to convert a DataFrame into a view and regi
3737
# Filter the DataFrame (for example, keep rows where a > 2)
3838
df_filtered = df.filter(col("a") > literal(2))
3939
40-
# Convert the filtered DataFrame into a view
41-
view = df_filtered.into_view()
42-
43-
# Register the view with the context
44-
ctx.register_table("view1", view)
40+
# Register the dataframe as a view with the context
41+
ctx.register_view("view1", df_filtered)
4542
4643
# Now run a SQL query against the registered view
4744
df_view = ctx.sql("SELECT * FROM view1")

src/dataframe.rs

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@ use crate::{
5252
expr::{sort_expr::PySortExpr, PyExpr},
5353
};
5454

55+
// https://github.com/apache/datafusion-python/pull/1016#discussion_r1983239116
56+
// - we have not decided on the table_provider approach yet
57+
// this is an interim implementation
5558
#[pyclass(name = "TableProvider", module = "datafusion")]
5659
pub struct PyTableProvider {
5760
provider: Arc<dyn TableProvider>,
@@ -71,6 +74,57 @@ impl PyTableProvider {
7174
/// A PyDataFrame is a representation of a logical plan and an API to compose statements.
7275
/// Use it to build a plan and `.collect()` to execute the plan and collect the result.
7376
/// The actual execution of a plan runs natively on Rust and Arrow on a multi-threaded environment.
77+
///
78+
/// # Methods
79+
///
80+
/// - `new`: Creates a new PyDataFrame.
81+
/// - `__getitem__`: Enable selection for `df[col]`, `df[col1, col2, col3]`, and `df[[col1, col2, col3]]`.
82+
/// - `__repr__`: Returns a string representation of the DataFrame.
83+
/// - `_repr_html_`: Returns an HTML representation of the DataFrame.
84+
/// - `describe`: Calculate summary statistics for a DataFrame.
85+
/// - `schema`: Returns the schema from the logical plan.
86+
/// - `into_view`: Convert this DataFrame into a Table that can be used in register_table. We have not finalized on PyTableProvider approach yet.
87+
/// - `select_columns`: Select columns from the DataFrame.
88+
/// - `select`: Select expressions from the DataFrame.
89+
/// - `drop`: Drop columns from the DataFrame.
90+
/// - `filter`: Filter the DataFrame based on a predicate.
91+
/// - `with_column`: Add a new column to the DataFrame.
92+
/// - `with_columns`: Add multiple new columns to the DataFrame.
93+
/// - `with_column_renamed`: Rename a column in the DataFrame.
94+
/// - `aggregate`: Aggregate the DataFrame based on group by and aggregation expressions.
95+
/// - `sort`: Sort the DataFrame based on expressions.
96+
/// - `limit`: Limit the number of rows in the DataFrame.
97+
/// - `collect`: Executes the plan, returning a list of `RecordBatch`es.
98+
/// - `cache`: Cache the DataFrame.
99+
/// - `collect_partitioned`: Executes the DataFrame and collects all results into a vector of vector of RecordBatch maintaining the input partitioning.
100+
/// - `show`: Print the result, 20 lines by default.
101+
/// - `distinct`: Filter out duplicate rows.
102+
/// - `join`: Join two DataFrames.
103+
/// - `join_on`: Join two DataFrames based on expressions.
104+
/// - `explain`: Print the query plan.
105+
/// - `logical_plan`: Get the logical plan for this DataFrame.
106+
/// - `optimized_logical_plan`: Get the optimized logical plan for this DataFrame.
107+
/// - `execution_plan`: Get the execution plan for this DataFrame.
108+
/// - `repartition`: Repartition the DataFrame based on a logical partitioning scheme.
109+
/// - `repartition_by_hash`: Repartition the DataFrame based on a hash partitioning scheme.
110+
/// - `union`: Calculate the union of two DataFrames, preserving duplicate rows.
111+
/// - `union_distinct`: Calculate the distinct union of two DataFrames.
112+
/// - `unnest_column`: Unnest a column in the DataFrame.
113+
/// - `unnest_columns`: Unnest multiple columns in the DataFrame.
114+
/// - `intersect`: Calculate the intersection of two DataFrames.
115+
/// - `except_all`: Calculate the exception of two DataFrames.
116+
/// - `write_csv`: Write the DataFrame to a CSV file.
117+
/// - `write_parquet`: Write the DataFrame to a Parquet file.
118+
/// - `write_json`: Write the DataFrame to a JSON file.
119+
/// - `to_arrow_table`: Convert the DataFrame to an Arrow Table.
120+
/// - `__arrow_c_stream__`: Convert the DataFrame to an Arrow C Stream.
121+
/// - `execute_stream`: Execute the DataFrame and return a RecordBatchStream.
122+
/// - `execute_stream_partitioned`: Execute the DataFrame and return partitioned RecordBatchStreams.
123+
/// - `to_pandas`: Convert the DataFrame to a Pandas DataFrame.
124+
/// - `to_pylist`: Convert the DataFrame to a Python list.
125+
/// - `to_pydict`: Convert the DataFrame to a Python dictionary.
126+
/// - `to_polars`: Convert the DataFrame to a Polars DataFrame.
127+
/// - `count`: Execute the DataFrame to get the total number of rows.
74128
#[pyclass(name = "DataFrame", module = "datafusion", subclass)]
75129
#[derive(Clone)]
76130
pub struct PyDataFrame {
@@ -179,6 +233,8 @@ impl PyDataFrame {
179233
/// Disabling the clippy lint, so we can use &self
180234
/// because we're working with Python bindings
181235
/// where objects are shared
236+
/// https://github.com/apache/datafusion-python/pull/1016#discussion_r1983239116
237+
/// - we have not decided on the table_provider approach yet
182238
#[allow(clippy::wrong_self_convention)]
183239
fn into_view(&self) -> PyDataFusionResult<PyTable> {
184240
// Call the underlying Rust DataFrame::into_view method.

0 commit comments

Comments
 (0)