You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/website/docs/general-usage/dataset-access/dataset.md
+13-20Lines changed: 13 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,9 +16,9 @@ Here's a full example of how to retrieve data from a pipeline and load it into a
16
16
17
17
## Getting started
18
18
19
-
Assuming you have a `Pipeline` object (let's call it `pipeline`), you can obtain a `ReadableDataset`and access your tables as `ReadableRelation` objects.
19
+
Assuming you have a `Pipeline` object (let's call it `pipeline`), you can obtain a `Dataset` which is contains the crendentials and schema to your destination dataset. You can run construct a query and execute it on the dataset to retrieve a `Relation` which you may use to retrieve data from the `Dataset`.
20
20
21
-
**Note:** The `ReadableDataset` and `ReadableRelation` objects are **lazy-loading**. They will only query and retrieve data when you perform an action that requires it, such as fetching data into a DataFrame or iterating over the data. This means that simply creating these objects does not load data into memory, making your code more efficient.
21
+
**Note:** The `Dataset` and `Relation` objects are **lazy-loading**. They will only query and retrieve data when you perform an action that requires it, such as fetching data into a DataFrame or iterating over the data. This means that simply creating these objects does not load data into memory, making your code more efficient.
22
22
23
23
24
24
### Access the dataset
@@ -27,13 +27,17 @@ Assuming you have a `Pipeline` object (let's call it `pipeline`), you can obtain
27
27
28
28
### Access tables as dataset
29
29
30
-
You can access tables in your dataset using either attribute access or item access.
30
+
The simplest way of getting a Relation from a Dataset is to get a full table relation:
Once you have a `ReadableRelation`, you can read data in various formats and sizes.
40
+
Once you have a `Relation`, you can read data in various formats and sizes.
37
41
38
42
### Fetch the entire table
39
43
@@ -55,7 +59,7 @@ Loading full tables into memory without limiting or iterating over them can cons
55
59
56
60
## Lazy loading behavior
57
61
58
-
The `ReadableDataset` and `ReadableRelation` objects are **lazy-loading**. This means that they do not immediately fetch data when you create them. Data is only retrieved when you perform an action that requires it, such as calling `.df()`, `.arrow()`, or iterating over the data. This approach optimizes performance and reduces unnecessary data loading.
62
+
The `Dataset` and `Relation` objects are **lazy-loading**. This means that they do not immediately fetch data when you create them. Data is only retrieved when you perform an action that requires it, such as calling `.df()`, `.arrow()`, or iterating over the data. This approach optimizes performance and reduces unnecessary data loading.
59
63
60
64
## Iterating over data in chunks
61
65
@@ -73,7 +77,7 @@ To handle large datasets efficiently, you can process data in smaller chunks.
The methods available on the ReadableRelation correspond to the methods available on the cursor returned by the SQL client. Please refer to the [SQL client](./sql-client.md#supported-methods-on-the-cursor) guide for more information.
80
+
The methods available on the Relation correspond to the methods available on the cursor returned by the SQL client. Please refer to the [SQL client](./sql-client.md#supported-methods-on-the-cursor) guide for more information.
77
81
78
82
## Connection Handling
79
83
@@ -171,20 +175,9 @@ Note: `delta` tables are by default on autorefresh which is implemented by delta
171
175
172
176
## Advanced usage
173
177
174
-
### Using custom SQL queries to create `ReadableRelations`
175
-
176
-
You can use custom SQL queries directly on the dataset to create a `ReadableRelation`:
When using custom SQL queries with `dataset()`, methods like `limit` and `select` won't work. Include any filtering or column selection directly in your SQL query.
182
-
:::
183
-
184
-
185
-
### Loading a `ReadableRelation` into a pipeline table
178
+
### Loading a `Relation` into a pipeline table
186
179
187
-
Since the `iter_arrow` and `iter_df` methods are generators that iterate over the full `ReadableRelation` in chunks, you can use them as a resource for another (or even the same) `dlt` pipeline:
180
+
Since the `iter_arrow` and `iter_df` methods are generators that iterate over the full `Relation` in chunks, you can use them as a resource for another (or even the same) `dlt` pipeline:
@@ -198,7 +191,7 @@ Visit the [Native Ibis integration](./ibis-backend.md) guide to learn more.
198
191
199
192
-**Memory usage:** Loading full tables into memory without iterating or limiting can consume significant memory, potentially leading to crashes if the dataset is large. Always consider using limits or chunked iteration.
200
193
201
-
-**Lazy evaluation:**`ReadableDataset` and `ReadableRelation` objects delay data retrieval until necessary. This design improves performance and resource utilization.
194
+
-**Lazy evaluation:**`Dataset` and `Relation` objects delay data retrieval until necessary. This design improves performance and resource utilization.
202
195
203
196
-**Custom SQL queries:** When executing custom SQL queries, remember that additional methods like `limit()` or `select()` won't modify the query. Include all necessary clauses directly in your SQL statement.
0 commit comments