You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyIceberg interfaces closely with Daft Dataframes (see also: [Daft integration with Iceberg](https://docs.daft.ai/en/stable/io/iceberg/)) which provides a full lazily optimized query engine interface on top of PyIceberg tables.
1130
+
1131
+
<!-- prettier-ignore-start -->
1132
+
1133
+
!!! note "Requirements"
1134
+
This requires [Daft to be installed](index.md).
1135
+
1136
+
<!-- prettier-ignore-end -->
1137
+
1138
+
A table can be read easily into a Daft Dataframe:
1139
+
1140
+
```python
1141
+
df = table.to_daft() # equivalent to `daft.read_iceberg(table)`
PyIceberg interfaces closely with Polars Dataframes and LazyFrame which provides a full lazily optimized query engine interface on top of PyIceberg tables.
1183
+
1184
+
<!-- prettier-ignore-start -->
1185
+
1186
+
!!! note "Requirements"
1187
+
This requires [`polars` to be installed](index.md).
1188
+
1189
+
```python
1190
+
pip install pyiceberg['polars']
1191
+
```
1192
+
<!-- prettier-ignore-end -->
1193
+
1194
+
PyIceberg data can be analyzed and accessed through Polars using either DataFrame or LazyFrame.
1195
+
If your code utilizes the Apache Iceberg data scanning and retrieval API and then analyzes the resulting DataFrame in Polars, use the `table.scan().to_polars()` API.
1196
+
If the intent is to utilize Polars' high-performance filtering and retrieval functionalities, use LazyFrame exported from the Iceberg table with the `table.to_polars()` API.
1197
+
1198
+
```python
1199
+
# Get LazyFrame
1200
+
iceberg_table.to_polars()
1201
+
1202
+
# Get Data Frame
1203
+
iceberg_table.scan().to_polars()
1204
+
```
1205
+
1206
+
#### Working with Polars DataFrame
1207
+
1208
+
PyIceberg makes it easy to filter out data from a huge table and pull it into a Polars dataframe locally. This will only fetch the relevant Parquet files for the query and apply the filter. This will reduce IO and therefore improve performance and reduce cost.
1110
1209
1111
1210
```python
1112
1211
# Expire old snapshots, but always keep last 10 and at least 5 total
| uri | <https://rest-catalog/ws> | URI identifying the REST Server |
343
-
| ugi | t-1234:secret | Hadoop UGI for Hive client. |
344
-
| credential | t-1234:secret | Credential to use for OAuth2 credential flow when initializing the catalog |
345
-
| token | FEW23.DFSDF.FSDF | Bearer token value to use for `Authorization` header |
342
+
| uri | <https://rest-catalog/ws> | URI identifying the REST Server |
343
+
| warehouse | myWarehouse | Warehouse location or identifier to request from the catalog service. May be used to determine server-side overrides, such as the warehouse location. |
344
+
| snapshot-loading-mode | refs | The snapshots to return in the body of the metadata. Setting the value to `all` would return the full set of snapshots currently valid for the table. Setting the value to `refs` would load all snapshots referenced by branches or tags. |
345
+
| `header.X-Iceberg-Access-Delegation` | `vended-credentials` | Signal to the server that the client supports delegated access via a comma-separated list of access mechanisms. The server may choose to supply access via any or none of the requested mechanisms. When using `vended-credentials`, the server provides temporary credentials to the client. When using `remote-signing`, the server signs requests on behalf of the client. (default: `vended-credentials`) |
346
+
347
+
#### Headers in REST Catalog
348
+
349
+
To configure custom headers in REST Catalog, include them in the catalog properties with `header.<Header-Name>`. This
350
+
ensures that all HTTP requests to the REST service include the specified headers.
| rest.sigv4-enabled | true | Sign requests to the REST Server using AWS SigV4 protocol |
350
378
| rest.signing-region | us-east-1 | The region to use when SigV4 signing a request |
351
379
| rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request |
352
-
| oauth2-server-uri | <https://auth-service/cc> | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') |
353
-
| snapshot-loading-mode | refs | The snapshots to return in the body of the metadata. Setting the value to `all` would return the full set of snapshots currently valid for the table. Setting the value to `refs` would load all snapshots referenced by branches or tags. |
354
-
| warehouse | myWarehouse | Warehouse location or identifier to request from the catalog service. May be used to determine server-side overrides, such as the warehouse location. |
355
380
356
381
<!-- markdown-link-check-enable-->
357
382
358
-
#### Headers in RESTCatalog
383
+
#### Common Integrations & Examples
359
384
360
-
To configure custom headers in RESTCatalog, include them in the catalog properties with the prefix `header.`. This
361
-
ensures that all HTTP requests to the REST service include the specified headers.
| `header.X-Iceberg-Access-Delegation` | `{vended-credentials,remote-signing}` | `vended-credentials` | Signal to the server that the client supports delegated access via a comma-separated list of access mechanisms. The server may choose to supply access via any or none of the requested mechanisms |
0 commit comments