You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: mkdocs/docs/configuration.md
+42-27Lines changed: 42 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,33 +105,6 @@ You can also set the FileIO explicitly:
105
105
106
106
For the FileIO there are several configuration options available:
107
107
108
-
### PyArrow FileSystem Extra Properties
109
-
110
-
When using `PyArrowFileIO`, any properties with filesystem specific prefixes that are not explicitly handled by PyIceberg will be passed to the underlying PyArrow filesystem implementations.
111
-
112
-
To use these properties, follow the format:
113
-
114
-
```txt
115
-
{fs_scheme}.{parameter_name}
116
-
```
117
-
118
-
- {fs_scheme} is the filesystem scheme (e.g., s3, hdfs, gcs).
119
-
- {parameter_name} must match the name expected by the PyArrow filesystem.
120
-
- Property values must use the correct type expected by the underlying filesystem (e.g., string, integer, boolean).
121
-
122
-
Below are examples of supported prefixes and how the properties are passed through:
123
-
124
-
<!-- markdown-link-check-disable -->
125
-
126
-
| Property Prefix | FileSystem | Example | Description |
|`s3.`|[S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html)|`s3.load_frequency=900`| Passed as `load_frequency=900` to S3FileSystem |
129
-
|`hdfs.`|[HadoopFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.HadoopFileSystem.html)|`hdfs.replication=3`| Passed as `replication=3` to HadoopFileSystem |
130
-
|`gcs.`|[GcsFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.GcsFileSystem.html)|`gcs.project_id=test`| Passed as `project_id='test'` to GcsFileSystem |
131
-
|`adls.`|[AzureFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.AzureFileSystem.html)|`adls.account_name=foo`| Passed as `account_name=foo` to AzureFileSystem |
132
-
|`oss.`|[S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html)|`oss.connect_timeout=30.0`| Passed as `connect_timeout=30.0` to S3FileSystem |
133
-
|`file.`|[LocalFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.LocalFileSystem.html)|`file.use_mmap=true`| Passed as `use_mmap=True` to LocalFileSystem |
| pyarrow.use-large-types-on-read | True | Use large PyArrow types i.e. [large_string](https://arrow.apache.org/docs/python/generated/pyarrow.large_string.html), [large_binary](https://arrow.apache.org/docs/python/generated/pyarrow.large_binary.html) and [large_list](https://arrow.apache.org/docs/python/generated/pyarrow.large_list.html) field types on table scans. The default value is True. |
248
223
249
224
<!-- markdown-link-check-enable-->
225
+
#### Advanced FileSystem Configuration
226
+
227
+
When using `PyArrowFileIO`, you can **pass additional configuration properties directly to the underlying PyArrow filesystem implementations**. This feature enables you to use any PyArrow filesystem option without requiring explicit PyIceberg support.
228
+
229
+
PyIceberg first processes its own supported properties for each filesystem, then passes any remaining properties with the appropriate prefix directly to the PyArrow filesystem constructor. This approach ensures:
230
+
231
+
1. PyIceberg's built-in properties take precedence
232
+
2. Advanced PyArrow options are automatically supported
233
+
3. New PyArrow features become available immediately
234
+
235
+
##### Configuration Format
236
+
237
+
Use this format for additional properties:
238
+
239
+
```txt
240
+
{fs_scheme}.{parameter_name}={value}
241
+
```
242
+
243
+
Where:
244
+
245
+
-`{fs_scheme}` is the filesystem scheme (e.g., `s3`, `hdfs`, `gcs`, `adls`, `oss`, `file`)
246
+
-`{parameter_name}` must match the exact parameter name expected by the PyArrow filesystem constructor
247
+
-`{value}` must be the correct type expected by the underlying filesystem (string, integer, boolean, etc.)
248
+
249
+
##### Supported Prefixes and FileSystems
250
+
251
+
<!-- markdown-link-check-disable -->
252
+
253
+
| Property Prefix | FileSystem | Example | Description |
|`s3.`|[S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html)|`s3.load_frequency=900`| Passed as `load_frequency=900` to S3FileSystem |
256
+
|`hdfs.`|[HadoopFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.HadoopFileSystem.html)|`hdfs.replication=3`| Passed as `replication=3` to HadoopFileSystem |
257
+
|`gcs.`|[GcsFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.GcsFileSystem.html)|`gcs.project_id=test`| Passed as `project_id='test'` to GcsFileSystem |
258
+
|`adls.`|[AzureFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.AzureFileSystem.html)|`adls.account_name=foo`| Passed as `account_name=foo` to AzureFileSystem |
259
+
|`oss.`|[S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html)|`oss.connect_timeout=30.0`| Passed as `connect_timeout=30.0` to S3FileSystem |
260
+
|`file.`|[LocalFileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.LocalFileSystem.html)|`file.use_mmap=true`| Passed as `use_mmap=True` to LocalFileSystem |
261
+
262
+
<!-- markdown-link-check-enable -->
263
+
264
+
**Note:** Refer to the PyArrow documentation for each filesystem to understand the available parameters and their expected types. Property values are passed directly to PyArrow, so they must match the exact parameter names and types expected by the filesystem constructors.
0 commit comments