You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: allow default scheme and netloc for schemeless path (apache#2291)
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
For hdfs it's common to get scheme and netloc from config and have paths
be just the uri. This PR adds properties to configure `DEFAULT_SCHEME`
and `DEFAULT_NETLOC`
For example
```
from pyiceberg.catalog import load_catalog
catalog = load_catalog("default", {
...,
"DEFAULT_SCHEME": "hdfs",
"DEFAULT_NETLOC": "ltx1-yugioh-cluster01.linkfs.prod-ltx1.atd.prod.linkedin.com:9000",
}
```
or if not using catalog
```
static_table = StaticTable.from_metadata(
"/warehouse/wh/nyc.db/taxis/metadata/00002-6ea51ce3-62aa-4197-9cf8-43d07c3440ca.metadata.json"
properties={
'DEFAULT_SCHEME': 'hdfs',
'DEFAULT_NETLOC': 'ltx1-yugioh-cluster01.linkfs.prod-ltx1.atd.prod.linkedin.com:9000',
}
)
```
Previously, schemeless paths are assumed to be for the local filesystem
only. This PR allows schemeless paths to be passed to the HDFS
Filesystem
# Are these changes tested?
Tested in test env at linkedin and with unit tests
# Are there any user-facing changes?
No user facing changes by default. If you add these env variables, if
file path doesn't have scheme/netloc it'll use the defaults specified.
<!-- In the case of user-facing changes, please add the changelog label.
-->
---------
Co-authored-by: Tom McCormick <[email protected]>
0 commit comments