PyArrowFile class is not compatible with ABFS uri syntax

### Apache Iceberg version

0.10.0 (latest release)

### Please describe the bug 🐞

Starting from version 20, Pyarrow has support for Azure filesystems.

ABFS URIs have this [format](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri): abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>

But Pyarrow library expects the following path format for Azure: abfs[s]://<file_system>/<path>/<file_name>.

As you see, the part "@<account_name>.<dfs|blob>.core.windows.net" prevents users to use pyarrow file io in Azure environment. This issue CAN be fixed in Pyiceberg by removing account_name part.

The proposed [fix](https://github.com/apache/iceberg-python/pull/2683) is just to start a conversation around the issue. I am not 100% sure how and where this should be fixed.

We know similar issues do not occur with Fsspec file io.

### Examples

We have a very basic setup with RestCatalog:

```
def create_iceberg_catalog():
    CATALOG_URI = "https://lakehouse.../catalog"

    catalog_config = {
        "uri": CATALOG_URI,
        PY_IO_IMPL: "pyiceberg.io.pyarrow.PyArrowFileIO",
        ADLS_ACCOUNT_NAME: "lakehouseaccount",
    }

    return RestCatalog("lakehouse", **catalog_config)
```

When we create a table "testns.testtable", it is assigned a following location : abfss://lakehouse-azure-bucket@lakehouseaccount.dfs.core.windows.net/testns/testtable

Then, when we try to append data to the table:
```
data = pa.table(
    {
        "id": pa.array(range(5), type=pa.int32()),  # Ensure 'id' is int32 to match Iceberg schema
        "value": [random.choice(["Heads", "Tails"]) for _ in range(5)],
    }
)
table.append(data)
```

it throws the following exception:
```
OSError: ListBlobsByHierarchy failed for prefix='aip_test[/test_table-xxx/metadata/snap-xxx.avro](https://xxx/test_table-xxx.avro)'. GetFileInfo is unable to determine whether the path exists. Azure Error: [InvalidResourceName] 400 The specified resource name contains invalid characters.
```

This is because exists() method is called:
```
File [~/.official-venvs/amd64.ipykernel-default.master/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py:368](https://xxx/user/nikita-matckevich/.official-venvs/amd64.ipykernel-default.master/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py#line=367), in PyArrowFile.create(self, overwrite)
    366     if not overwrite and self.exists() is True:
```

And it expects the uri without "@akehouseaccount.dfs.core.windows.net". When we monkey-patch the PyArrowFile.__init__ everything works fine:
```
PyArrowFile.old_init = PyArrowFile.__init__
def patched_init(self, location: str, path: str, fs: FileSystem, buffer_size: int = ONE_MEGABYTE):
    # Call the original __init__ method
    self.old_init(location, path, fs, buffer_size)
    self._path = remove_section_between_at_and_slash(path)
    print("Logging: PyArrowFile initialized")
PyArrowFile.__init__ = patched_init
```

It does not matter how and with which engine the table was created and written before: all pyarrow methods are not working, even those that are on read path, so it will be impossible to scan a non-empty table as well. We tested it by creating a table with fsspec file io and reading it with pyarrow file io.

It is hard to test this behavior with Azurite, because Azurite uris are different and do not contain "@<account_name>" part.

### Willingness to contribute

- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyArrowFile class is not compatible with ABFS uri syntax #2698

Apache Iceberg version

Please describe the bug 🐞

Examples

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyArrowFile class is not compatible with ABFS uri syntax #2698

Description

Apache Iceberg version

Please describe the bug 🐞

Examples

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions