-
Notifications
You must be signed in to change notification settings - Fork 387
Open
Description
Apache Iceberg version
0.10.0
Please describe the bug 🐞
Since 0.10.0, it is now possible to use a botocore session for a rest catalog, so:
import io
import os
import pandas as pd
import pyarrow as pa
from boto3 import Session
from pyiceberg.catalog import load_catalog
boto3_session = Session(profile_name='a_profile', region_name='us-east-1')
catalog = load_catalog(
"catalog",
type="rest",
botocore_session=boto3_session._session,
warehouse="arn:aws:s3tables:us-east-1:XXXXXXXXXXX:bucket/a_bucket",
uri=f"https://s3tables.us-east-1.amazonaws.com/iceberg",
**{
"rest.sigv4-enabled": "true",
"rest.signing-name": "s3tables",
"rest.signing-region": "us-east-1"
})
table = catalog.load_table("namespace.a_table")
json_string = "[{\"data\":\"000000000000\", ...}]"
df = pd.read_json(io.StringIO(json_string), orient='records')
arrow_table = pa.Table.from_pandas(df=df, schema=table.schema().as_arrow())
table.overwrite(arrow_table)
It works until we ".overwrite()":
OSError: When reading information for key 'metadata/snap-6778585584222594295-0-3ae9518f-fd1c-488f-b3d2-4ca1724317a1.avro' in bucket '2c8e7acb-67a1-4dc9-8ym9eg38966b8bazzfjn487w5o9wruse1b--table-s3': AWS Error UNKNOWN (HTTP status 400) during HeadObject operation: No response body.
To "fix" it, we can do:
boto3_session = Session(profile_name='a_profile', region_name='us-east-1')
catalog = load_catalog(
"catalog",
type="rest",
botocore_session=boto3_session._session,
warehouse="arn:aws:s3tables:us-east-1:XXXXXXXXXXX:bucket/a_bucket",
uri=f"https://s3tables.us-east-1.amazonaws.com/iceberg",
**{
"rest.sigv4-enabled": "true",
"rest.signing-name": "s3tables",
"rest.signing-region": "us-east-1"
})
table = catalog.load_table("namespace.a_table")
json_string = "[{\"data\":\"000000000000\", ...}]"
df = pd.read_json(io.StringIO(json_string), orient='records')
arrow_table = pa.Table.from_pandas(df=df, schema=table.schema().as_arrow())
credentials = boto3_session.get_credentials().get_frozen_credentials()
os.environ["AWS_ACCESS_KEY_ID"] = credentials.access_key
os.environ["AWS_SECRET_ACCESS_KEY"] = credentials.secret_key
if credentials.token:
os.environ["AWS_SESSION_TOKEN"] = credentials.token
table.overwrite(arrow_table)
which works but defeats the purpose.
We can access .schema() and such. So it seems the overwrite method is not using the proper SigV4Adapter (pyiceberg/catalog/rest/init.py).
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels