Skip to content

Conversation

@talatuyarer
Copy link

Adds support in DuckDB’s Iceberg extension to attach user-provided “extra headers” to all HTTP requests made to the REST catalog (including the initial /v1/config call and subsequent table/namespace operations).

This aligns DuckDB with how other Iceberg clients treat REST catalogs. Example usage:

CREATE SECRET iceberg_secret_single (
	TYPE ICEBERG,
	CLIENT_ID 'admin',
	CLIENT_SECRET 'password',
	OAUTH2_SERVER_URI 'http://127.0.0.1:8181/v1/oauth/tokens',
	EXTRA_HTTP_HEADERS MAP {'X-Custom-Header': 'custom-value'}
);

I also fixed a bug if catalog return prefix which has slashes in it. Duckdb encode those slashes which it should not encode.

@talatuyarer
Copy link
Author

Hi @Tmonster Could you review my pr ?

@Tmonster
Copy link
Collaborator

Hi @talatuyarer,

yes, I will take a look. Just have been super busy with some things at the moment.

Copy link
Collaborator

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Can you add some tests to make sure they are sent in the request as well?

In a test like at test/sql/local/irc/test_table_information_requests.test you can see us check the request. To check the headers I think something like

select request.headers from duckdb_logs_parsed('http');

should work

path_components.push_back(component);

// If the component contains slashes, split it into multiple segments
if (component.find('/') != string::npos) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why you split into multiple segments here?
I have had issues in the past where warehouse names contain slashes and if they are split up into separate components then the attach doesn't work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRCEndpointBuilder::AddPathComponent() is treating the entire prefix as a single component and URL-encoding it. GetURLEncoded() calls StringUtil::URLEncode(component) for each path component. This is correct for individual components, but when BigLake returns a prefix like projects/1057666841514/catalogs/biglake-public-nyc-taxi-iceberg, it contains multiple path segments separated by /.

When we call AddPathComponent(catalog.prefix) with this multi-segment prefix, it treats the entire string (including the / characters) as a single component and URL-encodes it, turning / into %2F.

If you want i can create a function just prefix specific. @Tmonster What do you think ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added AddPrefixComponent method. fyi

@talatuyarer talatuyarer requested a review from Tmonster January 30, 2026 04:31
@talatuyarer
Copy link
Author

talatuyarer commented Jan 30, 2026

@Tmonster Thank you for pointer I added also tests too. and also run make format to fix format issue fyi

@talatuyarer
Copy link
Author

@Tmonster @Tishj This is ready to merge :)

@Tmonster
Copy link
Collaborator

Tmonster commented Feb 4, 2026

Thanks! Just set up a test against some other IRC catalogs. Will take another look when that finishes

@Tmonster
Copy link
Collaborator

Tmonster commented Feb 4, 2026

Looks like some of the cloud tests are failing. You can see the run here. Seems like the s3tables attach is failing. They return an already encoded s3 prefix. The response to the /config endpoint looks something like this

{
  "defaults": {
    "write.object-storage.partitioned-paths": "false",
    "s3.delete-enabled": "false",
    "io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "write.object-storage.enabled": "true",
    "prefix": "arn%3Aaws%3As3tables%3Aus-east-2%3A<account_id>%3Abucket%2Ficeberg-testing",
    "rest-metrics-reporting-enabled": "false"
  },
  "overrides": {}
}

And the url we hit now for namespaces is

/iceberg/v1/arn%3Aaws%3As3tables%3Aus-east-2%3A<account_id>%3Abucket%2Ficeberg-testing/namespaces"

The url we hit for table listing is

/iceberg/v1/arn%3Aaws%3As3tables%3Aus-east-2%3A<account_id>%3Abucket/iceberg-testing/namespaces/default/tables

Notice the difference in bucket%2Ficeberg-testing and bucket/iceberg-testing

It seems the URL builder in GetTables is building the following components in the GetTable request

  path_components = {
    std::__1::vector<std::__1::string, std::__1::allocator<std::__1::string> > = size=6 {
      [0] = "v1"
      [1] = "arn:aws:s3tables:us-east-2:<account_id>:bucket"
      [2] = "iceberg-testing"
      [3] = "namespaces"
      [4] = "default"
      [5] = "tables"
    }
  }

But in the GetSchemas the URL builder has the following components.

  path_components = {
  std::__1::vector<std::__1::string, std::__1::allocator<std::__1::string> > = size=3 {
    [0] = "v1"
    [1] = "arn:aws:s3tables:us-east-2:<account_id>:bucket/iceberg-testing"
    [2] = "namespaces"
  }
}
host = "s3tables.us-east-2.amazonaws.com/iceberg"
params = size=0 {}
}

It seems like here you missed AddPrefixComponent.

I think the fix here is to write some functionality to detect if the prefix is already encoded or not? If it is encoded, decode it and add it as the prefix. Also, S3Tables is super easy to set up here for debugging. It really is just a matter of creating a bucket in the S3Tables console.

I don't know what BigLake returns (encoded or decoded), would be nice if you could share that here

Also, can you add a test where you explicitly add an Authorization header? And then make sure it gets overridden?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants