Skip to content

feat(catalog): Implement register_table for glue catalog #1568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 66 additions & 6 deletions crates/catalog/glue/src/catalog.rs
Original file line number Diff line number Diff line change
Expand Up @@ -624,15 +624,75 @@ impl Catalog for GlueCatalog {
}
}

/// Asynchronously registers an existing table into the Glue Catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding Asynchronously? Do you mean the async function? If so, we typically don't add this word.

///
/// Converts the provided table identifier and metadata location into a
/// Glue-compatible table representation, and attempts to create the
/// corresponding table in the Glue Catalog.
///
/// # Returns
/// Returns `Ok(Table)` if the table is successfully registered and loaded.
/// If the registration fails due to validation issues, existing table conflicts,
/// metadata problems, or errors during the registration or loading process,
/// an `Err(...)` is returned.
async fn register_table(
&self,
_table_ident: &TableIdent,
_metadata_location: String,
table: &TableIdent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table: &TableIdent,
table_ident: &TableIdent,

metadata_location: String,
) -> Result<Table> {
Err(Error::new(
ErrorKind::FeatureUnsupported,
"Registering a table is not supported yet",
))
let db_name = validate_namespace(table.namespace())?;
let table_name = table.name();

let metadata = TableMetadata::read_from(&self.file_io, &metadata_location).await?;

let table_input = convert_to_glue_table(
table_name,
metadata_location.clone(),
&metadata,
metadata.properties(),
None,
)?;

let builder = self
.client
.0
.create_table()
.database_name(&db_name)
.table_input(table_input);
let builder = with_catalog_id!(builder, self.config);

let result = builder.send().await;

match result {
Ok(_) => {
self.load_table(table).await.map_err(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use Table::builder()....build() so we don't need to load the table here?

ref:

Error::new(
ErrorKind::Unexpected,
format!(
"Table {}.{} created but failed to load: {e}",
db_name, table_name
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use .with_source(e) to expose the cause

})
}
Err(err) => {
let service_err = err.as_service_error();

if service_err.map(|e| e.is_entity_not_found_exception()) == Some(true) {
Err(Error::new(
ErrorKind::NamespaceNotFound,
format!("Database {} does not exist", db_name),
))
} else if service_err.map(|e| e.is_already_exists_exception()) == Some(true) {
Err(Error::new(
ErrorKind::TableAlreadyExists,
format!("Table {}.{} already exists", db_name, table_name),
))
} else {
Err(from_aws_sdk_error(err))
}
}
Copy link
Contributor

@CTTY CTTY Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it seems that iceberg-rs doesn't handle aws error very explicitly, the existing way of exposing error is just attaching AWS SDK error as a source and return ErrorKind::Unexpected: https://github.com/apache/iceberg-rust/blob/2bc03c28268f15472353b828602bb5efd3bf9513/crates/catalog/glue/src/error.rs#L24-L23

ideally we want to handle every error listed here for CreateTable: https://docs.rs/aws-sdk-glue/latest/aws_sdk_glue/operation/create_table/enum.CreateTableError.html

the error handling in this PR looks good to me, and we should update the existing error handling in create_table. this should be completed with a follow-up

}
}

async fn update_table(&self, _commit: TableCommit) -> Result<Table> {
Expand Down
29 changes: 29 additions & 0 deletions crates/catalog/glue/tests/glue_catalog_test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -367,3 +367,32 @@ async fn test_list_namespace() -> Result<()> {

Ok(())
}

#[tokio::test]
async fn test_register_table() -> Result<()> {
let catalog = get_catalog().await;
let namespace = NamespaceIdent::new("test_register_table".into());
set_test_namespace(&catalog, &namespace).await?;

let creation = set_table_creation(Some("s3a://warehouse/hive/test_register_table".into()), "my_table")?;
let table = catalog.create_table(&namespace, creation).await?;
let metadata_location = table
.metadata_location()
.expect("Expected metadata location to be set")
.to_string();

catalog.drop_table(table.identifier()).await?;
let ident = TableIdent::new(namespace.clone(), "my_table".to_string());

let registered = catalog
.register_table(&ident, metadata_location.clone())
.await?;

assert_eq!(registered.identifier(), &ident);
assert_eq!(
registered.metadata_location().as_deref(),
Some(metadata_location.as_str())
);

Ok(())
}
Loading