Skip to content

[#9647]feat(catalog-lakehouse-generic): Add external Delta table support for generic lakehouse catalog#9678

Open
jerryshao wants to merge 3 commits intoapache:mainfrom
jerryshao:issue-9647
Open

[#9647]feat(catalog-lakehouse-generic): Add external Delta table support for generic lakehouse catalog#9678
jerryshao wants to merge 3 commits intoapache:mainfrom
jerryshao:issue-9647

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This commit implements support for external Delta Lake tables in the generic lakehouse catalog, allowing users to register and manage metadata for existing Delta tables.

Features:

  • External Delta table registration and metadata management
  • Schema stored from user CREATE TABLE request
  • Metadata-only DROP operation (preserves physical data)
  • Comprehensive validation and error messages
  • Integration with Delta Kernel 3.3.0 for table creation

Implementation:

  • DeltaConstants: Delta table format constant
  • DeltaTableDelegator: ServiceLoader integration for Delta format
  • DeltaTableOperations: Table lifecycle operations (242 lines)
    • CREATE: Requires external=true and location properties
    • LOAD: Retrieves metadata from entity store
    • DROP: Removes metadata only, preserves data
    • ALTER: Not supported (throws UnsupportedOperationException)
    • PURGE: Not supported for external tables

Testing:

  • 4 unit tests in TestDeltaTableOperations
  • 7 integration tests in CatalogGenericCatalogDeltaIT
    • Physical Delta table creation with Delta Kernel
    • Registration and metadata operations
    • Validation of external-only constraint
    • Verification of metadata-only drop behavior

Documentation:

  • lakehouse-generic-delta-table.md: Complete user guide
    • Table operations and examples
    • Data type mappings
    • Troubleshooting and best practices
    • Integration with Spark and Delta Lake APIs

Dependencies:

  • Delta Kernel API 3.3.0 and defaults
  • Hadoop 3.x for Configuration

Limitations:

  • External tables only (managed tables require Delta 4.0 CommitCoordinator)
  • ALTER not supported (use Delta Lake APIs directly)
  • Schema validation not enforced (user responsibility)
  • Partitioning informational only (managed by Delta log)

Why are the changes needed?

Fix: #9647

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT and IT added.

…catalog

This commit implements support for external Delta Lake tables in the
generic lakehouse catalog, allowing users to register and manage
metadata for existing Delta tables.

Features:
- External Delta table registration and metadata management
- Schema stored from user CREATE TABLE request
- Metadata-only DROP operation (preserves physical data)
- Comprehensive validation and error messages
- Integration with Delta Kernel 3.3.0 for table creation

Implementation:
- DeltaConstants: Delta table format constant
- DeltaTableDelegator: ServiceLoader integration for Delta format
- DeltaTableOperations: Table lifecycle operations (242 lines)
  * CREATE: Requires external=true and location properties
  * LOAD: Retrieves metadata from entity store
  * DROP: Removes metadata only, preserves data
  * ALTER: Not supported (throws UnsupportedOperationException)
  * PURGE: Not supported for external tables

Testing:
- 4 unit tests in TestDeltaTableOperations
- 7 integration tests in CatalogGenericCatalogDeltaIT
  * Physical Delta table creation with Delta Kernel
  * Registration and metadata operations
  * Validation of external-only constraint
  * Verification of metadata-only drop behavior

Documentation:
- lakehouse-generic-delta-table.md: Complete user guide
  * Table operations and examples
  * Data type mappings
  * Troubleshooting and best practices
  * Integration with Spark and Delta Lake APIs

Dependencies:
- Delta Kernel API 3.3.0 and defaults
- Hadoop 3.x for Configuration

Limitations:
- External tables only (managed tables require Delta 4.0 CommitCoordinator)
- ALTER not supported (use Delta Lake APIs directly)
- Schema validation not enforced (user responsibility)
- Partitioning informational only (managed by Delta log)

All tests passing (11/11):
- Unit tests: 4/4 ✅
- Integration tests: 7/7 ✅
@roryqi
Copy link
Contributor

roryqi commented Jan 12, 2026

Do our target users use DeltaLake 3.3?

@jerryshao
Copy link
Contributor Author

Do our target users use DeltaLake 3.3?

We don't bind to a specific version of Delta, choosing 3.3 is only for testing purposes.

+ " this field.");
}

if (distribution != null && !distribution.equals(Distributions.NONE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concern about !distribution.equals(Distributions.NONE). Maybe table has the distribution, it will be confused if we specify the distribution is none.

@Override
public Table createTable(
NameIdentifier ident,
Column[] columns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the columns be different from the underlying location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible. But since the query engine only honors the location to read the table, so this inconsistency only affects the display.

@jerryshao jerryshao self-assigned this Feb 2, 2026
…ble creation

Change Delta table operations to throw IllegalArgumentException instead
of logging warnings when unsupported parameters (partitions, distribution,
sort orders, indexes) are specified during CREATE TABLE. This provides
clearer, more immediate feedback to users.

Changes:
- DeltaTableOperations: Replace LOG.warn() with Preconditions.checkArgument()
  for partitions, distribution, sort orders, and indexes validation
- Remove unused Logger field and imports
- Update JavaDoc from 'Ignored Parameters' to 'Disallowed Parameters'
- Add 4 new unit tests in TestDeltaTableOperations to verify exceptions
- Replace integration test testCreateDeltaTableWithIgnoredParameters with
  4 specific exception tests in CatalogGenericCatalogDeltaIT

All tests passing:
- Unit tests: 8/8 ✅
- Integration tests: 11/11 ✅

This change improves user experience by failing fast with clear error
messages instead of silently ignoring parameters that won't take effect.
"The root directory of the generic table.",
"The directory of the table. For managed table, if this is not specified"
+ " in the table property, it will use the one in catalog / schema level and "
+ "concatenate with the table name. For external table, this property is"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to contradict the configuration for the Lance table. An external Lance table can still use the concat policy and does not have such a limitation. Do you need to change Lance along the way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Lance REST support create/register table without location provided?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we can set the properties location in catalogs or schemas, then if location is not set in table properties, we can still refer to the final location for the table.

Currently, all table register/created by Lance REST are external tables, and the table property location is optional as long as we set the property on its parents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Support managing external table in generic catalog

3 participants