[#9647]feat(catalog-lakehouse-generic): Add external Delta table support for generic lakehouse catalog#9678
[#9647]feat(catalog-lakehouse-generic): Add external Delta table support for generic lakehouse catalog#9678jerryshao wants to merge 3 commits intoapache:mainfrom
Conversation
…catalog This commit implements support for external Delta Lake tables in the generic lakehouse catalog, allowing users to register and manage metadata for existing Delta tables. Features: - External Delta table registration and metadata management - Schema stored from user CREATE TABLE request - Metadata-only DROP operation (preserves physical data) - Comprehensive validation and error messages - Integration with Delta Kernel 3.3.0 for table creation Implementation: - DeltaConstants: Delta table format constant - DeltaTableDelegator: ServiceLoader integration for Delta format - DeltaTableOperations: Table lifecycle operations (242 lines) * CREATE: Requires external=true and location properties * LOAD: Retrieves metadata from entity store * DROP: Removes metadata only, preserves data * ALTER: Not supported (throws UnsupportedOperationException) * PURGE: Not supported for external tables Testing: - 4 unit tests in TestDeltaTableOperations - 7 integration tests in CatalogGenericCatalogDeltaIT * Physical Delta table creation with Delta Kernel * Registration and metadata operations * Validation of external-only constraint * Verification of metadata-only drop behavior Documentation: - lakehouse-generic-delta-table.md: Complete user guide * Table operations and examples * Data type mappings * Troubleshooting and best practices * Integration with Spark and Delta Lake APIs Dependencies: - Delta Kernel API 3.3.0 and defaults - Hadoop 3.x for Configuration Limitations: - External tables only (managed tables require Delta 4.0 CommitCoordinator) - ALTER not supported (use Delta Lake APIs directly) - Schema validation not enforced (user responsibility) - Partitioning informational only (managed by Delta log) All tests passing (11/11): - Unit tests: 4/4 ✅ - Integration tests: 7/7 ✅
|
Do our target users use DeltaLake 3.3? |
We don't bind to a specific version of Delta, choosing 3.3 is only for testing purposes. |
| + " this field."); | ||
| } | ||
|
|
||
| if (distribution != null && !distribution.equals(Distributions.NONE)) { |
There was a problem hiding this comment.
I have concern about !distribution.equals(Distributions.NONE). Maybe table has the distribution, it will be confused if we specify the distribution is none.
| @Override | ||
| public Table createTable( | ||
| NameIdentifier ident, | ||
| Column[] columns, |
There was a problem hiding this comment.
Will the columns be different from the underlying location?
There was a problem hiding this comment.
Yes, it is possible. But since the query engine only honors the location to read the table, so this inconsistency only affects the display.
…ble creation Change Delta table operations to throw IllegalArgumentException instead of logging warnings when unsupported parameters (partitions, distribution, sort orders, indexes) are specified during CREATE TABLE. This provides clearer, more immediate feedback to users. Changes: - DeltaTableOperations: Replace LOG.warn() with Preconditions.checkArgument() for partitions, distribution, sort orders, and indexes validation - Remove unused Logger field and imports - Update JavaDoc from 'Ignored Parameters' to 'Disallowed Parameters' - Add 4 new unit tests in TestDeltaTableOperations to verify exceptions - Replace integration test testCreateDeltaTableWithIgnoredParameters with 4 specific exception tests in CatalogGenericCatalogDeltaIT All tests passing: - Unit tests: 8/8 ✅ - Integration tests: 11/11 ✅ This change improves user experience by failing fast with clear error messages instead of silently ignoring parameters that won't take effect.
...generic/src/main/java/org/apache/gravitino/catalog/lakehouse/delta/DeltaTableOperations.java
Outdated
Show resolved
Hide resolved
| "The root directory of the generic table.", | ||
| "The directory of the table. For managed table, if this is not specified" | ||
| + " in the table property, it will use the one in catalog / schema level and " | ||
| + "concatenate with the table name. For external table, this property is" |
There was a problem hiding this comment.
This seems to contradict the configuration for the Lance table. An external Lance table can still use the concat policy and does not have such a limitation. Do you need to change Lance along the way?
There was a problem hiding this comment.
Does Lance REST support create/register table without location provided?
There was a problem hiding this comment.
I mean, we can set the properties location in catalogs or schemas, then if location is not set in table properties, we can still refer to the final location for the table.
Currently, all table register/created by Lance REST are external tables, and the table property location is optional as long as we set the property on its parents.
What changes were proposed in this pull request?
This commit implements support for external Delta Lake tables in the generic lakehouse catalog, allowing users to register and manage metadata for existing Delta tables.
Features:
Implementation:
Testing:
Documentation:
Dependencies:
Limitations:
Why are the changes needed?
Fix: #9647
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UT and IT added.