-
Notifications
You must be signed in to change notification settings - Fork 123
feat: Add CreateTable API with typestate pattern #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a5081f9 to
a6f5715
Compare
This commit introduces a complete CreateTable API for Delta Kernel Rust,
enabling programmatic creation of Delta tables with proper metadata tracking
and validation.
The API uses a typestate pattern to enforce correct usage at compile time:
1. **CreateTableBuilder** (configuration stage)
- Configure table properties, schema, partition columns
- Builder methods: with_table_properties(), with_partition_columns()
- Validates configuration before transitioning
2. **CreateTableTransaction** (commit stage)
- Commits Protocol, Metadata, and CommitInfo actions
- Returns Transaction for future data operations
- Enforces that metadata is committed before data operations
This pattern is extensible for future table features:
- Add builder methods for clustering, constraints, invariants
- Add table feature flags to Protocol action
- Extend validation in build() method
- Maintain type safety throughout
Example extensibility:
```rust
CreateTableBuilder::new(path, schema, engine_info)
.with_table_properties(props)
.with_partition_columns(vec!["date", "region"]) // ✓ Supported
.with_clustering(vec!["user_id"]) // Future
.with_check_constraints(constraints) // Future
.build(engine)?
```
- test_create_simple_table: Basic table creation with events schema
- test_create_table_with_properties: Financial table with Delta properties
- test_create_table_already_exists: Duplicate creation prevention
- test_create_table_multiple_properties: Builder pattern chaining
**Coverage**: Verifies tables load correctly and schemas are preserved
- test_commit_info_is_written_to_log: Validates all 3 actions present
* Parses actual JSON log file
* Verifies Protocol (minReaderVersion=3, minWriterVersion=7)
* Verifies Metadata (id, schemaString, createdTime)
* Verifies CommitInfo (timestamp, operation, engineInfo, txnId, kernelVersion)
- test_log_action_order: Ensures correct action ordering
**Coverage**: Guarantees Delta protocol compliance and complete audit trail
- test_create_table_empty_schema: Schema validation
- Realistic schemas: DECIMAL types, TIMESTAMP, DATE, BOOLEAN
**Coverage**: Prevents invalid table configurations
```rust
use delta_kernel::table_manager::TableManager;
use delta_kernel::schema::{StructType, StructField, DataType};
use delta_kernel::committer::FileSystemCommitter;
use std::sync::Arc;
use std::collections::HashMap;
// Define schema
let schema = Arc::new(StructType::try_new(vec![
StructField::new("user_id", DataType::LONG, false),
StructField::new("event_type", DataType::STRING, false),
StructField::new("timestamp", DataType::TIMESTAMP, false),
])?);
// Configure table properties
let mut properties = HashMap::new();
properties.insert("delta.appendOnly".to_string(), "true".to_string());
properties.insert("delta.checkpointInterval".to_string(), "10".to_string());
// Create table with typestate pattern
let create_txn = TableManager::create_table(
"/path/to/table",
schema,
"MyApplication/1.0"
)
.with_table_properties(properties)
.build(&engine)?;
// Commit metadata (creates _delta_log/00000000000000000000.json)
let txn = create_txn.commit_metadata(&engine, Box::new(FileSystemCommitter::new()))?;
// Future: Use returned transaction for data operations
// txn.add_files(data);
// txn.commit(&engine)?;
```
- kernel/src/actions/mod.rs: Protocol version constants
- kernel/src/transaction/create_table.rs: CommitInfo generation
- kernel/tests/create_table.rs: Comprehensive test suite (7 tests)
None - this is a new API addition.
- Add support for partition columns in builder
- Add clustering configuration
- Add check constraints and invariants
- Support column mapping modes
- Enable schema evolution from CreateTableTransaction
a6f5715 to
7b507b8
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1504 +/- ##
==========================================
- Coverage 84.67% 84.66% -0.01%
==========================================
Files 122 124 +2
Lines 32741 32873 +132
Branches 32741 32873 +132
==========================================
+ Hits 27722 27832 +110
- Misses 3674 3683 +9
- Partials 1345 1358 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
| pub fn create_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fly by comment:
- Kernel Java: We have TableManager as our one entry point into Kernel. That is: all Snapshot loading (which is used for reads and writes), all table creation, and all streaming goes through this API (it exposes APIs to other classes, like SnapshotBuiler, CreateTableTransactionBuilder, etc.)
- Kernel Rust: No TableManager, instead has APIs on underlying classes/traits/impls e.g. SnapshotBuilder
So I encourage: either fully adopting TableManager, or not adopting TableManager, and deleting this TableManager.create_table API
| path: impl AsRef<str>, | ||
| schema: SchemaRef, | ||
| engine_info: impl Into<String>, | ||
| ) -> CreateTableBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Kernel Rust have TransctionBuilders yet?
Though verbose, Kernel Java's CreateTableTransactionBuilder naming convention makes it clear that: this is a builder for a transaction (yields a Transaction) that should be used to create a table.
The callout being: my_create_table_instance_builder.build() might make a caller think that that has "built" the table -- wheras instead all it has done is "built" the Transaction instance
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
| pub fn commit_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the API flow we want.
For example: In general, I believe the two kernels have aligned on: I have a Transaction, I commit, and then I can get a PostCommitSnapshot (just a Snapshot). From there I can do whatever want! Perhaps use that postCommitSnapshot to do some maintence operations (publish, write CRC, write checkpoint). Perhaps use it to start a new transaction.
Yet this API hardcodes: We write 0000.json, and then load a Snapshot, and then return a Transaction.
This means I can't write 000.crc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the PR description:
Returns Transaction for future data operations
Enforces that metadata is committed before data operations
I don't think we want this?
CREATE TABLE AS SELECT -- we want to either compelte or fail -- if we have created the table ... but didn't insert the data actions ... and then the data actions txn fails ... then that SQL statement was not atomic? it is half-complete?
| let protocol = Protocol::try_new( | ||
| TABLE_FEATURES_MIN_READER_VERSION, | ||
| TABLE_FEATURES_MIN_WRITER_VERSION, | ||
| Some(Vec::<String>::new()), // readerFeatures (empty for now) | ||
| Some(Vec::<String>::new()), // writerFeatures (empty for now) | ||
| )?; | ||
|
|
||
| // Get current timestamp | ||
| let created_time = current_time_ms()?; | ||
|
|
||
| // Create Metadata action | ||
| let metadata = Metadata::try_new( | ||
| None, // name | ||
| None, // description | ||
| (*self.schema).clone(), | ||
| self.partition_columns, | ||
| created_time, | ||
| self.table_properties, | ||
| )?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing the tableProperties -> protocol parsing.
See, for example, Kernel Java :: https://github.com/delta-io/delta/blob/765a91688893b07c1e38835ee427b1eb9aa6357b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionMetadataFactory.java#L93
For example, what if my table properties has:
delta.feature.XXX = supported--> this should turn on support for table featureXXX-- right? // @zachschuermann and @OussamaSaoudi -- have you all aligned on this?delta.appendOnly = true--> this should addappendOnlyto the protocoldelta.enableInCommitTimestamps = true--> this should add ICT to the protocol, and also upgrade the writer version to 7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the PR description:
Add table feature flags to Protocol action
Can we add checks to at least throw? So we explicitly only support a very limited initial use case, and we can safely add more support later?
| ) -> DeltaResult<Transaction> { | ||
| // Create Protocol action with table features support | ||
| let protocol = Protocol::try_new( | ||
| TABLE_FEATURES_MIN_READER_VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this to be the default. Typically, if I just do a simpel CREATE, it creates the table with delta version (1, 2). This gives it the maximum possible allowed reader and writer set (of connectors).
Now, if some catalog has an opinion, perhaps it can inject some defaults. Kernel Java's Committer supports that. We can chat offline.
| partition_columns: Vec<String>, | ||
| } | ||
|
|
||
| impl CreateTableTransaction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to think about: How can CREATE and DDL (ALTER TABLE SET TBLPROPERTIES) share, or even DML (a MERGE that writes a new wider type and so we auto-enable type widening) --- how do all of these code paths share common schema, protocol, metadata, table property validation?
| /// // Future: txn.add_files(...); txn.commit(engine)?; | ||
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious (as we chatted offline): How will this eventually extend to support CTAS?
- Kernel Rust has its own transaction logic/impl/class today for data writes
- This is now doing create here (separately from data write logic)
Where will the logic go that does both? (I ask with a worry about duplication / consolidation of logic) 👍
feat: Add CreateTable API with typestate pattern
What changes are proposed in this pull request?
This PR introduces a complete
CreateTableAPI for Delta Kernel Rust, enabling programmatic creation of Delta tables with proper metadata tracking, validation, and compile-time safety guarantees.Motivation
Currently, Delta Kernel Rust lacks a public API for creating new Delta tables. Users need to:
This PR provides a high-level, type-safe API similar to the Java Kernel's
CreateTableTransactionBuilder, making it easy to create Delta tables programmatically while ensuring protocol compliance.Design: Typestate Pattern
The API uses a typestate pattern to enforce correct usage at compile time:
1.
CreateTableBuilder(Configuration Stage)pub struct CreateTableBuilder { /* ... */ }- Configure table properties, schema, partition columns
with_table_properties(),with_partition_columns()DeltaResult<CreateTableTransaction>2.
CreateTableTransaction(Commit Stage)pub struct CreateTableTransaction { /* ... */ }- Commits
Protocol,Metadata, andCommitInfoactionsTransactionfor future data operationsThis pattern is extensible for future table features:
build()methodChanges
New Files
kernel/src/table_manager.rs: Entry point with static factory methodTableManager::create_table()kernel/src/transaction/create_table.rs: Core implementation ofCreateTableBuilderandCreateTableTransactionkernel/tests/create_table.rs: Comprehensive test suite.Modified Files
kernel/src/actions/mod.rs: Added protocol version constantsTABLE_FEATURES_MIN_READER_VERSION = 3TABLE_FEATURES_MIN_WRITER_VERSION = 7kernel/src/lib.rs: Exported new API (TableManager,CreateTableBuilder,CreateTableTransaction)kernel/src/transaction/mod.rs: Addedcreate_tablemodule exportkernel/examples/write-table/src/main.rs: Updated to use newCreateTableAPIKey Implementation Details
operation: "CREATE TABLE",engineInfo,timestamp,txnId,kernelVersion)How was this change tested?
Comprehensive three-layer test coverage:
1. High-Level Integration Tests (Snapshot Validation)
test_create_simple_table: Basic table creation with events schematest_create_table_with_properties: Financial table with Delta propertiestest_create_table_already_exists: Duplicate creation preventiontest_create_table_multiple_properties: Builder pattern chainingCoverage: Verifies tables load correctly and schemas are preserved
2. Mid-Level Specification Tests (Log File Structure)
test_commit_info_is_written_to_log: Validates all 3 actions present00000000000000000000.json)Protocol(minReaderVersion=3,minWriterVersion=7)Metadata(id,schemaString,createdTime)CommitInfo(timestamp,operation,engineInfo,txnId,kernelVersion)test_log_action_order: Ensures correct action ordering (Protocol → Metadata → CommitInfo)Coverage: Guarantees Delta protocol compliance and complete audit trail
3. Low-Level Validation Tests (Constraint Enforcement)
test_create_table_empty_schema: Schema validationCoverage: Prevents invalid table configurations
Example Usage
This PR affects the following public APIs
Introduces new CreateTableTransaction and CreateTableBuilder APIs.
Future Work
CreateTableTransaction