Skip to content

feat: add register table for catalogs #1550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Elbehery
Copy link

  1. PositionDeleteFileWriter (crates/iceberg/src/writer/base_writer/position_delete_writer.rs)
  • Schema Implementation: Fixed schema with file_path (String) and pos (Long) columns
  • Field IDs: Correct Iceberg spec field IDs (2147483546, 2147483545)
  • Writer Integration: Full IcebergWriter trait implementation
  • Test Coverage: Unit tests for schema validation, batch creation, and writer functionality
  • Module Integration: Properly exposed in mod.rs
  1. Equality Delete Parsing (crates/iceberg/src/arrow/caching_delete_file_loader.rs)
  • RecordBatch Processing: Converts Arrow batches to Iceberg predicates
  • Multi-type Support: Handles all Arrow data types (String, Int, Float, Date, etc.)
  • Logical Operations: Constructs proper AND/OR predicate trees
  • Error Handling: Robust handling of missing field IDs and null values
  1. LZ4 Compression Support (crates/iceberg/src/puffin/)
  • Dependency Added: lz4_flex = "0.11" in Cargo.toml
  • Compression Logic: lz4_flex::compress_prepend_size() implementation
  • Decompression Logic: lz4_flex::decompress_size_prepended() implementation
  • Test Updates: All LZ4 tests now expect success instead of FeatureUnsupported
  1. Delete Manifest Processing (crates/iceberg/src/transaction/snapshot.rs)
  • SnapshotProducer Extensions: Added added_delete_files field
  • Validation Logic: add_delete_files() with content type validation
  • Manifest Generation: write_delete_manifest() for delete entries
  • Summary Integration: Delete files included in snapshot summaries
  1. Partition Support (crates/iceberg/src/writer/file_writer/parquet_writer.rs)
  • Hive-style Parsing: Extracts col=value patterns from file paths
  • Type Conversion: Converts string values to appropriate Literal types
  • Base64 Support: Handles binary partition values
  • Integration: Fully integrated into parquet_files_to_data_files()
  1. NaN Value Counts (crates/iceberg/src/writer/file_writer/parquet_writer.rs)
  • Statistics Extraction: Reads Parquet row group statistics
  • NaN Detection: Uses NanValueCountVisitor for float/double columns
  • Async Implementation: Proper async file I/O handling
  • Integration: Passed to DataFileBuilder construction
  1. Deletion Vector Support (crates/iceberg/src/arrow/caching_delete_file_loader.rs)
  • Puffin Detection: is_puffin_file() helper function
  • Blob Parsing: parse_delete_vector_blob() with RoaringTreemap
  • Index Integration: Added to PopulatedDeleteFileIndex
  • Context Enum: New DelVecs variant in DeleteFileContext
  1. Case-Insensitive Predicate Binding (crates/iceberg/src/scan/)
  • Context Propagation: Added case_sensitive to FileScanTask, ManifestEntryContext, ManifestFileContext
  • Filter Integration: Used in build_equality_delete_predicate()
  • End-to-End Flow: Properly passed through scan planning pipeline

Fixes #1518

@Elbehery Elbehery force-pushed the 20250724_register_tbl branch from a32a7e7 to c457dfe Compare July 24, 2025 11:00
@liurenjie1024
Copy link
Contributor

Hi, @Elbehery Is this pr really for resigtering table for sql catalog? I see most changes are unrelated.

@Elbehery
Copy link
Author

Hi, @Elbehery Is this pr really for resigtering table for sql catalog? I see most changes are unrelated.

oh deep apologies, would you kindly review or correct my path ?

i tried as much as i can, i am very new to Icerberg :/

@liurenjie1024
Copy link
Contributor

These two prs are implementing register_table for memory and rest catalog, you could take them as a reference:
#1549
#1521

@Elbehery
Copy link
Author

These two prs are implementing register_table for memory and rest catalog, you could take them as a reference: #1549 #1521

thanks so much 🙏🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement register_table for sql catalog
2 participants