-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Labels
Description
Please describe the end goal of this project
We are planning to introduce components related to enable pluggable indexing #20644, and multiple composable data-formats in OpenSearch. This issue is to track the work related to the feature.
Proposals for various workstreams
Abstractions in Core
- Introduce Lucene agnostic Interfaces (Indexer, CatalogSnapshot, FieldCapability, etc.)
- Introduce DataFormat Plugin and associated interfaces (DataFormat, IndexingExecutionEngine, etc.)
- Dataformat Registry
- Introduce Lucene agnostic Searcher/Reader interfaces
- Indexer implementation for data formats along with store management
- Add Merge Handler and Committer interfaces and orchestration logic
- Add Catalog Snapshot and File Manager implementation
- Add changes to support mappers to work with new interfaces
Composable Data Format Plugin
- Multiplexing implementation
- Add support for Row Id Management for coherence across data formats
- Failure handling across data formats during indexing
- Refresh orchestration
Commons for Data formats
- Writer Pool Management to handle concurrency
Parquet Data Format Plugin
- Introduce Arrow based implementation to buffer incoming documents
- arrow-rs based Indexing and native writer management
- JNI Bridge for leveraging arrow-rs
- Field Handlers
- Compaction in parquet
Lucene Data Format Plugin [TBD]
- Field Handlers
- IndexWriter integration to work with composable data formats
Remote Store Integration
- Update Remote Store to use Catalog Snapshot
- Introduce Format Awareness to manage multiple data formats and paths
- Handle Snapshot, Restore, and recovery
- Lucene agnostic data Integrity handling
Supporting References
Issues
Related component
Indexing
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
New