Skip to content

[META] OpenSearch Pluggable IndexingΒ #20876

@mgodwan

Description

@mgodwan

Please describe the end goal of this project

We are planning to introduce components related to enable pluggable indexing #20644, and multiple composable data-formats in OpenSearch. This issue is to track the work related to the feature.

Proposals for various workstreams

Abstractions in Core

  • Introduce Lucene agnostic Interfaces (Indexer, CatalogSnapshot, FieldCapability, etc.)
  • Introduce DataFormat Plugin and associated interfaces (DataFormat, IndexingExecutionEngine, etc.)
  • Dataformat Registry
  • Introduce Lucene agnostic Searcher/Reader interfaces
  • Indexer implementation for data formats along with store management
  • Add Merge Handler and Committer interfaces and orchestration logic
  • Add Catalog Snapshot and File Manager implementation
  • Add changes to support mappers to work with new interfaces

Composable Data Format Plugin

  • Multiplexing implementation
  • Add support for Row Id Management for coherence across data formats
  • Failure handling across data formats during indexing
  • Refresh orchestration

Commons for Data formats

  • Writer Pool Management to handle concurrency

Parquet Data Format Plugin

  • Introduce Arrow based implementation to buffer incoming documents
  • arrow-rs based Indexing and native writer management
  • JNI Bridge for leveraging arrow-rs
  • Field Handlers
  • Compaction in parquet

Lucene Data Format Plugin [TBD]

  • Field Handlers
  • IndexWriter integration to work with composable data formats

Remote Store Integration

  • Update Remote Store to use Catalog Snapshot
  • Introduce Format Awareness to manage multiple data formats and paths
  • Handle Snapshot, Restore, and recovery
  • Lucene agnostic data Integrity handling

Supporting References

#20644

Issues

#18416

Related component

Indexing

Metadata

Metadata

Assignees

No one assigned

    Labels

    MetaMeta issue, not directly linked to a PRluceneuntriaged

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions