Skip to content

[Feature] Add Global Index for Bitmap and ANN search #5

@lxy-9602

Description

@lxy-9602

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

In modern analytical and AI-driven workloads, efficient data retrieval—especially for high-cardinality filters or similarity-based queries—is critical. Traditional file-level or partition-level metadata (e.g., min/max statistics) often fall short when queries involve selective predicates on non-partition columns or require nearest-neighbor lookups in vector spaces. This leads to excessive I/O, scanning irrelevant files, and poor end-to-end latency.

To address this, we introduce global indexing in Paimon—a unified, table-wide index structure that spans all data files across partitions and snapshots.

Solution

Proposed Solution
We propose building global indexes in Paimon using Row Tracking (which assigns each record a stable, globally unique Row ID) and Data Evolution (ensuring consecutive row ID without gaps).

We will support two index types:

  • Bitmap-based inverted indexes for fast scalar filtering (e.g.,type = X),
  • Vector indexes via our in-house vector engine Lumina for efficient DiskANN search.

Index construction and lookup are distributed, enabling analytical engines like Spark and StarRocks to skip irrelevant data and fetch only matching records—dramatically improving query performance.

Anything else?

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions