Skip to content

[Feature] Introduce BTree global index #6834

@steFaiz

Description

@steFaiz

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

At present, for scalar indexes we mainly support the Bitmap index. Many thanks to @leaves12138 for the implementation, which has established a solid framework for scalar indexing. However, a conventional Bitmap index has significant limitations and does not work well for scenarios with high data cardinality, such as int, double, and string types.

A global B-Tree index can be built on any comparable data type, providing efficient point lookups and range queries. In addition, it is more amenable to distributed parallel processing during updates and reads. Therefore, this issue proposes implementing a distributed global B-Tree index.

Solution

The basic implementation will be built on the SST FileFormat introduced in #6734, providing point lookup and range query capabilities to cover most common SQL filter predicates.

index construction

Index construction can be efficiently implemented via range shuffle in Flink or Spark: different writer tasks are responsible for writing index files for their assigned key ranges, and a commit task then performs a unified commit of all generated files. As below:

Image

index query

The metadata of B-Tree index files will record information such as the file’s min key, max key, and whether it contains nulls (hasNulls). During index planning, we can use this metadata to prune candidate files, then query the remaining files in parallel and merge the results.

We will bring more details in future PRs.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions