Skip to content

Can we make brute-force faster in the highly filtered case? #130927

@benwtrent

Description

@benwtrent

Description

@jimczi 's PR #130251 shows that we can make significant gains for flat indices when queried with a filter by not doing an exhaustive search of the scorers to build a bitset.

We periodically skip the vector index structure and brute-force if we know the number of vector ops to explore the index is likely higher than just scoring all the vectors referenced by a filter.

One idea is to use Weight#count(LeafReaderContext), and if it provides a count that isn't -1 and something < knnThreshold, don't build the bitset and simply iterate.

I am not sure we can use Scorer#iterator()#count(), as that can just be crazy inaccurate...but maybe its cheaper to check before building the bitset and such?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions