Combining face search with context search #7388
Replies: 3 comments 5 replies
-
cc @mertalev Note that while we do to some degree support pgvector, we're primarily focused on pgvecto.rs for the search features. |
Beta Was this translation helpful? Give feedback.
-
There's a general issue with joining on one-to-many relations like with the TypeORM's take on this with take/skip also isn't compatible with setting a custom sort order, whereas we need to sort by vector similarity. It would have to be this ugly triple-nested query where we sort by vector similarity, then select for distinct assets on that, and in the outer-most query sort again by vector similarity. And even then it would duplicate the asset if the first row of a page has the same asset as the last row of the previous page. |
Beta Was this translation helpful? Give feedback.
-
Added with #7521 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The new update (v1.95) brought amazing improvements to the search feature, however currently it's not possible to input context and select faces at the same time.
Question about this has already been asked in the update discussion and answered with "technical limitation" (#7252 (reply in thread)).
I personally believe that this could be extremely useful to be able to select both the face and input the context so I dug into the code and I believe the technical limitation refers to the fact that face search is using embeddings to order the results of the query and the same goes for smart search (with context). Ordering by both is indeed a technical limitation but one that could be solved in a few ways.
Authors of pgvector actually mention hybrid search in the README and provide a recommendation to use RRF (Reciprocal Rank Fusion) as a solution to this problem together with an example written in python using a raw SQL query (https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search_rrf.py).
Looking at the query it looks like it would be impossible to implement exactly using the current ORM (typeorm), but I think a solution would be to run 2 queries and combine the results using this algorithm in the code but that poses some issues in relation to performance and pagination.
One solution would bo te get many results (more than one page) for both queries and then paginate in typescript, but that would be quite slow, another one would be to just combine pairs of pages and sort only those results, which would probably be less accurate (especially for small pages), but would be performant and should work.
I considered working on this feature but first I would love to get some feedback or a green light to work on this from more experienced maintainers of this project.
Beta Was this translation helpful? Give feedback.
All reactions