|
| 1 | +# How to Perform Hybrid Search with Multiple Vector Stores |
| 2 | + |
| 3 | +Ragbits comes with a special type of vector store called [`HybridSearchVectorStore`][ragbits.core.vector_stores.hybrid.HybridSearchVectorStore], which allows you to combine multiple vector stores into a single search index. It acts as a single vector store but internally manages querying and updating multiple vector stores during operations like storing, searching, and deleting entries. |
| 4 | + |
| 5 | +The main use cases for using a hybrid vector store are: |
| 6 | + |
| 7 | +* **Combining Different Modalities**: You can combine multiple vector stores that store different types of data, like text and images. This allows you to store multiple modality-specific vectors for the same entry (for example, an image embedding and a text embedding of a description of the image) and search them together. |
| 8 | +* **Combining Different Types of Embeddings**: You can combine multiple vector stores that store different types of embeddings, like dense and sparse embeddings. This allows you to store multiple embeddings for the same entry and search them simultaneously. |
| 9 | + |
| 10 | +!!! info |
| 11 | + <!-- TODO: Remove this once sparse embedding support in Vector Stores is implemented --> |
| 12 | + Sparse embeddings support in Vector Stores is an upcoming feature of Ragbits. The examples below will be updated to show how to use them with hybrid search once they are available. |
| 13 | + |
| 14 | +## Using a Hybrid Vector Store with Different Modalities |
| 15 | + |
| 16 | +To create a hybrid vector store, you need to pass a list of vector stores to the constructor of the [`HybridSearchVectorStore`][ragbits.core.vector_stores.hybrid.HybridSearchVectorStore] class. For example, this creates two in-memory vector stores—one for text and one for images: |
| 17 | + |
| 18 | +```python |
| 19 | +from ragbits.core.vector_stores.hybrid import HybridSearchVectorStore |
| 20 | +from ragbits.core.vector_stores.in_memory import InMemoryVectorStore |
| 21 | +from ragbits.core.embeddings.vertex_multimodal import VertexAIMultimodelEmbedder |
| 22 | + |
| 23 | +embedder = VertexAIMultimodelEmbedder() |
| 24 | + |
| 25 | +vector_store_text = InMemoryVectorStore(embedder=embedder, embedding_type=EmbeddingType.TEXT) |
| 26 | +vector_store_image = InMemoryVectorStore(embedder=embedder, embedding_type=EmbeddingType.IMAGE) |
| 27 | + |
| 28 | +vector_store_hybrid = HybridSearchVectorStore(vector_store_text, vector_store_image) |
| 29 | +``` |
| 30 | + |
| 31 | +You can then use the `vector_store_hybrid` object to store, search, and delete entries, just as you would use a regular vector store, or pass it to [Ragbits' Document Search](../document_search/ingest-documents.md). When you store an entry in the hybrid vector store, it will be stored in all the vector stores it contains. In this case, one will store the text embedding and the other will store the image embedding. |
| 32 | + |
| 33 | +## Using a Hybrid Vector Store with Different Types of Embeddings |
| 34 | + |
| 35 | +<!-- TODO: Change this example to dense and sparse embeddings once sparse embedding support in Vector Stores is implemented --> |
| 36 | +Similarly, you can create a hybrid vector store with different types of embeddings. For example, this creates two in-memory vector stores—one using an embedding model from OpenAI and one using an embedding model from Mistral: |
| 37 | + |
| 38 | +```python |
| 39 | +from ragbits.core.vector_stores.hybrid import HybridSearchVectorStore |
| 40 | +from ragbits.core.vector_stores.in_memory import InMemoryVectorStore |
| 41 | +from ragbits.core.embeddings.litellm import LiteLLMEmbedder |
| 42 | + |
| 43 | +vector_store_openai = InMemoryVectorStore(embedder=LiteLLMEmbedder(model="text-embedding-ada-002")) |
| 44 | +vector_store_mistral = InMemoryVectorStore(embedder=LiteLLMEmbedder(model="mistral/mistral-embed")) |
| 45 | + |
| 46 | +vector_store_hybrid = HybridSearchVectorStore(vector_store_openai, vector_store_mistral) |
| 47 | +``` |
| 48 | + |
| 49 | +You can then use the `vector_store_hybrid` object to store, search, and delete entries, just as you would use a regular vector store, or pass it to [Ragbits' Document Search](../document_search/ingest-documents.md). When you store an entry in the hybrid vector store, it will be stored in all the vector stores it contains. In this case, one will store the embedding using the OpenAI model and the other will store the embedding using the Mistral model. |
| 50 | + |
| 51 | +Note that you can pass an arbitrary number of vector stores to the `HybridSearchVectorStore` constructor, and they can be of any type as long as they implement the `VectorStore` interface. For example, this combines three vector stores—one Chroma vector store, one Qdrant vector store, and one PgVector vector store: |
| 52 | + |
| 53 | +```python |
| 54 | +import asyncpg |
| 55 | +from chromadb import EphemeralClient |
| 56 | +from qdrant_client import AsyncQdrantClient |
| 57 | + |
| 58 | +from ragbits.core.vector_stores.hybrid import HybridSearchVectorStore |
| 59 | +from ragbits.core.vector_stores.chroma import ChromaVectorStore |
| 60 | +from ragbits.core.vector_stores.qdrant import QdrantVectorStore |
| 61 | +from ragbits.core.vector_stores.pgvector import PgVectorStore |
| 62 | +from ragbits.core.embeddings.litellm import LiteLLMEmbedder |
| 63 | + |
| 64 | +postgres_pool = await asyncpg.create_pool("postgresql://user:password@localhost/db") |
| 65 | + |
| 66 | +vector_store_hybrid = HybridSearchVectorStore( |
| 67 | + ChromaVectorStore( |
| 68 | + client=EphemeralClient(), |
| 69 | + index_name="chroma_example", |
| 70 | + embedder=LiteLLMEmbedder(), |
| 71 | + ), |
| 72 | + QdrantVectorStore( |
| 73 | + client=AsyncQdrantClient(location=":memory:"), |
| 74 | + index_name="qdrant_example", |
| 75 | + embedder=LiteLLMEmbedder(), |
| 76 | + ), |
| 77 | + PgVectorStore( |
| 78 | + client=pool, |
| 79 | + table_name="postgres_example", |
| 80 | + vector_size=1536, |
| 81 | + embedder=LiteLLMEmbedder(), |
| 82 | + ), |
| 83 | +) |
| 84 | + |
| 85 | +# The entry will be stored in all three vector stores |
| 86 | +await vector_store_hybrid.store([VectorStoreEntry(id=uuid.uuid4(), text="Example entry")]) |
| 87 | +``` |
| 88 | + |
| 89 | +## Specifying the Retrieval Strategy for a Hybrid Vector Store |
| 90 | + |
| 91 | +When you search a hybrid vector store, you can specify a retrieval strategy to determine how the results from the different vector stores are combined. Ragbits comes with the following retrieval strategies: |
| 92 | + |
| 93 | +* [`OrderedHybridRetrivalStrategy`][ragbits.core.vector_stores.hybrid_strategies.OrderedHybridRetrivalStrategy]: This strategy returns the results from the vector stores ordered by their score. If the same entry is found in multiple vector stores, either the highest score is used or if the `sum_scores` parameter is set to `True`, the scores are summed. This is the default strategy. |
| 94 | +* [`ReciprocalRankFusion`][ragbits.core.vector_stores.hybrid_strategies.ReciprocalRankFusion]: This strategy combines the results from the vector stores using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm, which prioritizes entries that appear at the top of the results from individual vector stores. If the same entry is found in multiple vector stores, the scores are summed by default, or if the `sum_scores` parameter is set to `False`, the highest score is used. |
| 95 | +* [`DistributionBasedScoreFusion`][ragbits.core.vector_stores.hybrid_strategies.DistributionBasedScoreFusion]: This strategy combines the results from the vector stores using the [Distribution-Based Score Fusion](https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18) algorithm, which normalizes the scores from the individual vector stores so they can be compared and combined sensibly. If the same entry is found in multiple vector stores, either the highest score is used or if the `sum_scores` parameter is set to `True`, the scores are summed. |
| 96 | + |
| 97 | +Note that summing the scores from individual stores boosts the entries found in multiple stores. This can be useful when searching through multiple types of embeddings but may not be desirable when searching through multiple modalities since entries containing both text and image embeddings would have an advantage over those containing only one. |
| 98 | + |
| 99 | +To specify a retrieval strategy when searching a hybrid vector store, you can pass it as the `retrieval_strategy` parameter to the constructor of the [`HybridSearchVectorStore`][ragbits.core.vector_stores.hybrid.HybridSearchVectorStore] class. For example, this creates a hybrid vector store with the `DistributionBasedScoreFusion` retrieval strategy: |
| 100 | + |
| 101 | +```python |
| 102 | +from ragbits.core.vector_stores.hybrid import HybridSearchVectorStore |
| 103 | +from ragbits.core.vector_stores.in_memory import InMemoryVectorStore |
| 104 | +from ragbits.core.vector_stores.hybrid_strategies import DistributionBasedScoreFusion |
| 105 | +from ragbits.core.embeddings.litellm import LiteLLMEmbedder |
| 106 | + |
| 107 | +embedder = LiteLLMEmbedder() |
| 108 | + |
| 109 | +vector_store_text = InMemoryVectorStore(embedder=embedder, embedding_type=EmbeddingType.TEXT) |
| 110 | +vector_store_image = InMemoryVectorStore(embedder=embedder, embedding_type=EmbeddingType.IMAGE) |
| 111 | + |
| 112 | +vector_store_hybrid = HybridSearchVectorStore( |
| 113 | + vector_store_text, |
| 114 | + vector_store_image, |
| 115 | + retrieval_strategy=DistributionBasedScoreFusion(), |
| 116 | +) |
| 117 | +``` |
0 commit comments