Skip to content

Commit 396160a

Browse files
akashmangoaifm1320
authored andcommitted
lancedb support added
1 parent cacc74e commit 396160a

File tree

2 files changed

+86
-0
lines changed

2 files changed

+86
-0
lines changed

adalflow/adalflow/utils/lazy_import.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,11 @@ class OptionalPackages(Enum):
7474
"Please install faiss with: pip install faiss-cpu (or faiss if you use GPU)",
7575
)
7676

77+
LANCEDB = (
78+
"lancedb",
79+
"Please install lancedb with: pip install lancedb .",
80+
)
81+
7782
# db library
7883
SQLALCHEMY = (
7984
"sqlalchemy",

docs/source/tutorials/retriever.rst

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,87 @@ The printout is:
369369
In default, the score is a simulated probabity in range ``[0, 1]`` using consine similarity. The higher the score, the more relevant the document is to the query.
370370
You can check the retriever for more type of scores.
371371

372+
LanceDBRetriever
373+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374+
To perform semantic search using LanceDB, we will use :class:`LanceDBRetriever<components.retriever.lancedb_retriever.LanceDBRetriever>`.
375+
The `LanceDBRetriever` is designed for efficient vector-based retrieval with LanceDB, leveraging embeddings that can be either ``List[float]`` or ``np.ndarray``.
376+
LanceDB supports in-memory and disk-based configurations and can handle large-scale data with high retrieval speed.
377+
378+
.. note ::
379+
The ``lancedb`` package is optional. Ensure you have it installed in your environment to use LanceDBRetriever.
380+
381+
We will prepare the document embeddings using the `content` field.
382+
383+
.. code-block:: python
384+
385+
from adalflow.core.embedder import Embedder
386+
from adalflow.core.types import ModelClientType
387+
388+
model_kwargs = {
389+
"model": "text-embedding-3-small",
390+
"dimensions": 256,
391+
"encoding_format": "float",
392+
}
393+
394+
embedder = Embedder(model_client=ModelClientType.OPENAI(), model_kwargs=model_kwargs)
395+
output = embedder(input=[doc["content"] for doc in documents])
396+
documents_embeddings = [x.embedding for x in output.data]
397+
398+
After initializing the LanceDB retriever, we can add documents and perform retrievals. The retriever can be set with its top-k hyperparameter during initialization.
399+
400+
.. code-block:: python
401+
402+
from adalflow.components.retriever import LanceDBRetriever
403+
retriever = LanceDBRetriever(embedder=embedder, dimensions=256, db_uri="/tmp/lancedb", top_k=2)
404+
405+
print(retriever)
406+
407+
The printout:
408+
409+
.. code-block::
410+
411+
LanceDBRetriever(
412+
top_k=2, dimensions=256, total_documents=0
413+
(embedder): Embedder(
414+
model_kwargs={'model': 'text-embedding-3-small', 'dimensions': 256, 'encoding_format': 'float'},
415+
(model_client): OpenAIClient()
416+
)
417+
)
418+
419+
We can add documents to LanceDB and use the retriever for query-based searches.
420+
421+
.. code-block:: python
422+
423+
documents = [
424+
{
425+
"title": "The Impact of Renewable Energy on the Economy",
426+
"content": "Renewable energy technologies not only help in reducing greenhouse gas emissions but also contribute significantly to the economy by creating jobs."
427+
},
428+
{
429+
"title": "Understanding Solar Panels",
430+
"content": "Solar panels convert sunlight into electricity by allowing photons, or light particles, to knock electrons free from atoms."
431+
},
432+
{
433+
"title": "Pros and Cons of Solar Energy",
434+
"content": "While solar energy offers substantial environmental benefits, such as reducing carbon footprints and pollution, it also has downsides."
435+
},
436+
{
437+
"title": "Renewable Energy and Its Effects",
438+
"content": "Renewable energy sources like wind, solar, and hydro power play a crucial role in combating climate change."
439+
}
440+
]
441+
442+
# Add documents to LanceDB
443+
retriever.add_documents(documents)
444+
445+
# Perform retrieval queries
446+
output_1 = retriever.retrieve(query="What are the benefits of renewable energy?")
447+
output_2 = retriever.retrieve(query="How do solar panels impact the environment?")
448+
print("Query 1 Results:", output_1)
449+
print("Query 2 Results:", output_2)
450+
451+
This setup allows the `LanceDBRetriever` to function as an efficient tool for large-scale, embedding-based document retrieval within LanceDB.
452+
372453
BM25Retriever
373454
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374455
So the semantic search works pretty well. We will see how :class:`BM25Retriever<components.retriever.bm25_retriever.BM25Retriever>` works in comparison.

0 commit comments

Comments
 (0)