You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/tutorials/retriever.rst
+81Lines changed: 81 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -369,6 +369,87 @@ The printout is:
369
369
In default, the score is a simulated probabity in range ``[0, 1]`` using consine similarity. The higher the score, the more relevant the document is to the query.
370
370
You can check the retriever for more type of scores.
371
371
372
+
LanceDBRetriever
373
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374
+
To perform semantic search using LanceDB, we will use :class:`LanceDBRetriever<components.retriever.lancedb_retriever.LanceDBRetriever>`.
375
+
The `LanceDBRetriever` is designed for efficient vector-based retrieval with LanceDB, leveraging embeddings that can be either ``List[float]`` or ``np.ndarray``.
376
+
LanceDB supports in-memory and disk-based configurations and can handle large-scale data with high retrieval speed.
377
+
378
+
.. note ::
379
+
The ``lancedb`` package is optional. Ensure you have it installed in your environment to use LanceDBRetriever.
380
+
381
+
We will prepare the document embeddings using the `content` field.
output = embedder(input=[doc["content"] for doc in documents])
396
+
documents_embeddings = [x.embedding for x in output.data]
397
+
398
+
After initializing the LanceDB retriever, we can add documents and perform retrievals. The retriever can be set with its top-k hyperparameter during initialization.
399
+
400
+
.. code-block:: python
401
+
402
+
from adalflow.components.retriever import LanceDBRetriever
We can add documents to LanceDB and use the retriever for query-based searches.
420
+
421
+
.. code-block:: python
422
+
423
+
documents = [
424
+
{
425
+
"title": "The Impact of Renewable Energy on the Economy",
426
+
"content": "Renewable energy technologies not only help in reducing greenhouse gas emissions but also contribute significantly to the economy by creating jobs."
427
+
},
428
+
{
429
+
"title": "Understanding Solar Panels",
430
+
"content": "Solar panels convert sunlight into electricity by allowing photons, or light particles, to knock electrons free from atoms."
431
+
},
432
+
{
433
+
"title": "Pros and Cons of Solar Energy",
434
+
"content": "While solar energy offers substantial environmental benefits, such as reducing carbon footprints and pollution, it also has downsides."
435
+
},
436
+
{
437
+
"title": "Renewable Energy and Its Effects",
438
+
"content": "Renewable energy sources like wind, solar, and hydro power play a crucial role in combating climate change."
439
+
}
440
+
]
441
+
442
+
# Add documents to LanceDB
443
+
retriever.add_documents(documents)
444
+
445
+
# Perform retrieval queries
446
+
output_1 = retriever.retrieve(query="What are the benefits of renewable energy?")
447
+
output_2 = retriever.retrieve(query="How do solar panels impact the environment?")
448
+
print("Query 1 Results:", output_1)
449
+
print("Query 2 Results:", output_2)
450
+
451
+
This setup allows the `LanceDBRetriever` to function as an efficient tool for large-scale, embedding-based document retrieval within LanceDB.
452
+
372
453
BM25Retriever
373
454
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374
455
So the semantic search works pretty well. We will see how :class:`BM25Retriever<components.retriever.bm25_retriever.BM25Retriever>` works in comparison.
0 commit comments