Skip to content

Commit dc1d8d2

Browse files
authored
[Docs] - Update Colbert doc to reflect notebook updates (#567)
* retriever * missed-dotenv * set-prerequisites * testing-ok * package
1 parent 56d9e17 commit dc1d8d2

File tree

3 files changed

+75
-40
lines changed

3 files changed

+75
-40
lines changed

docs/modules/colbert/pages/index.adoc

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,17 @@ To get started using ColBERT with RAGStack and Astra DB, see the xref:examples:c
2121

2222
The `colbert` module provides a vanilla implementation for ColBERT retrieval. It is not tied to any specific framework and can be used with any of the RAGStack packages.
2323

24+
To install the `ragstack-ai-colbert` package:
25+
+
26+
[source,python]
27+
----
28+
pip install ragstack-ai-colbert
29+
----
30+
2431
To use ColBERT with LangChain or LLamaIndex, install ColBERT as an extra:
2532

26-
* `ragstack-ai-langchain[colbert]`
27-
* `ragstack-ai-llamaindex[colbert]`
33+
* `pip install "ragstack-ai-langchain[colbert]"`
34+
* `pip install "ragstack-ai-llamaindex[colbert]"`
2835

2936
== How is ColBERT different from RAG?
3037

docs/modules/examples/pages/colbert.adoc

Lines changed: 65 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -15,48 +15,55 @@ Use ColBERT, Astra DB, and RAGStack to:
1515

1616
For more information, see xref:colbert:index.adoc[].
1717

18-
== Prerequisites
19-
2018
[NOTE]
2119
====
2220
To run ragstack-ai-colbert in a Windows environment, use https://learn.microsoft.com/en-us/windows/wsl/install[Windows Subsystem for Linux].
2321
====
2422

25-
Import the ragstack-ai-colbert package:
23+
== Prerequisites
24+
25+
. Install dependencies:
26+
+
2627
[source,python]
2728
----
28-
pip install ragstack-ai-colbert
29+
pip install ragstack-ai-colbert python-dotenv
30+
----
31+
+
32+
. Create a `.env` file in your application directory with the following environment variables:
33+
+
34+
[source,bash]
35+
----
36+
ASTRA_DB_APPLICATION_TOKEN=AstraCS: ...
37+
ASTRA_DB_ID=2eab82dc-9032-45ba-aeb0-a481b6f9458d
2938
----
39+
+
40+
[NOTE]
41+
====
42+
In an Astra API endpoint like `https://2eab82dc-9032-45ba-aeb0-a481b6f9458d-us-east-1.apps.astra.datastax.com`, the `ASTRA_DB_ID` is `2eab82dc-9032-45ba-aeb0-a481b6f9458d`.
43+
====
3044

3145
== Prepare data and create embeddings
3246

33-
. Prepare documents for chunking.
47+
. Import dependencies and load environment variables.
3448
+
3549
[source,python]
3650
----
37-
arctic_botany_dict = {
38-
"Introduction to Arctic Botany": "Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.",
39-
"Arctic Plant Adaptations": "Plants in the Arctic have developed unique adaptations to endure the extreme climate. Perennial growth, antifreeze proteins, and a short growth cycle are among the evolutionary solutions. These adaptations not only allow the plants to survive but also to reproduce in short summer months. Arctic plants often have small, dark leaves to absorb maximum sunlight, and some species grow in cushion or mat forms to resist cold winds. Understanding these adaptations provides insights into the resilience of Arctic flora.",
40-
"The Tundra Biome": "The Arctic tundra is a vast, treeless biome where the subsoil is permanently frozen. Here, the vegetation is predominantly composed of dwarf shrubs, grasses, mosses, and lichens. The tundra supports a surprisingly rich biodiversity, adapted to its cold, dry, and windy conditions. The biome plays a crucial role in the Earth's climate system, acting as a carbon sink. However, it's sensitive to climate change, with thawing permafrost and shifting vegetation patterns.",
41-
"Arctic Plant Biodiversity": "Despite the challenging environment, the Arctic boasts a significant variety of plant species, each adapted to its niche. From the colorful blooms of Arctic poppies to the hardy dwarf willows, these plants form a complex ecosystem. The biodiversity of Arctic flora is vital for local wildlife, providing food and habitat. This diversity also has implications for Arctic peoples, who depend on certain plant species for food, medicine, and materials.",
42-
"Climate Change and Arctic Flora": "Climate change poses a significant threat to Arctic botany, with rising temperatures, melting permafrost, and changing precipitation patterns. These changes can lead to shifts in plant distribution, phenology, and the composition of the Arctic flora. Some species may thrive, while others could face extinction. This dynamic is critical to understanding future Arctic ecosystems and their global impact, including feedback loops that may exacerbate global warming.",
43-
"Research and Conservation in the Arctic": "Research in Arctic botany is crucial for understanding the intricate balance of this ecosystem and the impacts of climate change. Scientists conduct studies on plant physiology, genetics, and ecosystem dynamics. Conservation efforts are focused on protecting the Arctic's unique biodiversity through protected areas, sustainable management practices, and international cooperation. These efforts aim to preserve the Arctic flora for future generations and maintain its role in the global climate system.",
44-
"Traditional Knowledge and Arctic Botany": "Indigenous peoples of the Arctic have a deep connection with the land and its plant life. Traditional knowledge, passed down through generations, includes the uses of plants for nutrition, healing, and materials. This body of knowledge is invaluable for both conservation and understanding the ecological relationships in Arctic ecosystems. Integrating traditional knowledge with scientific research enriches our comprehension of Arctic botany and enhances conservation strategies.",
45-
"Future Directions in Arctic Botanical Studies": "The future of Arctic botany lies in interdisciplinary research, combining traditional knowledge with modern scientific techniques. As the Arctic undergoes rapid changes, understanding the ecological, cultural, and climatic dimensions of Arctic flora becomes increasingly important. Future research will need to address the challenges of climate change, explore the potential for Arctic plants in biotechnology, and continue to conserve this unique biome. The resilience of Arctic flora offers lessons in adaptation and survival relevant to global challenges."
46-
}
47-
arctic_botany_texts = list(arctic_botany_dict.values())
51+
import os
52+
import logging
53+
import nest_asyncio
54+
from dotenv import load_dotenv
55+
from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore
56+
57+
load_dotenv()
4858
----
4959
+
5060
. Set up the ColBERT and Astra configurations.
5161
+
5262
[source,python]
5363
----
54-
from getpass import getpass
55-
from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore
56-
5764
keyspace="default_keyspace"
58-
database_id=getpass("Enter your Astra Database Id:")
59-
astra_token=getpass("Enter your Astra Token:")
65+
database_id=os.getenv("ASTRA_DB_ID")
66+
astra_token=os.getenv("ASTRA_DB_APPLICATION_TOKEN")
6067
6168
database = CassandraDatabase.from_astra(
6269
astra_token=astra_token,
@@ -72,6 +79,24 @@ vector_store = ColbertVectorStore(
7279
)
7380
----
7481
+
82+
. Prepare documents for chunking.
83+
+
84+
[source,python]
85+
----
86+
arctic_botany_dict = {
87+
"Introduction to Arctic Botany": "Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.",
88+
"Arctic Plant Adaptations": "Plants in the Arctic have developed unique adaptations to endure the extreme climate. Perennial growth, antifreeze proteins, and a short growth cycle are among the evolutionary solutions. These adaptations not only allow the plants to survive but also to reproduce in short summer months. Arctic plants often have small, dark leaves to absorb maximum sunlight, and some species grow in cushion or mat forms to resist cold winds. Understanding these adaptations provides insights into the resilience of Arctic flora.",
89+
"The Tundra Biome": "The Arctic tundra is a vast, treeless biome where the subsoil is permanently frozen. Here, the vegetation is predominantly composed of dwarf shrubs, grasses, mosses, and lichens. The tundra supports a surprisingly rich biodiversity, adapted to its cold, dry, and windy conditions. The biome plays a crucial role in the Earth's climate system, acting as a carbon sink. However, it's sensitive to climate change, with thawing permafrost and shifting vegetation patterns.",
90+
"Arctic Plant Biodiversity": "Despite the challenging environment, the Arctic boasts a significant variety of plant species, each adapted to its niche. From the colorful blooms of Arctic poppies to the hardy dwarf willows, these plants form a complex ecosystem. The biodiversity of Arctic flora is vital for local wildlife, providing food and habitat. This diversity also has implications for Arctic peoples, who depend on certain plant species for food, medicine, and materials.",
91+
"Climate Change and Arctic Flora": "Climate change poses a significant threat to Arctic botany, with rising temperatures, melting permafrost, and changing precipitation patterns. These changes can lead to shifts in plant distribution, phenology, and the composition of the Arctic flora. Some species may thrive, while others could face extinction. This dynamic is critical to understanding future Arctic ecosystems and their global impact, including feedback loops that may exacerbate global warming.",
92+
"Research and Conservation in the Arctic": "Research in Arctic botany is crucial for understanding the intricate balance of this ecosystem and the impacts of climate change. Scientists conduct studies on plant physiology, genetics, and ecosystem dynamics. Conservation efforts are focused on protecting the Arctic's unique biodiversity through protected areas, sustainable management practices, and international cooperation. These efforts aim to preserve the Arctic flora for future generations and maintain its role in the global climate system.",
93+
"Traditional Knowledge and Arctic Botany": "Indigenous peoples of the Arctic have a deep connection with the land and its plant life. Traditional knowledge, passed down through generations, includes the uses of plants for nutrition, healing, and materials. This body of knowledge is invaluable for both conservation and understanding the ecological relationships in Arctic ecosystems. Integrating traditional knowledge with scientific research enriches our comprehension of Arctic botany and enhances conservation strategies.",
94+
"Future Directions in Arctic Botanical Studies": "The future of Arctic botany lies in interdisciplinary research, combining traditional knowledge with modern scientific techniques. As the Arctic undergoes rapid changes, understanding the ecological, cultural, and climatic dimensions of Arctic flora becomes increasingly important. Future research will need to address the challenges of climate change, explore the potential for Arctic plants in biotechnology, and continue to conserve this unique biome. The resilience of Arctic flora offers lessons in adaptation and survival relevant to global challenges."
95+
}
96+
97+
arctic_botany_texts = list(arctic_botany_dict.values())
98+
----
99+
+
75100
. Connect to Astra and ingest embeddings.
76101
+
77102
[source,python]
@@ -94,40 +119,40 @@ Python::
94119
+
95120
[source,python]
96121
----
97-
import logging
98-
import nest_asyncio
99122
nest_asyncio.apply()
100123
101-
logging.getLogger('cassandra').setLevel(logging.ERROR) # workaround to suppress logs
102-
from ragstack_colbert import ColbertRetriever
103-
retriever = ColbertRetriever(
104-
vector_store=db, embedding_model=colbert
105-
)
124+
logging.getLogger("cassandra").setLevel(logging.ERROR) # workaround to suppress logs
125+
retriever = vector_store.as_retriever()
106126
107-
answers = retriever.retrieve("What's arctic botany", k=2)
108-
for answer in answers:
109-
print(f"Rank: {answer.rank} Score: {answer.score} Text: {answer.data.text}\n")
127+
answers = retriever.text_search("What's arctic botany", k=2)
128+
for rank, (answer, score) in enumerate(answers):
129+
print(f"Rank: {rank} Score: {score} Text: {answer.text}\n")
110130
----
111-
131+
+
112132
Result::
113133
+
114134
[source,plain]
115135
----
116-
Rank: 0 Score: 5.266005039215088 Text: Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.
136+
#> QueryTokenizer.tensorize(batch_text[0], batch_background[0], bsize) ==
137+
#> Input: . What's arctic botany, True, None
138+
#> Output IDs: torch.Size([9]), tensor([ 101, 1, 2054, 1005, 1055, 2396, 2594, 17018, 102])
139+
#> Output Mask: torch.Size([9]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1])
140+
141+
Rank: 0 Score: 5.266004428267479 Text: Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.
117142
118-
Rank: 1 Score: 3.9489646703004837 Text: Research in Arctic botany is crucial for understanding the intricate balance of this ecosystem and the impacts of climate change. Scientists conduct studies on plant physiology, genetics, and ecosystem dynamics. Conservation efforts are focused on protecting the Arctic's unique biodiversity through protected areas, sustainable management practices, and international cooperation. These efforts aim to preserve the Arctic flora for future generations and maintain its role in the global climate system.
143+
Rank: 1 Score: 5.266004309058189 Text: Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.
119144
----
120145
======
121146

122147
== Retrieve embeddings with the LangChain retriever
123148

124149
Alternatively, use the ColBERT extra with the `ragstack-ai-langchain` package to retrieve documents.
125150

126-
. Install the RAGStack LangChain package with the ColBERT extra.
151+
. Install the RAGStack Langchain package with the ColBERT extra.
127152
+
128153
[source,python]
129154
----
130-
pip install ragstack-ai-langchain[colbert]
155+
pip install "ragstack-ai-langchain[colbert]"
131156
----
132157
+
133158
. Run the LangChain retriever against the indexed embeddings.
@@ -145,7 +170,10 @@ lc_vector_store = LangchainColbertVectorStore(
145170
embedding_model=embedding_model,
146171
)
147172
148-
docs = lc_vector_store.similarity_search("what kind fish lives shallow coral reefs atlantic, india ocean, red sea, gulf of mexico, pacific, and arctic ocean")
173+
docs = lc_vector_store.similarity_search(
174+
"what kind fish lives shallow coral reefs atlantic, india ocean, "
175+
"red sea, gulf of mexico, pacific, and arctic ocean"
176+
)
149177
print(f"first answer: {docs[0].page_content}")
150178
----
151179

docs/modules/examples/pages/llama-astra.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ DB Access Token] with Database Administrator permissions.
2020
Install the following dependencies:
2121
[source,python]
2222
----
23-
pip install ragstack-ai
23+
pip install ragstack-ai python-dotenv
2424
----
2525
See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details.
2626

0 commit comments

Comments
 (0)