Skip to content

Commit ec2fe67

Browse files
authored
Merge pull request #4 from ydb-platform/notebook_example
Add notebook example
2 parents a03e1ba + d2517e4 commit ec2fe67

File tree

1 file changed

+354
-0
lines changed

1 file changed

+354
-0
lines changed

examples/basic_example.ipynb

Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "5db7cbde-9577-4d4a-9a78-304b4ab6be4f",
6+
"metadata": {},
7+
"source": [
8+
"# YDB Vector Store Example"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"id": "97b34308-e3ee-4fea-b3cc-d56520b7cb61",
14+
"metadata": {},
15+
"source": [
16+
"## Setup\n",
17+
"\n",
18+
"First, set up a local YDB with [docker compose file](https://github.com/ydb-platform/langchain-ydb/blob/main/docker/docker-compose.yml) using command: `docker compose up -d --wait`\n",
19+
"\n"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"id": "9cbac6d9-7aa0-4008-82d0-d393993932ac",
25+
"metadata": {},
26+
"source": [
27+
"Install `langchain-ydb` python package"
28+
]
29+
},
30+
{
31+
"cell_type": "code",
32+
"execution_count": 1,
33+
"id": "4bca83db-1051-49f4-a7dd-7b007ea454a8",
34+
"metadata": {},
35+
"outputs": [],
36+
"source": [
37+
"!pip install -qU langchain-ydb"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"id": "6f79b8c7-7ce3-4d10-8b43-087231e3c5ff",
43+
"metadata": {},
44+
"source": [
45+
"Then prepare embeddings model to work with:"
46+
]
47+
},
48+
{
49+
"cell_type": "code",
50+
"execution_count": 2,
51+
"id": "f2404de2-7d63-4682-ba9a-4ccf48de3a90",
52+
"metadata": {},
53+
"outputs": [],
54+
"source": [
55+
"!pip install -qU langchain-huggingface\n",
56+
"\n",
57+
"from langchain_huggingface import HuggingFaceEmbeddings\n",
58+
"\n",
59+
"embeddings = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\")"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"id": "962f6371-a4c6-4188-80c3-2379740b2747",
65+
"metadata": {},
66+
"source": [
67+
"Finally, create YDB Vector Store:"
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": 3,
73+
"id": "0761e64b-a65e-4170-9e81-1a318968e694",
74+
"metadata": {},
75+
"outputs": [],
76+
"source": [
77+
"from langchain_ydb.vectorstores import YDB, YDBSettings\n",
78+
"\n",
79+
"vector_store = YDB(\n",
80+
" embeddings,\n",
81+
" config=YDBSettings(\n",
82+
" table=\"langchain_ydb_example_notebook\",\n",
83+
" drop_existing_table=True,\n",
84+
" ),\n",
85+
")"
86+
]
87+
},
88+
{
89+
"cell_type": "markdown",
90+
"id": "9a52b3b1-3443-40ec-9bda-5214ed554a6f",
91+
"metadata": {},
92+
"source": [
93+
"## Operations with YDB Vector Store"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"id": "5d9b41bd-b124-45c7-be47-fa8467ed508a",
99+
"metadata": {},
100+
"source": [
101+
"Prepare data to work with:"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": 4,
107+
"id": "bacab9ef-2f2b-4639-ad0b-e236240c4fa2",
108+
"metadata": {},
109+
"outputs": [],
110+
"source": [
111+
"data = [\n",
112+
" (\n",
113+
" \"The Earth revolves around the Sun once every 365.25 days.\",\n",
114+
" {\"category\": \"astronomy\"}\n",
115+
" ),\n",
116+
" (\n",
117+
" \"Water boils at 100 degrees Celsius at standard atmospheric pressure.\",\n",
118+
" {\"category\": \"science\"}\n",
119+
" ),\n",
120+
" (\n",
121+
" \"Light travels at approximately 299,792 kilometers per second in a vacuum.\",\n",
122+
" {\"category\": \"science\"}\n",
123+
" ),\n",
124+
" (\n",
125+
" \"The Great Wall of China is over 13,000 miles long.\",\n",
126+
" {\"category\": \"history\"}\n",
127+
" ),\n",
128+
" (\n",
129+
" \"Mount Everest is the highest mountain in the world, standing at 29,032 feet.\",\n",
130+
" {\"category\": \"geography\"}\n",
131+
" ),\n",
132+
" (\n",
133+
" \"The Amazon Rainforest is the largest tropical rainforest, covering over 5.5 \"\n",
134+
" \"million square kilometers.\",\n",
135+
" {\"category\": \"geography\"}\n",
136+
" ),\n",
137+
" (\n",
138+
" \"The human body contains 206 bones.\",\n",
139+
" {\"category\": \"biology\"}\n",
140+
" ),\n",
141+
" (\n",
142+
" \"The Pacific Ocean is the largest ocean on Earth, covering more than \"\n",
143+
" \"63 million square miles.\",\n",
144+
" {\"category\": \"geography\"}\n",
145+
" ),\n",
146+
" (\n",
147+
" \"The speed of sound in air is around 343 meters per second at \"\n",
148+
" \"room temperature.\",\n",
149+
" {\"category\": \"science\"}\n",
150+
" ),\n",
151+
" (\n",
152+
" \"A leap year occurs every four years to help synchronize the calendar year \"\n",
153+
" \"with the solar year.\",\n",
154+
" {\"category\": \"astronomy\"}\n",
155+
" ),\n",
156+
" (\n",
157+
" \"The cheetah is the fastest land animal, capable of running up to 75 miles per \"\n",
158+
" \"hour.\",\n",
159+
" {\"category\": \"biology\"}\n",
160+
" ),\n",
161+
" (\n",
162+
" \"Venus is the hottest planet in our solar system, with surface temperatures of \"\n",
163+
" \"around 467 degrees Celsius.\",\n",
164+
" {\"category\": \"astronomy\"}\n",
165+
" ),\n",
166+
" (\n",
167+
" \"Honey never spoils. Archaeologists have found pots of honey in \"\n",
168+
" \"ancient Egyptian tombs that are over 3,000 years old and still edible.\",\n",
169+
" {\"category\": \"history\"}\n",
170+
" ),\n",
171+
" (\n",
172+
" \"The heart of a resting adult pumps about 70 milliliters of blood per beat.\",\n",
173+
" {\"category\": \"biology\"}\n",
174+
" ),\n",
175+
" (\n",
176+
" \"The blue whale is the largest animal on Earth, growing up to \"\n",
177+
" \"100 feet long and weighing as much as 200 tons.\",\n",
178+
" {\"category\": \"biology\"}\n",
179+
" ),\n",
180+
" (\n",
181+
" \"The Eiffel Tower in Paris was completed in 1889 and was the tallest structure \"\n",
182+
" \"in the world until 1930.\",\n",
183+
" {\"category\": \"history\"}\n",
184+
" ),\n",
185+
" (\n",
186+
" \"Sharks have been around for over 400 million years, surviving several mass \"\n",
187+
" \"extinction events.\",\n",
188+
" {\"category\": \"biology\"}\n",
189+
" ),\n",
190+
" (\n",
191+
" \"Bananas are berries, while strawberries are not. Botanically, berries \"\n",
192+
" \"come from the ovary of a single flower with seeds embedded in the flesh.\",\n",
193+
" {\"category\": \"biology\"}\n",
194+
" ),\n",
195+
" (\n",
196+
" \"Tokyo is the most populous city in the world, with a population of over 37 \"\n",
197+
" \"million people in the metropolitan area.\",\n",
198+
" {\"category\": \"geography\"}\n",
199+
" ),\n",
200+
" (\n",
201+
" \"The Mona Lisa, painted by Leonardo da Vinci, is one of the most famous \"\n",
202+
" \"works of art and is displayed in the Louvre Museum in Paris.\",\n",
203+
" {\"category\": \"art\"}\n",
204+
" )\n",
205+
"]\n",
206+
"\n",
207+
"\n",
208+
"texts = [row[0] for row in data]\n",
209+
"metadatas = [row[1] for row in data]\n"
210+
]
211+
},
212+
{
213+
"cell_type": "markdown",
214+
"id": "875b12e6-400c-4547-bd01-a4722da0e380",
215+
"metadata": {},
216+
"source": [
217+
"Insert this data to vector store:"
218+
]
219+
},
220+
{
221+
"cell_type": "code",
222+
"execution_count": 5,
223+
"id": "89786747-1b63-49b0-bb70-e47916b751e5",
224+
"metadata": {},
225+
"outputs": [
226+
{
227+
"name": "stderr",
228+
"output_type": "stream",
229+
"text": [
230+
"Inserting data...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 11.76it/s]\n"
231+
]
232+
}
233+
],
234+
"source": [
235+
"ids = vector_store.add_texts(texts, metadatas)"
236+
]
237+
},
238+
{
239+
"cell_type": "markdown",
240+
"id": "78c12e5c-c022-495d-bbf3-de3ae07e91b0",
241+
"metadata": {},
242+
"source": [
243+
"Similarity search:"
244+
]
245+
},
246+
{
247+
"cell_type": "code",
248+
"execution_count": 6,
249+
"id": "c723670f-c562-4e82-a383-2802fc781141",
250+
"metadata": {},
251+
"outputs": [
252+
{
253+
"data": {
254+
"text/plain": [
255+
"[Document(metadata={'category': 'geography'}, page_content='Tokyo is the most populous city in the world, with a population of over 37 million people in the metropolitan area.'),\n",
256+
" Document(metadata={'category': 'history'}, page_content='The Great Wall of China is over 13,000 miles long.')]"
257+
]
258+
},
259+
"execution_count": 6,
260+
"metadata": {},
261+
"output_type": "execute_result"
262+
}
263+
],
264+
"source": [
265+
"vector_store.similarity_search(\"Any facts about Tokyo?\", k=2)"
266+
]
267+
},
268+
{
269+
"cell_type": "markdown",
270+
"id": "d9936976-15aa-41fd-aab1-4a986493fa1c",
271+
"metadata": {},
272+
"source": [
273+
"Similarity search with score:"
274+
]
275+
},
276+
{
277+
"cell_type": "code",
278+
"execution_count": 7,
279+
"id": "4103cb60-cf28-4c8c-8a82-c4b573801efd",
280+
"metadata": {},
281+
"outputs": [
282+
{
283+
"name": "stdout",
284+
"output_type": "stream",
285+
"text": [
286+
"[SIM=0.508] biology \t | The blue whale is the largest animal on Earth, growing up to 100 feet long and weighing as much as 200 tons.\n",
287+
"[SIM=0.373] history \t | The Great Wall of China is over 13,000 miles long.\n",
288+
"[SIM=0.339] geography \t | The Pacific Ocean is the largest ocean on Earth, covering more than 63 million square miles.\n",
289+
"[SIM=0.305] geography \t | The Amazon Rainforest is the largest tropical rainforest, covering over 5.5 million square kilometers.\n"
290+
]
291+
}
292+
],
293+
"source": [
294+
"result = vector_store.similarity_search_with_score(\"What objects are huge?\", k=4)\n",
295+
"for res, score in result:\n",
296+
" print(f\"[SIM={score:.3f}] {res.metadata['category']} \\t | {res.page_content}\")"
297+
]
298+
},
299+
{
300+
"cell_type": "markdown",
301+
"id": "e8f05729-d876-4758-ad75-26593d326b67",
302+
"metadata": {},
303+
"source": [
304+
"Similarity search with score and filter:"
305+
]
306+
},
307+
{
308+
"cell_type": "code",
309+
"execution_count": 8,
310+
"id": "ea643668-d5f8-4ac6-8fda-51a81f886cc2",
311+
"metadata": {},
312+
"outputs": [
313+
{
314+
"name": "stdout",
315+
"output_type": "stream",
316+
"text": [
317+
"[SIM=0.339] geography \t | The Pacific Ocean is the largest ocean on Earth, covering more than 63 million square miles.\n",
318+
"[SIM=0.305] geography \t | The Amazon Rainforest is the largest tropical rainforest, covering over 5.5 million square kilometers.\n",
319+
"[SIM=0.265] geography \t | Mount Everest is the highest mountain in the world, standing at 29,032 feet.\n",
320+
"[SIM=0.234] geography \t | Tokyo is the most populous city in the world, with a population of over 37 million people in the metropolitan area.\n"
321+
]
322+
}
323+
],
324+
"source": [
325+
"result = vector_store.similarity_search_with_score(\n",
326+
" \"What objects are huge?\", filter={\"category\":\"geography\"}\n",
327+
")\n",
328+
"for res, score in result:\n",
329+
" print(f\"[SIM={score:.3f}] {res.metadata['category']} \\t | {res.page_content}\")"
330+
]
331+
}
332+
],
333+
"metadata": {
334+
"kernelspec": {
335+
"display_name": "Python 3 (ipykernel)",
336+
"language": "python",
337+
"name": "python3"
338+
},
339+
"language_info": {
340+
"codemirror_mode": {
341+
"name": "ipython",
342+
"version": 3
343+
},
344+
"file_extension": ".py",
345+
"mimetype": "text/x-python",
346+
"name": "python",
347+
"nbconvert_exporter": "python",
348+
"pygments_lexer": "ipython3",
349+
"version": "3.13.2"
350+
}
351+
},
352+
"nbformat": 4,
353+
"nbformat_minor": 5
354+
}

0 commit comments

Comments
 (0)