Skip to content

Commit bb2c2fd

Browse files
mpb159753eyurtsev
andauthored
docs: Add openGauss vector store documentation (#30742)
Hey LangChain community! πŸ‘‹ Excited to propose official documentation for our new openGauss integration that brings powerful vector capabilities to the stack! ### What's Inside πŸ“¦ 1. **Full Integration Guide** Introducing [langchain-opengauss](https://pypi.org/project/langchain-opengauss/) on PyPI - your new toolkit for: πŸ” Native hybrid search (vectors + metadata) πŸš€ Production-grade connection pooling 🧩 Automatic schema management 2. **Rigorous Testing Passed** βœ… ![Benchmark Results](https://github.com/user-attachments/assets/ae3b21f7-aeea-4ae7-a142-f2aec57936a0) - 100% non-async test coverage ps: Current implementation resides in my personal repository: https://github.com/mpb159753/langchain-opengauss, How can I transfer process to langchain-ai org?? *Keen to hear your thoughts and make this integration shine!* ✨ --------- Co-authored-by: Eugene Yurtsev <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>
1 parent 913c896 commit bb2c2fd

File tree

3 files changed

+397
-0
lines changed

3 files changed

+397
-0
lines changed
Lines changed: 373 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,373 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "raw",
5+
"id": "1957f5cb",
6+
"metadata": {},
7+
"source": [
8+
"---\n",
9+
"sidebar_label: openGauss\n",
10+
"---"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "ef1f0986",
16+
"metadata": {},
17+
"source": [
18+
"# openGauss VectorStore\n",
19+
"\n",
20+
"This notebook covers how to get started with the openGauss VectorStore. [openGauss](https://opengauss.org/en/) is a high-performance relational database with native vector storage and retrieval capabilities. This integration enables ACID-compliant vector operations within LangChain applications, combining traditional SQL functionality with modern AI-driven similarity search.\n",
21+
" vector store."
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"id": "36fdc060",
27+
"metadata": {},
28+
"source": [
29+
"## Setup\n",
30+
"\n",
31+
"### Launch openGauss Container"
32+
]
33+
},
34+
{
35+
"metadata": {},
36+
"cell_type": "markdown",
37+
"source": [
38+
"```bash\n",
39+
"docker run --name opengauss \\\n",
40+
" -d \\\n",
41+
" -e GS_PASSWORD='MyStrongPass@123' \\\n",
42+
" -p 8888:5432 \\\n",
43+
" opengauss/opengauss-server:latest\n",
44+
"```"
45+
],
46+
"id": "e006fdc593107ef5"
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"id": "a51b3f07b83b8a1d",
51+
"metadata": {},
52+
"source": "### Install langchain-opengauss"
53+
},
54+
{
55+
"cell_type": "raw",
56+
"id": "ad030f666e228cc8",
57+
"metadata": {},
58+
"source": [
59+
"```bash\n",
60+
"pip install langchain-opengauss\n",
61+
"```"
62+
]
63+
},
64+
{
65+
"cell_type": "markdown",
66+
"id": "4d14f2f5f8ab0df7",
67+
"metadata": {},
68+
"source": [
69+
"**System Requirements**:\n",
70+
"- openGauss β‰₯ 7.0.0\n",
71+
"- Python β‰₯ 3.8\n",
72+
"- psycopg2-binary"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"id": "9695dee7",
78+
"metadata": {},
79+
"source": [
80+
"### Credentials\n",
81+
"\n",
82+
"Using your openGauss Credentials"
83+
]
84+
},
85+
{
86+
"cell_type": "markdown",
87+
"id": "93df377e",
88+
"metadata": {},
89+
"source": [
90+
"## Initialization\n",
91+
"\n",
92+
"import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n",
93+
"\n",
94+
"<EmbeddingTabs/>"
95+
]
96+
},
97+
{
98+
"cell_type": "code",
99+
"execution_count": null,
100+
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
101+
"metadata": {
102+
"tags": []
103+
},
104+
"outputs": [],
105+
"source": [
106+
"from langchain_opengauss import OpenGauss, OpenGaussSettings\n",
107+
"\n",
108+
"# Configure with schema validation\n",
109+
"config = OpenGaussSettings(\n",
110+
" table_name=\"test_langchain\",\n",
111+
" embedding_dimension=384,\n",
112+
" index_type=\"HNSW\",\n",
113+
" distance_strategy=\"COSINE\",\n",
114+
")\n",
115+
"vector_store = OpenGauss(embedding=embeddings, config=config)"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"id": "ac6071d4",
121+
"metadata": {},
122+
"source": [
123+
"## Manage vector store\n",
124+
"\n",
125+
"### Add items to vector store\n"
126+
]
127+
},
128+
{
129+
"cell_type": "code",
130+
"execution_count": null,
131+
"id": "17f5efc0",
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"from langchain_core.documents import Document\n",
136+
"\n",
137+
"document_1 = Document(page_content=\"foo\", metadata={\"source\": \"https://example.com\"})\n",
138+
"\n",
139+
"document_2 = Document(page_content=\"bar\", metadata={\"source\": \"https://example.com\"})\n",
140+
"\n",
141+
"document_3 = Document(page_content=\"baz\", metadata={\"source\": \"https://example.com\"})\n",
142+
"\n",
143+
"documents = [document_1, document_2, document_3]\n",
144+
"\n",
145+
"vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
146+
]
147+
},
148+
{
149+
"cell_type": "markdown",
150+
"id": "c738c3e0",
151+
"metadata": {},
152+
"source": "### Update items in vector store\n"
153+
},
154+
{
155+
"cell_type": "code",
156+
"execution_count": null,
157+
"id": "f0aa8b71",
158+
"metadata": {},
159+
"outputs": [],
160+
"source": [
161+
"updated_document = Document(\n",
162+
" page_content=\"qux\", metadata={\"source\": \"https://another-example.com\"}\n",
163+
")\n",
164+
"\n",
165+
"# If the id is already exist, will update the document\n",
166+
"vector_store.add_documents(document_id=\"1\", document=updated_document)"
167+
]
168+
},
169+
{
170+
"cell_type": "markdown",
171+
"id": "dcf1b905",
172+
"metadata": {},
173+
"source": "### Delete items from vector store\n"
174+
},
175+
{
176+
"cell_type": "code",
177+
"execution_count": null,
178+
"id": "ef61e188",
179+
"metadata": {},
180+
"outputs": [],
181+
"source": [
182+
"vector_store.delete(ids=[\"3\"])"
183+
]
184+
},
185+
{
186+
"cell_type": "markdown",
187+
"id": "c3620501",
188+
"metadata": {},
189+
"source": [
190+
"## Query vector store\n",
191+
"\n",
192+
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.\n",
193+
"\n",
194+
"### Query directly\n",
195+
"\n",
196+
"Performing a simple similarity search can be done as follows:\n",
197+
"\n",
198+
"- TODO: Edit and then run code cell to generate output"
199+
]
200+
},
201+
{
202+
"cell_type": "code",
203+
"execution_count": null,
204+
"id": "aa0a16fa",
205+
"metadata": {},
206+
"outputs": [],
207+
"source": [
208+
"results = vector_store.similarity_search(\n",
209+
" query=\"thud\", k=1, filter={\"source\": \"https://another-example.com\"}\n",
210+
")\n",
211+
"for doc in results:\n",
212+
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
213+
]
214+
},
215+
{
216+
"cell_type": "markdown",
217+
"id": "3ed9d733",
218+
"metadata": {},
219+
"source": "If you want to execute a similarity search and receive the corresponding scores you can run:\n"
220+
},
221+
{
222+
"cell_type": "code",
223+
"execution_count": null,
224+
"id": "5efd2eaa",
225+
"metadata": {},
226+
"outputs": [],
227+
"source": [
228+
"results = vector_store.similarity_search_with_score(\n",
229+
" query=\"thud\", k=1, filter={\"source\": \"https://example.com\"}\n",
230+
")\n",
231+
"for doc, score in results:\n",
232+
" print(f\"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]\")"
233+
]
234+
},
235+
{
236+
"cell_type": "markdown",
237+
"id": "0c235cdc",
238+
"metadata": {},
239+
"source": [
240+
"### Query by turning into retriever\n",
241+
"\n",
242+
"You can also transform the vector store into a retriever for easier usage in your chains.\n",
243+
"\n",
244+
"- TODO: Edit and then run code cell to generate output"
245+
]
246+
},
247+
{
248+
"cell_type": "code",
249+
"execution_count": null,
250+
"id": "f3460093",
251+
"metadata": {},
252+
"outputs": [],
253+
"source": [
254+
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 1})\n",
255+
"retriever.invoke(\"thud\")"
256+
]
257+
},
258+
{
259+
"cell_type": "markdown",
260+
"id": "901c75dc",
261+
"metadata": {},
262+
"source": [
263+
"## Usage for retrieval-augmented generation\n",
264+
"\n",
265+
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
266+
"\n",
267+
"- [Tutorials](/docs/tutorials/)\n",
268+
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
269+
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
270+
]
271+
},
272+
{
273+
"cell_type": "markdown",
274+
"id": "069f1b5f",
275+
"metadata": {},
276+
"source": [
277+
"## Configuration\n",
278+
"\n",
279+
"### Connection Settings\n",
280+
"| Parameter | Default | Description |\n",
281+
"|---------------------|-------------------------|--------------------------------------------------------|\n",
282+
"| `host` | localhost | Database server address |\n",
283+
"| `port` | 8888 | Database connection port |\n",
284+
"| `user` | gaussdb | Database username |\n",
285+
"| `password` | - | Complex password string |\n",
286+
"| `database` | postgres | Default database name |\n",
287+
"| `min_connections` | 1 | Connection pool minimum size |\n",
288+
"| `max_connections` | 5 | Connection pool maximum size |\n",
289+
"| `table_name` | langchain_docs | Name of the table for storing vector data and metadata |\n",
290+
"| `index_type` | IndexType.HNSW |Vector index algorithm type. Options: HNSW or IVFFLAT\\nDefault is HNSW.|\n",
291+
"| `vector_type` | VectorType.vector |Type of vector representation to use. Default is Vector.|\n",
292+
"| `distance_strategy` | DistanceStrategy.COSINE |Vector similarity metric to use for retrieval. Options: euclidean (L2 distance), cosine (angular distance, ideal for text embeddings), manhattan (L1 distance for sparse data), negative_inner_product (dot product for normalized vectors).\\n Default is cosine.|\n",
293+
"|`embedding_dimension`| 1536 |Dimensionality of the vector embeddings.|\n",
294+
"\n",
295+
"### Supported Combinations\n",
296+
"\n",
297+
"| Vector Type | Dimensions | Index Types | Supported Distance Strategies |\n",
298+
"|-------------|------------|--------------|---------------------------------------|\n",
299+
"| vector | ≀2000 | HNSW/IVFFLAT | COSINE/EUCLIDEAN/MANHATTAN/INNER_PROD |\n",
300+
"\n"
301+
]
302+
},
303+
{
304+
"cell_type": "markdown",
305+
"id": "6a7b7b7c4f5a03e1",
306+
"metadata": {},
307+
"source": [
308+
"## Performance Optimization\n",
309+
"\n",
310+
"### Index Tuning Guidelines\n",
311+
"**HNSW Parameters**:\n",
312+
"- `m`: 16-100 (balance between recall and memory)\n",
313+
"- `ef_construction`: 64-1000 (must be > 2*m)\n",
314+
"\n",
315+
"**IVFFLAT Recommendations**:\n",
316+
"```python\n",
317+
"import math\n",
318+
"\n",
319+
"lists = min(\n",
320+
" int(math.sqrt(total_rows)) if total_rows > 1e6 else int(total_rows / 1000),\n",
321+
" 2000, # openGauss maximum\n",
322+
")\n",
323+
"```\n",
324+
"\n",
325+
"### Connection Pooling\n",
326+
"```python\n",
327+
"OpenGaussSettings(min_connections=3, max_connections=20)\n",
328+
"```\n"
329+
]
330+
},
331+
{
332+
"cell_type": "markdown",
333+
"id": "6b581b499ffed641",
334+
"metadata": {},
335+
"source": [
336+
"## Limitations\n",
337+
"- `bit` and `sparsevec` vector types currently in development\n",
338+
"- Maximum vector dimensions: 2000 for `vector` type"
339+
]
340+
},
341+
{
342+
"cell_type": "markdown",
343+
"id": "8a27244f",
344+
"metadata": {},
345+
"source": [
346+
"## API reference\n",
347+
"\n",
348+
"For detailed documentation of all __ModuleName__VectorStore features and configurations head to the API reference: https://python.langchain.com/api_reference/en/latest/vectorstores/opengauss.OpenGuass.html"
349+
]
350+
}
351+
],
352+
"metadata": {
353+
"kernelspec": {
354+
"display_name": "Python 3 (ipykernel)",
355+
"language": "python",
356+
"name": "python3"
357+
},
358+
"language_info": {
359+
"codemirror_mode": {
360+
"name": "ipython",
361+
"version": 3
362+
},
363+
"file_extension": ".py",
364+
"mimetype": "text/x-python",
365+
"name": "python",
366+
"nbconvert_exporter": "python",
367+
"pygments_lexer": "ipython3",
368+
"version": "3.10.12"
369+
}
370+
},
371+
"nbformat": 4,
372+
"nbformat_minor": 5
373+
}

β€Ždocs/scripts/vectorstore_feat_table.pyβ€Ž

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,17 @@ def get_vectorstore_table():
140140
"Local/Cloud": "Local",
141141
"IDs in add Documents": True,
142142
},
143+
"openGauss": {
144+
"Delete by ID": True,
145+
"Filtering": True,
146+
"similarity_search_by_vector": True,
147+
"similarity_search_with_score": True,
148+
"asearch": False,
149+
"Passes Standard Tests": True,
150+
"Multi Tenancy": False,
151+
"Local/Cloud": "Local",
152+
"IDs in add Documents": True,
153+
},
143154
"QdrantVectorStore": {
144155
"Delete by ID": True,
145156
"Filtering": True,

0 commit comments

Comments
Β (0)