Skip to content
This repository was archived by the owner on Nov 10, 2025. It is now read-only.

Commit 55b59ef

Browse files
capemoxAayushTyagi1github-actions[bot]
authored
Add Couchbase as a tool (#264)
* - Added CouchbaseFTSVectorStore as a CrewAI tool. - Wrote a README to setup the tool. - Wrote test cases. - Added Couchbase as an optional dependency in the project. * Fixed naming in some places. Added docstrings. Added instructions on how to create a vector search index. * Fixed pyproject.toml * error handling and response format - Removed unnecessary ImportError for missing 'couchbase' package. - Changed response format from a concatenated string to a JSON array for search results. - Updated error handling to return error messages instead of raising exceptions in certain cases. - Adjusted tests to reflect changes in response format and error handling. * Update dependencies in pyproject.toml and uv.lock - Changed pydantic version from 2.6.1 to 2.10.6 in both pyproject.toml and uv.lock. - Updated crewai-tools version from 0.42.2 to 0.42.3 in uv.lock. - Adjusted pydantic-core version from 2.33.1 to 2.27.2 in uv.lock, reflecting the new pydantic version. * Removed restrictive pydantic version and updated uv.lock * synced lockfile * regenerated lockfile * updated lockfile * regenerated lockfile * Update tool specifications for * Fix test cases --------- Co-authored-by: AayushTyagi1 <tyagiaayush5@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent a3a5bdc commit 55b59ef

File tree

7 files changed

+887
-29
lines changed

7 files changed

+887
-29
lines changed

crewai_tools/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
CodeDocsSearchTool,
1515
CodeInterpreterTool,
1616
ComposioTool,
17+
CouchbaseFTSVectorSearchTool,
1718
CrewaiEnterpriseTools,
1819
CSVSearchTool,
1920
DallETool,

crewai_tools/tools/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .code_docs_search_tool.code_docs_search_tool import CodeDocsSearchTool
66
from .code_interpreter_tool.code_interpreter_tool import CodeInterpreterTool
77
from .composio_tool.composio_tool import ComposioTool
8+
from .couchbase_tool.couchbase_tool import CouchbaseFTSVectorSearchTool
89
from .crewai_enterprise_tools.crewai_enterprise_tools import CrewaiEnterpriseTools
910
from .csv_search_tool.csv_search_tool import CSVSearchTool
1011
from .dalle_tool.dalle_tool import DallETool
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# CouchbaseFTSVectorSearchTool
2+
## Description
3+
Couchbase is a NoSQL database with vector search capabilities. Users can store and query vector embeddings. You can learn more about Couchbase vector search here: https://docs.couchbase.com/cloud/vector-search/vector-search.html
4+
5+
This tool is specifically crafted for performing semantic search using Couchbase. Use this tool to find semantically similar docs to a given query.
6+
7+
## Installation
8+
Install the crewai_tools package by executing the following command in your terminal:
9+
10+
```shell
11+
uv pip install 'crewai[tools]'
12+
```
13+
14+
## Setup
15+
Before instantiating the tool, you need a Couchbase cluster.
16+
- Create a cluster on [Couchbase Capella](https://docs.couchbase.com/cloud/get-started/create-account.html), Couchbase's cloud database solution.
17+
- Create a [local Couchbase server](https://docs.couchbase.com/server/current/getting-started/start-here.html).
18+
19+
You will need to create a bucket, scope and collection on the cluster. Then, [follow this guide](https://docs.couchbase.com/python-sdk/current/hello-world/start-using-sdk.html) to create a Couchbase Cluster object and load documents into your collection.
20+
21+
Follow the docs below to create a vector search index on Couchbase.
22+
- [Create a vector search index on Couchbase Capella.](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html)
23+
- [Create a vector search index on your local Couchbase server.](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html)
24+
25+
Ensure that the `Dimension` field in the index matches the embedding model. For example, OpenAI's `text-embedding-3-small` model has an embedding dimension of 1536 dimensions, and so the `Dimension` field must be 1536 in the index.
26+
27+
## Example
28+
To utilize the CouchbaseFTSVectorSearchTool for different use cases, follow these examples:
29+
30+
```python
31+
from crewai_tools import CouchbaseFTSVectorSearchTool
32+
33+
# Instantiate a Couchbase Cluster object from the Couchbase SDK
34+
35+
tool = CouchbaseFTSVectorSearchTool(
36+
cluster=cluster,
37+
collection_name="collection",
38+
scope_name="scope",
39+
bucket_name="bucket",
40+
index_name="index",
41+
embedding_function=embed_fn
42+
)
43+
44+
# Adding the tool to an agent
45+
rag_agent = Agent(
46+
name="rag_agent",
47+
role="You are a helpful assistant that can answer questions with the help of the CouchbaseFTSVectorSearchTool.",
48+
llm="gpt-4o-mini",
49+
tools=[tool],
50+
)
51+
```
52+
53+
## Arguments
54+
- `cluster`: An initialized Couchbase `Cluster` instance.
55+
- `bucket_name`: The name of the Couchbase bucket.
56+
- `scope_name`: The name of the scope within the bucket.
57+
- `collection_name`: The name of the collection within the scope.
58+
- `index_name`: The name of the search index (vector index).
59+
- `embedding_function`: A function that takes a string and returns its embedding (list of floats).
60+
- `embedding_key`: Name of the field in the search index storing the vector. (Optional, defaults to 'embedding')
61+
- `scoped_index`: Whether the index is scoped (True) or cluster-level (False). (Optional, defaults to True)
62+
- `limit`: The maximum number of search results to return. (Optional, defaults to 3)
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
import json
2+
import os
3+
from typing import Any, Optional, Type, List, Dict, Callable
4+
5+
try:
6+
import couchbase.search as search
7+
from couchbase.cluster import Cluster
8+
from couchbase.options import SearchOptions
9+
from couchbase.vector_search import VectorQuery, VectorSearch
10+
11+
COUCHBASE_AVAILABLE = True
12+
except ImportError:
13+
COUCHBASE_AVAILABLE = False
14+
search = Any
15+
Cluster = Any
16+
SearchOptions = Any
17+
VectorQuery = Any
18+
VectorSearch = Any
19+
20+
from crewai.tools import BaseTool
21+
from pydantic import BaseModel, Field, SkipValidation
22+
23+
24+
class CouchbaseToolSchema(BaseModel):
25+
"""Input for CouchbaseTool."""
26+
27+
query: str = Field(
28+
...,
29+
description="The query to search retrieve relevant information from the Couchbase database. Pass only the query, not the question.",
30+
)
31+
32+
class CouchbaseFTSVectorSearchTool(BaseTool):
33+
"""Tool to search the Couchbase database"""
34+
35+
model_config = {"arbitrary_types_allowed": True}
36+
name: str = "CouchbaseFTSVectorSearchTool"
37+
description: str = "A tool to search the Couchbase database for relevant information on internal documents."
38+
args_schema: Type[BaseModel] = CouchbaseToolSchema
39+
cluster: SkipValidation[Optional[Cluster]] = None
40+
collection_name: Optional[str] = None,
41+
scope_name: Optional[str] = None,
42+
bucket_name: Optional[str] = None,
43+
index_name: Optional[str] = None,
44+
embedding_key: Optional[str] = Field(
45+
default="embedding",
46+
description="Name of the field in the search index that stores the vector"
47+
)
48+
scoped_index: Optional[bool] = Field(
49+
default=True,
50+
description="Specify whether the index is scoped. Is True by default."
51+
),
52+
limit: Optional[int] = Field(default=3)
53+
embedding_function: SkipValidation[Callable[[str], List[float]]] = Field(
54+
default=None,
55+
description="A function that takes a string and returns a list of floats. This is used to embed the query before searching the database."
56+
)
57+
58+
def _check_bucket_exists(self) -> bool:
59+
"""Check if the bucket exists in the linked Couchbase cluster"""
60+
bucket_manager = self.cluster.buckets()
61+
try:
62+
bucket_manager.get_bucket(self.bucket_name)
63+
return True
64+
except Exception:
65+
return False
66+
67+
def _check_scope_and_collection_exists(self) -> bool:
68+
"""Check if the scope and collection exists in the linked Couchbase bucket
69+
Raises a ValueError if either is not found"""
70+
scope_collection_map: Dict[str, Any] = {}
71+
72+
# Get a list of all scopes in the bucket
73+
for scope in self._bucket.collections().get_all_scopes():
74+
scope_collection_map[scope.name] = []
75+
76+
# Get a list of all the collections in the scope
77+
for collection in scope.collections:
78+
scope_collection_map[scope.name].append(collection.name)
79+
80+
# Check if the scope exists
81+
if self.scope_name not in scope_collection_map.keys():
82+
raise ValueError(
83+
f"Scope {self.scope_name} not found in Couchbase "
84+
f"bucket {self.bucket_name}"
85+
)
86+
87+
# Check if the collection exists in the scope
88+
if self.collection_name not in scope_collection_map[self.scope_name]:
89+
raise ValueError(
90+
f"Collection {self.collection_name} not found in scope "
91+
f"{self.scope_name} in Couchbase bucket {self.bucket_name}"
92+
)
93+
94+
return True
95+
96+
def _check_index_exists(self) -> bool:
97+
"""Check if the Search index exists in the linked Couchbase cluster
98+
Raises a ValueError if the index does not exist"""
99+
if self.scoped_index:
100+
all_indexes = [
101+
index.name for index in self._scope.search_indexes().get_all_indexes()
102+
]
103+
if self.index_name not in all_indexes:
104+
raise ValueError(
105+
f"Index {self.index_name} does not exist. "
106+
" Please create the index before searching."
107+
)
108+
else:
109+
all_indexes = [
110+
index.name for index in self.cluster.search_indexes().get_all_indexes()
111+
]
112+
if self.index_name not in all_indexes:
113+
raise ValueError(
114+
f"Index {self.index_name} does not exist. "
115+
" Please create the index before searching."
116+
)
117+
118+
return True
119+
120+
def __init__(self, **kwargs):
121+
"""Initialize the CouchbaseFTSVectorSearchTool.
122+
123+
Args:
124+
**kwargs: Keyword arguments to pass to the BaseTool constructor and
125+
to configure the Couchbase connection and search parameters.
126+
Requires 'cluster', 'bucket_name', 'scope_name',
127+
'collection_name', 'index_name', and 'embedding_function'.
128+
129+
Raises:
130+
ValueError: If required parameters are missing, the Couchbase cluster
131+
cannot be reached, or the specified bucket, scope,
132+
collection, or index does not exist.
133+
"""
134+
super().__init__(**kwargs)
135+
if COUCHBASE_AVAILABLE:
136+
try:
137+
if not self.cluster:
138+
raise ValueError("Cluster instance must be provided")
139+
140+
if not self.bucket_name:
141+
raise ValueError("Bucket name must be provided")
142+
143+
if not self.scope_name:
144+
raise ValueError("Scope name must be provided")
145+
146+
if not self.collection_name:
147+
raise ValueError("Collection name must be provided")
148+
149+
if not self.index_name:
150+
raise ValueError("Index name must be provided")
151+
152+
if not self.embedding_function:
153+
raise ValueError("Embedding function must be provided")
154+
155+
self._bucket = self.cluster.bucket(self.bucket_name)
156+
self._scope = self._bucket.scope(self.scope_name)
157+
self._collection = self._scope.collection(self.collection_name)
158+
except Exception as e:
159+
raise ValueError(
160+
"Error connecting to couchbase. "
161+
"Please check the connection and credentials"
162+
) from e
163+
164+
# check if bucket exists
165+
if not self._check_bucket_exists():
166+
raise ValueError(
167+
f"Bucket {self.bucket_name} does not exist. "
168+
" Please create the bucket before searching."
169+
)
170+
171+
self._check_scope_and_collection_exists()
172+
self._check_index_exists()
173+
else:
174+
import click
175+
176+
if click.confirm(
177+
"The 'couchbase' package is required to use the CouchbaseFTSVectorSearchTool. "
178+
"Would you like to install it?"
179+
):
180+
import subprocess
181+
182+
subprocess.run(["uv", "add", "couchbase"], check=True)
183+
else:
184+
raise ImportError(
185+
"The 'couchbase' package is required to use the CouchbaseFTSVectorSearchTool. "
186+
"Please install it with: uv add couchbase"
187+
)
188+
189+
def _run(self, query: str) -> str:
190+
"""Execute a vector search query against the Couchbase index.
191+
192+
Args:
193+
query: The search query string.
194+
195+
Returns:
196+
A JSON string containing the search results.
197+
198+
Raises:
199+
ValueError: If the search query fails or returns results without fields.
200+
"""
201+
query_embedding = self.embedding_function(query)
202+
fields = ["*"]
203+
204+
search_req = search.SearchRequest.create(
205+
VectorSearch.from_vector_query(
206+
VectorQuery(
207+
self.embedding_key,
208+
query_embedding,
209+
self.limit
210+
)
211+
)
212+
)
213+
214+
try:
215+
if self.scoped_index:
216+
search_iter = self._scope.search(
217+
self.index_name,
218+
search_req,
219+
SearchOptions(
220+
limit=self.limit,
221+
fields=fields,
222+
)
223+
)
224+
else:
225+
search_iter = self.cluster.search(
226+
self.index_name,
227+
search_req,
228+
SearchOptions(
229+
limit=self.limit,
230+
fields=fields
231+
)
232+
)
233+
234+
json_response = []
235+
236+
for row in search_iter.rows():
237+
json_response.append(row.fields)
238+
except Exception as e:
239+
return f"Search failed with error: {e}"
240+
241+
return json.dumps(json_response, indent=2)

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,9 @@ apify = [
9898
databricks-sdk = [
9999
"databricks-sdk>=0.46.0",
100100
]
101+
couchbase = [
102+
"couchbase>=4.3.5",
103+
]
101104
mcp = [
102105
"mcp>=1.6.0",
103106
"mcpadapt>=0.1.9",

0 commit comments

Comments
 (0)