-
Notifications
You must be signed in to change notification settings - Fork 450
Add MongoDB Vector Search Tool #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@blink1073 nice work. Can we try an approach where we are using the direct apis from mongo over wrapping another package just to call this? Don't want the dependency overhead of langchain here. |
Not a problem, I'll just end up duplicating some code from |
@lorenzejay I'm made the requested changes while preserving the API. |
- Changed import of EnvVar from tests.utils to crewai.tools in multiple files. - Updated README.md for MongoDB vector search tool with additional context. - Modified subprocess command in vector_search.py for package installation. - Cleaned up test_generate_tool_specs.py to improve mock patching syntax. - Deleted unused tests/utils.py file.
@blink1073 made a push for some fixes. use using this: if __name__ == "__main__":
from crewai import Agent, Task, Crew
from crewai_tools import MongoDBVectorSearchConfig, MongoDBVectorSearchTool
# Setup custom embedding model and customize the parameters.
query_config = MongoDBVectorSearchConfig(limit=10)
tool = MongoDBVectorSearchTool(
database_name="sample_mflix",
collection_name="embedded_movies",
connection_string="<>",
query_config=query_config,
vector_index_name="_id_",
generative_model="gpt-4o",
)
# Adding the tool to an agent
rag_agent = Agent(
name="rag_agent",
role="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
goal="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
backstory="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
llm="gpt-4o-mini",
tools=[tool],
)
task = Task(
name="rag_task",
description="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool. the query: {query}",
expected_output="The answer to the question",
agent=rag_agent,
)
crew = Crew(agents=[rag_agent], tasks=[task], verbose=True)
res = crew.kickoff(inputs={"query": "tell me about the movie: From Hand to Mouth"})
print("res", res) i'm getting pretty poor results on the default db. any help? ![]() |
I'll take a look. Here's the integ test I had written that will run using our creds nightly: mongodb-labs/ai-ml-pipeline-testing#71. |
Ah, I see the difference, what you're using it as is actually a follow-up capability, searching within a database itself. This initial PR is for vector search only, which is what my example does. It creates embeddings for each page of the PDF and then runs the query against those embeddings. |
I updated the crewai dep |
To clarify, as part of INTPYTHON-332 we would add a |
- Removed `auth0-python` package. - Updated `crewai` version to 0.140.0 and adjusted its dependencies. - Changed `json-repair` version to 0.25.2. - Updated `litellm` version to 1.72.6. - Modified dependency markers for several packages to improve compatibility with Python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work here!
I dropped a few comments let mw know what you think
…ling and new dimensions field - Added logging for error handling in the _run method and during client cleanup. - Introduced a new 'dimensions' field in the MongoDBVectorSearchConfig for embedding vector size. - Refactored the _run method to return JSON formatted results and handle exceptions gracefully. - Cleaned up import statements and improved code readability.
Just a heads up, I'm still working on updating our integration tests in mongodb-labs/ai-ml-pipeline-testing#71, I'll let y'all know when I get it working |
@blink1073 can you try running this: i'm getting poor results: if __name__ == "__main__":
from crewai import Agent, Task, Crew
tool = MongoDBVectorSearchTool(
database_name="sample_mflix",
collection_name="embedded_movies",
connection_string="<>",
embedding_key="plot_embedding",
)
agent = Agent(
role="MongoDBVectorSearchTool",
goal="You are a helpful assistant that can answer questions about the MongoDB database.",
backstory="You are a helpful assistant that can answer questions about the MongoDB database.",
tools=[tool],
llm="gpt-4.1",
)
task = Task(
description="get the movies with the director Alfred J. Goulding, use no filters",
expected_output="The movies with the director Alfred J. Goulding",
agent=agent,
)
crew = Crew(
agents=[agent],
tasks=[task],
verbose=True,
)
result = crew.kickoff()
print("result", result) |
im using the default collection when you create a new mongo instance |
We had that same conversation last month. ;) |
Do you have an example of it working then? Something I can run to confirm ? Let’s bring this home today |
Yes, the integration test is now passing: https://github.com/mongodb-labs/ai-ml-pipeline-testing/pull/71/files#diff-5c01b996bf644e0a14a5aa2a00ec357d24dbe961c3157919a979bc762f1344c4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Excellent, thank you both! |
* INTPYTHON-580 Design and Implement MongoDBVectorSearchTool * add implementation * wip * wip * finish tests * add todo * refactor to wrap langchain-mongodb * cleanup * address review * Fix usage of EnvVar class * inline code * lint * lint * fix usage of SearchIndexModel * Refactor: Update EnvVar import path and remove unused tests.utils module - Changed import of EnvVar from tests.utils to crewai.tools in multiple files. - Updated README.md for MongoDB vector search tool with additional context. - Modified subprocess command in vector_search.py for package installation. - Cleaned up test_generate_tool_specs.py to improve mock patching syntax. - Deleted unused tests/utils.py file. * update the crewai dep and the lockfile * chore: update package versions and dependencies in uv.lock - Removed `auth0-python` package. - Updated `crewai` version to 0.140.0 and adjusted its dependencies. - Changed `json-repair` version to 0.25.2. - Updated `litellm` version to 1.72.6. - Modified dependency markers for several packages to improve compatibility with Python versions. * refactor: improve MongoDB vector search tool with enhanced error handling and new dimensions field - Added logging for error handling in the _run method and during client cleanup. - Introduced a new 'dimensions' field in the MongoDBVectorSearchConfig for embedding vector size. - Refactored the _run method to return JSON formatted results and handle exceptions gracefully. - Cleaned up import statements and improved code readability. * address review * update tests * debug * fix test * fix test * fix test * support azure openai --------- Co-authored-by: lorenzejay <[email protected]>
I am a maintainer of
langchain-mongodb
, which I wrapped to create a crewAI tool.