Skip to content

Conversation

blink1073
Copy link
Contributor

I am a maintainer of langchain-mongodb, which I wrapped to create a crewAI tool.

@lorenzejay lorenzejay self-requested a review June 4, 2025 15:23
@lorenzejay
Copy link
Contributor

@blink1073 nice work. Can we try an approach where we are using the direct apis from mongo over wrapping another package just to call this? Don't want the dependency overhead of langchain here.

@blink1073
Copy link
Contributor Author

Not a problem, I'll just end up duplicating some code from langchain-mongodb.

@blink1073
Copy link
Contributor Author

@lorenzejay I'm made the requested changes while preserving the API.

@lorenzejay lorenzejay self-assigned this Jun 9, 2025
- Changed import of EnvVar from tests.utils to crewai.tools in multiple files.
- Updated README.md for MongoDB vector search tool with additional context.
- Modified subprocess command in vector_search.py for package installation.
- Cleaned up test_generate_tool_specs.py to improve mock patching syntax.
- Deleted unused tests/utils.py file.
@lorenzejay
Copy link
Contributor

lorenzejay commented Jun 10, 2025

@blink1073 made a push for some fixes. use crewai==0.126.0

using this:

if __name__ == "__main__":
    from crewai import Agent, Task, Crew
    from crewai_tools import MongoDBVectorSearchConfig, MongoDBVectorSearchTool

    # Setup custom embedding model and customize the parameters.
    query_config = MongoDBVectorSearchConfig(limit=10)
    tool = MongoDBVectorSearchTool(
        database_name="sample_mflix",
        collection_name="embedded_movies",
        connection_string="<>",
        query_config=query_config,
        vector_index_name="_id_",
        generative_model="gpt-4o",
    )

    # Adding the tool to an agent
    rag_agent = Agent(
        name="rag_agent",
        role="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
        goal="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
        backstory="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool.",
        llm="gpt-4o-mini",
        tools=[tool],
    )
    task = Task(
        name="rag_task",
        description="You are a helpful assistant that can answer questions with the help of the MongoDBVectorSearchTool. the query: {query}",
        expected_output="The answer to the question",
        agent=rag_agent,
    )
    crew = Crew(agents=[rag_agent], tasks=[task], verbose=True)
    res = crew.kickoff(inputs={"query": "tell me about the movie: From Hand to Mouth"})
    print("res", res)

i'm getting pretty poor results on the default db. any help?

Screenshot 2025-06-10 at 11 10 33 AM

@blink1073
Copy link
Contributor Author

I'll take a look. Here's the integ test I had written that will run using our creds nightly: mongodb-labs/ai-ml-pipeline-testing#71.

@blink1073
Copy link
Contributor Author

Ah, I see the difference, what you're using it as is actually a follow-up capability, searching within a database itself. This initial PR is for vector search only, which is what my example does. It creates embeddings for each page of the PDF and then runs the query against those embeddings.

@blink1073
Copy link
Contributor Author

I updated the crewai dep

@blink1073
Copy link
Contributor Author

To clarify, as part of INTPYTHON-332 we would add a mongodb_search_tool directory which could be used to perform your query.

- Removed `auth0-python` package.
- Updated `crewai` version to 0.140.0 and adjusted its dependencies.
- Changed `json-repair` version to 0.25.2.
- Updated `litellm` version to 1.72.6.
- Modified dependency markers for several packages to improve compatibility with Python versions.
Copy link
Contributor

@lucasgomide lucasgomide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work here!

I dropped a few comments let mw know what you think

lorenzejay and others added 2 commits July 8, 2025 10:10
…ling and new dimensions field

- Added logging for error handling in the _run method and during client cleanup.
- Introduced a new 'dimensions' field in the MongoDBVectorSearchConfig for embedding vector size.
- Refactored the _run method to return JSON formatted results and handle exceptions gracefully.
- Cleaned up import statements and improved code readability.
@blink1073 blink1073 requested a review from lucasgomide July 8, 2025 18:29
@blink1073
Copy link
Contributor Author

Just a heads up, I'm still working on updating our integration tests in mongodb-labs/ai-ml-pipeline-testing#71, I'll let y'all know when I get it working

@lorenzejay
Copy link
Contributor

@blink1073 can you try running this:

i'm getting poor results:

if __name__ == "__main__":
    from crewai import Agent, Task, Crew

    tool = MongoDBVectorSearchTool(
        database_name="sample_mflix",
        collection_name="embedded_movies",
        connection_string="<>",
        embedding_key="plot_embedding",
    )
   

    agent = Agent(
        role="MongoDBVectorSearchTool",
        goal="You are a helpful assistant that can answer questions about the MongoDB database.",
        backstory="You are a helpful assistant that can answer questions about the MongoDB database.",
        tools=[tool],
        llm="gpt-4.1",
    )

    task = Task(
        description="get the movies with the director Alfred J. Goulding, use no filters",
        expected_output="The movies with the director Alfred J. Goulding",
        agent=agent,
    )

    crew = Crew(
        agents=[agent],
        tasks=[task],
        verbose=True,
    )
    result = crew.kickoff()
    print("result", result)

@lorenzejay
Copy link
Contributor

im using the default collection when you create a new mongo instance

@blink1073
Copy link
Contributor Author

We had that same conversation last month. ;)

#319 (comment)

@lorenzejay
Copy link
Contributor

Do you have an example of it working then? Something I can run to confirm ? Let’s bring this home today

@blink1073
Copy link
Contributor Author

Yes, the integration test is now passing: https://github.com/mongodb-labs/ai-ml-pipeline-testing/pull/71/files#diff-5c01b996bf644e0a14a5aa2a00ec357d24dbe961c3157919a979bc762f1344c4

Copy link
Contributor

@lorenzejay lorenzejay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lorenzejay lorenzejay merged commit d12ba28 into crewAIInc:main Jul 9, 2025
4 checks passed
@blink1073
Copy link
Contributor Author

Excellent, thank you both!

mplachta pushed a commit to mplachta/crewAI-tools that referenced this pull request Aug 27, 2025
* INTPYTHON-580 Design and Implement MongoDBVectorSearchTool

* add implementation

* wip

* wip

* finish tests

* add todo

* refactor to wrap langchain-mongodb

* cleanup

* address review

* Fix usage of EnvVar class

* inline code

* lint

* lint

* fix usage of SearchIndexModel

* Refactor: Update EnvVar import path and remove unused tests.utils module

- Changed import of EnvVar from tests.utils to crewai.tools in multiple files.
- Updated README.md for MongoDB vector search tool with additional context.
- Modified subprocess command in vector_search.py for package installation.
- Cleaned up test_generate_tool_specs.py to improve mock patching syntax.
- Deleted unused tests/utils.py file.

* update the crewai dep and the lockfile

* chore: update package versions and dependencies in uv.lock

- Removed `auth0-python` package.
- Updated `crewai` version to 0.140.0 and adjusted its dependencies.
- Changed `json-repair` version to 0.25.2.
- Updated `litellm` version to 1.72.6.
- Modified dependency markers for several packages to improve compatibility with Python versions.

* refactor: improve MongoDB vector search tool with enhanced error handling and new dimensions field

- Added logging for error handling in the _run method and during client cleanup.
- Introduced a new 'dimensions' field in the MongoDBVectorSearchConfig for embedding vector size.
- Refactored the _run method to return JSON formatted results and handle exceptions gracefully.
- Cleaned up import statements and improved code readability.

* address review

* update tests

* debug

* fix test

* fix test

* fix test

* support azure openai

---------

Co-authored-by: lorenzejay <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants