-
Notifications
You must be signed in to change notification settings - Fork 0
INTPYTHON-580 Design and Implement MongoDBVectorSearchTool #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comments.
collection_name='example_collections', | ||
connection_string="<your_mongodb_connection_string>", | ||
query_config=query_config, | ||
index_name="my_vector_index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this vector_index_name. It is explicit and will avoid backwards compatibility issues is this gets adoption and we want access to other search types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
from crewai_tools import MongoDBVectorSearchConfig, MongoDBVectorSearchTool | ||
|
||
# Setup custom embedding model and customize the parameters. | ||
query_config = MongoDBVectorSearchConfig(limit=10, oversampling_factor=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like you could combine the two examples. "MongoDBVectorSearchTool provides a number of configurable parameters. The kwarg query_config takes a MongoDBVectorSearchConfig. For example....
On the vector index, is this automatically created? Is it clear what will you vectorized? It's worth noting that embedding models can embed any text, from plain text to embedded json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like showing the simplest case so they can copy-paste and get rolling. No, the vector index has to be explicitly created using create_vector_search_index
. I don't follow the embedded json part, the type annotation for add_texts is texts: Iterable[str]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thoughts. This looks great. Wrapping langchain_mongodb was a brilliant move.
|
||
query: str = Field( | ||
..., | ||
description="The query to search retrieve relevant information from the MongoDB database. Pass only the query, not the question.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between query and question? Is that a CrewAI thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following prior examples.
This version contains a dedicated fix fro CrewAIAdapter when schema doesn't allow null
- Changed import of EnvVar from tests.utils to crewai.tools in multiple files. - Updated README.md for MongoDB vector search tool with additional context. - Modified subprocess command in vector_search.py for package installation. - Cleaned up test_generate_tool_specs.py to improve mock patching syntax. - Deleted unused tests/utils.py file.
crewAIInc#331) * refactor: remove token validation from EnterpriseActionKitToolAdapter and CrewaiEnterpriseTools This commit simplifies the initialization of the EnterpriseActionKitToolAdapter and CrewaiEnterpriseTools by removing the explicit validation for the enterprise action token. The token can now be set to None without raising an error, allowing for more flexible usage. * added loggers for monitoring * fixed typo
* feat: add explictly package_dependencies in the Tools * feat: collect package_dependencies from Tool to add in tool.specs.json * feat: add default value in run_params Tool' specs * fix: support get boolean values This commit also refactor test to make easier define newest attributes into a Tool
crewAIInc#332) (crewAIInc#333) We’re currently using the JSON Schema standard for these fields
This change allows accessing tools by name (tools["tool_name"]) in addition to index (tools[0]), making it more intuitive and convenient to work with multiple tools without needing to remember their position in the list
* Add Oxylabs tools * Review updates * Add package_dependencies attribute
* feat: support to complex filter on ToolCollection * refactor: use proper tool collection methot to filter tool in CrewAiEnterpriseTools * feat: allow to filter available MCP tools
* refactor: remove token validation from EnterpriseActionKitToolAdapter and CrewaiEnterpriseTools This commit simplifies the initialization of the EnterpriseActionKitToolAdapter and CrewaiEnterpriseTools by removing the explicit validation for the enterprise action token. The token can now be set to None without raising an error, allowing for more flexible usage. * added loggers for monitoring * fixed typo * fix: enhance token handling in EnterpriseActionKitToolAdapter and CrewaiEnterpriseTools This commit improves the handling of the enterprise action token by allowing it to be fetched from environment variables if not provided. It adds checks to ensure the token is set before making API requests, enhancing robustness and flexibility. * removed redundancy * test: add new test for environment token fallback in CrewaiEnterpriseTools This update introduces a new test case to verify that the environment token is used when no token is provided during the initialization of CrewaiEnterpriseTools. Additionally, minor formatting adjustments were made to existing assertions for consistency. * test: update environment token test to clear environment variables This change modifies the test for CrewaiEnterpriseTools to ensure that the environment variables are cleared before setting the test token. This ensures a clean test environment and prevents potential interference from other tests. * drop redundancy
…crewAIInc#346) * feat: add support for parsing actions list from environment variables This commit introduces a new function, _parse_actions_list, to handle the parsing of a string representation of a list of tool names from environment variables. The CrewaiEnterpriseTools now utilizes this function to filter tools based on the parsed actions list, enhancing flexibility in tool selection. Additionally, a new test case is added to verify the correct usage of the environment actions list. * test: simplify environment actions list test setup This commit refactors the test for CrewaiEnterpriseTools to streamline the setup of environment variables. The environment token and actions list are now set in a single patch.dict call, improving readability and reducing redundancy in the test code.
…andling (crewAIInc#351) - Added TYPE_CHECKING imports for FirecrawlApp to enhance type safety. - Updated configuration keys in FirecrawlCrawlWebsiteTool and FirecrawlScrapeWebsiteTool to camelCase for consistency. - Introduced error handling in the _run methods of both tools to ensure FirecrawlApp is properly initialized before usage. - Adjusted parameters passed to crawl_url and scrape_url methods to use 'params' instead of unpacking the config dictionary directly.
Signed-off-by: Emmanuel Ferdman <[email protected]>
) * refactor: enhance schema handling in EnterpriseActionTool - Extracted schema property and required field extraction into separate methods for better readability and maintainability. - Introduced methods to analyze field types and create Pydantic field definitions based on nullability and requirement status. - Updated the _run method to handle required nullable fields, ensuring they are set to None if not provided in kwargs. * refactor: streamline nullable field handling in EnterpriseActionTool - Removed commented-out code related to handling required nullable fields for clarity. - Simplified the logic in the _run method to focus on processing parameters without unnecessary comments.
- Removed `auth0-python` package. - Updated `crewai` version to 0.140.0 and adjusted its dependencies. - Changed `json-repair` version to 0.25.2. - Updated `litellm` version to 1.72.6. - Modified dependency markers for several packages to improve compatibility with Python versions.
* - Added CouchbaseFTSVectorStore as a CrewAI tool. - Wrote a README to setup the tool. - Wrote test cases. - Added Couchbase as an optional dependency in the project. * Fixed naming in some places. Added docstrings. Added instructions on how to create a vector search index. * Fixed pyproject.toml * error handling and response format - Removed unnecessary ImportError for missing 'couchbase' package. - Changed response format from a concatenated string to a JSON array for search results. - Updated error handling to return error messages instead of raising exceptions in certain cases. - Adjusted tests to reflect changes in response format and error handling. * Update dependencies in pyproject.toml and uv.lock - Changed pydantic version from 2.6.1 to 2.10.6 in both pyproject.toml and uv.lock. - Updated crewai-tools version from 0.42.2 to 0.42.3 in uv.lock. - Adjusted pydantic-core version from 2.33.1 to 2.27.2 in uv.lock, reflecting the new pydantic version. * Removed restrictive pydantic version and updated uv.lock * synced lockfile * regenerated lockfile * updated lockfile * regenerated lockfile * Update tool specifications for * Fix test cases --------- Co-authored-by: AayushTyagi1 <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ling and new dimensions field - Added logging for error handling in the _run method and during client cleanup. - Introduced a new 'dimensions' field in the MongoDBVectorSearchConfig for embedding vector size. - Refactored the _run method to return JSON formatted results and handle exceptions gracefully. - Cleaned up import statements and improved code readability.
Integration tests: mongodb-labs/ai-ml-pipeline-testing#71