-
Notifications
You must be signed in to change notification settings - Fork 0
feat(llm):vector db #56
base: agenticrag/dev
Are you sure you want to change the base?
Conversation
…nd Pipeline Pooling (apache#48)
Co-authored-by: Copilot <[email protected]> Co-authored-by: Linyu <[email protected]>
…ode Initialization (apache#51)
- Replace VectorIndex.from_index_file() with VectorStoreBase.from_name() - Use get_vector_index_class() factory function - Use get_vector_index_info() method for consistent interface - Fix ImportError after merge
- Remove VectorIndex import and usage - Update clean_all_graph_index() to use get_vector_index_class() - Use VectorStoreBase.clean() method with graph_name parameter - Fix remaining ImportError after merge
|
@codecov-ai-reviewer review |
|
On it! We are reviewing the PR and will provide feedback shortly. |
Summary of ChangesHello @fantasy-lotus, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py
Show resolved
Hide resolved
hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py
Show resolved
Hide resolved
hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py
Show resolved
Hide resolved
hugegraph-llm/src/hugegraph_llm/operators/index_op/gremlin_example_index_query.py
Show resolved
Hide resolved
hugegraph-llm/src/hugegraph_llm/indices/vector_index/faiss_vector_store.py
Show resolved
Hide resolved
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements vector database support, introducing Milvus and Qdrant as alternatives to the existing Faiss implementation. The changes establish a common interface for vector stores, add configuration management for vector database backends, and update embedding models to support dimension detection and batch processing.
Key changes:
- Adds abstract
VectorStoreBaseclass and implementations for Faiss, Milvus, and Qdrant - Introduces
IndexConfigfor managing vector database connection settings - Enhances embedding models with
get_embedding_dim()method and improved batch processing - Updates operators and nodes to use the new vector store abstraction
Reviewed Changes
Copilot reviewed 50 out of 51 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
hugegraph-llm/src/hugegraph_llm/indices/vector_index/base.py |
Defines abstract base class for vector stores |
hugegraph-llm/src/hugegraph_llm/indices/vector_index/faiss_vector_store.py |
Refactored Faiss implementation conforming to base interface |
hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py |
New Milvus vector store implementation |
hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py |
New Qdrant vector store implementation |
hugegraph-llm/src/hugegraph_llm/config/index_config.py |
Configuration class for vector database settings |
hugegraph-llm/src/hugegraph_llm/models/embeddings/*.py |
Enhanced embedding classes with dimension detection and batch processing |
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py |
Utility functions for vector index selection and initialization |
hugegraph-llm/src/hugegraph_llm/operators/index_op/*.py |
Updated operators to use new vector store abstraction |
hugegraph-llm/pyproject.toml |
Added optional vectordb dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and valuable refactoring to support multiple vector database backends (Faiss, Milvus, Qdrant), moving from a concrete Faiss implementation to a flexible VectorStoreBase abstraction. The changes are extensive, touching configuration, the Gradio UI, core operators, and embedding models. While the overall direction is excellent, I've identified a critical issue with ID generation in the Qdrant implementation that could lead to data loss, along with several high and medium-severity issues related to configuration safety, code clarity, and maintainability. Addressing these points will solidify this new abstraction and ensure its robustness.
hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py
Outdated
Show resolved
Hide resolved
hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py
Outdated
Show resolved
Hide resolved
hugegraph-llm/src/hugegraph_llm/operators/index_op/build_semantic_index.py
Outdated
Show resolved
Hide resolved
…tic_index.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ctor_store.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <[email protected]>
…isions The previous implementation used loop index (i) as point ID, which caused severe data loss when add() was called multiple times - IDs would restart from 0 and overwrite existing points. Fixed by using uuid.uuid4() to generate unique IDs for each point across all add operations, ensuring data integrity. This resolves a critical bug that could lead to inconsistent vector index and missing embeddings in Qdrant storage.
… improvements Resolved conflicts by prioritizing: - Local vector index implementations (VectorStoreBase, multi-backend support) - Local embedding improvements (batch size handling, dynamic dimension detection) - External flow/node architecture from apache/main - Kept mypy and ruff configurations from local branch All previous fixes maintained: - Qdrant UUID fix for point IDs - Embedding batch size auto-splitting - Vector config UI restoration - ImportError fixes for new vector API
…ength, signature mismatch) - Remove unused 'os' import from graph_index_utils.py - Fix trailing whitespace in multiple files - Break long lines to comply with 120 char limit - Add batch_size parameter to OllamaEmbedding.async_get_texts_embeddings for consistency - Improve pylint comment placement for better readability This improves the pylint score from 9.35/10 to 9.36/10
Fixed warnings: - W0718 (broad-exception-caught): Added pylint disable comments for legitimate broad exception handling in error recovery paths where multiple exception types need to be caught - R1711 (useless-return): Removed redundant return statement in utils.py - R0912 (too-many-branches): Suppressed for apply_vector_engine_backend() which handles multiple vector DB backends - E0401 (import-error): Suppressed for optional dependencies (pymilvus, qdrant_client) that may not be installed These changes improve pylint score from 9.36/10 to 9.38/10 while maintaining code quality and error handling robustness.
No description provided.