Skip to content
This repository was archived by the owner on Dec 28, 2025. It is now read-only.

Conversation

@fantasy-lotus
Copy link
Collaborator

No description provided.

weijinglin and others added 30 commits September 16, 2025 14:54
- Replace VectorIndex.from_index_file() with VectorStoreBase.from_name()
- Use get_vector_index_class() factory function
- Use get_vector_index_info() method for consistent interface
- Fix ImportError after merge
- Remove VectorIndex import and usage
- Update clean_all_graph_index() to use get_vector_index_class()
- Use VectorStoreBase.clean() method with graph_name parameter
- Fix remaining ImportError after merge
Copilot AI review requested due to automatic review settings October 27, 2025 09:50
@github-actions
Copy link

@codecov-ai-reviewer review

@codecov-ai
Copy link

codecov-ai bot commented Oct 27, 2025

On it! We are reviewing the PR and will provide feedback shortly.

@gemini-code-assist
Copy link

Summary of Changes

Hello @fantasy-lotus, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the hugegraph-llm project by integrating robust support for external vector databases, Milvus and Qdrant. The core vector indexing logic has been refactored into a modular design, allowing for easier addition of new vector store backends. This change introduces new configuration settings and updates the user interface to enable seamless selection and management of different vector engines, thereby improving the scalability and adaptability of the LLM application's vector search capabilities.

Highlights

  • Vector Database Integration: Introduced support for external vector databases, specifically Milvus and Qdrant, alongside the existing Faiss implementation, providing more flexibility for vector indexing.
  • Modular Vector Indexing: Refactored the vector index management by introducing a VectorStoreBase abstract class and separate implementations for Faiss, Milvus, and Qdrant, enhancing extensibility and maintainability.
  • Configuration and UI Updates: Added new configuration options for Milvus and Qdrant in config.md and index_config.py, and updated the Gradio UI (configs_block.py) to allow users to select and configure their preferred vector engine.
  • Embedding API Enhancements: Modified embedding models (LiteLLM, Ollama, OpenAI) to dynamically determine embedding dimensions and handle batch processing more efficiently, improving performance and robustness.
  • Build System and Linting: Updated pyproject.toml to include optional dependencies for vector databases and added configurations for mypy and ruff for improved code quality and consistency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Oct 27, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements vector database support, introducing Milvus and Qdrant as alternatives to the existing Faiss implementation. The changes establish a common interface for vector stores, add configuration management for vector database backends, and update embedding models to support dimension detection and batch processing.

Key changes:

  • Adds abstract VectorStoreBase class and implementations for Faiss, Milvus, and Qdrant
  • Introduces IndexConfig for managing vector database connection settings
  • Enhances embedding models with get_embedding_dim() method and improved batch processing
  • Updates operators and nodes to use the new vector store abstraction

Reviewed Changes

Copilot reviewed 50 out of 51 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
hugegraph-llm/src/hugegraph_llm/indices/vector_index/base.py Defines abstract base class for vector stores
hugegraph-llm/src/hugegraph_llm/indices/vector_index/faiss_vector_store.py Refactored Faiss implementation conforming to base interface
hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py New Milvus vector store implementation
hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py New Qdrant vector store implementation
hugegraph-llm/src/hugegraph_llm/config/index_config.py Configuration class for vector database settings
hugegraph-llm/src/hugegraph_llm/models/embeddings/*.py Enhanced embedding classes with dimension detection and batch processing
hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py Utility functions for vector index selection and initialization
hugegraph-llm/src/hugegraph_llm/operators/index_op/*.py Updated operators to use new vector store abstraction
hugegraph-llm/pyproject.toml Added optional vectordb dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable refactoring to support multiple vector database backends (Faiss, Milvus, Qdrant), moving from a concrete Faiss implementation to a flexible VectorStoreBase abstraction. The changes are extensive, touching configuration, the Gradio UI, core operators, and embedding models. While the overall direction is excellent, I've identified a critical issue with ID generation in the Qdrant implementation that could lead to data loss, along with several high and medium-severity issues related to configuration safety, code clarity, and maintainability. Addressing these points will solidify this new abstraction and ensure its robustness.

fantasy-lotus and others added 9 commits October 27, 2025 19:04
…tic_index.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ctor_store.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…isions

The previous implementation used loop index (i) as point ID, which caused
severe data loss when add() was called multiple times - IDs would restart
from 0 and overwrite existing points.

Fixed by using uuid.uuid4() to generate unique IDs for each point across
all add operations, ensuring data integrity.

This resolves a critical bug that could lead to inconsistent vector index
and missing embeddings in Qdrant storage.
… improvements

Resolved conflicts by prioritizing:
- Local vector index implementations (VectorStoreBase, multi-backend support)
- Local embedding improvements (batch size handling, dynamic dimension detection)
- External flow/node architecture from apache/main
- Kept mypy and ruff configurations from local branch

All previous fixes maintained:
- Qdrant UUID fix for point IDs
- Embedding batch size auto-splitting
- Vector config UI restoration
- ImportError fixes for new vector API
…ength, signature mismatch)

- Remove unused 'os' import from graph_index_utils.py
- Fix trailing whitespace in multiple files
- Break long lines to comply with 120 char limit
- Add batch_size parameter to OllamaEmbedding.async_get_texts_embeddings for consistency
- Improve pylint comment placement for better readability

This improves the pylint score from 9.35/10 to 9.36/10
Fixed warnings:
- W0718 (broad-exception-caught): Added pylint disable comments for legitimate
  broad exception handling in error recovery paths where multiple exception
  types need to be caught
- R1711 (useless-return): Removed redundant return statement in utils.py
- R0912 (too-many-branches): Suppressed for apply_vector_engine_backend()
  which handles multiple vector DB backends
- E0401 (import-error): Suppressed for optional dependencies (pymilvus,
  qdrant_client) that may not be installed

These changes improve pylint score from 9.36/10 to 9.38/10 while maintaining
code quality and error handling robustness.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants