feat(llm):vector db #56

fantasy-lotus · 2025-10-27T09:50:57Z

No description provided.

…nd Pipeline Pooling (apache#48)

…in web demo (apache#49)

Co-authored-by: Copilot <[email protected]> Co-authored-by: Linyu <[email protected]>

…ode Initialization (apache#51)

…_connection

…oject.toml

- Replace VectorIndex.from_index_file() with VectorStoreBase.from_name() - Use get_vector_index_class() factory function - Use get_vector_index_info() method for consistent interface - Fix ImportError after merge

- Remove VectorIndex import and usage - Update clean_all_graph_index() to use get_vector_index_class() - Use VectorStoreBase.clean() method with graph_name parameter - Fix remaining ImportError after merge

github-actions · 2025-10-27T09:51:10Z

@codecov-ai-reviewer review

codecov-ai · 2025-10-27T09:51:16Z

On it! We are reviewing the PR and will provide feedback shortly.

gemini-code-assist · 2025-10-27T09:51:51Z

Summary of Changes

Hello @fantasy-lotus, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the hugegraph-llm project by integrating robust support for external vector databases, Milvus and Qdrant. The core vector indexing logic has been refactored into a modular design, allowing for easier addition of new vector store backends. This change introduces new configuration settings and updates the user interface to enable seamless selection and management of different vector engines, thereby improving the scalability and adaptability of the LLM application's vector search capabilities.

Highlights

Vector Database Integration: Introduced support for external vector databases, specifically Milvus and Qdrant, alongside the existing Faiss implementation, providing more flexibility for vector indexing.
Modular Vector Indexing: Refactored the vector index management by introducing a VectorStoreBase abstract class and separate implementations for Faiss, Milvus, and Qdrant, enhancing extensibility and maintainability.
Configuration and UI Updates: Added new configuration options for Milvus and Qdrant in config.md and index_config.py, and updated the Gradio UI (configs_block.py) to allow users to select and configure their preferred vector engine.
Embedding API Enhancements: Modified embedding models (LiteLLM, Ollama, OpenAI) to dynamically determine embedding dimensions and handle batch processing more efficiently, improving performance and robustness.
Build System and Linting: Updated pyproject.toml to include optional dependencies for vector databases and added configurations for mypy and ruff for improved code quality and consistency.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py

hugegraph-llm/src/hugegraph_llm/models/embeddings/litellm.py

hugegraph-llm/src/hugegraph_llm/nodes/index_node/semantic_id_query_node.py

hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py

hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py

hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py

hugegraph-llm/src/hugegraph_llm/operators/index_op/gremlin_example_index_query.py

hugegraph-llm/src/hugegraph_llm/indices/vector_index/faiss_vector_store.py

coderabbitai · 2025-10-27T09:52:48Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull Request Overview

This PR implements vector database support, introducing Milvus and Qdrant as alternatives to the existing Faiss implementation. The changes establish a common interface for vector stores, add configuration management for vector database backends, and update embedding models to support dimension detection and batch processing.

Key changes:

Adds abstract VectorStoreBase class and implementations for Faiss, Milvus, and Qdrant
Introduces IndexConfig for managing vector database connection settings
Enhances embedding models with get_embedding_dim() method and improved batch processing
Updates operators and nodes to use the new vector store abstraction

Reviewed Changes

Copilot reviewed 50 out of 51 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`hugegraph-llm/src/hugegraph_llm/indices/vector_index/base.py`	Defines abstract base class for vector stores
`hugegraph-llm/src/hugegraph_llm/indices/vector_index/faiss_vector_store.py`	Refactored Faiss implementation conforming to base interface
`hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py`	New Milvus vector store implementation
`hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py`	New Qdrant vector store implementation
`hugegraph-llm/src/hugegraph_llm/config/index_config.py`	Configuration class for vector database settings
`hugegraph-llm/src/hugegraph_llm/models/embeddings/*.py`	Enhanced embedding classes with dimension detection and batch processing
`hugegraph-llm/src/hugegraph_llm/utils/vector_index_utils.py`	Utility functions for vector index selection and initialization
`hugegraph-llm/src/hugegraph_llm/operators/index_op/*.py`	Updated operators to use new vector store abstraction
`hugegraph-llm/pyproject.toml`	Added optional vectordb dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hugegraph-llm/src/hugegraph_llm/indices/vector_index/base.py

hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py

hugegraph-llm/src/hugegraph_llm/demo/rag_demo/app.py

gemini-code-assist

Code Review

This pull request introduces a significant and valuable refactoring to support multiple vector database backends (Faiss, Milvus, Qdrant), moving from a concrete Faiss implementation to a flexible VectorStoreBase abstraction. The changes are extensive, touching configuration, the Gradio UI, core operators, and embedding models. While the overall direction is excellent, I've identified a critical issue with ID generation in the Qdrant implementation that could lead to data loss, along with several high and medium-severity issues related to configuration safety, code clarity, and maintainability. Addressing these points will solidify this new abstraction and ensure its robustness.

hugegraph-llm/src/hugegraph_llm/indices/vector_index/qdrant_vector_store.py

hugegraph-llm/src/hugegraph_llm/config/index_config.py

hugegraph-llm/src/hugegraph_llm/demo/rag_demo/admin_block.py

hugegraph-llm/src/hugegraph_llm/flows/get_graph_index_info.py

hugegraph-llm/src/hugegraph_llm/indices/vector_index/base.py

hugegraph-llm/src/hugegraph_llm/indices/vector_index/milvus_vector_store.py

hugegraph-llm/src/hugegraph_llm/operators/index_op/build_semantic_index.py

…tic_index.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ctor_store.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: Copilot <[email protected]>

…isions The previous implementation used loop index (i) as point ID, which caused severe data loss when add() was called multiple times - IDs would restart from 0 and overwrite existing points. Fixed by using uuid.uuid4() to generate unique IDs for each point across all add operations, ensuring data integrity. This resolves a critical bug that could lead to inconsistent vector index and missing embeddings in Qdrant storage.

… improvements Resolved conflicts by prioritizing: - Local vector index implementations (VectorStoreBase, multi-backend support) - Local embedding improvements (batch size handling, dynamic dimension detection) - External flow/node architecture from apache/main - Kept mypy and ruff configurations from local branch All previous fixes maintained: - Qdrant UUID fix for point IDs - Embedding batch size auto-splitting - Vector config UI restoration - ImportError fixes for new vector API

…ength, signature mismatch) - Remove unused 'os' import from graph_index_utils.py - Fix trailing whitespace in multiple files - Break long lines to comply with 120 char limit - Add batch_size parameter to OllamaEmbedding.async_get_texts_embeddings for consistency - Improve pylint comment placement for better readability This improves the pylint score from 9.35/10 to 9.36/10

Fixed warnings: - W0718 (broad-exception-caught): Added pylint disable comments for legitimate broad exception handling in error recovery paths where multiple exception types need to be caught - R1711 (useless-return): Removed redundant return statement in utils.py - R0912 (too-many-branches): Suppressed for apply_vector_engine_backend() which handles multiple vector DB backends - E0401 (import-error): Suppressed for optional dependencies (pymilvus, qdrant_client) that may not be installed These changes improve pylint score from 9.36/10 to 9.38/10 while maintaining code quality and error handling robustness.

weijinglin and others added 30 commits September 16, 2025 14:54

Refactor: Refactor Scheduler to Support Dynamic Workflow Scheduling a…

dae3e24

…nd Pipeline Pooling (apache#48)

Refactor: Refactor hugegraph-ai to using CGraph & port some usecases …

41aeae5

…in web demo (apache#49)

Refactor: text2germlin with PCgraph framework (apache#50)

78011d3

Co-authored-by: Copilot <[email protected]> Co-authored-by: Linyu <[email protected]>

Refactor RAG Workflow: Modularize Flows, Add Streaming, and Improve N…

85e1296

…ode Initialization (apache#51)

feat(llm): index curd test passed

591a0d1

feat(llm): some type bug && revert to FaissVectorIndex

0692bca

feat(llm): some type bug

70da993

feat(llm): some type bug(from mypy)

68fd974

feat(llm): add License header

997d2e2

feat(llm): import sort && change name

58777ff

feat(llm): vector db finished

1a14a4b

feat(llm): updata llm

9f7d64f

feat(llm): nexpected-keyword-arg,unused-import

f470605

feat(llm): fit unitest

fce80a9

feat(llm): use lambda

f3a8a26

style: format code with black line-length 120

77dd386

fix(security): add URL validation to avoid potential SSRF in test_api…

8f3ba72

…_connection

small fix

f90c1d5

fix url

3561876

fix

a1c128e

fix

7eedb18

chore: mark vectordb optional

68e06fc

fix cycle import & add docs

6d088f7

fix

12bf415

fix

6d7c9ed

fix

f6fa0b7

fix schema g & prompt g

6daf82c

fix black

17c72bc

fix

08f6857

fix: resolve leftover conflict markers and deps in hugegraph-llm/pypr…

b78b051

…oject.toml

fantasy-lotus added 3 commits October 27, 2025 16:43

fix: update GetGraphIndexInfoFlow to use new vector index API

4ba51ca

- Replace VectorIndex.from_index_file() with VectorStoreBase.from_name() - Use get_vector_index_class() factory function - Use get_vector_index_info() method for consistent interface - Fix ImportError after merge

fix: update graph_index_utils to use new vector index API

66b0961

- Remove VectorIndex import and usage - Update clean_all_graph_index() to use get_vector_index_class() - Use VectorStoreBase.clean() method with graph_name parameter - Fix remaining ImportError after merge

fix conflicts

9690da7

Copilot AI review requested due to automatic review settings October 27, 2025 09:50

github-actions bot added llm ml python-client labels Oct 27, 2025