chore: support multiple source search in chat by appflowy · Pull Request #7906 · AppFlowy-IO/AppFlowy

appflowy · 2025-05-10T02:39:43Z

Support keyword search when chat with page

Summary by Sourcery

Implement multiple source search support for chat functionality, allowing retrieval of documents from different sources with configurable search parameters

New Features:

Support keyword search across multiple vector stores
Introduce configurable retriever with multiple source support
Add flexible document retrieval with score thresholding

Enhancements:

Refactor retriever architecture to support multiple sources
Improve document search flexibility with configurable parameters
Add support for retrieving documents from different storage backends

Chores:

Restructure chat and retriever modules
Update interfaces for document retrieval
Modify existing search implementations

sourcery-ai · 2025-05-10T02:39:47Z

Reviewer's Guide

This pull request implements support for keyword search from multiple document sources within the chat feature. It introduces a MultipleSourceRetrieverStore trait, which is implemented by the existing SqliteVectorStore (after refactoring its search logic) and a new Tantivy-based store (MultiSourceVSTanvityImpl). A new MultipleSourceRetriever queries these stores concurrently via an AFRetriever trait (refactored from a struct) and aggregates their results. Core chat components like LLMChatController, LLMChat, and ConversationalRetrieverChain have been updated to utilize this new extensible retrieval architecture. Additionally, Tantivy search capabilities were augmented with limit and score_threshold parameters to align with the retriever's requirements.

File-Level Changes

Change	Details	Files
Established a new framework for multi-source document retrieval in chat.	Defined the `MultipleSourceRetrieverStore` trait to abstract document sources. Implemented `MultipleSourceRetriever` for concurrent querying of multiple stores. Refactored `AFRetriever` from a concrete struct to a trait for retriever abstraction.	`frontend/rust-lib/flowy-ai/src/local_ai/chat/retriever/mod.rs` `frontend/rust-lib/flowy-ai/src/local_ai/chat/retriever/multi_source_retriever.rs` `frontend/rust-lib/flowy-ai/src/local_ai/chat/chains/conversation_chain.rs`
Integrated SQLite and Tantivy as document sources into the new multi-source retrieval framework.	Adapted `SqliteVectorStore` to implement `MultipleSourceRetrieverStore`, extracting its document reading logic. Introduced `MultiSourceVSTanvityImpl` as a Tantivy-based implementation of `MultipleSourceRetrieverStore`. Added `SqliteVecRetriever` implementing the `AFRetriever` trait for direct SQLite vector store usage.	`frontend/rust-lib/flowy-ai/src/embeddings/store.rs` `frontend/rust-lib/flowy-core/src/deps_resolve/chat_deps.rs` `frontend/rust-lib/flowy-ai/src/local_ai/chat/retriever/sqlite_retriever.rs`
Updated core chat components to utilize the new multi-source retrieval system.	`LLMChatController` now manages and provides `MultipleSourceRetrieverStore` instances. `LLMChat` is initialized with and uses multiple retriever sources. `ConversationalRetrieverChain` now relies on the `AFRetriever` trait. The `create_retriever` factory function was modified to instantiate `MultipleSourceRetriever`.	`frontend/rust-lib/flowy-ai/src/local_ai/chat/llm_chat.rs` `frontend/rust-lib/flowy-ai/src/local_ai/chat/mod.rs` `frontend/rust-lib/flowy-ai/src/local_ai/chat/chains/conversation_chain.rs` `frontend/rust-lib/flowy-core/src/server_layer.rs` `frontend/rust-lib/flowy-ai/tests/main.rs`
Enhanced Tantivy search functionality with result limiting and scoring.	Tantivy search functions (`DocumentTantivyState::search`, `tanvity_local_search`) now accept `limit` and `score_threshold` parameters. Tantivy search result items (`TanvitySearchResponseItem`) now include a `score` field.	`frontend/rust-lib/flowy-search-pub/src/tantivy_state.rs` `frontend/rust-lib/flowy-server/src/util.rs` `frontend/rust-lib/flowy-search-pub/src/entities.rs` `frontend/rust-lib/flowy-search/src/document/local_search_handler.rs` `frontend/rust-lib/flowy-server/src/af_cloud/impls/search.rs` `frontend/rust-lib/flowy-server/src/local_server/impls/search.rs`
Adjusted the chat's Question-Answering prompt.	Removed the instruction for the LLM to respond "I don't know" if context is insufficient.	`frontend/rust-lib/flowy-ai/src/local_ai/chat/format_prompt.rs`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @appflowy - I've reviewed your changes - here's some feedback:

The _full_search parameter in MultipleSourceRetrieverStore::read_documents is not utilized by its current implementations; ensure its intended effect is realized or remove it if redundant.
The modification to QA_CONTEXT_TEMPLATE removing the specific 'I don't know' instruction may alter the LLM's response behavior when context is insufficient.
Consider making the new search parameters (limit, score_threshold) configurable rather than hardcoded in general search handlers and chat retriever initialization.

Here's what I looked at during the review

🟡 General issues: 3 issues found
🟢 Security: all looks good
🟡 Testing: 1 issue found
🟡 Complexity: 2 issues found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-10T02:43:06Z

frontend/rust-lib/flowy-ai/src/embeddings/store.rs

+    debug_assert!(embedding.len() == 1);
+    let query_embedding = embedding.first().unwrap();


suggestion (bug_risk): Reconsider relying solely on debug_assert for a critical unwrap.

debug_assert is omitted in release builds, so unwrap can still panic. Add a runtime check or use a safe pattern (e.g., if let) to guarantee query_embedding is always valid.

Suggested implementation:

// Safely extract the first embedding; if unavailable, return early. let query_embedding = if let Some(query_embedding) = embedding.first() { query_embedding } else { return Ok(Vec::new()); };

Depending on your application's requirements, you might want to log or handle the case where more than one embedding is returned. If exactly one embedding is always expected, consider adding an explicit runtime check for that case.

sourcery-ai · 2025-05-10T02:43:07Z

frontend/rust-lib/flowy-ai/src/local_ai/chat/mod.rs

  }

  async fn create_chat_if_not_exist(&self, info: LLMChatInfo) -> FlowyResult<()> {
+    debug_assert!(!self.retriever_sources.read().await.is_empty());


suggestion (bug_risk): Consider handling empty retriever_sources more gracefully.

Instead of relying on debug_assert, implement error handling or a fallback when retriever_sources is empty to avoid runtime issues in production.

Suggested change

debug_assert!(!self.retriever_sources.read().await.is_empty());

{

let retriever_sources_guard = self.retriever_sources.read().await;

if retriever_sources_guard.is_empty() {

return Err(FlowyError::local_ai().with_context("Retriever sources are empty"));

}

}

sourcery-ai · 2025-05-10T02:43:07Z

frontend/rust-lib/flowy-ai/src/local_ai/chat/retriever/multi_source_retriever.rs

+          docs.len(),
+        );
+        results.extend(docs);
+      } else {


suggestion: Improve error handling in document retrieval aggregation.

Use pattern matching on the Err variant instead of unwrap_err to make the error handling more explicit.

sourcery-ai · 2025-05-10T02:43:07Z

frontend/rust-lib/flowy-ai/src/embeddings/store.rs

  }
 }

+#[async_trait]


issue (complexity): Consider extracting the embedding generation, similarity search, and result conversion logic into a shared helper function to reduce code duplication.

Consider extracting the code for embedding generation, similarity search, and converting results to documents into a shared helper function. This reduces duplicated logic between MultipleSourceRetrieverStore and the VectorStore implementation. For example, you could add a method to SqliteVectorStore like:

impl SqliteVectorStore { async fn search_and_convert( &self, workspace_id: &str, query: &str, rag_ids: &[String], limit: i32, score_threshold: f32, ) -> FlowyResult<Vec<Document>> { let vector_db = self.vector_db.upgrade() .ok_or_else(|| FlowyError::internal().with_context("Vector database not initialized"))?; let embedder = self.create_embedder()?; let request = GenerateEmbeddingsRequest::new( embedder.model().name().to_string(), EmbeddingsInput::Single(query.to_string()), ); let embedding = embedder.embed(request).await?.embeddings; if embedding.is_empty() { return Ok(Vec::new()); } let query_embedding = embedding.first().unwrap(); let results = vector_db.search_with_score( workspace_id, rag_ids, query_embedding, limit, score_threshold, ).await?; let documents = results.into_iter().map(|result| { let mut metadata = HashMap::new(); if let Some(map) = result.metadata.as_ref().and_then(|v| v.as_object()) { for (key, value) in map { metadata.insert(key.clone(), value.clone()); } } Document::new(result.content).with_metadata(metadata) }).collect(); Ok(documents) } }

Then update your trait implementations to call this helper. For example, in MultipleSourceRetrieverStore:

#[async_trait] impl MultipleSourceRetrieverStore for SqliteVectorStore { async fn read_documents( &self, workspace_id: &Uuid, query: &str, limit: usize, rag_ids: &[String], score_threshold: f32, _full_search: bool, ) -> FlowyResult<Vec<Document>> { self.search_and_convert( &workspace_id.to_string(), query, rag_ids, limit as i32, score_threshold, ).await } fn retriever_name(&self) -> &'static str { "Sqlite Multiple Source Retriever" } }

And similarly in your VectorStore implementation, delegate to the helper function. This keeps functionality intact while reducing duplicate code and complexity.

sourcery-ai · 2025-05-10T02:43:07Z

frontend/rust-lib/flowy-ai/src/local_ai/chat/retriever/multi_source_retriever.rs

+      })
+      .collect::<Vec<_>>();
+
+    let search_results = join_all(search_futures).await;


issue (complexity): Consider using iterator combinators like flat_map to simplify the error handling and document merging after join_all.

Consider refactoring the manual loop with iterator combinators to reduce verbosity. For example, after using join_all you can chain an iterator to filter out errors and flatten documents. This change keeps the functionality intact while making the code more concise:

async fn retrieve_documents(&self, query: &str) -> Result<Vec<Document>, Box<dyn Error>> { trace!( "[VectorStore] filters: {:?}, retrieving documents for query: {}", self.rag_ids, query, ); let search_futures = self.vector_stores.iter().map(|vector_store| { let vector_store = vector_store.clone(); let query = query.to_string(); let num_docs = self.num_docs; let full_search = self.full_search; let rag_ids = self.rag_ids.clone(); let workspace_id = self.workspace_id; let score_threshold = self.score_threshold; async move { vector_store .read_documents(&workspace_id, &query, num_docs, &rag_ids, score_threshold, full_search) .await .map(|docs| (vector_store.retriever_name(), docs)) } }); let results: Vec<Document> = join_all(search_futures) .await .into_iter() .flat_map(|res| match res { Ok((retriever_name, docs)) => { trace!("[VectorStore] {} found {} results", retriever_name, docs.len()); docs }, Err(e) => { error!("[VectorStore] Failed to retrieve documents: {}", e); vec![] } }) .collect(); Ok(results) }

This approach leverages flat_map to filter out errors and merge the documents into one vector, reducing the explicit loop and improving readability.

chore: support multiple source search in chat

44fada4

sourcery-ai bot reviewed May 10, 2025

View reviewed changes

chore: clippy

0d2c213

appflowy merged commit b4d48a6 into main May 10, 2025
20 checks passed

appflowy deleted the chat_embed_fallback_search branch May 10, 2025 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: support multiple source search in chat#7906

chore: support multiple source search in chat#7906
appflowy merged 2 commits intomainfrom
chat_embed_fallback_search

appflowy commented May 10, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented May 10, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot May 10, 2025

Uh oh!

sourcery-ai bot May 10, 2025

Uh oh!

sourcery-ai bot May 10, 2025

Uh oh!

sourcery-ai bot May 10, 2025

Uh oh!

sourcery-ai bot May 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		debug_assert!(embedding.len() == 1);
		let query_embedding = embedding.first().unwrap();

-    debug_assert!(!self.retriever_sources.read().await.is_empty());
+    {
+        let retriever_sources_guard = self.retriever_sources.read().await;
+        if retriever_sources_guard.is_empty() {
+            return Err(FlowyError::local_ai().with_context("Retriever sources are empty"));
+        }
+    }

Uh oh!

Conversation

appflowy commented May 10, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 10, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 10, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 10, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 10, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

appflowy commented May 10, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented May 10, 2025 •

edited

Loading