Skip to content

[Bug]: CondensePlusContextChatEngine.astream_chat silently aborts generation and yields 'Empty Response' when Retriever returns 0 nodesย #20894

@jefersonlop3s

Description

@jefersonlop3s

Bug Description

When using CondensePlusContextChatEngine with an asynchronous stream (astream_chat), if the provided Retriever (e.g., QueryFusionRetriever or VectorIndexRetriever) returns 0 nodes (which can happen frequently with valid metadata filters like tenant IDs), the Chat Engine silently aborts the LLM generation process.

Instead of passing the system prompt and the user query to the LLM with an empty context, the synthesizer (called via _arun_c3 -> asynthesize) evaluates to an empty node list and completely skips the LLM API call. It instantly returns a hardcoded "Empty Response" string wrapped in an AsyncStreamingResponse.

This is highly problematic for production RAG systems (like multi-tenant architectures), where a user might ask a general question or have an empty vector space on their first day. Instead of the LLM answering naturally leveraging its baseline knowledge and the System Prompt, the application receives a silent "Empty Response" in less than 1 second, with no exceptions raised, masking the behavior.

llamaindex_bug_report.md

Version

Python 3.12.x LlamaIndex v0.10.x LLM Provider: Agnostic (Tested with Ollama, but reproducible via OpenAI due to synthesizer logic)

Steps to Reproduce

import asyncio
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.chat_engine import CondensePlusContextChatEngine
from llama_index.llms.ollama import Ollama # or OpenAI

async def main():
# 1. Create an empty index (or one where filters will yield 0 nodes)
index = VectorStoreIndex.from_documents([])
retriever = index.as_retriever(similarity_top_k=2)

# 2. Setup LLM
llm = Ollama(model="qwen2.5:0.5b") # Ensure model is running locally

# 3. Build CondensePlusContextChatEngine
engine = CondensePlusContextChatEngine.from_defaults(
    retriever=retriever,
    llm=llm,
    system_prompt="You are a helpful AI."
)

# 4. Attempt to trigger Async Stream
print("Sending query...")
chat_stream = await engine.astream_chat("Hello, can you help me?")

full_text = ""
async for token in chat_stream.async_response_gen():
    full_text += token
    
print(f"Final Response: '{full_text}'") 
# Expected output: "Hello! Yes, I can help you..."
# Actual output: "Empty Response" (Instantly, no LLM call dispatched)

if name == "main":
asyncio.run(main())

Relevant Logs/Tracbacks

INFO:src.agent:๐Ÿ“‚ Initializing local search tool (RAG)...
INFO:src.agent:   โœ“ RAG tool initialized successfully
INFO:src.engine_builder:โš™๏ธ  Configuring Hybrid Search (Vector + BM25)...
INFO:src.engine_builder:   ๐Ÿ” Tenant Filter applied: Jeferson
WARNING:src.engine_builder:   โš ๏ธ  No documents found in ChromaDB for BM25.
INFO:src.engine_builder:   ๐Ÿง  Engine initialized via OLLAMA with model qwen2.5:0.5b

[STARTING ASTREAM_CHAT TRACE]

INFO:llama_index.core.chat_engine.condense_plus_context:Condensed question: Hello Sovereign! This is a test being sent directly from N8N via Webhook in the Cybrid network.

# Execution HALTS in less than 1.2s without dispatching any POST request to /api/chat

==== DEBUG RESPONSE ====
TYPE: <class 'llama_index.core.chat_engine.types.StreamingAgentChatResponse'>
FULL TEXT FINAL: 'Empty Response'
chat_stream.response: 'Empty Response'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions