Context chunking in an Agent system #10039
Replies: 6 comments
-
|
The issue you're hitting is a common pattern: accumulating all context before analysis defeats the purpose of chunking. Solution: State-Based Incremental Processing Instead of letting the pipeline accumulate chunks, use shared state to process incrementally: state = {
"invoices_to_process": ["inv_001", "inv_002", ...],
"partial_results": [],
"current_index": 0
}
def process_next_chunk():
idx = state["current_index"]
if idx >= len(state["invoices_to_process"]):
return aggregate_results(state["partial_results"])
# Process single chunk
chunk = load_invoice(state["invoices_to_process"][idx])
result = analyze_chunk(chunk) # Small context window
# Store partial result
state["partial_results"].append(result)
state["current_index"] += 1
return process_next_chunk() # Or return control to orchestratorWhy this works:
Alternative: MapReduce Pattern # Map phase: parallel partial analysis
partial_results = [analyze_chunk(chunk) for chunk in chunks]
# Reduce phase: aggregate structured results
final = aggregate(partial_results) # Works on JSON, not contextThe key is keeping each LLM call focused on a small context, and handling aggregation in code (not in the model's context window). More on state-based agent patterns: https://github.com/KeepALifeUS/autonomous-agents |
Beta Was this translation helpful? Give feedback.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
-
|
Great question — this is a classic Map-Reduce problem in agent-based RAG systems. The core issue is that Haystack's agent loop accumulates all tool outputs into the conversation context before making the next LLM call, so your chunking strategy gets negated. Two approaches that work well in practice: 1. Tool-Level Aggregation (recommended for your invoice use case)Instead of returning raw invoice data from your tool, have the tool itself perform the partial analysis and return only the structured summary: @component
class InvoiceAnalyzerTool:
"""Analyzes a chunk of invoices and returns structured findings."""
@component.output_types(analysis=str)
def run(self, invoice_ids: List[str], query: str):
# Load only this chunk
invoices = self.load_invoices(invoice_ids)
# LLM call with SMALL context — just this chunk
partial = self.llm.run(
prompt=f"Analyze these {len(invoices)} invoices for: {query}\n"
f"Return ONLY key findings as bullet points.\n{invoices}"
)
return {"analysis": partial} # Compact summary, not raw dataThe agent then sees N small summaries instead of N full invoice payloads. The final LLM call aggregates findings from structured summaries rather than raw documents. 2. Pipeline-as-Tool PatternWrap an entire Map-Reduce pipeline as a single agent tool. The agent calls one tool, but internally it fans out: from haystack import Pipeline
class MapReduceInvoiceTool:
def __init__(self):
self.map_pipeline = Pipeline() # Processes individual chunks
self.reduce_pipeline = Pipeline() # Aggregates partial results
def run(self, invoice_ids: List[str], query: str):
chunks = [invoice_ids[i:i+10] for i in range(0, len(invoice_ids), 10)]
# Map phase — each chunk gets its own small-context LLM call
partial_results = []
for chunk in chunks:
result = self.map_pipeline.run({"invoices": chunk, "query": query})
partial_results.append(result["summary"])
# Reduce phase — aggregate the compact summaries
final = self.reduce_pipeline.run({"partials": partial_results, "query": query})
return final["answer"]Why this fixes your problem: The agent's context window only ever sees the final aggregated answer (or at most, the list of compact partial summaries). The heavy lifting happens inside the tool, not in the agent's conversation loop. Bonus tip for invoice analysis specifically: Consider extracting structured fields (amount, date, vendor, status) into a DataFrame first, then only sending the LLM the rows that actually match the user's query. For questions like "total outstanding from vendor X," you often don't need LLM analysis at all — just pandas aggregation with the LLM formatting the answer. Happy to share more implementation details if you're using Haystack 2.x pipelines for this! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm building an Agent system for generating analysis of accounts receivable. I have a database containing invoices, and for every question asked by the user, the Agent queries the relevant invoices, to analyze them and then answer the question.
I thought that, in the event of the query returning a large number of invoices, the analysis would get compromised because of the large context. Does this make sense? Based on this assumption, I created a tool that first gets the entire list of invoices, and separates it into ranges of N invoices (chunks), so that each chunk gets sent to another tool that returns the corresponding invoices. The idea is to make partial analysis of these invoices subsets, so that in the end, each analysis would be concatenated into one, final analysis.
The problem is, when I saw the logs, I realized the behavior is currently not chunk generation -> partial analysis -> chunk generation -> partial analysis, etc, but instead, all chunks are generated and concatenated to the Agent's LLM context, and only then a full analysis is made, which defeats the entire idea, since the context is still large. What would be a solution to this?
Beta Was this translation helpful? Give feedback.
All reactions