Skip to content

[Feature Request] Real-time Streaming Events During Context Reduction #1511

@tahitimoon

Description

@tahitimoon

Problem Statement

Problem Description

When using SummarizingConversationManager with stream_async(), there's no way to notify users in real-time when context compression is happening.

Current Behavior

# In agent.py (lines 716-726)
except ContextWindowOverflowException as e:
    self.conversation_manager.reduce_context(self, e=e)  # Blocks for 30+ seconds
    # ... retry event loop

The reduce_context() call is synchronous and blocking. During this time:

  • No events are yielded to the async iterator
  • The event loop is blocked
  • Users see a frozen UI with no feedback

Expected Behavior

Users should receive streaming events during context compression:

  • conversation_compacting - when compression starts
  • conversation_compacted - when compression completes

This allows frontend applications to show "Compacting conversation..." feedback to users during the 30+ second wait.

Proposed Solutions

Option 1: Hook-based Events (Minimal Change)

Add new hook events for context reduction:

# New hook events
class BeforeContextReductionEvent:
    agent: Agent
    exception: ContextWindowOverflowException

class AfterContextReductionEvent:
    agent: Agent
    original_message_count: int
    compressed_message_count: int
    removed_count: int

This allows users to subscribe to these events via HookProvider.

Option 2: Yield Events During Exception Handling

Modify _execute_event_loop_cycle to yield events during context reduction:

except ContextWindowOverflowException as e:
    # Yield event before reduction
    yield {"context_reduction": "starting", "message_count": len(self.messages)}
    
    self.conversation_manager.reduce_context(self, e=e)
    
    # Yield event after reduction
    yield {"context_reduction": "completed", "new_message_count": len(self.messages)}
    
    # Retry
    async for event in self._execute_event_loop_cycle(...):
        yield event

Option 3: Async reduce_context

Make reduce_context async and run summarization in a thread pool:

async def reduce_context_async(self, agent: Agent, e: Exception = None):
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(None, super().reduce_context, agent, e)

This would unblock the event loop, but requires significant API changes.

Use Case

We're building a financial research assistant that handles long conversations. When context overflow occurs, users wait 30+ seconds without any feedback. Showing "Compacting conversation..." would significantly improve UX.

Environment

  • Python 3.13
  • strands-agents SDK 1.22.0
  • strands-agents-tools 0.2.19

Related Code

class NotifyingSummarizingConversationManager(SummarizingConversationManager):
    """Custom manager that emits events - but events can't be yielded during blocking."""
    
    def __init__(self, event_queue: asyncio.Queue, ...):
        self._event_queue = event_queue
    
    def reduce_context(self, agent, e=None, **kwargs):
        self._event_queue.put_nowait({"type": "conversation_compacting"})
        super().reduce_context(agent, e=e, **kwargs)  # Blocks 30s+
        self._event_queue.put_nowait({"type": "conversation_compacted"})
        # Problem: Both events are consumed AFTER blocking ends

Proposed Solution

No response

Use Case

Scenario: Financial Research Assistant with Long Conversations
We are building a financial research assistant that helps users analyze stocks, retrieve financial data, and generate investment insights. The conversations often become very long because:

  1. Multiple tool calls: Each query may trigger 5-10 tool calls (fetching income statements, analyst estimates, news, etc.)
  2. Rich context: Users ask follow-up questions that require understanding previous analysis
  3. Extended sessions: A single research session can last 20+ minutes with dozens of messages

The Problem:
When context overflow occurs and SummarizingConversationManager.reduce_context() is triggered:

  • The summarization process takes 30+ seconds (calling the LLM to generate a summary)
  • During this time, the UI appears frozen with no feedback
  • Users may think the application crashed and refresh the page

What We Need:
A way to emit streaming events during context reduction so we can:

  • Show "Compacting conversation..." message when reduction starts
  • Display "Compaction complete (47 → 25 messages)" when it finishes

Impact:
This would significantly improve UX for any application using SummarizingConversationManager with stream_async(), especially chatbot interfaces where users expect real-time feedback.

Alternatives Solutions

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions