Skip to content

refactor: Migrate from programmatic to streaming subscriptions#349

Closed
SoulSniper-V2 wants to merge 12 commits intodapr:mainfrom
SoulSniper-V2:fix/migrate-programmatic-subscriptions
Closed

refactor: Migrate from programmatic to streaming subscriptions#349
SoulSniper-V2 wants to merge 12 commits intodapr:mainfrom
SoulSniper-V2:fix/migrate-programmatic-subscriptions

Conversation

@SoulSniper-V2
Copy link
Contributor

@SoulSniper-V2 SoulSniper-V2 commented Jan 7, 2026

Description

Migrates message subscriptions from the legacy subscribe_with_handler (programmatic) to dapr_client.subscribe (streaming).

This fixes #348 by using persistent gRPC connections, which removes the requirement for exposing app ports and improves reliability.

Issue reference

Closes: #348

Checklist

  • Created/updated tests
  • Tested this change against all the quickstarts
  • Extended the documentation

Implementation

  • Replaced subscribe_with_handler with dapr_client.subscribe and a background consumer thread.
  • Added graceful shutdown logic to join threads on exit.
  • Added error logging for failed retries.

Verification

  • Unit Tests: Updated tests/workflow/test_message_router.py to mock streaming subscriptions. All 77 unit tests passed.
  • Static Analysis: ruff passed.

@SoulSniper-V2 SoulSniper-V2 requested review from a team as code owners January 7, 2026 15:06
@SoulSniper-V2 SoulSniper-V2 force-pushed the fix/migrate-programmatic-subscriptions branch from 37c2439 to 46666cb Compare January 7, 2026 15:10
Copy link
Collaborator

@sicoyle sicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your interest in contributing and for adding tests—this is great 🙌

I have a few comments so far. Also, would you mind splitting the streaming logic out of registration.py into separate functions, or possibly a separate file (e.g., subscription.py)? With these great additions, that file is starting to get a bit heavy, and this would help keep things more maintainable.

Did you test these changes against some of the quickstarts/examples by chance?

@SoulSniper-V2
Copy link
Contributor Author

SoulSniper-V2 commented Jan 8, 2026

Thanks for the review @sicoyle! 🙌

I moved the streaming logic into subscription.py like you suggested - definitely makes registration.py much cleaner. Also went ahead and removed the legacy code path and addressed the variable naming/logging nits.

Ran the message router tests locally and everything looks good.

@SoulSniper-V2
Copy link
Contributor Author

Hi @sicoyle, just checking in on this. I have addressed the feedback regarding the refactor into subscription.py and the cleanup of the legacy paths. Let me know if there’s anything else needed to move this toward a final review or to start the CI builds

@sicoyle
Copy link
Collaborator

sicoyle commented Jan 27, 2026

Hi @sicoyle, just checking in on this. I have addressed the feedback regarding the refactor into subscription.py and the cleanup of the legacy paths. Let me know if there’s anything else needed to move this toward a final review or to start the CI builds

Thank you! Could you please correct the lint/build failures? I just need to test these changes locally before merge (since I don't have automation yet running on PRs 🙃 )

Copy link
Collaborator

@sicoyle sicoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind splitting up subscribe_message_bindings a bit? At almost 400 lines with several nested functions, it’s getting a little hard to follow and review. Breaking it into smaller pieces would make it much easier to maintain and reason about 🙏

@sicoyle sicoyle changed the title refactor: Migrate from programmatic to streaming pub/sub subscriptions refactor: Migrate from programmatic to streamin subscriptions Jan 27, 2026
@sicoyle sicoyle changed the title refactor: Migrate from programmatic to streamin subscriptions refactor: Migrate from programmatic to streaming subscriptions Jan 27, 2026
@SoulSniper-V2
Copy link
Contributor Author

Addressed all feedback. Split the function into smaller helpers, added constants, fixed exception handling, and renamed variables for clarity. Tests pass.

Arush Wadhawan and others added 5 commits January 27, 2026 21:24
Signed-off-by: Arush Wadhawan <soulsniper@Arushs-MacBook-Air.local>
Signed-off-by: Arush Wadhawan <soulsniper@Arushs-MacBook-Air.local>
Signed-off-by: Arush Wadhawan <soulsniper@Arushs-MacBook-Air.local>
Signed-off-by: Arush Wadhawan <warush23+github@gmail.com>
- Split subscribe_message_bindings into focused helper functions
- Add module-level constants for delivery modes and status codes
- Use WorkflowStatus enum instead of magic strings
- Add proper validation for dead_letter_topics (ValueError if multiple)
- Fix _resolve_event_loop to properly raise on missing loop
- Add zombie thread detection (raise RuntimeError on timeout)
- Move json import to top level
- Rename variables for clarity:
  - grouped -> bindings_by_topic_key
  - b -> binding
  - plan -> binding_schema_pairs
  - preferred -> matching_ce_type_pairs
- Replace logger.info with logger.debug for internal operations
- Restore underscore prefixes for internal functions
- Remove nested try/except where possible
- Add proper docstrings explaining function purposes

Signed-off-by: Arush Wadhawan <warush23+github@gmail.com>
@SoulSniper-V2 SoulSniper-V2 force-pushed the fix/migrate-programmatic-subscriptions branch from 478053b to 9143489 Compare January 28, 2026 02:26
Copy link
Contributor

@CasperGN CasperGN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

Left an ask to use f-strings consistently rather than %s.

else:
setattr(parsed, METADATA_KEY, metadata)
except Exception:
logger.debug("Could not attach %s to payload; continuing.", METADATA_KEY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convert all these strings (including the error message ones further down below) to f strings so we're consistent?

There's a mix in this file as well.

Signed-off-by: Arush Wadhawan <warush23+github@gmail.com>
@SoulSniper-V2
Copy link
Contributor Author

Converted all logging to f-strings.

@CasperGN
Copy link
Contributor

@sicoyle this lgtm now.

Thank you @SoulSniper-V2!

Copilot AI review requested due to automatic review settings February 26, 2026 23:26
@github-actions
Copy link

Metadata Schema Compatibility Report

Breaking Metadata Schema Changes

  • Removed property agent_metadata from (root)
  • Removed property max_iterations from (root)
  • Removed property schema_version from (root)
  • Removed property tool_choice from (root)
  • Removed property statestore from AgentMetadata
  • Removed property component_name from LLMMetadata
  • Removed property statestore from MemoryMetadata
  • Removed property type from MemoryMetadata
  • Removed property name from PubSubMetadata
  • Removed property statestore from RegistryMetadata
  • Removed property tool_args from ToolMetadata
  • Removed property tool_description from ToolMetadata
  • Removed property tool_name from ToolMetadata
  • New required field version in (root) (did not exist in previous version)
  • New required field type in MemoryStoreMetadata (did not exist in previous version)
  • New required field resource_name in PubSubMetadata (did not exist in previous version)
  • New required field args in ToolMetadata (did not exist in previous version)
  • New required field description in ToolMetadata (did not exist in previous version)
  • New required field name in ToolMetadata (did not exist in previous version)

This is informational — breaking changes may be intentional for a new release.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates workflow message routing from legacy programmatic pub/sub subscriptions (subscribe_with_handler) to streaming subscriptions (dapr_client.subscribe) to use persistent gRPC connections and improve reliability.

Changes:

  • Added a new workflow/utils/subscription.py module implementing streaming subscription consumption in background threads with optional async delivery mode.
  • Refactored workflow/utils/registration.py to build MessageRouteBindings and delegate subscription setup to the new streaming utilities.
  • Updated tests/workflow/test_message_router.py to mock dapr_client.subscribe instead of subscribe_with_handler.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.

File Description
tests/workflow/test_message_router.py Updates unit tests to mock streaming subscriptions and validate subscription call parameters.
dapr_agents/workflow/utils/subscription.py New streaming subscription implementation: composite routing per topic, consumer threads, async worker queue option, shutdown helpers.
dapr_agents/workflow/utils/registration.py Refactors binding collection and replaces legacy subscription logic with calls into the new streaming subscription module.
Comments suppressed due to low confidence (1)

dapr_agents/workflow/utils/registration.py:316

  • register_message_routes no longer accepts a subscribe override, but there are still internal call sites passing subscribe=... (e.g. dapr_agents/workflow/runners/base.py calls it with subscribe=subscribe). This will raise TypeError: got an unexpected keyword argument 'subscribe' at runtime. Either update all call sites to stop passing subscribe, or keep the parameter (possibly deprecated) and handle/ignore it for streaming subscriptions.
def register_message_routes(
    *,
    dapr_client: DaprClient,
    targets: Optional[Iterable[Any]] = None,
    routes: Optional[Iterable[PubSubRouteSpec]] = None,
    loop: Optional[asyncio.AbstractEventLoop] = None,
    delivery_mode: Literal["sync", "async"] = "sync",
    queue_maxsize: int = 1024,
    deduper: Optional[DedupeBackend] = None,
    scheduler: Optional[SchedulerFn] = None,
    wf_client: Optional[wf.DaprWorkflowClient] = None,
    await_result: bool = False,
    await_timeout: Optional[int] = None,
    fetch_payloads: bool = True,
    log_outcome: bool = True,
) -> List[Callable[[], None]]:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to 197
_ = _resolve_event_loop(
loop
) # Parity with message registrar; FastAPI does not require it yet.
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mount_http_bindings calls _resolve_event_loop(loop) purely for “parity”, but this can raise in normal FastAPI setups where routes are registered from a synchronous context (common with Python 3.12+ where asyncio.get_event_loop() can fail). Since HTTP route mounting doesn’t require an event loop here, consider removing this call or making it non-fatal (e.g., only validate in truly async-required paths).

Suggested change
_ = _resolve_event_loop(
loop
) # Parity with message registrar; FastAPI does not require it yet.

Copilot uses AI. Check for mistakes.
Comment on lines +431 to +435
loop.call_soon_threadsafe(
queue.put_nowait,
(binding.handler, parsed),
)
return TopicEventResponse(STATUS_SUCCESS)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In async delivery mode the handler schedules queue.put_nowait(...) via loop.call_soon_threadsafe(...) and immediately returns success. If the queue is full, put_nowait will raise QueueFull in the event loop, and the message will still be ACKed (lost) with no retry. Add explicit backpressure handling: catch QueueFull and return retry (or block with an awaited put), and ensure failures to enqueue do not ACK the message.

Suggested change
loop.call_soon_threadsafe(
queue.put_nowait,
(binding.handler, parsed),
)
return TopicEventResponse(STATUS_SUCCESS)
if loop.is_running():
# Backpressure-aware enqueue: block until the item is queued
fut = asyncio.run_coroutine_threadsafe(
queue.put((binding.handler, parsed)),
loop,
)
try:
fut.result()
except Exception:
logger.exception(
"Failed to enqueue workflow task for handler %s; "
"requesting retry.",
binding.name,
)
return TopicEventResponse(STATUS_RETRY)
return TopicEventResponse(STATUS_SUCCESS)
# If the loop is not running, fall through to the sync path below.

Copilot uses AI. Check for mistakes.
_validate_delivery_mode(delivery_mode)
_validate_dead_letter_topics(bindings)

resolved_loop = _resolve_event_loop(loop)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subscribe_message_bindings resolves an event loop unconditionally (resolved_loop = _resolve_event_loop(loop)), even for delivery_mode='sync'. In Python 3.12+ this can raise when called from a synchronous context with loop=None, despite sync mode being able to operate via asyncio.run(...). Consider only requiring an existing/running loop for delivery_mode='async', and otherwise creating a new loop or bypassing loop resolution in sync mode.

Suggested change
resolved_loop = _resolve_event_loop(loop)
if delivery_mode == DELIVERY_MODE_ASYNC:
resolved_loop = _resolve_event_loop(loop)
else:
# In sync mode we can rely on asyncio.run(...) and do not require
# an existing/running event loop; avoid resolving it unconditionally.
resolved_loop = loop

Copilot uses AI. Check for mistakes.
Comment on lines +561 to +563
mock_sub = MagicMock()
mock_sub.__iter__.return_value = []
mock_client.subscribe.return_value = mock_sub
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_sub.__iter__.return_value is set to [], which is iterable but not an iterator; iterating the subscription in the consumer thread may raise a TypeError and the test won’t reliably exercise the streaming loop. Prefer mock_sub.__iter__.return_value = iter([]).

Copilot uses AI. Check for mistakes.
Comment on lines +596 to +598
mock_sub = MagicMock()
mock_sub.__iter__.return_value = []
mock_client.subscribe.return_value = mock_sub
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_sub.__iter__.return_value = [] returns a list rather than an iterator; the consumer thread’s for msg in sub: can fail with TypeError and exit. Set this to iter([]) to accurately model an empty subscription stream.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i addressed

@sicoyle
Copy link
Collaborator

sicoyle commented Mar 5, 2026

@SoulSniper-V2 this is super close! Would you mind addressing the copilot feedback please? 🙏

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Metadata Schema Compatibility Report

Breaking Metadata Schema Changes

  • Removed property agent_metadata from (root)
  • Removed property max_iterations from (root)
  • Removed property schema_version from (root)
  • Removed property tool_choice from (root)
  • Removed property statestore from AgentMetadata
  • Removed property component_name from LLMMetadata
  • Removed property statestore from MemoryMetadata
  • Removed property type from MemoryMetadata
  • Removed property name from PubSubMetadata
  • Removed property statestore from RegistryMetadata
  • Removed property tool_args from ToolMetadata
  • Removed property tool_description from ToolMetadata
  • Removed property tool_name from ToolMetadata
  • New required field version in (root) (did not exist in previous version)
  • New required field type in MemoryStoreMetadata (did not exist in previous version)
  • New required field resource_name in PubSubMetadata (did not exist in previous version)
  • New required field args in ToolMetadata (did not exist in previous version)
  • New required field description in ToolMetadata (did not exist in previous version)
  • New required field name in ToolMetadata (did not exist in previous version)

This is informational — breaking changes may be intentional for a new release.

@1Ninad
Copy link
Contributor

1Ninad commented Mar 9, 2026

Hi @SoulSniper-V2 @sicoyle, just checking - are you still planning to address the Copilot feedback on this PR?
If you are unable to continue, I would be happy to help work on the requested changes and push an update. Please let me know.

@sicoyle sicoyle mentioned this pull request Mar 10, 2026
4 tasks
@sicoyle
Copy link
Collaborator

sicoyle commented Mar 10, 2026

thank you so much for your contributions @SoulSniper-V2 🤗 I went ahead and cherry picked your changes in this PR and addressed all of the remaining PR feedback here so we can get this change into the next release 🎉
#500

@sicoyle sicoyle closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate from programmatic to streaming pub/sub subscriptions

5 participants