Skip to content

Conversation

DNXie
Copy link
Member

@DNXie DNXie commented Oct 3, 2025

Migrated from #177
Context: #160

Add batch routing to Service to improve request throughput and maintain session-aware routing.

  • Added new @service_endpoint decorator that supports routing configuration (router, batch_size, batch_timeout).

  • Introduced ServiceEndpointProperty to distinguish between @endpoint and @service_endpoint.

  • Centralized endpoint-to-router mapping in Service (self.routers) with support for both plain routers and batchers.

  • Updated ServiceInterface to register endpoints through _set_router, ensuring consistent setup for both standard and service endpoints.

  • Extended _call and _get_replica to handle batch routing, session routing, and fallback routing in a unified way.

  • Enhanced Service.stop to gracefully shut down any active batchers in addition to replicas.

  • Added integration tests to validate:

    • Round-robin distribution with and without batching
    • Correct batch flushing when batch_size is reached
    • Independent coexistence of multiple endpoints with different batch sizes/routers

Test

pytest tests/unit_tests/test_service.py
pytest tests/unit_tests/test_router.py
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
Instead of selecting a replica immediately, incoming requests are enqueued
and grouped into batches. Once a batch is ready (either reaching the maximum
size or exceeding the maximum wait time), the batcher makes a single routing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to base the wait time on the status of the replica instead of a fixed time? For example, if the replica is still busy, we can let the batch grow larger, but if the replica is free for some minimum time interval, then we can send the batch.

Copy link
Member Author

@DNXie DNXie Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. We could make the batch timeout adaptive based on replica load (e.g., wait longer when replicas are busy and flush earlier when they’re idle). I’d prefer to land this current version first, then explore that as a follow-up improvement once the base batching logic is stable. Just added a TODO in the while loop.

# TODO: make timeout adaptive based on replica load.

@DNXie DNXie changed the title [WIP] Add Batch routing support via @service_endpoint with configurable batch size and timeout Add Batch routing support via @service_endpoint with configurable batch size and timeout Oct 8, 2025
session_id=None,
function=self.function,
args=args,
kwargs={},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allenwang28 Do we want to support kwargs here?

results = [results] * len(batch)
else:
# scalar result, broadcast to batch size
results = [results] * len(batch)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allenwang28 Do we want to handle when the returned results have different length or a scalar?

@DNXie DNXie marked this pull request as ready for review October 8, 2025 19:33
@DNXie DNXie requested a review from allenwang28 October 8, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants