Skip to content

Conversation

allenwang28
Copy link
Contributor

This diff makes a few changes:

  • Changes Service into an actor, allowing it to run on its own process, allowing it to scale out better.
    • This turns _call, __initialize__, _call_all(enqueue on all healthy replicas) , start_session, get_metrics, get_metrics_summary, terminate_session, stop, into endpoints
  • Introduces a ServiceInterface. This retains the existing session APIs, but now creates endpoints over the actor_def endpoints. It specifically introduces:
    • choose(): run this endpoint on a replica
    • call(): run this endpoint on all replicas
  • Fixes a bug where sess_id is an arg. Turns it into a kwarg cc @joecummings
  • Refactors service.py into
    • interface.py (including session and ServiceInterface),
    • metrics.py (including ServiceMetrics),
    • service.py (the base Service actor),
  • adds new tests

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 25, 2025
@allenwang28 allenwang28 changed the title [Service] Service in proc [Service] Turns Service into an Actor and splits service into its own files Aug 25, 2025
Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general question, but overall looks good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment in the docstring, but a few reasons:

  • primarily so you don't have to interact with the Service actor, which IMO can be annoying
  • pairs the proc_mesh with the Service actor
  • later we might want to pass a reference to the service to other actors, without moving where it's placed. This doubles as a handle so we can make calls to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to ServiceInterface

@allenwang28 allenwang28 marked this pull request as ready for review August 25, 2025 21:54
@allenwang28 allenwang28 merged commit ce1ed98 into meta-pytorch:main Aug 25, 2025
4 checks passed
@allenwang28 allenwang28 deleted the service_in_proc branch August 25, 2025 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants