-
Notifications
You must be signed in to change notification settings - Fork 24
Add as_actor
single-actor mode
#195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wondering what is the motivation behind this? Why would one choose an actor creation directly over service? Single actor seems to be a special case of service? |
For context: #173 You're right though, the rationale is that only vLLM should be a service currently. Trainer for e.g. will not really take advantage of fault tolerance or routing, so we should always expect it to be a singleton. |
src/forge/controller/actor.py
Outdated
num_replicas: int | None = None, | ||
procs: int | None = None, | ||
**service_kwargs, | ||
procs: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also hosts: int
, with_gpu: bool
and num_replicas: int | None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of put them all in **kwargs
since only procs
is required for both service and actor. Do you think it is better to explicitly list them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please explicitly list them
src/forge/controller/actor.py
Outdated
class_attrs["num_replicas"] = 1 | ||
cfg = ServiceConfig(**filter_config_params(ServiceConfig, class_attrs)) | ||
|
||
service_cls = type(f"{cls.__name__}Service", (cls,), {"_service_config": cfg}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still need service_cls
here? can the logic of as_service()
be:
@classmethod
async def as_service(cls, **actor_kwargs) -> "ServiceInterface":
service = Service(cfg, cls, actor_kwargs)
await service.__initialize__()
return ServiceInterface(service, cls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! Removed.
src/forge/controller/actor.py
Outdated
cfg = ProcessConfig(**filter_config_params(ProcessConfig, class_attrs)) | ||
|
||
logger.info("Spawning single actor %s", cls.__name__) | ||
actor = await cls.launch(process_config=cfg, **actor_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm maybe we can modify the def launch()
above to simplify things?
Like this:
@classmethod
async def launch(cls, *args, **kwargs) -> "ForgeActor":
proc_mesh = await get_proc_mesh(process_config=ProcessConfig(procs=cls._procs, hosts=cls._hosts, with_gpu=cls._with_gpu))
actor_name = kwargs.pop("name", cls.__name__)
actor = await proc_mesh.spawn(actor_name, cls, *args, **kwargs)
actor._proc_mesh = proc_mesh
if hasattr(proc_mesh, "_hostname") and hasattr(proc_mesh, "_port"):
host, port = proc_mesh._hostname, proc_mesh._port
await actor.set_env.call(addr=host, port=port)
await actor.setup.call()
return actor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/forge/controller/actor.py
Outdated
num_replicas: int | None = None, | ||
procs: int | None = None, | ||
**service_kwargs, | ||
procs: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please explicitly list them
src/forge/controller/actor.py
Outdated
# dynamically create a configured subclass for consistency | ||
cls = type(f"{cls.__name__}Service", (cls,), {"_service_config": cfg}) | ||
class_attrs = {k: v for k, v in cls.__dict__.items() if not k.startswith("__")} | ||
if "procs" not in class_attrs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up comment on explicit attributes, this for e.g. is unclear and can be pretty brittle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in the latest version
src/forge/controller/actor.py
Outdated
proc_mesh = await get_proc_mesh(process_config=process_config) | ||
# Build process config from class attributes with defaults | ||
cfg = ProcessConfig( | ||
procs=getattr(cls, "procs", 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally try and use getattr
as little as possible. If it's used too much it can mask real errors that can be really hard to debug later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fallback when the user doesn’t specify configs via .options()
. In this case, the original ForgeActor
class doesn’t have attributes like procs
. If we are getting rid of getattr
, one way I can think of is to add these attributes to ForgeActor
class like
class ForgeActor(Actor):
procs: int = 1
hosts: int | None = None
with_gpus: bool = False
num_replicas: int = 1
def __init__(self, *args, **kwargs):
But either way, it means the default values are specified in three places:
- In types.py
- As default values in
.options()
- As attributes on the
ForgeActor
class OR here inlaunch
.
I’m not sure if there’s a cleaner way to handle this. I’ve updated the code accordingly (get rid of getattr
), please take a look and let me know if you have any suggestions or improvements.
src/forge/controller/actor.py
Outdated
actor = await cls.launch(**actor_kwargs) | ||
|
||
# Patch shutdown to bypass endpoint system | ||
actor.shutdown = types.MethodType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is a hack, we shouldn't be doing this. I'm guessing it's because we want to preserve the ability to
svc = MyActor.as_service()
await svc.shutdown()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as_service
returns a ServiceInterface
. So when we call service.shutdown()
, we are actually calling ServiceInterface.shutdown
The reason I have to do this hacky thing is:
Without it, actor.shutdown()
gives me this error:
RuntimeError: Actor <class 'tests.unit_tests.test_service.Counter'>.shutdown is not annotated as an endpoint. To call it as one, add a @endpoint decorator to it, or directly wrap it in one as_endpoint(obj.method).call(...)
If I simply decorate shutdown
with @endpoint
, we'd have to call it like
await actor.shutdown.call()
But it would still give error:
AssertionError("Called shutdown on a replica with no proc_mesh.")
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see. Ok in that case, I think what we should do is not do actor.shutdown()
for now, and just rely on eg
await RLTrainer.stop(trainer)
for now. Maybe what we can do next is have the provisioner keep track of all of the proc meshes, and do a global shutdown()
? Including all the services etc. we can discuss more, just want to unblock this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Done!
src/forge/controller/actor.py
Outdated
|
||
@classmethod | ||
async def launch(cls, *, process_config: ProcessConfig, **kwargs) -> "ForgeActor": | ||
async def launch(cls, **kwargs) -> "ForgeActor": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add *args here? This solves the *args related TODO that's listed here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in launch
and as_actor
. Also tested in test_as_actor_with_kwargs_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ty @DNXie!
src/forge/controller/actor.py
Outdated
# Option C: skip options, use the default service config with num_replicas=1, procs=1 | ||
service = await MyForgeActor.as_service(...) | ||
await service.shutdown() | ||
Returns a dynamically created subclass of this ForgeActor with bound configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns a dynamically created subclass of this ForgeActor with bound configuration. | |
Returns a version of ForgeActor with configured resource attributes. |
tests/unit_tests/test_service.py
Outdated
self.kwargs = kwargs | ||
|
||
@endpoint | ||
async def get_args(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove these tests, i think it's fine without
Summary:
This PR adds
ForgeActor.as_actor()
and refactorsForgeActor.options()
and.as_service()
to improve configuration handling and dynamic subclassing for actors and services. Context: #173Key changes include:
as_actor()
to support launching a single actor directly.Service
andActor
num_hosts
tohosts
andnum_procs
toprocs
inProcessConfig
..options()
now stores all configuration parameters as class attributes rather than building a full config object immediately..as_actor()
or.as_service()
calls, ensuring each configuration remains isolated..options()
is not called.Changes in behavior:
So the single actor initialization from
become
Usage Examples 1 (Actor):
Log:
Usage Examples 2 (Service):
Log when
num_replicas=3
The printed class name is its original class name instead of xxService (See #193)
INFO forge.controller.actor:actor.py:123 Spawning Service Actor for Counter INFO forge.controller.actor:actor.py:207 Spawning single actor Counter INFO forge.controller.actor:actor.py:207 Spawning single actor Counter INFO forge.controller.actor:actor.py:207 Spawning single actor Counter
Usage Examples 3 (Positional argument):
This means you can now do:
instead of having to use keyword-only arguments like:
Test