-
Notifications
You must be signed in to change notification settings - Fork 16
Provisioner cleanup #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provisioner cleanup #311
Conversation
except ImportError as e: | ||
print(f"Warning: Monarch meta/fb inetrnal imports failed: {e}") | ||
print("Monarch functionality will be limited") | ||
# This means there is an error with MAST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it silently fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, yes. (However some sort of logging would be idea.) until we figure out what would be a right way to segregate meta-only code blocks. These dependencies are meta internal and are installed via an internal fbpkg monarch build. The env setup for mast requires a separate installation script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added in a block that says "if imports failed and you're trying to use MAST, print an error messaging say imports failed and that you should check your build was correct"
src/forge/controller/provisioner.py
Outdated
If None, an address will be detected. | ||
Returns: | ||
A proc mesh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the actual class, e.g. ProcMesh?
also add a typehint?
src/forge/controller/provisioner.py
Outdated
env_vars["HYPERACTOR_MESSAGE_DELIVER_TIMEOUT_SECS"] = "600" | ||
env_vars["HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT_SECS"] = "600" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need these both? what's the difference between these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch! Removed the DELIVER which was a typo
This PR implements a few cleanups after #285
Changes:
get_remote_info()
- given a host mesh, spawn a proc and a setup actor, get the host port and address of the hostmesh, and kill the proc. This happens regardless of the underlying launcher. Slight refactoring in the bootstrapcontroller.proc_mesh.get_proc_mesh
etc. and redirects everything to just useprovisioner.get_proc_mesh
spawn_procs(...)
but I see that the prior procs die still. The intention was to enable the colocation of the vLLM controller and worker. We do this tracking because Monarch doesn't yet let you get thehost_mesh
from theproc_mesh
(but it will eventually)get_proc_mesh
- this way you can colocate things