-
Notifications
You must be signed in to change notification settings - Fork 16
Enables dynamic GPU allocation for local workloads #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces dynamic GPU allocation for local workloads by implementing a GPU resource manager that tracks and distributes GPU resources instead of hard-coding GPU assignments. This enables spawning multiple replicas of GPU-intensive services like vLLM.
- Adds
GpuResourceManager
singleton actor to track and allocate GPU resources - Updates process and service types to include
num_gpus
field for GPU allocation requests - Integrates GPU allocation into process mesh creation and service configuration
Reviewed Changes
Copilot reviewed 16 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tests/unit_tests/test_service.py | Updates test configurations to specify gpus_per_replica=0 for CPU-only tests |
tests/unit_tests/test_gpu_manager.py | Comprehensive test suite for GPU manager allocation and release functionality |
src/forge/types.py | Adds num_gpus field to ProcessConfig and gpus_per_replica to ServiceConfig |
src/forge/controller/service/spawn.py | Updates import paths after service module restructuring |
src/forge/controller/service/service.py | Updates import paths after service module restructuring |
src/forge/controller/service/metrics.py | Updates import paths after service module restructuring |
src/forge/controller/service/init.py | Creates service module package with proper exports |
src/forge/controller/proc_mesh.py | Integrates GPU allocation into process mesh creation and environment setup |
src/forge/controller/custom_actors/service_registry.py | Stub implementation for future service tracking functionality |
src/forge/controller/custom_actors/gpu_manager.py | Core GPU manager actor implementation with allocation and release logic |
src/forge/controller/custom_actors/init.py | Exports GPU manager utility functions |
src/forge/controller/init.py | Updates exports after service module restructuring |
apps/sft_v2/main.py | Updates comment to reflect correct module path |
apps/sft_v2/llama3_8b.yaml | Adds GPU allocation configuration and fixes tokenizer path |
apps/rl/llama3_8b.yaml | Adds GPU allocation configuration for trainer and replay buffer |
apps/grpo/main.py | Adds GPU allocation to service configuration and removes hardcoded device assignment |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theses are a lot of changes but they seem to be mostly renaming and import changes. I like the addition of the GPU manager.
Context for this PR - I want to be able to demo a service that spawns N replicas of vLLM. For that to work, we need dynamic GPU allocation. Our current approach hard codes GPU assignment.
So this PR does a few things:
GpuResourceManager
, a singleton actor (or "controller") responsible for tracking and releasing GPUs. This only works for local proc mesh for now, will expand to multi-host once/if needed. Since it's a singleton, there's no concern about race conditions etc.num_gpus
too, which is the signal for receiving GPU idshostmesh
APIs - currently broken for some NYI stuff in Monarch. Probably need to work through a re-design of procmesh management/allocationNext up:
get_proc_mesh()
correctly