-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Labels
Milestone
Description
When Data Designer is used on a SLURM-managed GPU cluster, it should be able to automatically manage model servers required to run generation and preview jobs.
What this feature should do
- Automatically spin up and tear down model servers on SLURM
- Launch model servers (e.g. via vLLM) as SLURM jobs when needed.
- Shut them down when they are no longer in use.
Support interactive preview workflows
- Allow users to interactively query models for Data Designer preview jobs.
- Support streaming responses.
- Keep model servers alive for the duration of an interactive session, then clean them up.
Support large-scale batch generation
- Scale model servers up and down to efficiently execute Data Designer jobs.
- Execute work within a user-defined GPU budget for the job.
- Users explicitly specify how many GPUs they are making available to a Data Designer job.
- Data Designer uses only those GPUs and does not require manual placement or provisioning.
Data Designer determines how to:
- Split work across models.
- Scale model replicas.
- Assign GPUs to each model instance.
- Provide a simple user-facing configuration
Users specify:
- Which models they want to use.
- The total number of GPUs available to the job (and optionally per-model GPU needs).
- Data Designer handles model lifecycle, scaling, and GPU utilization automatically.
Outcome
From the user’s perspective, running Data Designer on SLURM should require no manual model orchestration. Users declare their model needs and GPU budget, and Data Designer automatically provisions, scales, and cleans up model servers within those constraints.