Skip to content

Feature Request: Support for NVIDIA Multi-Instance GPU (MIG) #1744

@zhuangqh

Description

@zhuangqh

Description

Currently, KAITO allocates full GPUs to model workloads. For smaller LLMs (e.g., Phi-3, Mistral 7B with quantization), a full A100 or H100 is often overkill, leading to underutilized hardware and higher costs. We propose adding support for NVIDIA MIG, allowing users to partition a single physical GPU into multiple instances to run several small models concurrently on a single node.

Use Case

Efficient Multi-Model Hosting: A user wants to run 4 small model instances (e.g., 4x Phi-4-mini) on a single Azure Standard_ND96asr_v4 node (which has 8x A100 GPUs). By using MIG, they could theoretically host dozens of small models on that same node by partitioning each GPU.

Proposed Scope: BYO (Bring Your Own) Node First

To reduce initial complexity, this feature will focus on BYO Nodes:

  • The infrastructure admin is responsible for enabling MIG mode on the GPUs and creating the partitions (e.g., 1g.10gb) at the OS/Node level.
  • KAITO will focus on discovery and scheduling rather than hardware partitioning management.

Technical Challenges

  1. Resource Specification: Kubernetes represents MIG instances as unique extended resources (e.g., nvidia.com/mig-1g.10gb). KAITO's current Resource spec needs to handle these instead of standard nvidia.com/gpu.
  2. VRAM Estimator: The internal KAITO estimator must be updated to understand that a 1g.10gb profile has specific memory limits, preventing it from trying to schedule a model that requires 20GB of VRAM onto a 10GB MIG partition.
  3. Workflow Decoupling: This should be decoupled from the main Node Autoprovisioning (NAP) workflow initially, as NAP requires complex integration with cloud-specific APIs to request MIG-enabled pools.

Proposed API Changes

Add support for MIG resource strings in the Kaitocontrol / Workspace CRD.

apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
  name: phi-3-mig
spec:
  offloading: false
  resource:
    instanceType: "" # Leave empty for BYO / Manual scheduling
    labelSelector:
      matchLabels:
        apps: mig-enabled-node
    # New capability: requesting specific MIG resources
    resourceRequests:
      nvidia.com/mig-1g.10gb: 1 
  inference:
    preset:
      name: "phi-3-mini-4k-instruct"

Implementation Strategy

  • Partition Management: Customer Managed. The user configures the MIG strategy (Single Strategy or Mixed Strategy) on the node. KAITO simply consumes the available resources advertised by the NVIDIA Device Plugin.
  • Scheduler: KAITO should ensure the count in the resource request aligns with how many MIG instances are required per replica.
  • Estimator Logic: * Enhance the estimator to map nvidia.com/mig-x.ygb to a memory capacity of y GB.
  • If the model's estimated footprint > y, the admission controller should reject the Workspace creation with a descriptive error.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions