Feature Request: Support for NVIDIA Multi-Instance GPU (MIG)

## Description

Currently, KAITO allocates full GPUs to model workloads. For smaller LLMs (e.g., Phi-3, Mistral 7B with quantization), a full A100 or H100 is often overkill, leading to underutilized hardware and higher costs. We propose adding support for **NVIDIA MIG**, allowing users to partition a single physical GPU into multiple instances to run several small models concurrently on a single node.

## Use Case

**Efficient Multi-Model Hosting:** A user wants to run 4 small model instances (e.g., 4x Phi-4-mini) on a single Azure `Standard_ND96asr_v4` node (which has 8x A100 GPUs). By using MIG, they could theoretically host dozens of small models on that same node by partitioning each GPU.

## Proposed Scope: BYO (Bring Your Own) Node First

To reduce initial complexity, this feature will focus on **BYO Nodes**:

* The infrastructure admin is responsible for enabling MIG mode on the GPUs and creating the partitions (e.g., `1g.10gb`) at the OS/Node level.
* KAITO will focus on **discovery and scheduling** rather than hardware partitioning management.

## Technical Challenges

1. **Resource Specification:** Kubernetes represents MIG instances as unique extended resources (e.g., `nvidia.com/mig-1g.10gb`). KAITO's current `Resource` spec needs to handle these instead of standard `nvidia.com/gpu`.
2. **VRAM Estimator:** The internal KAITO estimator must be updated to understand that a `1g.10gb` profile has specific memory limits, preventing it from trying to schedule a model that requires 20GB of VRAM onto a 10GB MIG partition.
3. **Workflow Decoupling:** This should be decoupled from the main Node Autoprovisioning (NAP) workflow initially, as NAP requires complex integration with cloud-specific APIs to request MIG-enabled pools.

## Proposed API Changes

Add support for MIG resource strings in the `Kaitocontrol` / `Workspace` CRD.

```yaml
apiVersion: kaito.sh/v1alpha1
kind: Workspace
metadata:
  name: phi-3-mig
spec:
  offloading: false
  resource:
    instanceType: "" # Leave empty for BYO / Manual scheduling
    labelSelector:
      matchLabels:
        apps: mig-enabled-node
    # New capability: requesting specific MIG resources
    resourceRequests:
      nvidia.com/mig-1g.10gb: 1 
  inference:
    preset:
      name: "phi-3-mini-4k-instruct"

```

## Implementation Strategy

* **Partition Management:** **Customer Managed.** The user configures the MIG strategy (Single Strategy or Mixed Strategy) on the node. KAITO simply consumes the available resources advertised by the NVIDIA Device Plugin.
* **Scheduler:** KAITO should ensure the `count` in the resource request aligns with how many MIG instances are required per replica.
* **Estimator Logic:** * Enhance the estimator to map `nvidia.com/mig-x.ygb` to a memory capacity of `y` GB.
* If the model's estimated footprint > `y`, the admission controller should reject the Workspace creation with a descriptive error.



## References

* [[NVIDIA MIG Device Enumeration & Names](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/mig-device-names.html)](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/mig-device-names.html)
* [[Promoting MIG Resources in Kubernetes](https://www.google.com/search?q=https://docs.nvidia.com/datacenter/cloud-native/kubernetes/latest/mig-k8s.html)](https://www.google.com/search?q=https://docs.nvidia.com/datacenter/cloud-native/kubernetes/latest/mig-k8s.html)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for NVIDIA Multi-Instance GPU (MIG) #1744

Description

Use Case

Proposed Scope: BYO (Bring Your Own) Node First

Technical Challenges

Proposed API Changes

Implementation Strategy

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support for NVIDIA Multi-Instance GPU (MIG) #1744

Description

Description

Use Case

Proposed Scope: BYO (Bring Your Own) Node First

Technical Challenges

Proposed API Changes

Implementation Strategy

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions