Add configurable allocation policy (packed/distributed) for replicated and MIG resources by wkd-woo · Pull Request #1621 · NVIDIA/k8s-device-plugin

wkd-woo · 2026-02-11T09:05:26Z

Summary

Add --allocation-policy flag (ALLOCATION_POLICY env) with distributed (default) and packed options
packed mode bin-packs replicated/MIG devices onto fewest physical GPUs, freeing up remaining GPUs for full-GPU workloads
Default behavior (distributed) is unchanged — no breaking changes

Motivation

The current distributedAlloc was designed for time-slicing, where distributing replicas across physical GPUs avoids compute contention. However, MIG devices also fall into this code path simply because AlignedAllocationSupported() returns false for them — not because distributed allocation is the right strategy.

MIG instances are hardware-isolated partitions with dedicated SMs and memory. Packing them onto fewer physical GPUs has no performance penalty, and frees up remaining GPUs for full-GPU workloads:

┌─────────────────────────────────────────────────────────────────────┐
│                Distributed Allocation (current default)              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  GPU 0            GPU 1            GPU 2            GPU 3           │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐      │
│  │ ████░░░░ │    │ ████░░░░ │    │ ████░░░░ │    │ ░░░░░░░░ │      │
│  │ MIG 1/5  │    │ MIG 1/5  │    │ MIG 1/5  │    │ MIG 0/5  │      │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘      │
│   ⚠ partial       ⚠ partial       ⚠ partial       ░ empty          │
│                                                                     │
│  → Full GPU request arrives: only 1 GPU available (GPU 3)           │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    Packed Allocation (bin-packing)                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  GPU 0            GPU 1            GPU 2            GPU 3           │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐      │
│  │ ████████ │    │ ░░░░░░░░ │    │ ░░░░░░░░ │    │ ░░░░░░░░ │      │
│  │ MIG 3/5  │    │ MIG 0/5  │    │ MIG 0/5  │    │ MIG 0/5  │      │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘      │
│   ✓ packed         ★ free          ★ free          ★ free           │
│                                                                     │
│  → Full GPU request arrives: 3 GPUs available (GPU 1, 2, 3)        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Relates to #491

Design

getPreferredAllocation
├─ Full GPU (AlignedAllocationSupported && no annotations)
│   └─ alignedAlloc (topology-based, unaffected by this change)
│
└─ MIG / time-slicing / MPS
├─ allocationPolicy=packed  → packedAlloc (new)
└─ allocationPolicy=distributed (default) → distributedAlloc (existing, unmodified)

Key decisions:

distributedAlloc is completely untouched — packedAlloc is a separate function following the existing alignedAlloc/distributedAlloc pattern
Full GPU nodes are unaffected — alignedAlloc is selected before allocationPolicy is ever checked, so setting packed on a full-GPU node has no effect
Flag applies uniformly — when packed is set, it applies to MIG, time-slicing, and MPS. Silently ignoring a user-set flag for specific device types would be inconsistent
Per-node config supported — works with existing config-manager + ConfigMap + node label (nvidia.com/device-plugin.config) mechanism via YAML config

Usage

CLI flag / Environment variable

--allocation-policy=packed
# or
ALLOCATION_POLICY=packed
Config file (per-node via ConfigMap + node label)

version: v1
flags:
  migStrategy: mixed
  plugin:
    allocationPolicy: packed

kubectl label node mig-node nvidia.com/device-plugin.config=mig-packed

Test plan

TestDistributedAlloc (6 cases) — existing distributed behavior regression
TestDistributedAllocIsDefault — default consistency verified over 10 iterations
TestPackedAlloc (6 cases) — bin-packing: same GPU priority, overflow to next GPU
TestPackedVsDistributedContrast — two strategies produce different results on same input
TestFullGPUNodeIgnoresAllocationPolicy (3 sub-cases) — full GPU / MIG / replicated branch path verification
All existing internal/rm/ tests pass unchanged

copy-pr-bot · 2026-02-11T09:05:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

The existing distributedAlloc spreads replicated devices evenly across physical GPUs, which was designed for time-slicing where workloads compete for shared compute. MIG instances, however, are hardware-isolated and do not suffer from contention when packed onto the same GPU. This adds a --allocation-policy flag (env: ALLOCATION_POLICY) with two options: "distributed" (default, preserving current behavior) and "packed" (bin-packing onto fewest physical GPUs). The packed policy helps free up entire GPUs for full-GPU workloads in mixed clusters. The flag applies uniformly to all non-aligned allocation paths (MIG, time-slicing, MPS) and can be configured per-node via ConfigMap and the nvidia.com/device-plugin.config node label. Signed-off-by: 장재영B <jae.j@tossinvest.com>

wkd-woo force-pushed the feature/packed-allocation-policy branch from 421e3c9 to 6686d1a Compare February 11, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable allocation policy (packed/distributed) for replicated and MIG resources#1621

Add configurable allocation policy (packed/distributed) for replicated and MIG resources#1621
wkd-woo wants to merge 1 commit intoNVIDIA:mainfrom
wkd-woo:feature/packed-allocation-policy

wkd-woo commented Feb 11, 2026

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wkd-woo commented Feb 11, 2026

Summary

Motivation

Design

Usage

CLI flag / Environment variable

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant