Skip to content

Add configurable allocation policy (packed/distributed) for replicated and MIG resources#1621

Open
wkd-woo wants to merge 1 commit intoNVIDIA:mainfrom
wkd-woo:feature/packed-allocation-policy
Open

Add configurable allocation policy (packed/distributed) for replicated and MIG resources#1621
wkd-woo wants to merge 1 commit intoNVIDIA:mainfrom
wkd-woo:feature/packed-allocation-policy

Conversation

@wkd-woo
Copy link

@wkd-woo wkd-woo commented Feb 11, 2026

Summary

  • Add --allocation-policy flag (ALLOCATION_POLICY env) with distributed (default) and packed options
  • packed mode bin-packs replicated/MIG devices onto fewest physical GPUs, freeing up remaining GPUs for full-GPU workloads
  • Default behavior (distributed) is unchanged — no breaking changes

Motivation

The current distributedAlloc was designed for time-slicing, where distributing replicas across physical GPUs avoids compute contention. However, MIG devices also fall into this code path simply because AlignedAllocationSupported() returns false for them — not because distributed allocation is the right strategy.

MIG instances are hardware-isolated partitions with dedicated SMs and memory. Packing them onto fewer physical GPUs has no performance penalty, and frees up remaining GPUs for full-GPU workloads:

┌─────────────────────────────────────────────────────────────────────┐
│                Distributed Allocation (current default)              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  GPU 0            GPU 1            GPU 2            GPU 3           │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐      │
│  │ ████░░░░ │    │ ████░░░░ │    │ ████░░░░ │    │ ░░░░░░░░ │      │
│  │ MIG 1/5  │    │ MIG 1/5  │    │ MIG 1/5  │    │ MIG 0/5  │      │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘      │
│   ⚠ partial       ⚠ partial       ⚠ partial       ░ empty          │
│                                                                     │
│  → Full GPU request arrives: only 1 GPU available (GPU 3)           │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    Packed Allocation (bin-packing)                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  GPU 0            GPU 1            GPU 2            GPU 3           │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐      │
│  │ ████████ │    │ ░░░░░░░░ │    │ ░░░░░░░░ │    │ ░░░░░░░░ │      │
│  │ MIG 3/5  │    │ MIG 0/5  │    │ MIG 0/5  │    │ MIG 0/5  │      │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘      │
│   ✓ packed         ★ free          ★ free          ★ free           │
│                                                                     │
│  → Full GPU request arrives: 3 GPUs available (GPU 1, 2, 3)        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Relates to #491

Design

getPreferredAllocation
├─ Full GPU (AlignedAllocationSupported && no annotations)
│   └─ alignedAlloc (topology-based, unaffected by this change)
│
└─ MIG / time-slicing / MPS
├─ allocationPolicy=packed  → packedAlloc (new)
└─ allocationPolicy=distributed (default) → distributedAlloc (existing, unmodified)

Key decisions:

  • distributedAlloc is completely untouchedpackedAlloc is a separate function following the existing alignedAlloc/distributedAlloc pattern
  • Full GPU nodes are unaffectedalignedAlloc is selected before allocationPolicy is ever checked, so setting packed on a full-GPU node has no effect
  • Flag applies uniformly — when packed is set, it applies to MIG, time-slicing, and MPS. Silently ignoring a user-set flag for specific device types would be inconsistent
  • Per-node config supported — works with existing config-manager + ConfigMap + node label (nvidia.com/device-plugin.config) mechanism via YAML config

Usage

CLI flag / Environment variable

--allocation-policy=packed
# or
ALLOCATION_POLICY=packed
Config file (per-node via ConfigMap + node label)

version: v1
flags:
  migStrategy: mixed
  plugin:
    allocationPolicy: packed

kubectl label node mig-node nvidia.com/device-plugin.config=mig-packed

Test plan

  • TestDistributedAlloc (6 cases) — existing distributed behavior regression
  • TestDistributedAllocIsDefault — default consistency verified over 10 iterations
  • TestPackedAlloc (6 cases) — bin-packing: same GPU priority, overflow to next GPU
  • TestPackedVsDistributedContrast — two strategies produce different results on same input
  • TestFullGPUNodeIgnoresAllocationPolicy (3 sub-cases) — full GPU / MIG / replicated branch path verification
    All existing internal/rm/ tests pass unchanged

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

The existing distributedAlloc spreads replicated devices evenly across
physical GPUs, which was designed for time-slicing where workloads
compete for shared compute. MIG instances, however, are hardware-isolated
and do not suffer from contention when packed onto the same GPU.

This adds a --allocation-policy flag (env: ALLOCATION_POLICY) with two
options: "distributed" (default, preserving current behavior) and
"packed" (bin-packing onto fewest physical GPUs). The packed policy
helps free up entire GPUs for full-GPU workloads in mixed clusters.

The flag applies uniformly to all non-aligned allocation paths (MIG,
time-slicing, MPS) and can be configured per-node via ConfigMap and
the nvidia.com/device-plugin.config node label.

Signed-off-by: 장재영B <jae.j@tossinvest.com>
@wkd-woo wkd-woo force-pushed the feature/packed-allocation-policy branch from 421e3c9 to 6686d1a Compare February 11, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant