What you would like to be added?
We need to define groups of pods within a PodCliqueScalingGroup (PCSG) that share the exact same GPU(s). When scaling the PCSG, all GPU-sharing groups should be replicated together.
Use Case
Within a single PCSG, define multiple "GPU-sharing groups" where pods share GPUs:
Each sharing group has 2+ pods (e.g., primary-shadow pair)
Pods within a sharing group access the same GPU(s)
Different sharing groups use different GPU sets
When scaling the PCSG, all groups are replicated together
Example:
PCSG (replicas: 2)
├── Replica 0
│ ├── Sharing Group "worker1": primary-1 + shadow-1 → share GPU 0-7
│ └── Sharing Group "worker2": primary-2 + shadow-2 → share GPU 8-15
└── Replica 1
├── Sharing Group "worker3": primary-3 + shadow-3 → share GPU 16-23
└── Sharing Group "worker4": primary-4 + shadow-4 → share GPU 24-31
Current Gap
No way to specify which pods within a PCSG should share GPUs
No mechanism to define multiple independent GPU-sharing groups
Traditional GPU requests allocate separate GPUs to each pod
Why is this needed?
Dynamo GPU emory service