Skip to content

GPU page discuss specific restrictions of socket/gpu binding on GPU nodes. #834

@wwarriner

Description

@wwarriner

What would you like to see added?

CPU affinity appears in gres.conf and is mapped based on hardware architecture. Affinity means that only certain physical cores are associated with each physical GPU on any GPU node for performance reasons. Cores mapped to a GPU have faster access to that GPU than cores not mapped. On the hardware, this mapping cannot be changed as it is part of the physical layout of the devices. Slurm cannot determine this on its own, so it must be instructed via gres.conf.

In practice, CPU affinity limits the ratio of cores to GPU when requesting GPUs for jobs. Setting aside QoS, if a researcher requests a single GPU and more cores than in the table below, they will potentially get multiple nodes. If they try to force a higher core count to be on a single node with --nodes=1 then the job will get stuck in queue with ReqNodeNotAvail.

The table below ignore QoS limits.

partition max cores:gpu from affinity max cores for 1 gpu
pascal* 14:1 14
ampere* 64:1 64

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions