How can I pin the GPU memory?

During the process of model training, the required GPU memory is always changing. For example, when I use DAPO to train Qwen2.5-Math-7B(using four A100 GPUs), each card needs nearly 60G of GPU memory at most during the training, while it only requires less than 1G at least. Since currently, I'm sharing an 8-card A100 machine with multiple people, during the training process, the situation often occurs: the GPU memory is occupied by others immediately after being released, resulting in an "Out Of Memory" (OOM) error . Therefore, I hope to pin the GPU memory during training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I pin the GPU memory? #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How can I pin the GPU memory? #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions