Skip to content

How can I pin the GPU memory? #35

@congli01

Description

@congli01

During the process of model training, the required GPU memory is always changing. For example, when I use DAPO to train Qwen2.5-Math-7B(using four A100 GPUs), each card needs nearly 60G of GPU memory at most during the training, while it only requires less than 1G at least. Since currently, I'm sharing an 8-card A100 machine with multiple people, during the training process, the situation often occurs: the GPU memory is occupied by others immediately after being released, resulting in an "Out Of Memory" (OOM) error . Therefore, I hope to pin the GPU memory during training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions