Skip to content

Make Elastic Training Flexible to GPU Memory  #40

@ZeyaWang

Description

@ZeyaWang

Currently, the user needs to choose a local upper bound in adaptdl by themselves so that no memory issue is caused by too large batch size given the GPU memory resource, which should be automated by adaptdl to prevent the elastic training from breaking the GPU memory requirement in the future.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions