-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
During handling failures, if some pipeline doesn't have enough number of nodes, Oobleck is supposed to borrow nodes from other pipelines or merge pipelines.
Previous implementation had a prototype implementation, but during refactoring with colossalai backend it is gone. As a result, when there is no pipeline template for the remaining number of nodes in the pipeline, training terminates with an error in OobleckPlugin._instantiate_pipelines().
Metadata
Metadata
Assignees
Labels
No labels