DAOS-16362 pydaos: ensure checkpoint path is created#17489
DAOS-16362 pydaos: ensure checkpoint path is created#17489
Conversation
|
Ticket title is 'pytorch checkpoint module' |
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17489/1/execution/node/1067/log |
b356ea2 to
bd6677d
Compare
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17489/3/testReport/ |
|
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17489/3/execution/node/1029/log |
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17489/3/execution/node/1067/log |
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17489/4/testReport/ |
|
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17489/4/execution/node/1013/log |
Some use of Checkpoint assumes that path to the checkpoiunt file will be created with all missing parent directories. For instance, DLIO benchmark writes checkpoints as `/prefix/global_epochX_stepY/layer-Z.pt`. This commit adds `ensure_path` parameter to call `mkdirall` before writing checkpoint file. Features: pytorch Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com>
Features: pytorch Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com>
Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com>
eb23a4c to
951b03b
Compare
Some use of Checkpoint assumes that path to the checkpoiunt file will be created with all missing parent directories.
For instance, DLIO benchmark writes checkpoints as
/prefix/global_epochX_stepY/layer-Z.pt.This commit adds
ensure_pathparameter to callmkdirallbefore writing checkpoint file.Features: pytorch