Skip to content

Commit 620e073

Browse files
committed
fix
Signed-off-by: oliver könig <[email protected]>
1 parent c507dda commit 620e073

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

nemo_run/core/execution/templates/ft_launcher_dgxc.j2

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ echo "[FT-Setup] Starting training on $(hostname)..."
3232
# Note: In high-scale K8s, writing to a single file from 1000 pods can cause lock contention.
3333
# If scale is small, this is fine.
3434
if [ -n "$FAULT_TOL_JOB_RESULTS_FILE" ]; then
35+
mkdir -p "$(dirname "$FAULT_TOL_JOB_RESULTS_FILE")"
3536
echo "$(hostname) $(date +%s) X" >> "$FAULT_TOL_JOB_RESULTS_FILE"
3637
fi
3738

0 commit comments

Comments
 (0)