Skip to content

Potential race condition w/ NFS #20

@liningpan

Description

@liningpan

Thanks for creating dSQ!

If I understand correctly, when dSQ saves a job status entry it assumes that appending to a file is atomic. However, this is not true on NFS filesystems. Although unlikely, it is theoretically possible corrupt a status file without additional locking (i.e. flock/fcntl). I just want to raise this potential issue since Grace now uses NFSv3 for home folders and scratch.

https://www.man7.org/linux/man-pages/man2/open.2.html

O_APPEND may lead to corrupted files on NFS filesystems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can't be done without a race condition.

dSQ/dSQBatch.py

Lines 121 to 129 in 2d37334

with open(
path.join(args.status_dir[0], "job_{}_status.tsv".format(jid)), "a"
) as out_status:
print(
"{Array_Task_ID}\t{Exit_Code}\t{Hostname}\t{T_Start}\t{T_End}\t{T_Elapsed:.02f}\t{Task}".format(
**out_dict
),
file=out_status,
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions