Skip to content

Conversation

bnikolic
Copy link

@bnikolic bnikolic commented Sep 29, 2025

On rare occasions shutil.copyfile fails when source file is on AWS Lustre implementation because the sendfile(2) syscall returns ENODATA. This patch works around this by disabling the sendfile(2) implementation in the same way as if sendfile(2) was not available.

@bedevere-app
Copy link

bedevere-app bot commented Sep 29, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@picnixz picnixz changed the title GH-139416: Fix copyfile failure due to sendfile + Lustre GH-139416: Fix portability of sendfile(2) support detection for Lustre filesystems Sep 29, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a NEWS entry.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with this but is it possible that ENODATA could be sent for other reasons where we shouldn't fall back to the legacy sendfile?

Improve whitespace

Co-authored-by: Bénédikt Tran <[email protected]>
@python-cla-bot
Copy link

python-cla-bot bot commented Sep 29, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bnikolic
Copy link
Author

Ah, there is one thing that comes to mind, thank you: what if ENODATA comes after some number of blocks have already been written. Need to check if that is possible and/or how to hanlde. Will convert this to draft for time being.

@bnikolic bnikolic marked this pull request as draft September 29, 2025 15:26
@picnixz
Copy link
Member

picnixz commented Sep 29, 2025

Ah, there is one thing that comes to mind, thank you: what if ENODATA comes after some number of blocks have already been written. Need to check if that is possible and/or how to hanlde. Will convert this to draft for time being.

Thank you for investigating this yes.

Comment on lines 221 to 229
dstpos = os.lseek(outfd, 0, os.SEEK_CUR)
if dstpos > 0:
# Some data has already been written but we use
# sendfile in a mode that does not update the
# input fd position when reading. Hence seek the
# input fd to the correct position before falling
# back on POSIX read/write method. Since sendfile
# requires mmapable infd, it should also be seekable
os.lseek(infd, dstpos, os.SEEK_SET)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dstpos = os.lseek(outfd, 0, os.SEEK_CUR)
if dstpos > 0:
# Some data has already been written but we use
# sendfile in a mode that does not update the
# input fd position when reading. Hence seek the
# input fd to the correct position before falling
# back on POSIX read/write method. Since sendfile
# requires mmapable infd, it should also be seekable
os.lseek(infd, dstpos, os.SEEK_SET)
# 'infd' and 'outfd' are assumed to be seekable,
# as they are checked to be "regular" files.
if offset > 0:
# Some data has already been written but we use
# sendfile in a mode that does not update the
# input fd position when reading. Hence seek the
# input fd to the correct position before falling
# back on POSIX read/write method.
os.lseek(infd, offset, os.SEEK_SET)

AFAIK, offset would match dstpos right? (or maybe offset + 1, I didn't think about it much here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check this but I think this would not work, because offset is updated only at the bottom with the return value of sendfile() (which is the number of bytes actually sent). If are catching the exception then no return value (I think it would be available in C). So this is why the offset is not used in in the next condition but rather the return of lseek, even if it is just to check no data have been written.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. I think I would leave the dstpos = os.lseek(outfd, 0, os.SEEK_CUR) in case sendfile has failed after writing some data. This would be consistent with the usage below (Give up on first call and if no data was copied and subsequent line) which was I think done for same reason.

side_effect=syscall) as m2:
with self.get_files() as (src, dst):
with self.assertRaises(_GiveupOnFastCopy) as cm:
self.zerocopy_fun(src, dst)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with (
    unittest.mock.patch('os.fstat', return_value=mock):
    unittest.mock.patch(self.PATCHPOINT, create=True,
                        side_effect=syscall):
    self.get_files() as (src, dst)
):
    self.assertRaises(_GiveupOnFastCopy, self.zerocopy_func, src, dst)

You'll need to dedent the rest of the code as well. Also, don't bind context managers if they are not used.

@bnikolic
Copy link
Author

bnikolic commented Oct 1, 2025

As this got a bit more complicated than I expected I'm going to see if I can back-port this into one of our production-like runs just to check nothing unexpected happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants