Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygale barneygale commented Mar 23, 2025

Make [Writable]Path._copy_from() yield (target, source, part_size) tuples when copying files and directories. A tuple with part_size=0 is emitted for every path encountered, and further tuples with part_size>0 may be emitted when copying regular files.

This should allow anyio.Path to wrap _copy_from() and make it cancelable.

Make `WritablePath._copy_from()` yield `(target, source, part_size)`
tuples when copying files and directories. A tuple with `part_size=0` is
emitted for every path encountered, and further tuples with `part_size>0`
**may** be emitted when copying regular files.

This should allow `anyio.Path` to wrap `_copy_from()` and make it
cancelable.
@barneygale
Copy link
Contributor Author

Sorry it took me ages to get around to this Thomas. I'm happy to adjust the patch to make your life as easy as possible, so let me know what you'd like!

@graingert
Copy link
Contributor

Amazing I'll try and integrate this into an anyio PR asap

@barneygale
Copy link
Contributor Author

FWIW, the _copy_from() method has an underscore prefix because I haven't yet found a good justification to add it to the public pathlib.Path interface, and "it's needed in the ABCs" doesn't cut it while the ABCs are still private. Frustratingly it's the only method in this category.

I wonder if converting it to a generator changes the picture? It might be useful for non-anyio users who want to track progress of large copies. If so I guess we'd rename it to iter_copy_from/similar.

target = self.with_segments(target)
ensure_distinct_paths(self, target)
target._copy_from(self, **kwargs)
list(target._copy_from(self, **kwargs)) # Consume generator.
Copy link
Contributor

@graingert graingert Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the docs in itertools the best way to consume a generator is collections.deque:

def consume(iterator, n=None):
    "Advance the iterator n-steps ahead. If n is None, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        deque(iterator, maxlen=0)
    else:
        next(islice(iterator, n, n), None)

it's a neat little short-cut when deque is maxlen=0

if (maxlen == 0)
return consume_iterator(it);

Suggested change
list(target._copy_from(self, **kwargs)) # Consume generator.
collections.deque(target._copy_from(self, **kwargs), maxlen=0) # Consume generator.

@graingert
Copy link
Contributor

looking really good, ideally I'd like to see progress support for for move(_into) and rmdir

also it looks like move will double the storage usage until it finishes and starts deleting. Would it be possible to move each file at a time? If I cancel the operation I expect to have about the same amount of storage consumed in total

eg Path._copy_from(remove_source=True)

(I get this is a huge ask!)

@barneygale
Copy link
Contributor Author

That feels a tad dangerous to me, because a failed transfer would leave a user with two incomplete directory trees they'd need to reconcile.

(Path.move() tries to use os.replace() first, which avoids the issue when the source and target are on the same fs)

@graingert
Copy link
Contributor

ok that's fair, but for move I'd still like the progress of the delete, perhaps deletes could be tuples of (path, -1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants