-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Labels
cachingIssues that would be fixed or improved by a separate, customizable caching layer.Issues that would be fixed or improved by a separate, customizable caching layer.
Description
The is_dir check is fairly expensive, but at least for S3 and Azure when the entries were created as a result of the client's _list_dir method, you can tell for each entry whether it is a directory or a file and immediately set the result on the created CloudPath instance.
For example for the S3Client._list_dir, you could write something like:
paginator = self.client.get_paginator("list_objects_v2")
for result in paginator.paginate(
Bucket=cloud_path.bucket, Prefix=prefix, Delimiter="/", MaxKeys=1000
):
# sub directory names
for result_prefix in result.get("CommonPrefixes", []):
path = S3Path(f"s3://{cloud_path.bucket}/{result_prefix.get('Prefix')}")
path._is_dir = True
yield path
# files in the directory
for result_key in result.get("Contents", []):
path = S3Path(f"s3://{cloud_path.bucket}/{result_key.get('Key')}")
path._is_dir = False
yield pathand modify S3Path.is_dir:
def is_dir(self) -> bool:
if self._is_dir is None:
self._is_dir = self.client._is_file_or_dir(self) == "dir"
return self._is_dirThis makes a HUGE performance difference if you need to call is_dir on the entries returned from iterdir or glob (in my case, when implementing a file dialog that works for cloud paths).
Not sure if this particular implementation is the best way to do this, but something like this is needed.
Gilthans and connorbrinton
Metadata
Metadata
Assignees
Labels
cachingIssues that would be fixed or improved by a separate, customizable caching layer.Issues that would be fixed or improved by a separate, customizable caching layer.