Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ You can also skip providers you do not have accounts for by commenting them out

### Test rigs

Since we want behavior parity across providers, nearly all of the tests are written in a provider-agnositc way. Each test is passed a test rig as a fixture, and the rig provides the correct way for generating cloudpaths for testing. The test rigs are defined in [`conftest.py`](tests/conftest.py).
Since we want behavior parity across providers, nearly all of the tests are written in a provider-agnostic way. Each test is passed a test rig as a fixture, and the rig provides the correct way for generating cloudpaths for testing. The test rigs are defined in [`conftest.py`](tests/conftest.py).

**Almost none of the tests instantiate `CloudPath` or a `*Client` class directly.**

Expand Down Expand Up @@ -196,7 +196,7 @@ Here's a checklist from the PR template to make sure that you did all the requir

If you are not a maintainer, a maintainer will have to approve your PR to run the test suite in GitHub Actions. No need to ping a maintainer, it will be seen as part of our regular review.

Even once the tests run, two jobs will fail. This is expected. The failures are: (1) The live tests, and (2) the install tests. Both of these require access to the live backends, which are not available to outside contributors. If everything else passes, you can ignore these failiures. A mainter will take the following steps:
Even once the tests run, two jobs will fail. This is expected. The failures are: (1) The live tests, and (2) the install tests. Both of these require access to the live backends, which are not available to outside contributors. If everything else passes, you can ignore these failures. A mainter will take the following steps:

- Create a branch off the main repo for your PR's changes
- Merge your PR into that new branch
Expand All @@ -210,7 +210,7 @@ For example, see a [repo-local branch running the live tests in this PR](https:/

### Adding dependencies

We want `cloudpathlib` to be as lightweight as possible. Our strong preference is to not take any external dependencies for the library outside of the official software development kit (SDK) for the cloud provider. If you want to add a dependency, please open an issue to discuss it first. Library depencies are tracked in `pyproject.toml`.
We want `cloudpathlib` to be as lightweight as possible. Our strong preference is to not take any external dependencies for the library outside of the official software development kit (SDK) for the cloud provider. If you want to add a dependency, please open an issue to discuss it first. Library dependencies are tracked in `pyproject.toml`.

Dependencies that are only needed for building documentation, development, linting, formatting, or testing can be added to `requirements-dev.txt`, and are not subject to the same scrutiny.

Expand Down Expand Up @@ -307,7 +307,7 @@ To see how it is used in PR, you can [see an example here](https://github.com/dr

### Exceptions

Different backends may raise different exception classses when something goes wrong. To make it easy for users to catch exceptions that are agnostic of the backend, we generally will catch and raise a specific exception from [`exceptions.py`](cloudpathlib/exceptions.py) for any exception that we understand. You can add new exceptions to this file if any are needed for new features.
Different backends may raise different exception classes when something goes wrong. To make it easy for users to catch exceptions that are agnostic of the backend, we generally will catch and raise a specific exception from [`exceptions.py`](cloudpathlib/exceptions.py) for any exception that we understand. You can add new exceptions to this file if any are needed for new features.



Expand Down
2 changes: 1 addition & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Includes all changes from v0.18.0.

- API change: Add `ignore` parameter to `CloudPath.copytree` in order to match `shutil` API. ([Issue #145](https://github.com/drivendataorg/cloudpathlib/issues/145), [PR #272](https://github.com/drivendataorg/cloudpathlib/pull/272))
- Use the V2 version for listing objects `list_objects_v2` in `S3Client`. ([Issue #155](https://github.com/drivendataorg/cloudpathlib/issues/155), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302))
- Add abilty to use `.exists` to check for a raw bucket/container (no additional path components). ([Issue #291](https://github.com/drivendataorg/cloudpathlib/issues/291), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302))
- Add ability to use `.exists` to check for a raw bucket/container (no additional path components). ([Issue #291](https://github.com/drivendataorg/cloudpathlib/issues/291), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302))
- Prevent data loss when renaming by skipping files that would be renamed to the same thing. ([Issue #277](https://github.com/drivendataorg/cloudpathlib/issues/277), [PR #278](https://github.com/drivendataorg/cloudpathlib/pull/278))
- Speed up common `glob`/`rglob` patterns. ([Issue #274](https://github.com/drivendataorg/cloudpathlib/issues/274), [PR #276](https://github.com/drivendataorg/cloudpathlib/pull/276))

Expand Down
2 changes: 1 addition & 1 deletion cloudpathlib/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def __init__(
if isinstance(file_cache_mode, str):
file_cache_mode = FileCacheMode(file_cache_mode)

# if not explcitly passed to client, get from env var
# if not explicitly passed to client, get from env var
if file_cache_mode is None:
file_cache_mode = FileCacheMode.from_environment()

Expand Down
2 changes: 1 addition & 1 deletion cloudpathlib/s3/s3client.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ def _remove(self, cloud_path: S3Path, missing_ok: bool = True) -> None:
)

elif file_or_dir == "dir":
# try to delete as a direcotry instead
# try to delete as a directory instead
bucket = self.s3.Bucket(cloud_path.bucket)

prefix = cloud_path.key
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ As noted above, you can also call `.set_as_default_client()` on the client objec

## Other S3 `ExtraArgs` in `boto3`

The S3 SDK, `boto3` supports a set of `ExtraArgs` for uploads, downloads, and listing operations. When you instatiate a client, you can pass the `extra_args` keyword argument with any of those extra args that you want to set. We will pass these on to the upload, download, and list methods insofar as those methods support the specific args.
The S3 SDK, `boto3` supports a set of `ExtraArgs` for uploads, downloads, and listing operations. When you instantiate a client, you can pass the `extra_args` keyword argument with any of those extra args that you want to set. We will pass these on to the upload, download, and list methods insofar as those methods support the specific args.

The args supported for uploads are the same as `boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS`, see the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer) for the latest, but as of the time of writing, these are:

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/caching.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"\n",
"The cache logic also support writing to cloud files seamlessly in addition to reading. We do this by tracking when a `CloudPath` is opened and on the close of that file, we will upload the new version to the cloud if it has changed.\n",
"\n",
"**Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text edior, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:\n",
"**Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text editor, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:\n",
"\n",
"```python\n",
"with my_cloud_path.open(\"w\") as f:\n",
Expand Down Expand Up @@ -269,7 +269,7 @@
"\n",
"However, sometimes I don't want to have to re-download files I know won't change. For example, in the LADI dataset, I may want to use the images in a Jupyter notebook and every time I restart the notebook I want to always have the downloaded files. I don't want to ever re-download since I know the LADI images won't be changing on S3. I want these to be there, even if I restart my whole machine.\n",
"\n",
"We can do this just by using a `Client` that does all the downloading/uploading to a specfic folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)"
"We can do this just by using a `Client` that does all the downloading/uploading to a specific folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)"
]
},
{
Expand Down Expand Up @@ -433,7 +433,7 @@
" - `*Client.clear_cache()` - All files downloaded by this specific client instance will be removed from the cache. If you didn't create a client instance yourself, you can get the one that is used by a cloudpath with `CloudPath.client` or get the default one for a particular provider with `get_default_client`, for example by calling `S3Client.get_default_client().clear_cache()`.\n",
" - By deleting the cached file itself or the containing directory using any normal method. To see where on a disk the cache is, you can use `CloudPath.fspath` for an individual file or use `*Client._local_cache_dir` for the client's cache. You can then use any method you like to delete these local files.\n",
"\n",
"However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing beahvior to the most appropriate one for your use case below, you can have the cache automatically cleared.\n"
"However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing behavior to the most appropriate one for your use case below, you can have the cache automatically cleared.\n"
]
},
{
Expand Down Expand Up @@ -647,7 +647,7 @@
"source": [
"### File cache mode: `\"persistent\"`\n",
"\n",
"If `local_cache_dir` is specificed, but `file_cache_mode` is not, then the mode is set to `\"persistent\"` automatically. Conversely, if you set the mode to `\"persistent\"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.\n",
"If `local_cache_dir` is specified, but `file_cache_mode` is not, then the mode is set to `\"persistent\"` automatically. Conversely, if you set the mode to `\"persistent\"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.\n",
"\n",
"Local cache file exists after file is closed for reading.\n",
"\n",
Expand Down
8 changes: 4 additions & 4 deletions docs/docs/script/caching.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#
# The cache logic also support writing to cloud files seamlessly in addition to reading. We do this by tracking when a `CloudPath` is opened and on the close of that file, we will upload the new version to the cloud if it has changed.
#
# **Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text edior, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:
# **Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text editor, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:
#
# ```python
# with my_cloud_path.open("w") as f:
Expand Down Expand Up @@ -85,7 +85,7 @@
#
# However, sometimes I don't want to have to re-download files I know won't change. For example, in the LADI dataset, I may want to use the images in a Jupyter notebook and every time I restart the notebook I want to always have the downloaded files. I don't want to ever re-download since I know the LADI images won't be changing on S3. I want these to be there, even if I restart my whole machine.
#
# We can do this just by using a `Client` that does all the downloading/uploading to a specfic folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)
# We can do this just by using a `Client` that does all the downloading/uploading to a specific folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)

from cloudpathlib import S3Client

Expand Down Expand Up @@ -159,7 +159,7 @@
# - `*Client.clear_cache()` - All files downloaded by this specific client instance will be removed from the cache. If you didn't create a client instance yourself, you can get the one that is used by a cloudpath with `CloudPath.client` or get the default one for a particular provider with `get_default_client`, for example by calling `S3Client.get_default_client().clear_cache()`.
# - By deleting the cached file itself or the containing directory using any normal method. To see where on a disk the cache is, you can use `CloudPath.fspath` for an individual file or use `*Client._local_cache_dir` for the client's cache. You can then use any method you like to delete these local files.
#
# However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing beahvior to the most appropriate one for your use case below, you can have the cache automatically cleared.
# However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing behavior to the most appropriate one for your use case below, you can have the cache automatically cleared.
#

#
Expand Down Expand Up @@ -280,7 +280,7 @@

# ### File cache mode: `"persistent"`
#
# If `local_cache_dir` is specificed, but `file_cache_mode` is not, then the mode is set to `"persistent"` automatically. Conversely, if you set the mode to `"persistent"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.
# If `local_cache_dir` is specified, but `file_cache_mode` is not, then the mode is set to `"persistent"` automatically. Conversely, if you set the mode to `"persistent"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.
#
# Local cache file exists after file is closed for reading.
#
Expand Down
2 changes: 1 addition & 1 deletion tests/mock_clients/mock_adls_gen2.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

class MockedDataLakeServiceClient:
def __init__(self, test_dir, adls):
# root is parent of the test specific directort
# root is parent of the test specific directory
self.root = test_dir.parent
self.test_dir = test_dir
self.adls = adls
Expand Down
2 changes: 1 addition & 1 deletion tests/mock_clients/mock_s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
TEST_ASSETS = Path(__file__).parent.parent / "assets"
DEFAULT_S3_BUCKET_NAME = "bucket"

# Since we don't contol exactly when the filesystem finishes writing a file
# Since we don't control exactly when the filesystem finishes writing a file
# and the test files are super small, we can end up with race conditions in
# the tests where the updated file is modified before the source file,
# which breaks our caching logic
Expand Down
4 changes: 2 additions & 2 deletions tests/performance/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
logger.remove()
logger.add(lambda msg: tqdm.write(msg, end=""), colorize=True)

# get environement variables
# get environment variables
load_dotenv(find_dotenv())

# enumerate cloudpathlib implementations
Expand Down Expand Up @@ -97,7 +97,7 @@ def _format_row(r):
return table


@cli.command(short_help="Runs peformance test suite against a specific backend and bucket.")
@cli.command(short_help="Runs performance test suite against a specific backend and bucket.")
def run(
backend: CloudEnum,
bucket: Optional[str] = None,
Expand Down
4 changes: 2 additions & 2 deletions tests/test_caching.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ def test_loc_dir(rig: CloudProviderTestRig, tmpdir):
file_cache_mode=FileCacheMode.persistent, **rig.required_client_kwargs
)

# automatically set to persitent if not specified
# automatically set to persistent if not specified
client = rig.client_class(local_cache_dir=tmpdir, **rig.required_client_kwargs)
assert client.file_cache_mode == FileCacheMode.persistent

Expand Down Expand Up @@ -458,7 +458,7 @@ def test_manual_cache_clearing(rig: CloudProviderTestRig):
assert cp._local.exists()
assert cp.client._local_cache_dir.exists()

# clears the file itself, but not the containg folder
# clears the file itself, but not the containing folder
cp.clear_cache()

assert not cp._local.exists()
Expand Down
6 changes: 3 additions & 3 deletions tests/test_cloudpath_file_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,9 +342,9 @@ def test_is_dir_is_file(rig, tmp_path):
assert not test_case.is_dir()

# does not exist (same behavior as pathlib.Path that does not exist)
non_existant = rig.create_cloud_path("dir_0/not_a_file")
assert not non_existant.is_file()
assert not non_existant.is_dir()
non_existent = rig.create_cloud_path("dir_0/not_a_file")
assert not non_existent.is_file()
assert not non_existent.is_dir()


def test_file_read_writes(rig, tmp_path):
Expand Down
2 changes: 1 addition & 1 deletion tests/test_cloudpath_manipulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ def test_parser(rig):
with pytest.raises(NotImplementedError):
rig.create_cloud_path("a/b/c").parser
else:
# always posixpath since our dispath goes to PurePosixPath
# always posixpath since our dispatch goes to PurePosixPath
assert rig.create_cloud_path("a/b/c").parser == posixpath


Expand Down
2 changes: 1 addition & 1 deletion tests/test_gs_specific.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def _calculate_b64_wrapped_md5_hash(contents: str) -> str:
b64string = b64encode(contents_md5_bytes).decode()
return b64string

# if USE_LIVE_CLOUD this doesnt have any effect
# if USE_LIVE_CLOUD this doesn't have any effect
expected_hash = _calculate_b64_wrapped_md5_hash(contents)
monkeypatch.setenv("MOCK_EXPECTED_MD5_HASH", expected_hash)

Expand Down
Loading
Loading