diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 54c98962..b37da474 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -97,7 +97,7 @@ You can also skip providers you do not have accounts for by commenting them out ### Test rigs -Since we want behavior parity across providers, nearly all of the tests are written in a provider-agnositc way. Each test is passed a test rig as a fixture, and the rig provides the correct way for generating cloudpaths for testing. The test rigs are defined in [`conftest.py`](tests/conftest.py). +Since we want behavior parity across providers, nearly all of the tests are written in a provider-agnostic way. Each test is passed a test rig as a fixture, and the rig provides the correct way for generating cloudpaths for testing. The test rigs are defined in [`conftest.py`](tests/conftest.py). **Almost none of the tests instantiate `CloudPath` or a `*Client` class directly.** @@ -196,7 +196,7 @@ Here's a checklist from the PR template to make sure that you did all the requir If you are not a maintainer, a maintainer will have to approve your PR to run the test suite in GitHub Actions. No need to ping a maintainer, it will be seen as part of our regular review. -Even once the tests run, two jobs will fail. This is expected. The failures are: (1) The live tests, and (2) the install tests. Both of these require access to the live backends, which are not available to outside contributors. If everything else passes, you can ignore these failiures. A mainter will take the following steps: +Even once the tests run, two jobs will fail. This is expected. The failures are: (1) The live tests, and (2) the install tests. Both of these require access to the live backends, which are not available to outside contributors. If everything else passes, you can ignore these failures. A mainter will take the following steps: - Create a branch off the main repo for your PR's changes - Merge your PR into that new branch @@ -210,7 +210,7 @@ For example, see a [repo-local branch running the live tests in this PR](https:/ ### Adding dependencies -We want `cloudpathlib` to be as lightweight as possible. Our strong preference is to not take any external dependencies for the library outside of the official software development kit (SDK) for the cloud provider. If you want to add a dependency, please open an issue to discuss it first. Library depencies are tracked in `pyproject.toml`. +We want `cloudpathlib` to be as lightweight as possible. Our strong preference is to not take any external dependencies for the library outside of the official software development kit (SDK) for the cloud provider. If you want to add a dependency, please open an issue to discuss it first. Library dependencies are tracked in `pyproject.toml`. Dependencies that are only needed for building documentation, development, linting, formatting, or testing can be added to `requirements-dev.txt`, and are not subject to the same scrutiny. @@ -307,7 +307,7 @@ To see how it is used in PR, you can [see an example here](https://github.com/dr ### Exceptions -Different backends may raise different exception classses when something goes wrong. To make it easy for users to catch exceptions that are agnostic of the backend, we generally will catch and raise a specific exception from [`exceptions.py`](cloudpathlib/exceptions.py) for any exception that we understand. You can add new exceptions to this file if any are needed for new features. +Different backends may raise different exception classes when something goes wrong. To make it easy for users to catch exceptions that are agnostic of the backend, we generally will catch and raise a specific exception from [`exceptions.py`](cloudpathlib/exceptions.py) for any exception that we understand. You can add new exceptions to this file if any are needed for new features. diff --git a/HISTORY.md b/HISTORY.md index d6cfd234..4060f641 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -113,7 +113,7 @@ Includes all changes from v0.18.0. - API change: Add `ignore` parameter to `CloudPath.copytree` in order to match `shutil` API. ([Issue #145](https://github.com/drivendataorg/cloudpathlib/issues/145), [PR #272](https://github.com/drivendataorg/cloudpathlib/pull/272)) - Use the V2 version for listing objects `list_objects_v2` in `S3Client`. ([Issue #155](https://github.com/drivendataorg/cloudpathlib/issues/155), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302)) - - Add abilty to use `.exists` to check for a raw bucket/container (no additional path components). ([Issue #291](https://github.com/drivendataorg/cloudpathlib/issues/291), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302)) + - Add ability to use `.exists` to check for a raw bucket/container (no additional path components). ([Issue #291](https://github.com/drivendataorg/cloudpathlib/issues/291), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302)) - Prevent data loss when renaming by skipping files that would be renamed to the same thing. ([Issue #277](https://github.com/drivendataorg/cloudpathlib/issues/277), [PR #278](https://github.com/drivendataorg/cloudpathlib/pull/278)) - Speed up common `glob`/`rglob` patterns. ([Issue #274](https://github.com/drivendataorg/cloudpathlib/issues/274), [PR #276](https://github.com/drivendataorg/cloudpathlib/pull/276)) diff --git a/cloudpathlib/client.py b/cloudpathlib/client.py index 1b6c32eb..c4305fc3 100644 --- a/cloudpathlib/client.py +++ b/cloudpathlib/client.py @@ -44,7 +44,7 @@ def __init__( if isinstance(file_cache_mode, str): file_cache_mode = FileCacheMode(file_cache_mode) - # if not explcitly passed to client, get from env var + # if not explicitly passed to client, get from env var if file_cache_mode is None: file_cache_mode = FileCacheMode.from_environment() diff --git a/cloudpathlib/s3/s3client.py b/cloudpathlib/s3/s3client.py index db130e82..87e45a17 100644 --- a/cloudpathlib/s3/s3client.py +++ b/cloudpathlib/s3/s3client.py @@ -322,7 +322,7 @@ def _remove(self, cloud_path: S3Path, missing_ok: bool = True) -> None: ) elif file_or_dir == "dir": - # try to delete as a direcotry instead + # try to delete as a directory instead bucket = self.s3.Bucket(cloud_path.bucket) prefix = cloud_path.key diff --git a/docs/docs/authentication.md b/docs/docs/authentication.md index 36018532..2732f1fe 100644 --- a/docs/docs/authentication.md +++ b/docs/docs/authentication.md @@ -125,7 +125,7 @@ As noted above, you can also call `.set_as_default_client()` on the client objec ## Other S3 `ExtraArgs` in `boto3` -The S3 SDK, `boto3` supports a set of `ExtraArgs` for uploads, downloads, and listing operations. When you instatiate a client, you can pass the `extra_args` keyword argument with any of those extra args that you want to set. We will pass these on to the upload, download, and list methods insofar as those methods support the specific args. +The S3 SDK, `boto3` supports a set of `ExtraArgs` for uploads, downloads, and listing operations. When you instantiate a client, you can pass the `extra_args` keyword argument with any of those extra args that you want to set. We will pass these on to the upload, download, and list methods insofar as those methods support the specific args. The args supported for uploads are the same as `boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS`, see the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer) for the latest, but as of the time of writing, these are: diff --git a/docs/docs/caching.ipynb b/docs/docs/caching.ipynb index a1b98951..f2be92c5 100644 --- a/docs/docs/caching.ipynb +++ b/docs/docs/caching.ipynb @@ -32,7 +32,7 @@ "\n", "The cache logic also support writing to cloud files seamlessly in addition to reading. We do this by tracking when a `CloudPath` is opened and on the close of that file, we will upload the new version to the cloud if it has changed.\n", "\n", - "**Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text edior, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:\n", + "**Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text editor, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:\n", "\n", "```python\n", "with my_cloud_path.open(\"w\") as f:\n", @@ -269,7 +269,7 @@ "\n", "However, sometimes I don't want to have to re-download files I know won't change. For example, in the LADI dataset, I may want to use the images in a Jupyter notebook and every time I restart the notebook I want to always have the downloaded files. I don't want to ever re-download since I know the LADI images won't be changing on S3. I want these to be there, even if I restart my whole machine.\n", "\n", - "We can do this just by using a `Client` that does all the downloading/uploading to a specfic folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)" + "We can do this just by using a `Client` that does all the downloading/uploading to a specific folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)" ] }, { @@ -433,7 +433,7 @@ " - `*Client.clear_cache()` - All files downloaded by this specific client instance will be removed from the cache. If you didn't create a client instance yourself, you can get the one that is used by a cloudpath with `CloudPath.client` or get the default one for a particular provider with `get_default_client`, for example by calling `S3Client.get_default_client().clear_cache()`.\n", " - By deleting the cached file itself or the containing directory using any normal method. To see where on a disk the cache is, you can use `CloudPath.fspath` for an individual file or use `*Client._local_cache_dir` for the client's cache. You can then use any method you like to delete these local files.\n", "\n", - "However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing beahvior to the most appropriate one for your use case below, you can have the cache automatically cleared.\n" + "However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing behavior to the most appropriate one for your use case below, you can have the cache automatically cleared.\n" ] }, { @@ -647,7 +647,7 @@ "source": [ "### File cache mode: `\"persistent\"`\n", "\n", - "If `local_cache_dir` is specificed, but `file_cache_mode` is not, then the mode is set to `\"persistent\"` automatically. Conversely, if you set the mode to `\"persistent\"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.\n", + "If `local_cache_dir` is specified, but `file_cache_mode` is not, then the mode is set to `\"persistent\"` automatically. Conversely, if you set the mode to `\"persistent\"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.\n", "\n", "Local cache file exists after file is closed for reading.\n", "\n", diff --git a/docs/docs/script/caching.py b/docs/docs/script/caching.py index d8bb5b69..688f11f0 100644 --- a/docs/docs/script/caching.py +++ b/docs/docs/script/caching.py @@ -24,7 +24,7 @@ # # The cache logic also support writing to cloud files seamlessly in addition to reading. We do this by tracking when a `CloudPath` is opened and on the close of that file, we will upload the new version to the cloud if it has changed. # -# **Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text edior, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example: +# **Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text editor, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example: # # ```python # with my_cloud_path.open("w") as f: @@ -85,7 +85,7 @@ # # However, sometimes I don't want to have to re-download files I know won't change. For example, in the LADI dataset, I may want to use the images in a Jupyter notebook and every time I restart the notebook I want to always have the downloaded files. I don't want to ever re-download since I know the LADI images won't be changing on S3. I want these to be there, even if I restart my whole machine. # -# We can do this just by using a `Client` that does all the downloading/uploading to a specfic folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.) +# We can do this just by using a `Client` that does all the downloading/uploading to a specific folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.) from cloudpathlib import S3Client @@ -159,7 +159,7 @@ # - `*Client.clear_cache()` - All files downloaded by this specific client instance will be removed from the cache. If you didn't create a client instance yourself, you can get the one that is used by a cloudpath with `CloudPath.client` or get the default one for a particular provider with `get_default_client`, for example by calling `S3Client.get_default_client().clear_cache()`. # - By deleting the cached file itself or the containing directory using any normal method. To see where on a disk the cache is, you can use `CloudPath.fspath` for an individual file or use `*Client._local_cache_dir` for the client's cache. You can then use any method you like to delete these local files. # -# However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing beahvior to the most appropriate one for your use case below, you can have the cache automatically cleared. +# However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing behavior to the most appropriate one for your use case below, you can have the cache automatically cleared. # # @@ -280,7 +280,7 @@ # ### File cache mode: `"persistent"` # -# If `local_cache_dir` is specificed, but `file_cache_mode` is not, then the mode is set to `"persistent"` automatically. Conversely, if you set the mode to `"persistent"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`. +# If `local_cache_dir` is specified, but `file_cache_mode` is not, then the mode is set to `"persistent"` automatically. Conversely, if you set the mode to `"persistent"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`. # # Local cache file exists after file is closed for reading. # diff --git a/tests/mock_clients/mock_adls_gen2.py b/tests/mock_clients/mock_adls_gen2.py index aefdb735..8d17b662 100644 --- a/tests/mock_clients/mock_adls_gen2.py +++ b/tests/mock_clients/mock_adls_gen2.py @@ -9,7 +9,7 @@ class MockedDataLakeServiceClient: def __init__(self, test_dir, adls): - # root is parent of the test specific directort + # root is parent of the test specific directory self.root = test_dir.parent self.test_dir = test_dir self.adls = adls diff --git a/tests/mock_clients/mock_s3.py b/tests/mock_clients/mock_s3.py index 4fb37f94..9f75f950 100644 --- a/tests/mock_clients/mock_s3.py +++ b/tests/mock_clients/mock_s3.py @@ -12,7 +12,7 @@ TEST_ASSETS = Path(__file__).parent.parent / "assets" DEFAULT_S3_BUCKET_NAME = "bucket" -# Since we don't contol exactly when the filesystem finishes writing a file +# Since we don't control exactly when the filesystem finishes writing a file # and the test files are super small, we can end up with race conditions in # the tests where the updated file is modified before the source file, # which breaks our caching logic diff --git a/tests/performance/cli.py b/tests/performance/cli.py index dc090437..4ef00456 100644 --- a/tests/performance/cli.py +++ b/tests/performance/cli.py @@ -20,7 +20,7 @@ logger.remove() logger.add(lambda msg: tqdm.write(msg, end=""), colorize=True) -# get environement variables +# get environment variables load_dotenv(find_dotenv()) # enumerate cloudpathlib implementations @@ -97,7 +97,7 @@ def _format_row(r): return table -@cli.command(short_help="Runs peformance test suite against a specific backend and bucket.") +@cli.command(short_help="Runs performance test suite against a specific backend and bucket.") def run( backend: CloudEnum, bucket: Optional[str] = None, diff --git a/tests/test_caching.py b/tests/test_caching.py index feb613a6..5223df71 100644 --- a/tests/test_caching.py +++ b/tests/test_caching.py @@ -202,7 +202,7 @@ def test_loc_dir(rig: CloudProviderTestRig, tmpdir): file_cache_mode=FileCacheMode.persistent, **rig.required_client_kwargs ) - # automatically set to persitent if not specified + # automatically set to persistent if not specified client = rig.client_class(local_cache_dir=tmpdir, **rig.required_client_kwargs) assert client.file_cache_mode == FileCacheMode.persistent @@ -458,7 +458,7 @@ def test_manual_cache_clearing(rig: CloudProviderTestRig): assert cp._local.exists() assert cp.client._local_cache_dir.exists() - # clears the file itself, but not the containg folder + # clears the file itself, but not the containing folder cp.clear_cache() assert not cp._local.exists() diff --git a/tests/test_cloudpath_file_io.py b/tests/test_cloudpath_file_io.py index d367e1ae..16c835f9 100644 --- a/tests/test_cloudpath_file_io.py +++ b/tests/test_cloudpath_file_io.py @@ -342,9 +342,9 @@ def test_is_dir_is_file(rig, tmp_path): assert not test_case.is_dir() # does not exist (same behavior as pathlib.Path that does not exist) - non_existant = rig.create_cloud_path("dir_0/not_a_file") - assert not non_existant.is_file() - assert not non_existant.is_dir() + non_existent = rig.create_cloud_path("dir_0/not_a_file") + assert not non_existent.is_file() + assert not non_existent.is_dir() def test_file_read_writes(rig, tmp_path): diff --git a/tests/test_cloudpath_manipulation.py b/tests/test_cloudpath_manipulation.py index b9e70669..9e314299 100644 --- a/tests/test_cloudpath_manipulation.py +++ b/tests/test_cloudpath_manipulation.py @@ -202,7 +202,7 @@ def test_parser(rig): with pytest.raises(NotImplementedError): rig.create_cloud_path("a/b/c").parser else: - # always posixpath since our dispath goes to PurePosixPath + # always posixpath since our dispatch goes to PurePosixPath assert rig.create_cloud_path("a/b/c").parser == posixpath diff --git a/tests/test_gs_specific.py b/tests/test_gs_specific.py index f17d0898..048d9580 100644 --- a/tests/test_gs_specific.py +++ b/tests/test_gs_specific.py @@ -71,7 +71,7 @@ def _calculate_b64_wrapped_md5_hash(contents: str) -> str: b64string = b64encode(contents_md5_bytes).decode() return b64string - # if USE_LIVE_CLOUD this doesnt have any effect + # if USE_LIVE_CLOUD this doesn't have any effect expected_hash = _calculate_b64_wrapped_md5_hash(contents) monkeypatch.setenv("MOCK_EXPECTED_MD5_HASH", expected_hash) diff --git a/tests/test_s3_specific.py b/tests/test_s3_specific.py index 4b12f7b9..d9edc94e 100644 --- a/tests/test_s3_specific.py +++ b/tests/test_s3_specific.py @@ -126,10 +126,10 @@ def _execute_on_subprocess_and_observe(use_threads): return max_threads - # usually ~3 threads are spun up whe use_threads is False + # usually ~3 threads are spun up when use_threads is False assert _execute_on_subprocess_and_observe(use_threads=False) < 5 - # usually ~15 threads are spun up whe use_threads is True + # usually ~15 threads are spun up when use_threads is True assert _execute_on_subprocess_and_observe(use_threads=True) > 10