drivendataorg · jayqi · Mar 13, 2025 · Mar 13, 2025
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -97,7 +97,7 @@ You can also skip providers you do not have accounts for by commenting them out
 
 ### Test rigs
 
-Since we want behavior parity across providers, nearly all of the tests are written in a provider-agnositc way. Each test is passed a test rig as a fixture, and the rig provides the correct way for generating cloudpaths for testing. The test rigs are defined in [`conftest.py`](tests/conftest.py).
+Since we want behavior parity across providers, nearly all of the tests are written in a provider-agnostic way. Each test is passed a test rig as a fixture, and the rig provides the correct way for generating cloudpaths for testing. The test rigs are defined in [`conftest.py`](tests/conftest.py).
 
 **Almost none of the tests instantiate `CloudPath` or a `*Client` class directly.**
 
@@ -196,7 +196,7 @@ Here's a checklist from the PR template to make sure that you did all the requir
 
 If you are not a maintainer, a maintainer will have to approve your PR to run the test suite in GitHub Actions. No need to ping a maintainer, it will be seen as part of our regular review.
 
-Even once the tests run, two jobs will fail. This is expected. The failures are: (1) The live tests, and (2) the install tests. Both of these require access to the live backends, which are not available to outside contributors. If everything else passes, you can ignore these failiures. A mainter will take the following steps:
+Even once the tests run, two jobs will fail. This is expected. The failures are: (1) The live tests, and (2) the install tests. Both of these require access to the live backends, which are not available to outside contributors. If everything else passes, you can ignore these failures. A mainter will take the following steps:
 
  - Create a branch off the main repo for your PR's changes
  - Merge your PR into that new branch
@@ -210,7 +210,7 @@ For example, see a [repo-local branch running the live tests in this PR](https:/
 
 ### Adding dependencies
 
-We want `cloudpathlib` to be as lightweight as possible. Our strong preference is to not take any external dependencies for the library outside of the official software development kit (SDK) for the cloud provider. If you want to add a dependency, please open an issue to discuss it first. Library depencies are tracked in `pyproject.toml`.
+We want `cloudpathlib` to be as lightweight as possible. Our strong preference is to not take any external dependencies for the library outside of the official software development kit (SDK) for the cloud provider. If you want to add a dependency, please open an issue to discuss it first. Library dependencies are tracked in `pyproject.toml`.
 
 Dependencies that are only needed for building documentation, development, linting, formatting, or testing can be added to `requirements-dev.txt`, and are not subject to the same scrutiny.
 
@@ -307,7 +307,7 @@ To see how it is used in PR, you can [see an example here](https://github.com/dr
 
 ### Exceptions
 
-Different backends may raise different exception classses when something goes wrong. To make it easy for users to catch exceptions that are agnostic of the backend, we generally will catch and raise a specific exception from [`exceptions.py`](cloudpathlib/exceptions.py) for any exception that we understand. You can add new exceptions to this file if any are needed for new features.
+Different backends may raise different exception classes when something goes wrong. To make it easy for users to catch exceptions that are agnostic of the backend, we generally will catch and raise a specific exception from [`exceptions.py`](cloudpathlib/exceptions.py) for any exception that we understand. You can add new exceptions to this file if any are needed for new features.
 
 
 

diff --git a/HISTORY.md b/HISTORY.md
@@ -113,7 +113,7 @@ Includes all changes from v0.18.0.
 
  - API change: Add `ignore` parameter to `CloudPath.copytree` in order to match `shutil` API. ([Issue #145](https://github.com/drivendataorg/cloudpathlib/issues/145), [PR #272](https://github.com/drivendataorg/cloudpathlib/pull/272))
  - Use the V2 version for listing objects `list_objects_v2` in `S3Client`. ([Issue #155](https://github.com/drivendataorg/cloudpathlib/issues/155), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302))
- - Add abilty to use `.exists` to check for a raw bucket/container (no additional path components). ([Issue #291](https://github.com/drivendataorg/cloudpathlib/issues/291), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302))
+ - Add ability to use `.exists` to check for a raw bucket/container (no additional path components). ([Issue #291](https://github.com/drivendataorg/cloudpathlib/issues/291), [PR #302](https://github.com/drivendataorg/cloudpathlib/pull/302))
  - Prevent data loss when renaming by skipping files that would be renamed to the same thing. ([Issue #277](https://github.com/drivendataorg/cloudpathlib/issues/277), [PR #278](https://github.com/drivendataorg/cloudpathlib/pull/278))
  - Speed up common `glob`/`rglob` patterns. ([Issue #274](https://github.com/drivendataorg/cloudpathlib/issues/274), [PR #276](https://github.com/drivendataorg/cloudpathlib/pull/276))
 

diff --git a/cloudpathlib/client.py b/cloudpathlib/client.py
@@ -44,7 +44,7 @@ def __init__(
         if isinstance(file_cache_mode, str):
             file_cache_mode = FileCacheMode(file_cache_mode)
 
-        # if not explcitly passed to client, get from env var
+        # if not explicitly passed to client, get from env var
         if file_cache_mode is None:
             file_cache_mode = FileCacheMode.from_environment()
 

diff --git a/cloudpathlib/s3/s3client.py b/cloudpathlib/s3/s3client.py
@@ -322,7 +322,7 @@ def _remove(self, cloud_path: S3Path, missing_ok: bool = True) -> None:
                 )
 
         elif file_or_dir == "dir":
-            # try to delete as a direcotry instead
+            # try to delete as a directory instead
             bucket = self.s3.Bucket(cloud_path.bucket)
 
             prefix = cloud_path.key

diff --git a/docs/docs/authentication.md b/docs/docs/authentication.md
@@ -125,7 +125,7 @@ As noted above, you can also call `.set_as_default_client()` on the client objec
 
 ## Other S3 `ExtraArgs` in `boto3`
 
-The S3 SDK, `boto3` supports a set of `ExtraArgs` for uploads, downloads, and listing operations. When you instatiate a client, you can pass the `extra_args` keyword argument with any of those extra args that you want to set. We will pass these on to the upload, download, and list methods insofar as those methods support the specific args.
+The S3 SDK, `boto3` supports a set of `ExtraArgs` for uploads, downloads, and listing operations. When you instantiate a client, you can pass the `extra_args` keyword argument with any of those extra args that you want to set. We will pass these on to the upload, download, and list methods insofar as those methods support the specific args.
 
 The args supported for uploads are the same as `boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS`, see the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer) for the latest, but as of the time of writing, these are:
 

diff --git a/docs/docs/caching.ipynb b/docs/docs/caching.ipynb
@@ -32,7 +32,7 @@
     "\n",
     "The cache logic also support writing to cloud files seamlessly in addition to reading. We do this by tracking when a `CloudPath` is opened and on the close of that file, we will upload the new version to the cloud if it has changed.\n",
     "\n",
-    "**Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text edior, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:\n",
+    "**Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text editor, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:\n",
     "\n",
     "```python\n",
     "with my_cloud_path.open(\"w\") as f:\n",
@@ -269,7 +269,7 @@
     "\n",
     "However, sometimes I don't want to have to re-download files I know won't change. For example, in the LADI dataset, I may want to use the images in a Jupyter notebook and every time I restart the notebook I want to always have the downloaded files. I don't want to ever re-download since I know the LADI images won't be changing on S3. I want these to be there, even if I restart my whole machine.\n",
     "\n",
-    "We can do this just by using a `Client` that does all the downloading/uploading to a specfic folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)"
+    "We can do this just by using a `Client` that does all the downloading/uploading to a specific folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)"
    ]
   },
   {
@@ -433,7 +433,7 @@
     " - `*Client.clear_cache()` - All files downloaded by this specific client instance will be removed from the cache. If you didn't create a client instance yourself, you can get the one that is used by a cloudpath with `CloudPath.client` or get the default one for a particular provider with `get_default_client`, for example by calling `S3Client.get_default_client().clear_cache()`.\n",
     " - By deleting the cached file itself or the containing directory using any normal method. To see where on a disk the cache is, you can use `CloudPath.fspath` for an individual file or use `*Client._local_cache_dir` for the client's cache. You can then use any method you like to delete these local files.\n",
     "\n",
-    "However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing beahvior to the most appropriate one for your use case below, you can have the cache automatically cleared.\n"
+    "However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing behavior to the most appropriate one for your use case below, you can have the cache automatically cleared.\n"
    ]
   },
   {
@@ -647,7 +647,7 @@
    "source": [
     "### File cache mode: `\"persistent\"`\n",
     "\n",
-    "If `local_cache_dir` is specificed, but `file_cache_mode` is not, then the mode is set to `\"persistent\"` automatically. Conversely, if you set the mode to `\"persistent\"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.\n",
+    "If `local_cache_dir` is specified, but `file_cache_mode` is not, then the mode is set to `\"persistent\"` automatically. Conversely, if you set the mode to `\"persistent\"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.\n",
     "\n",
     "Local cache file exists after file is closed for reading.\n",
     "\n",

diff --git a/docs/docs/script/caching.py b/docs/docs/script/caching.py
@@ -24,7 +24,7 @@
 # 
 # The cache logic also support writing to cloud files seamlessly in addition to reading. We do this by tracking when a `CloudPath` is opened and on the close of that file, we will upload the new version to the cloud if it has changed.
 # 
-# **Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text edior, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:
+# **Warning** we don't upload files that weren't opened for write by `cloudpathlib`. For example, if you edit a file in the cache manually in a text editor, `cloudpathlib` won't know to update that file on the cloud. If you want to write to a file in the cloud, you should use the `open` or `write` methods, for example:
 # 
 # ```python
 # with my_cloud_path.open("w") as f:
@@ -85,7 +85,7 @@
 # 
 # However, sometimes I don't want to have to re-download files I know won't change. For example, in the LADI dataset, I may want to use the images in a Jupyter notebook and every time I restart the notebook I want to always have the downloaded files. I don't want to ever re-download since I know the LADI images won't be changing on S3. I want these to be there, even if I restart my whole machine.
 # 
-# We can do this just by using a `Client` that does all the downloading/uploading to a specfic folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)
+# We can do this just by using a `Client` that does all the downloading/uploading to a specific folder on our local machine. We set the cache folder by passing `local_cache_dir` to the `Client` when instantiating. You can also set a default for all clients by setting the `CLOUDPATHLIB_LOCAL_CACHE_DIR` to a path. (This is only recommended with (1) an absolute path, so you know where the cache is no matter where your code is running, and (2) if you only use the default client for one cloud provider and don't instantiate multiple. In this case, the clients will use the same cache dir and could overwrite each other's content. Setting `CLOUDPATHLIB_LOCAL_CACHE_DIR` to an empty string will be treated as it not being set.)
 
 from cloudpathlib import S3Client
 
@@ -159,7 +159,7 @@
 #  - `*Client.clear_cache()` - All files downloaded by this specific client instance will be removed from the cache. If you didn't create a client instance yourself, you can get the one that is used by a cloudpath with `CloudPath.client` or get the default one for a particular provider with `get_default_client`, for example by calling `S3Client.get_default_client().clear_cache()`.
 #  - By deleting the cached file itself or the containing directory using any normal method. To see where on a disk the cache is, you can use `CloudPath.fspath` for an individual file or use `*Client._local_cache_dir` for the client's cache. You can then use any method you like to delete these local files.
 # 
-# However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing beahvior to the most appropriate one for your use case below, you can have the cache automatically cleared.
+# However, for most cases, you shouldn't need to manage the file cache manually. By setting the automatic cache clearing behavior to the most appropriate one for your use case below, you can have the cache automatically cleared.
 # 
 
 # 
@@ -280,7 +280,7 @@
 
 # ### File cache mode: `"persistent"`
 # 
-# If `local_cache_dir` is specificed, but `file_cache_mode` is not, then the mode is set to `"persistent"` automatically. Conversely, if you set the mode to `"persistent"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.
+# If `local_cache_dir` is specified, but `file_cache_mode` is not, then the mode is set to `"persistent"` automatically. Conversely, if you set the mode to `"persistent"` explicitly, you must also pass `local_cache_dir` or the `Client` will raise `InvalidConfigurationException`.
 # 
 # Local cache file exists after file is closed for reading.
 # 

diff --git a/tests/mock_clients/mock_adls_gen2.py b/tests/mock_clients/mock_adls_gen2.py
@@ -9,7 +9,7 @@
 
 class MockedDataLakeServiceClient:
     def __init__(self, test_dir, adls):
-        # root is parent of the test specific directort
+        # root is parent of the test specific directory
         self.root = test_dir.parent
         self.test_dir = test_dir
         self.adls = adls

diff --git a/tests/mock_clients/mock_s3.py b/tests/mock_clients/mock_s3.py
@@ -12,7 +12,7 @@
 TEST_ASSETS = Path(__file__).parent.parent / "assets"
 DEFAULT_S3_BUCKET_NAME = "bucket"
 
-# Since we don't contol exactly when the filesystem finishes writing a file
+# Since we don't control exactly when the filesystem finishes writing a file
 # and the test files are super small, we can end up with race conditions in
 # the tests where the updated file is modified before the source file,
 # which breaks our caching logic

diff --git a/tests/performance/cli.py b/tests/performance/cli.py
@@ -20,7 +20,7 @@
 logger.remove()
 logger.add(lambda msg: tqdm.write(msg, end=""), colorize=True)
 
-# get environement variables
+# get environment variables
 load_dotenv(find_dotenv())
 
 # enumerate cloudpathlib implementations
@@ -97,7 +97,7 @@ def _format_row(r):
     return table
 
 
-@cli.command(short_help="Runs peformance test suite against a specific backend and bucket.")
+@cli.command(short_help="Runs performance test suite against a specific backend and bucket.")
 def run(
     backend: CloudEnum,
     bucket: Optional[str] = None,

diff --git a/tests/test_caching.py b/tests/test_caching.py
@@ -202,7 +202,7 @@ def test_loc_dir(rig: CloudProviderTestRig, tmpdir):
             file_cache_mode=FileCacheMode.persistent, **rig.required_client_kwargs
         )
 
-    # automatically set to persitent if not specified
+    # automatically set to persistent if not specified
     client = rig.client_class(local_cache_dir=tmpdir, **rig.required_client_kwargs)
     assert client.file_cache_mode == FileCacheMode.persistent
 
@@ -458,7 +458,7 @@ def test_manual_cache_clearing(rig: CloudProviderTestRig):
     assert cp._local.exists()
     assert cp.client._local_cache_dir.exists()
 
-    # clears the file itself, but not the containg folder
+    # clears the file itself, but not the containing folder
     cp.clear_cache()
 
     assert not cp._local.exists()

diff --git a/tests/test_cloudpath_file_io.py b/tests/test_cloudpath_file_io.py
@@ -342,9 +342,9 @@ def test_is_dir_is_file(rig, tmp_path):
         assert not test_case.is_dir()
 
     # does not exist (same behavior as pathlib.Path that does not exist)
-    non_existant = rig.create_cloud_path("dir_0/not_a_file")
-    assert not non_existant.is_file()
-    assert not non_existant.is_dir()
+    non_existent = rig.create_cloud_path("dir_0/not_a_file")
+    assert not non_existent.is_file()
+    assert not non_existent.is_dir()
 
 
 def test_file_read_writes(rig, tmp_path):

diff --git a/tests/test_cloudpath_manipulation.py b/tests/test_cloudpath_manipulation.py
@@ -202,7 +202,7 @@ def test_parser(rig):
         with pytest.raises(NotImplementedError):
             rig.create_cloud_path("a/b/c").parser
     else:
-        # always posixpath since our dispath goes to PurePosixPath
+        # always posixpath since our dispatch goes to PurePosixPath
         assert rig.create_cloud_path("a/b/c").parser == posixpath
 
 

diff --git a/tests/test_gs_specific.py b/tests/test_gs_specific.py
@@ -71,7 +71,7 @@ def _calculate_b64_wrapped_md5_hash(contents: str) -> str:
         b64string = b64encode(contents_md5_bytes).decode()
         return b64string
 
-    # if USE_LIVE_CLOUD this doesnt have any effect
+    # if USE_LIVE_CLOUD this doesn't have any effect
     expected_hash = _calculate_b64_wrapped_md5_hash(contents)
     monkeypatch.setenv("MOCK_EXPECTED_MD5_HASH", expected_hash)