Skip to content

Commit 1200550

Browse files
pjbullljyanesmljyanesm
authored
Refactor GS authentication to use default credentials (#514) (#526)
* Refactor GS authentication to use default credentials (#514) * Refactor authentication to use default credentials for Google Cloud Storage client This enables federated identity Updates HISTORY.md * Keeps same functionality for API whilst enhancing the env var alternative * Simplifies credential handling logic, and updates docstring * Updates HISTORY.md --------- Co-authored-by: ljyanesm <[email protected]> * bonus: test fixes and docs * Mock default auth * fix history --------- Co-authored-by: Luis Yanes <[email protected]> Co-authored-by: ljyanesm <[email protected]>
1 parent b21cac1 commit 1200550

File tree

6 files changed

+231
-43
lines changed

6 files changed

+231
-43
lines changed

CONTRIBUTING.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,149 @@ When the tests finish, if it is using a live server, the test files will be dele
110110

111111
If you want to speed up your testing during development, you may comment out some of the rigs in [`conftest.py`](tests/conftest.py). Don't commit this change, and make sure you run against all the rigs before submitting a PR.
112112

113+
### Test Fixtures and Rigs
114+
115+
The test suite uses a comprehensive set of fixtures and test rigs to ensure consistent behavior across all cloud providers. Here's a detailed overview of the available fixtures and their properties:
116+
117+
#### Core Fixtures
118+
119+
**`assets_dir`** - Path to the test assets directory containing sample files and directories used across all tests.
120+
121+
**`live_server`** - Boolean indicating whether to use live cloud servers (controlled by `USE_LIVE_CLOUD=1` environment variable).
122+
123+
**`wait_for_mkdir`** - Fixture that patches `os.mkdir` to wait for directory creation, useful for tests that are sometimes flaky due to filesystem timing.
124+
125+
#### Cloud Provider Test Rigs
126+
127+
The `CloudProviderTestRig` class is the foundation for all cloud provider testing. Each rig provides:
128+
129+
- **`path_class`**: The CloudPath subclass for the provider (e.g., `S3Path`, `AzureBlobPath`)
130+
- **`client_class`**: The Client subclass for the provider (e.g., `S3Client`, `AzureBlobClient`)
131+
- **`drive`**: The bucket/container name for the provider
132+
- **`test_dir`**: Unique test directory name generated from session UUID, module name, and function name
133+
- **`live_server`**: Whether the rig uses live cloud servers
134+
- **`required_client_kwargs`**: Additional client configuration parameters
135+
- **`cloud_prefix`**: The cloud prefix for the provider (e.g., `s3://`, `az://`)
136+
137+
Each rig provides a `create_cloud_path(path, client=None)` method that constructs cloud paths with the proper prefix and test directory structure.
138+
139+
#### Available Test Rigs
140+
141+
**Azure Rigs:**
142+
- **`azure_rig`**: Tests Azure Blob Storage with mocked or live backend
143+
- **`azure_gen2_rig`**: Tests Azure Data Lake Storage Gen2 with mocked or live backend
144+
- Has `is_adls_gen2 = True` flag for tests that need to skip ADLS Gen2 specific behavior
145+
- Uses `AZURE_STORAGE_CONNECTION_STRING` or `AZURE_STORAGE_GEN2_CONNECTION_STRING` environment variables
146+
147+
**Google Cloud Storage:**
148+
- **`gs_rig`**: Tests Google Cloud Storage with mocked or live backend
149+
- Uses `LIVE_GS_BUCKET` environment variable for live testing
150+
151+
**Amazon S3:**
152+
- **`s3_rig`**: Tests AWS S3 with mocked or live backend
153+
- Uses `LIVE_S3_BUCKET` environment variable for live testing
154+
- **`custom_s3_rig`**: Tests S3-compatible services (MinIO, Ceph, etc.)
155+
- Has `is_custom_s3 = True` flag for tests that need to skip AWS-specific behavior
156+
- Uses `CUSTOM_S3_BUCKET`, `CUSTOM_S3_ENDPOINT`, `CUSTOM_S3_KEY_ID`, `CUSTOM_S3_SECRET_KEY` environment variables
157+
158+
**Local Storage Rigs:**
159+
- **`local_azure_rig`**: Tests Azure Blob Storage using local filesystem simulation
160+
- **`local_gs_rig`**: Tests Google Cloud Storage using local filesystem simulation
161+
- **`local_s3_rig`**: Tests S3 using local filesystem simulation
162+
163+
**HTTP/HTTPS Rigs:**
164+
- **`http_rig`**: Tests HTTP endpoints with local test server
165+
- **`https_rig`**: Tests HTTPS endpoints with local test server and self-signed certificates
166+
- Both use `HttpProviderTestRig` subclass with additional HTTP-specific functionality
167+
168+
#### Fixture Unions
169+
170+
The test suite uses `pytest-cases` fixture unions to run tests against multiple providers:
171+
172+
- **`rig`**: Runs tests against all cloud providers (Azure Blob, Azure ADLS Gen2, GCS, S3, Custom S3, Local Azure, Local S3, Local GCS, HTTP, HTTPS)
173+
- **`azure_rigs`**: Runs tests against both Azure Blob and Azure ADLS Gen2
174+
- **`s3_like_rig`**: Runs tests against AWS S3 and Custom S3 (for S3-compatible services)
175+
- **`http_like_rig`**: Runs tests against HTTP and HTTPS endpoints
176+
177+
#### HTTP Server Fixtures
178+
179+
**`http_server`** and **`https_server`** (from `tests/http_fixtures.py`):
180+
- Start local HTTP/HTTPS test servers with custom request handlers
181+
- Support PUT, DELETE, POST, GET, and HEAD methods for comprehensive testing
182+
- Use self-signed certificates for HTTPS testing
183+
- Automatically clean up server directories after tests
184+
185+
#### Mock Clients
186+
187+
Located in `tests/mock_clients/`, these provide local filesystem-based implementations of cloud SDKs:
188+
189+
- **`mock_azureblob.py`**: Mock Azure Blob Storage client
190+
- **`mock_adls_gen2.py`**: Mock Azure Data Lake Storage Gen2 client
191+
- **`mock_gs.py`**: Mock Google Cloud Storage client
192+
- **`mock_s3.py`**: Mock AWS S3 client
193+
- **`utils.py`**: Utility functions for mock clients (e.g., `delete_empty_parents_up_to_root`)
194+
195+
#### Test Assets
196+
197+
Located in `tests/assets/`, the test assets provide a consistent set of files and directories:
198+
199+
```
200+
tests/assets/
201+
├── dir_0/
202+
│ ├── file0_0.txt
203+
│ ├── file0_1.txt
204+
│ └── file0_2.txt
205+
└── dir_1/
206+
├── file_1_0.txt
207+
└── dir_1_0/
208+
└── file_1_0_0.txt
209+
```
210+
211+
These assets are automatically copied to each test rig's directory and provide a predictable file structure for testing file operations, directory traversal, and other functionality.
212+
213+
#### Utility Fixtures
214+
215+
**`utilities_dir`**: Path to test utilities directory containing SSL certificates for HTTPS testing.
216+
217+
**`_sync_filesystem()`**: Utility function that forces filesystem synchronization to stabilize tests, especially important on Windows where `os.sync()` is not available.
218+
219+
#### Environment Variables for Live Testing
220+
221+
When `USE_LIVE_CLOUD=1` is set, the following environment variables control live cloud testing:
222+
223+
- **Azure**: `AZURE_STORAGE_CONNECTION_STRING`, `AZURE_STORAGE_GEN2_CONNECTION_STRING`, `LIVE_AZURE_CONTAINER`
224+
- **Google Cloud**: `LIVE_GS_BUCKET` (requires Google Cloud credentials)
225+
- **AWS S3**: `LIVE_S3_BUCKET` (requires AWS credentials)
226+
- **Custom S3**: `CUSTOM_S3_BUCKET`, `CUSTOM_S3_ENDPOINT`, `CUSTOM_S3_KEY_ID`, `CUSTOM_S3_SECRET_KEY`
227+
228+
#### Using Test Rigs in Your Tests
229+
230+
When writing tests, use the rig's `create_cloud_path()` method to create cloud paths:
231+
232+
```python
233+
def test_file_operations(rig):
234+
# Create a path to an existing file in the test assets
235+
cp = rig.create_cloud_path("dir_0/file0_0.txt")
236+
237+
# Create a path to a non-existent file
238+
cp2 = rig.create_cloud_path("path/that/does/not/exist.txt")
239+
240+
# Get a client instance
241+
client = rig.client_class()
242+
```
243+
244+
For provider-specific tests, you can check rig properties:
245+
246+
```python
247+
def test_azure_specific_feature(azure_rig):
248+
if azure_rig.is_adls_gen2:
249+
# Skip or test ADLS Gen2 specific behavior
250+
pass
251+
else:
252+
# Test Azure Blob Storage specific behavior
253+
pass
254+
```
255+
113256
### Authoring tests
114257

115258
We want our test suite coverage to be comprehensive, so PRs need to add tests if they add new functionality. If you are adding a new feature, you will need to add tests for it. If you are changing an existing feature, you will need to update the tests to match the new behavior.

HISTORY.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
# cloudpathlib Changelog
22

3+
## UNRELEASED
4+
5+
- Fixed issue with GS credentials, using default auth enables a wider set of authentication methods in GS (Issue [#390](https://github.com/drivendataorg/cloudpathlib/issues/390), PR [#514](https://github.com/drivendataorg/cloudpathlib/pull/514), thanks @ljyanesm)
6+
- Added support for http(s) urls with `HttpClient`, `HttpPath`, `HttpsClient`, and `HttpsPath`. (Issue [#455](https://github.com/drivendataorg/cloudpathlib/issues/455), PR [#468](https://github.com/drivendataorg/cloudpathlib/pull/468))
7+
38
## v0.21.1 (2025-05-14)
49

510
- Fixed `rmtree` fail on Azure with no `hns` and more than 256 blobs to drop (Issue [#509](https://github.com/drivendataorg/cloudpathlib/issues/509), PR [#508](https://github.com/drivendataorg/cloudpathlib/pull/508), thanks @alikefia)
6-
- Added support for http(s) urls with `HttpClient`, `HttpPath`, `HttpsClient`, and `HttpsPath`. (Issue [#455](https://github.com/drivendataorg/cloudpathlib/issues/455 ), PR [#468](https://github.com/drivendataorg/cloudpathlib/pull/468))
711

812
## v0.21.0 (2025-03-03)
913

cloudpathlib/gs/gsclient.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from google.auth.credentials import Credentials
1616
from google.api_core.retry import Retry
1717

18+
from google.auth import default as google_default_auth
1819
from google.auth.exceptions import DefaultCredentialsError
1920
from google.cloud.storage import Client as StorageClient
2021

@@ -51,18 +52,14 @@ def __init__(
5152
):
5253
"""Class constructor. Sets up a [`Storage
5354
Client`](https://googleapis.dev/python/storage/latest/client.html).
54-
Supports the following authentication methods of `Storage Client`.
55+
Supports, in this order, the following authentication methods of `Storage Client`.
5556
56-
- Environment variable `"GOOGLE_APPLICATION_CREDENTIALS"` containing a
57-
path to a JSON credentials file for a Google service account. See
58-
[Authenticating as a Service
59-
Account](https://cloud.google.com/docs/authentication/production).
60-
- File path to a JSON credentials file for a Google service account.
61-
- OAuth2 Credentials object and a project name.
6257
- Instantiated and already authenticated `Storage Client`.
58+
- OAuth2 Credentials object and a project name.
59+
- File path to a JSON credentials file for a Google service account.
60+
- Google Cloud SDK default credentials. See [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials)
6361
64-
If multiple methods are used, priority order is reverse of list above
65-
(later in list takes priority). If no authentication methods are used,
62+
If no authentication methods are used,
6663
then the client will be instantiated as anonymous, which will only have
6764
access to public buckets.
6865
@@ -91,18 +88,24 @@ def __init__(
9188
timeout (Optional[float]): Cloud Storage [timeout value](https://cloud.google.com/python/docs/reference/storage/1.39.0/retry_timeout)
9289
retry (Optional[google.api_core.retry.Retry]): Cloud Storage [retry configuration](https://cloud.google.com/python/docs/reference/storage/1.39.0/retry_timeout#configuring-retries)
9390
"""
94-
if application_credentials is None:
95-
application_credentials = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
96-
91+
# don't check `GOOGLE_APPLICATION_CREDENTIALS` since `google_default_auth` already does that
92+
# use explicit client
9793
if storage_client is not None:
9894
self.client = storage_client
95+
# use explicit credentials
9996
elif credentials is not None:
10097
self.client = StorageClient(credentials=credentials, project=project)
98+
# use explicit credential file
10199
elif application_credentials is not None:
102100
self.client = StorageClient.from_service_account_json(application_credentials)
101+
# use default credentials based on SDK precedence
103102
else:
104103
try:
105-
self.client = StorageClient()
104+
# use `google_default_auth` instead of `StorageClient()` since it
105+
# handles precedence of creds in different locations properly
106+
credentials, default_project = google_default_auth()
107+
project = project or default_project # use explicit project if present
108+
self.client = StorageClient(credentials=credentials, project=project)
106109
except DefaultCredentialsError:
107110
self.client = StorageClient.create_anonymous_client()
108111

0 commit comments

Comments
 (0)