Skip to content

Commit c0d20fa

Browse files
pjbullfafnirZ
andauthored
Add md5 hash property to GSPath (#483) (#490)
* Add md5 hash property to GSPath (#483) * feat: added md5_hash property in GSClient._get_metadata, added md5 property inside GSPath * docs: update supported methods and properties table in root README * chore: running make format * feat: updating code since linter wouldnt let me do it all in one line like the etag property * chore: rename variable to match the variable naming in GSPath.stat() * feat: update LocalGSPath to mimic local/implementations/azure.py for md5 property * feat: updating MockBlob to include md5_hash property * docs: updated history.md * chore: reverting to pre PEP604 syntax * test: added md5 test case, update MockBlob to take its value from an environment variable * chore: applying format and lint * Fix tests for md5 hash --------- Co-authored-by: Jacky Xie <[email protected]>
1 parent 0c8d0c4 commit c0d20fa

File tree

7 files changed

+46
-2
lines changed

7 files changed

+46
-2
lines changed

HISTORY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Unreleased
44

55
- Fixed `CloudPath(...) / other` to correctly attempt to fall back on `other`'s `__rtruediv__` implementation, in order to support classes that explicitly support the `/` with a `CloudPath` instance. Previously, this would always raise a `TypeError` if `other` were not a `str` or `PurePosixPath`. (PR [#479](https://github.com/drivendataorg/cloudpathlib/pull/479))
6+
- Add `md5` property to `GSPath`, updated LocalGSPath to include `md5` property, updated mock_gs.MockBlob to include `md5_hash` property.
67
- Fixed an uncaught exception on Azure Gen2 storage accounts with HNS enabled when used with `DefaultAzureCredential`. (Issue [#486](https://github.com/drivendataorg/cloudpathlib/issues/486))
78

89
## v0.20.0 (2024-10-18)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ Most methods and properties from `pathlib.Path` are supported except for the one
205205
| `bucket` ||||
206206
| `container` ||||
207207
| `key` ||||
208-
| `md5` ||| |
208+
| `md5` ||| |
209209

210210
----
211211

cloudpathlib/gs/gsclient.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ def _get_metadata(self, cloud_path: GSPath) -> Optional[Dict[str, Any]]:
121121
"size": blob.size,
122122
"updated": blob.updated,
123123
"content_type": blob.content_type,
124+
"md5_hash": blob.md5_hash,
124125
}
125126

126127
def _download_file(self, cloud_path: GSPath, local_path: Union[str, os.PathLike]) -> Path:

cloudpathlib/gs/gspath.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import os
22
from pathlib import Path
33
from tempfile import TemporaryDirectory
4-
from typing import TYPE_CHECKING
4+
from typing import TYPE_CHECKING, Optional
55

66
from ..cloudpath import CloudPath, NoStatError, register_path_class
77

@@ -95,3 +95,10 @@ def blob(self) -> str:
9595
@property
9696
def etag(self):
9797
return self.client._get_metadata(self).get("etag")
98+
99+
@property
100+
def md5(self) -> Optional[str]:
101+
meta = self.client._get_metadata(self)
102+
if not meta:
103+
return None
104+
return meta.get("md5_hash", None)

cloudpathlib/local/implementations/gs.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ def blob(self) -> str:
5353
def etag(self):
5454
return self.client._md5(self)
5555

56+
@property
57+
def md5(self) -> str:
58+
return self.client._md5(self)
59+
5660

5761
LocalGSPath.__name__ = "GSPath"
5862

tests/mock_clients/mock_gs.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from datetime import datetime, timedelta
2+
import os
23
from pathlib import Path, PurePosixPath
34
import shutil
45
from tempfile import TemporaryDirectory
@@ -95,6 +96,10 @@ def upload_from_filename(self, filename, content_type=None):
9596
def etag(self):
9697
return "etag"
9798

99+
@property
100+
def md5_hash(self):
101+
return os.environ.get("MOCK_EXPECTED_MD5_HASH", "md5_hash")
102+
98103
@property
99104
def size(self):
100105
path = self.bucket / self.name

tests/test_gs_specific.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,29 @@ def test_as_url(gs_rig):
4949
assert "X-Goog-Date" in query_params
5050
assert "X-Goog-SignedHeaders" in query_params
5151
assert "X-Goog-Signature" in query_params
52+
53+
54+
@pytest.mark.parametrize(
55+
"contents",
56+
[
57+
"hello world",
58+
"another test case",
59+
],
60+
)
61+
def test_md5_property(contents, gs_rig, monkeypatch):
62+
def _calculate_b64_wrapped_md5_hash(contents: str) -> str:
63+
# https://cloud.google.com/storage/docs/json_api/v1/objects
64+
from base64 import b64encode
65+
from hashlib import md5
66+
67+
contents_md5_bytes = md5(contents.encode()).digest()
68+
b64string = b64encode(contents_md5_bytes).decode()
69+
return b64string
70+
71+
# if USE_LIVE_CLOUD this doesnt have any effect
72+
expected_hash = _calculate_b64_wrapped_md5_hash(contents)
73+
monkeypatch.setenv("MOCK_EXPECTED_MD5_HASH", expected_hash)
74+
75+
p: GSPath = gs_rig.create_cloud_path("dir_0/file0_0.txt")
76+
p.write_text(contents)
77+
assert p.md5 == expected_hash

0 commit comments

Comments
 (0)