-
-
Notifications
You must be signed in to change notification settings - Fork 625
Open
Labels
Description
Generating hashes for empty files will always return None, which is not documented and different from the usual hashing algorithms as well as contradicting the SPDX standard.
Example:
from commoncode.hash import sha1
from hashlib import sha1 as sha1_hashlib
from tempfile import NamedTemporaryFile
with NamedTemporaryFile() as temporary_file:
temporary_file.write(b'')
temporary_file.seek(0)
print(sha1(location=temporary_file.name))
print(sha1_hashlib(string=temporary_file.read(), used_for_security=False).hexdigest())The reason seems to be that https://github.com/aboutcode-org/commoncode/blob/878be6140deac30e2b95fb0fad9eb8feca015fc8/src/commoncode/hash.py#L38 does not use msg is not None, but basically bool(msg), which is False for empty inputs as well.
Replacing the line with
self.h = msg is not None and hmodule(msg).digest()[:self.digest_size] or None(as well as replacing the same pattern in sha1_git_hasher) seems to fix this issue.