-
-
Notifications
You must be signed in to change notification settings - Fork 255
Description
Describe the bug
The deepdiff library is unable to detect timezone changes in datetimes within arrays.
When ignore_order=True and a datetime is present within an array, deepdiff uses an execution path that makes a DeepHash. Due to the implementation of datetime_normalize, all datetimes have their timezone set to UTC:
Line 627 in c718369
| obj = obj.replace(tzinfo=datetime.timezone.utc) |
This behavior causes issues in our use case, as we rely on deepdiff to detect changes in datetime objects, including changes to their timezones.
To Reproduce
Here’s a minimal example demonstrating the issue. The data structure includes datetimes within an array. When the timezone of the datetimes is changed, deepdiff` does not recognize this change:
class Simple:
def __init__(self, date: datetime):
self.dates: List[datetime] = [date]
@staticmethod
def construct(add_timezone: bool = False) -> "Simple":
if add_timezone:
date = datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)
else:
date = datetime(2020, 8, 31, 13, 14, 1)
return Simple(date=date)
def test_simple():
old = Simple.construct()
new = Simple.construct(add_timezone=True)
diff = DeepDiff(old, new, ignore_order=True)
assert bool(diff) == True # <—— This assertion fails
def test_simple_array():
old = [datetime(2020, 8, 31, 13, 14, 1)]
new = [datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)]
diff = DeepDiff(old, new, ignore_order=True)
assert bool(diff) == True # <—— This assertion fails
Execution path
- diff.py (725): _diff_iterable -> _diff_iterable_with_deephash
- diff.py (1269): _diff_iterable_with_deephash -> _create_hashtable
- diff.py (1085): _create_hashtable -> DeepHash
- diff.py (1085): _create_hashtable -> DeepHash
- deephash.py (216): DeepHash -> (DeepHash)self._hash
- deephash.py (535): (DeepHash)self._hash -> (DeepHash)self._prep_datetime
- deephash.py (478): (DeepHash)self._prep_datetime -> datetime_normalize
- helper.py (627): datetime_normalize
Expected behavior
I expect deepdiff to detect changes to datetime objects, including differences in their timezones, and report the data structures as different.
Is there a specific reason why datetime_normalize replaces the timezone with UTC? Would it be reasonable to modify or remove this behavior?
Alternatively, could the logic be adjusted to conditionally apply the timezone replacement only when truncate_datetime is explicitly set? For example, the following indentation could be introduced:
def datetime_normalize(truncate_datetime, obj):
if truncate_datetime:
if truncate_datetime == 'second':
obj = obj.replace(microsecond=0)
elif truncate_datetime == 'minute':
obj = obj.replace(second=0, microsecond=0)
elif truncate_datetime == 'hour':
obj = obj.replace(minute=0, second=0, microsecond=0)
elif truncate_datetime == 'day':
obj = obj.replace(hour=0, minute=0, second=0, microsecond=0)
if isinstance(obj, datetime.datetime):
obj = obj.replace(tzinfo=datetime.timezone.utc)
elif isinstance(obj, datetime.time):
obj = time_to_seconds(obj)
return obj
This approach ensures that timezone replacement occurs only when truncate_datetime is explicitly set.
A solution with the proposed approach mentioned above has been submitted in PR #517.
OS, DeepDiff version and Python version (please complete the following information):
- Python Version 3.10
- DeepDiff Version 8.0.1 (Note: logic is present on main branch)