Skip to content

DeepDiff Fails to Detect Timezone Changes in Arrays with ignore_order=True #516

@3schwartz

Description

@3schwartz

Describe the bug

The deepdiff library is unable to detect timezone changes in datetimes within arrays.

When ignore_order=True and a datetime is present within an array, deepdiff uses an execution path that makes a DeepHash. Due to the implementation of datetime_normalize, all datetimes have their timezone set to UTC:

obj = obj.replace(tzinfo=datetime.timezone.utc)

This behavior causes issues in our use case, as we rely on deepdiff to detect changes in datetime objects, including changes to their timezones.

To Reproduce

Here’s a minimal example demonstrating the issue. The data structure includes datetimes within an array. When the timezone of the datetimes is changed, deepdiff` does not recognize this change:

class Simple:
    def __init__(self, date: datetime):
        self.dates: List[datetime] = [date]

    @staticmethod
    def construct(add_timezone: bool = False) -> "Simple":
        if add_timezone:
            date = datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)
        else:
            date = datetime(2020, 8, 31, 13, 14, 1)
        return Simple(date=date)

def test_simple():
    old = Simple.construct()
    new = Simple.construct(add_timezone=True)

    diff = DeepDiff(old, new, ignore_order=True)

    assert bool(diff) == True # <—— This assertion fails

def test_simple_array():
    old = [datetime(2020, 8, 31, 13, 14, 1)]
    new = [datetime(2020, 8, 31, 13, 14, 1, tzinfo=timezone.utc)]

    diff = DeepDiff(old, new, ignore_order=True)

    assert bool(diff) == True # <—— This assertion fails

Execution path

  • diff.py (725): _diff_iterable -> _diff_iterable_with_deephash
  • diff.py (1269): _diff_iterable_with_deephash -> _create_hashtable
  • diff.py (1085): _create_hashtable -> DeepHash
  • diff.py (1085): _create_hashtable -> DeepHash
  • deephash.py (216): DeepHash -> (DeepHash)self._hash
  • deephash.py (535): (DeepHash)self._hash -> (DeepHash)self._prep_datetime
  • deephash.py (478): (DeepHash)self._prep_datetime -> datetime_normalize
  • helper.py (627): datetime_normalize

Expected behavior

I expect deepdiff to detect changes to datetime objects, including differences in their timezones, and report the data structures as different.

Is there a specific reason why datetime_normalize replaces the timezone with UTC? Would it be reasonable to modify or remove this behavior?

Alternatively, could the logic be adjusted to conditionally apply the timezone replacement only when truncate_datetime is explicitly set? For example, the following indentation could be introduced:

def datetime_normalize(truncate_datetime, obj):
    if truncate_datetime:
        if truncate_datetime == 'second':
            obj = obj.replace(microsecond=0)
        elif truncate_datetime == 'minute':
            obj = obj.replace(second=0, microsecond=0)
        elif truncate_datetime == 'hour':
            obj = obj.replace(minute=0, second=0, microsecond=0)
        elif truncate_datetime == 'day':
            obj = obj.replace(hour=0, minute=0, second=0, microsecond=0)

        if isinstance(obj, datetime.datetime):
            obj = obj.replace(tzinfo=datetime.timezone.utc)
        elif isinstance(obj, datetime.time):
            obj = time_to_seconds(obj)
    return obj

This approach ensures that timezone replacement occurs only when truncate_datetime is explicitly set.

A solution with the proposed approach mentioned above has been submitted in PR #517.

OS, DeepDiff version and Python version (please complete the following information):

  • Python Version 3.10
  • DeepDiff Version 8.0.1 (Note: logic is present on main branch)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions