Skip to content

comparison dataframes in assertions.pyΒ #65

@t3rn0

Description

@t3rn0

DISCLAIMER
I came here without deep knowledge of data-detective. Following question is purely based on the contents of the assertions.py file
(https://github.com/Tinkoff/data-detective/blob/master/data-detective-airflow/data_detective_airflow/test_utilities/assertions.py)

Function assert_frame_equal compares two dataframes. How? By computing two scalar values with to_bytes and comparing them. This method has some limitations:

a = pd.DataFrame([1, 1, 2, 4])  
b = pd.DataFrame([3, 2, 4, 3])  
c = pd.DataFrame([2, 4])  
d = pd.DataFrame([1, [1, 2], 4])  
e = pd.DataFrame([{"a": 2}, {"a": 4}])  
     
assert to_bytes(a) == to_bytes(b) == to_bytes(c) == to_bytes(d) == to_bytes(e)  # True

Probably you compare dataframes of the same size or the same type. So there's no chance you'll have dataframes C, D, E in test simultaneously. But having dataframes A and B equal makes one wondering.
Have you considered such cases?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions