Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pandas/tests/io/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -674,3 +674,12 @@ def test_pickle_reader(reader):
# GH 22265
with BytesIO() as buffer:
pickle.dump(reader, buffer)


@td.skip_if_no("pyarrow")
def test_pyarrow_read_csv_datetime_dtype():
data = "date,id\n20/12/2025,a\n,b\n31/12/2020,c"
df = pd.read_csv(
StringIO(data), parse_dates=["date"], dayfirst=True, dtype_backend="pyarrow"
)
assert (df["date"].dtype) == "datetime64[s]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you build an expected DataFrame and use tm.assert_frame_equal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a bit of struggle with the dtype casting, Tried two methods:

    # put dtype string[pyarrow] on the Series
    expected = pd.DataFrame(
        {
            "date": pd.Series(
                pd.to_datetime(["20/12/2025", pd.NaT, "31/12/2020"], dayfirst=True),
            ),
            "id": pd.Series(["a", "b", "c"], dtype="string[pyarrow]"),
        },
    )
    
    ###############
    
    # cast dtype using .astype()
    expected["id"] = expected["id"].astype("string[pyarrow]")

Returns error:

E       AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="id") are different
E
E       Attribute "dtype" are different
E       [left]:  StringDtype(storage=pyarrow, na_value=<NA>)
E       [right]: string[pyarrow]

For a band-aid fix, I tried casting string[pyarrow] as well to the same column in the df variable.

@td.skip_if_no("pyarrow")
def test_pyarrow_read_csv_datetime_dtype():
    data = "date,id\n20/12/2025,a\n,b\n31/12/2020,c"
    df = pd.read_csv(
        StringIO(data), parse_dates=["date"], dayfirst=True, dtype_backend="pyarrow"
    )
    expected = pd.DataFrame(
        {
            "date": pd.Series(
                pd.to_datetime(["20/12/2025", pd.NaT, "31/12/2020"], dayfirst=True),
            ),
            "id": pd.Series(["a", "b", "c"], dtype="string[pyarrow]"),
        },
    )
    expected["id"] = expected["id"].astype("string[pyarrow]")
    df["id"] = df["id"].astype("string[pyarrow]")

    assert tm.assert_frame_equal(expected, df)
    assert (df["date"].dtype) == "datetime64[s]"

But for some reason, pytest returns:

>       assert tm.assert_frame_equal(expected, df)
E       AssertionError

Hard to check what's the error exaclty, since the error isn't verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the simplifying the bug report, I don't think we need the string column, only the "date" column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke - Tried this:

@td.skip_if_no("pyarrow")
def test_pyarrow_read_csv_datetime_dtype():
    # GH 59904
    data = '"date"\n"20/12/2025"\n""\n"31/12/2020"'
    result = pd.read_csv(
        StringIO(data), parse_dates=["date"], dayfirst=True, dtype_backend="pyarrow"
    )
    expected_dict = {
        "date": pd.Series(
            pd.to_datetime(["20/12/2025", pd.NaT, "31/12/2020"], dayfirst=True)
        )
    }
    expected = pd.DataFrame(expected_dict)

    assert (result["date"].dtype) == "datetime64[s]"
    assert tm.assert_frame_equal(expected, result)

Still returns assertion error

>       assert tm.assert_frame_equal(expected, result)
E       AssertionError

pandas/tests/io/test_common.py:696: AssertionError

Copy link
Contributor Author

@KevsterAmp KevsterAmp Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally saw the problem lol
tm.assert_frame_equal should be run without assert. That's why it was showing AssertionError 😆

    assert tm.assert_frame_equal(expect, result) # returns AssertionError

    tm.assert_frame_equal(expect, result) # passes

Fixed it now and the test is passing