Skip to content

warntimestamp in convert too expensive #559

@visr

Description

@visr

#172 added a warning. I think it is too expensive, and it may be better to just document the behavior instead.

I am reading a 7760500 row table written by pandas, which defaults to nanosecond resolution, and want to convert it to DateTime, I don't need sub-ms precision. The conversion worked out of the box, but took 75 seconds. Profiling showed that almost all time was in warntimestamp generating the log message. Without the log message it takes 0.05 seconds.

This shows it in a benchmark:

using Chairmarks, Arrow, Dates

# alternative to convert that doesn't have warntimestamp
function to_datetime(x::Arrow.Timestamp{U, nothing})::DateTime where {U}
    x_since_epoch = Arrow.periodtype(U)(x.x)
    ms_since_epoch = Dates.toms(x_since_epoch)
    ut_instant = Dates.UTM(ms_since_epoch + Arrow.UNIX_EPOCH_DATETIME)
    return DateTime(ut_instant)
end

const ts = Arrow.Timestamp{Arrow.Flatbuf.TimeUnit.NANOSECOND, nothing}(1764288000000000000)
@b convert(DateTime, ts)  # 6.525 μs (119 allocs: 6.719 KiB)
@b to_datetime(ts)  # 1.332 ns

I now avoid this with convert = false and using the to_datetime function above, but I think more people will run into this performance pitfall.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions