Skip to content

Order date handling is wrong causing order numbers to be non-deterministic #105

@escholer-pyx

Description

@escholer-pyx

The raw order data provides the date and time of every order, but the staging model truncates the time portion using {{ dbt.date_trunc('day','ordered_at') }}. This removes the time of day the order was placed, which is very useful information for downstream analysis. It also causes the orders model to be non-deterministic because it calculates the customers order number as:

row_number() over (
            partition by customer_id
            order by ordered_at asc
        ) as customer_order_number

but if a customer places two orders on the same day, they will have the same ordered_at value (after its truncated to just the date) and the sort order -and therefore the customer_order_number - for customers with multiple orders on the same day are non-deterministic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions