Skip to content

Commit ca9f9ca

Browse files
committed
Initial release commit
1 parent d23d026 commit ca9f9ca

File tree

426 files changed

+11106
-1188
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

426 files changed

+11106
-1188
lines changed

README.md

Lines changed: 155 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,155 @@
1-
# Multi-Cycle, Multi-Touch Revenue and Cost Attribution
1+
# RA Attribution for dbt
2+
3+
This dbt package provides a multi-touch, multi-cycle marketing attribution model that helps marketers better understand the contribution each online marketing channel makes to order revenue, and the cost and return on investment from ad channel spend that led to those conversions.
4+
5+
## Supported Data Sources and Warehouse Target
6+
7+
The package assumes that orders and user registrations along with customer LTV measures and currency FX rates come from tables replicated from a customer application database, and marketing touchpoints are sourced from Snowplow. Ad Spend data comes via Fivetran from Google Ads, Facebook Ads and Snapchat Ads, as shown in the high-level data flow diagram below.
8+
9+
![](img/image-20220213-224936.png)
10+
11+
Whilst as much use as possible has been made of dbt\_utils cross-database SQL functions, the target data warehouse platform is assumed to be Snowflake and this package has not yet been tested on BigQuery or other dbt-supported warehouse platforms.
12+
13+
Credit is also due to Fivetran for their community-released Google Ads, Facebook Ads and Snapchat Ads dbt modules from which functionality and code has been incorporated into this package.
14+
15+
### Dependencies
16+
17+
* dbt core 1.0.1 or higher
18+
19+
* dbt\_utils 0.8.0 or higher
20+
21+
* fivetran\_utils 0.3.2 or higher
22+
23+
* Snowflake data warehouse
24+
25+
* Fivetran for Google Ads, Facebook Ads and Snapchat Ads API replication
26+
27+
* Snowplow
28+
29+
* Orders, Order Lines, User, Currency Rates and Customer LTV table extracts from your custom app database
30+
31+
32+
### How to Run this Package
33+
34+
1. Configure Fivetran to replicate your Facebook Ads, Google Ads and/or Snapchat Ads data into your Snowflake Data Warehouse
35+
36+
2. Configure Fivetran to replicate your Orders, Order Details, Users, Currency Conversion Rates and Customer Lifetime Value tables also into Snowflake, and map the columns in your incoming tables to the expected columns in the STG\_CUSTOM\_EVENTS\_ORDER\_EVENTS, STG\_CUSTOM\_EVENTS\_REGISTRATION\_EVENTS, STG\_CUSTOM\_LTV\_CUSTOMER\_LTV and STG\_CURRENCY\_RATES dbt models
37+
38+
1. Sample data for these models is included as seed files in the package, and the default “true” setting for the `attribution_demo_mode` configuration variable will use these seed file values by default when you run the package
39+
40+
3. Set the configuration variables in `dbt_project.yml` to point to your Snowflake database and schemas
41+
42+
4. Run the package using `dbt build`.
43+
44+
45+
### DAG Lineage Graphs
46+
47+
#### Overall Package Running in Demo Mode
48+
49+
![](img/image-20220214-001921.png)
50+
51+
#### Integration and Warehouse Layers
52+
53+
![](img/image-20220214-002146.png)
54+
55+
## Supported Attribution Models
56+
57+
| | |
58+
| --- | --- |
59+
| **Model Name** | **Description** |
60+
| First Click | Attributes 100% of each first order, subsequent order and account opening to the first marketing or non-marketing touchpoint over a 30-day (default) look-back window |
61+
| Last Click | Attributes 100% of each first order, subsequent order and account opening to the last marketing or non-marketing touchpoint over a 30-day (default) look-back window |
62+
| Last Non-Direct Click | Attributes 100% of each first order, subsequent order and account opening to the last marketing touchpoint over a 30-day (default) look-back window |
63+
| Last Paid Click | Attributes 100% of each first order, subsequent order and account opening to the last paid marketing touchpoint over a 30-day (default) look-back window |
64+
| Even Click | Attributes evenly a share of each first order, subsequent order and account opening to each touchpoint  over a 30-day (default) look-back window |
65+
| Time-Decay | Attributes a percentage of the credit to all the channels on the conversion path for a time-decay period: the amount of credit for each channel is less (decaying) the further back in time the channel was interacted (0.5, 0.25, 0.125 etc) shared across all touchpoints for the day, over a 30-day (default) look-back window and 7-day (default) time-decay look-back window |
66+
| | |
67+
68+
## Conversion Measures and Currencies
69+
70+
The attribution model within this package is a multi-cycle, multi-touch revenue attribution model that attributes
71+
72+
* new account openings,
73+
74+
* count and local/global currency value of first and repeat orders
75+
76+
* customer LTV value (30, 60, 90, 180 and 365 days spend since first order) on first order conversion
77+
78+
79+
## Account Opening, First and Repeat Order Conversion Cycles
80+
81+
Each conversion has its own conversion cycle with the assumption that account openings and first orders occur once at most for each customer, and repeat orders occur zero or more times.
82+
83+
![](img/image-20220208-212618.png)
84+
85+
## Package Configuration Variables
86+
87+
All configuration variables are contained with the `dbt_project.yml` dbt configuration file, along with configuration options for the Fivetran Google Ads, Facebook Ads and Snapchat Ads included modules.
88+
89+
| | | | |
90+
| --- | --- | --- | --- |
91+
| Category | Variable | Defaults | Purpose |
92+
| Data Sources | stg\_custom\_events\_schema | CUSTOM\_DB\_EXTRACT | Custom event table schema |
93+
| Data Sources | stg\_custom\_events\_database | RA\_DATA\_WAREHOUSE\_DEV | Custom event table database |
94+
| Data Sources | stg\_custom\_ltv\_schema | CUSTOM\_DB\_EXTRACT | Custom ltv table schema |
95+
| Data Sources | stg\_custom\_ltv\_database | RA\_DATA\_WAREHOUSE\_DEV | Custom ltv table database |
96+
| Data Sources | stg\_custom\_currency\_schema | CUSTOM\_DB\_EXTRACT | Custom Currency FX table schema |
97+
| Data Sources | stg\_custom\_currency\_database | RA\_DATA\_WAREHOUSE\_DEV | Custom Currency FX table database |
98+
| Enabled Sources | attribution\_warehouse\_ad\_campaign\_sources | \['facebook\_ads','google\_ads'\] | sources, from facebook\_ads, google\_ads and snapchat\_ads that are in-scope for ad spend attribution |
99+
| Enabled Sources | attribution\_warehouse\_ad\_group\_sources | \['facebook\_ads','google\_ads'\] | sources, from facebook\_ads, google\_ads and snapchat\_ads that are in-scope for ad spend attribution |
100+
| Enabled Sources | attribution\_warehouse\_ad\_sources | \['facebook\_ads','google\_ads'\] | sources, from facebook\_ads, google\_ads and snapchat\_ads that are in-scope for ad spend attribution |
101+
| Enabled Sources | attribution\_warehouse\_click\_id\_sources | \['google\_ads'\] | sources that provide click IDs for matching to Snowplow clicks |
102+
| Enabled Sources | attribution\_warehouse\_currency\_rate\_sources | \['custom\_currency\_rates'\] | source of currency rates |
103+
| Enabled Sources | attribution\_warehouse\_event\_sources | \['custom\_events\_order','custom\_events\_registration','snowplow\_events\_all'\] | event sources for revenue attribution |
104+
| Enabled Sources | attribution\_warehouse\_ltv\_sources | \['custom\_ltv\_customer'\] | ltv sources for ltv measures |
105+
| Model Parameters | attribution\_create\_account\_event\_type | user\_registration | event name for registration events |
106+
| Model Parameters | attribution\_conversion\_event\_type | confirmed\_order | event name for order events |
107+
| Model Parameters | attribution\_global\_currency | USD' | currency code for global amounts |
108+
| Model Parameters | attribution\_lookback\_days\_window | 30 | how far back sessions can go to be eligable for attribution |
109+
| Model Parameters | attribution\_time\_decay\_days\_window | 7 | over how many days do we decay the value of conversions for time-decay model |
110+
| Model Parameters | attribution\_include\_conversion\_session | TRUE | whether the session containing the conversion event is within scope for attribution |
111+
| Model Parameters | attribution\_match\_offline\_conversions\_to\_sessions | TRUE | whether orders and registrations are matched to Snowplow sessions or not |
112+
| Model Parameters | attribution\_max\_session\_hours | 24 | maximum length of a session in hours to be considered for matching purposes |
113+
| Model Parameters | attribution\_demo\_mode | TRUE | set to 'true' to source events |
114+
| Measures | attribution\_models | _see dbt\_project.yml_ | list of model names to be appended to measures |
115+
| Measures | attribution\_input\_measures | _see dbt\_project.yml_ | list of attribution input measures |
116+
| Measures | attribution\_output\_conversion\_measures | _see dbt\_project.yml_ | list of attribution output conversion measures |
117+
| Measures | attribution\_output\_revenue\_measures | _see dbt\_project.yml_ | list of attribution output revenue measures |
118+
119+
## Matching Orders and User Registrations to Snowplow Sessions
120+
121+
In order to use first and repeat orders + user registrations as the conversion events that we then attribute across sessions, we create our own “confirmed order” events from the transactions and “user registration” events from customer records in the custom application database tables extract.
122+
123+
### Matching Orders and User Registration Events to Snowplow Sessions
124+
125+
As these confirmed\_order and user\_registration events will not have Snowplow domain\_session\_ids, we attempt to match these events to existing Snowplow-derived sessions using the following rules:
126+
127+
1. We first aggregate all of the Snowplow events that contain a domain\_session\_id value to give us the starting timestamp, ending timestamp and domain\_session\_id for each Snowplow session
128+
129+
2. We then add up to 30 minutes to the end of each of those Snowplow-derived sessions, up to the timestamp of the next session for the same user
130+
131+
3. Then we attempt to match each order and user registration event to one of the Snowplow-derived sessions, with the additional (up to) 30 minutes allowing for the fact that the user may have visited a checkout page but not completed the checkout until up to 30 minutes later (the generally accepted length of a user session)
132+
133+
1. Each session has its length in hours calculated, and only sessions that are <=24 hours in length are considered for matching orders and accounts openings to; as before, any orders/new accounts not so matched will be assigned their own session
134+
135+
2. Two variables in the dbt\_project.yml file control the matching of orders/new accounts to Snowplow sessions:
136+
137+
`attribution_match_offline_conversions_to_sessions` (default value - true) controls whether new accounts/orders are matched with existing Snowplow sessions; if this variable is set to false then no matching takes place whatsoever
138+
139+
`attribution_max_session_hours` (default value: 24) determines the maximum length in hours that a session can run for in order to be eligible for matching to order or new account events
140+
141+
4. If we find a match, the session\_id and session\_type (“Snowplow”) of the matching Snowplow session is used for those values for the order or user registration event
142+
143+
5. If we don’t find a match, then we use `md5(concat(event_id, user_id)))` for the session\_id for the `confirmed_order` or `user_registration event`, and set the session\_type to "dbt Generated"
144+
145+
146+
## Glossary
147+
148+
| | | |
149+
| --- | --- | --- |
150+
| **Name** | **Example** | **Marketing Channel Data Source** |
151+
| Conversion Session | Session in which one or more conversion events happened, for example a first or repeat order, or a new customer registration | UTM Source, Medium, Campaign etc for the landing page view (first page view in session)<br><br>Note that the attribution model can be configured to include or exclude conversion sessions from the scope of those over which conversions can be attributed.<br><br>This option is set via the `attribution_include_conversion_session` variable in the dbt\_project.yml configuration file, set to true by default i.e. conversion sessions are within scope for attribution of conversions.<br><br>Rationale for providing this option is that conversion sessions |
152+
| Marketing Touchpoint Session | Session in which the landing website page or first mobile app interaction can be attributed to the Direct channel | |
153+
| Account Opening | Conversion event, one only over the lifetime of a user, containing the user’s registration event; may also contain marketing and non-marketing touchpoints, and a first order | UTM Source, Medium, Campaign etc for the landing page view (first page view in session), or none if the event did not happen within 30 minutes of a web or mobile app session |
154+
| First Order Conversion, First Order Revenue | Conversion event, one only over the lifetime of a user, containing the first confirmed order for a user | UTM Source, Medium, Campaign etc for the landing page view (first page view in session), or none if the event did not happen within 30 minutes of a web or mobile app session |
155+
| Repeat Order Conversion, Repeat Order Revenue | Conversion event, for which there may be none, one or more than one over the lifetime of a user, containing one or more confirmed orders for a user that are not the first confirmed order for that user | UTM Source, Medium, Campaign etc for the landing page view (first page view in session), or none if the event did not happen within 30 minutes of a web or mobile app session |

data/currency_rates.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
currency_rate_id,currency_rate_date,base_currency_code,quote_currency_code,currency_rate
1+
CURRENCY_RATE_ID,CURRENCY_RATE_DATE,BASE_CURRENCY_CODE,QUOTE_CURRENCY_CODE,CURRENCY_RATE
22
38701111ca2081f43b98266db55d2300,2021-09-02,GBP,USD,0.78
33
5e1027a5d7fc439b80ce38c2dce6633a,2021-09-03,GBP,USD,0.78
44
689477416cf1749af582d938a9444560,2021-09-04,GBP,USD,0.78

data/customer_ltv_ndays.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
user_id,local_currency,global_currency,ltv_30d_local_currency,ltv_30d_global_currency,ltv_60d_local_currency,ltv_60d_global_currency,ltv_90d_local_currency,ltv_90d_global_currency,ltv_180d_local_currency,ltv_180d_global_currency,ltv_365d_local_currency,ltv_365d_global_currency
1+
USER_ID,LOCAL_CURRENCY,GLOBAL_CURRENCY,LTV_30D_LOCAL_CURRENCY,LTV_30D_GLOBAL_CURRENCY,LTV_60D_LOCAL_CURRENCY,LTV_60D_GLOBAL_CURRENCY,LTV_90D_LOCAL_CURRENCY,LTV_90D_GLOBAL_CURRENCY,LTV_180D_LOCAL_CURRENCY,LTV_180D_GLOBAL_CURRENCY,LTV_365D_LOCAL_CURRENCY,LTV_365D_GLOBAL_CURRENCY
22
1329690,GBP,USD,239.82,239.82,239.82,239.82,355.13,355.13,544.95,544.95,544.95,544.95
33
1329726,GBP,USD,646.45,646.45,646.45,646.45,646.45,646.45,646.45,646.45,646.45,646.45
44
1329696,GBP,USD,128.41,128.41,756.11,756.11,1611.61,1611.61,1611.61,1611.61,1611.61,1611.61

data/events.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
event_id,domain_session_id,user_id,event,event_time,page_title,page_url_path,page_url,marketing_term,marketing_content,marketing_medium,marketing_campaign,marketing_source,marketing_click_id,page_url_query,referer_url_host,user_ipaddress,useragent
1+
EVENT_ID,DOMAIN_SESSION_ID,USER_ID,EVENT,EVENT_TIME,PAGE_TITLE,PAGE_URL_PATH,PAGE_URL,MARKETING_TERM,MARKETING_CONTENT,MARKETING_MEDIUM,MARKETING_CAMPAIGN,MARKETING_SOURCE,MARKETING_CLICK_ID,PAGE_URL_QUERY,REFERER_URL_HOST,USER_IPADDRESS,USERAGENT
22
3428cf5e-560a-4ba8-ac05-a0dcf3699010,f1d7b1d5-ae7c-464c-8e0e-b8807d4fd18a,1259696,se,2021-12-03 07:58:52.383,,,,,,,,,,,,192.168.0.1,
33
e1ab4720-a9ea-442e-bdf5-1bc9dd46fa49,fd9e2529-1a5d-41c5-873a-810616e65322,1259696,pp,2021-09-03 16:36:12.383,,,,,,,,,,,retailer.com,192.168.0.1,
44
f92dad79-0268-4445-8dd8-5d7583c9d1a4,fd9e2529-1a5d-41c5-873a-810616e65322,1259696,pp,2021-09-03 16:37:40.159,,,,,,,,,,,retailer.com,192.168.0.1,

data/orders.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
order_line_id,order_id,user_id,user_created_date,order_date,local_currency,global_currency,revenue_local_currency,revenue_global_currency
1+
ORDER_LINE_ID,ORDER_ID,USER_ID,USER_CREATED_DATE,ORDER_DATE,LOCAL_CURRENCY,GLOBAL_CURRENCY,REVENUE_LOCAL_CURRENCY,REVENUE_GLOBAL_CURRENCY
22
1607746952,2947678,1329682,2021-09-01 04:08:08.739 +0000,2021-09-01 05:13:54.714 +0000,GBP,USD,1.57333333,1.57333333
33
1607746833,2947678,1329682,2021-09-01 04:08:08.739 +0000,2021-09-01 05:13:54.714 +0000,GBP,USD,8.32000000,8.32000000
44
1607746931,2947678,1329682,2021-09-01 04:08:08.739 +0000,2021-09-01 05:13:54.727 +0000,GBP,USD,12.90666667,12.90666667
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
2+
version: 2
3+
4+
jobs:
5+
build:
6+
docker:
7+
- image: cimg/python:3.9.9
8+
- image: circleci/postgres:9.6.5-alpine-ram
9+
10+
steps:
11+
- checkout
12+
13+
- run:
14+
run: setup_creds
15+
command: |
16+
echo $BIGQUERY_SERVICE_ACCOUNT_JSON > ${HOME}/bigquery-service-key.json
17+
18+
- restore_cache:
19+
key: deps1-{{ .Branch }}
20+
21+
- run:
22+
name: "Setup dbt"
23+
command: |
24+
python3 -m venv dbt_venv
25+
. dbt_venv/bin/activate
26+
27+
python -m pip install --upgrade pip setuptools
28+
python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery
29+
30+
mkdir -p ~/.dbt
31+
cp integration_tests/ci/sample.profiles.yml ~/.dbt/profiles.yml
32+
33+
- run:
34+
name: "Run Tests - Postgres"
35+
environment:
36+
POSTGRES_TEST_HOST: localhost
37+
POSTGRES_TEST_USER: root
38+
POSTGRES_TEST_PASS: ''
39+
POSTGRES_TEST_PORT: 5432
40+
POSTGRES_TEST_DBNAME: circle_test
41+
command: |
42+
. dbt_venv/bin/activate
43+
cd integration_tests
44+
dbt --warn-error deps --target postgres
45+
dbt --warn-error run-operation create_source_table --target postgres
46+
dbt --warn-error seed --target postgres --full-refresh
47+
dbt --warn-error test --target postgres
48+
49+
- run:
50+
name: "Run Tests - Redshift"
51+
command: |
52+
. dbt_venv/bin/activate
53+
echo `pwd`
54+
cd integration_tests
55+
dbt --warn-error deps --target redshift
56+
dbt --warn-error run-operation create_source_table --target redshift
57+
dbt --warn-error seed --target redshift --full-refresh
58+
dbt --warn-error test --target redshift
59+
60+
- run:
61+
name: "Run Tests - Snowflake"
62+
command: |
63+
. dbt_venv/bin/activate
64+
echo `pwd`
65+
cd integration_tests
66+
dbt --warn-error deps --target snowflake
67+
dbt --warn-error run-operation create_source_table --target snowflake
68+
dbt --warn-error seed --target snowflake --full-refresh
69+
dbt --warn-error test --target snowflake
70+
71+
- run:
72+
name: "Run Tests - BigQuery"
73+
environment:
74+
BIGQUERY_SERVICE_KEY_PATH: "/home/circleci/bigquery-service-key.json"
75+
76+
command: |
77+
. dbt_venv/bin/activate
78+
echo `pwd`
79+
cd integration_tests
80+
dbt --warn-error deps --target bigquery
81+
dbt --warn-error run-operation create_source_table --target bigquery
82+
dbt --warn-error seed --target bigquery --full-refresh
83+
dbt --warn-error test --target bigquery
84+
85+
86+
- save_cache:
87+
key: deps1-{{ .Branch }}
88+
paths:
89+
- "dbt_venv"
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* @clrcrl

0 commit comments

Comments
 (0)