This dbt package:
- Adapted by SageData team from the Snowplow Fractribution v0.2.0 to run on Redshift (see changelog for details)
- Uses page view and conversion events to perform Attribution Modelling on your Snowplow data.
- Removed the need to use Python script or Docker image to create the final output table.
- This is a BASE package, that will create base tables. Please see SageData-snowplow-attribution package that will create follow-up tables specific to your data needs and desired attribution model.
Please refer to the doc site for a full breakdown of the original package. Please see the GitRepo SageData-snowplow-attribution package that will create subsequent table and views.
This package supports Redshift.
The snowplow-fractribution v0.2.0 package (original) currently supports Snowflake, Databricks and BigQuery.
Warehouse | dbt versions | snowplow-fractribution version |
---|---|---|
Snowflake, BigQuery, Databricks | >=1.3.0 to <2.0.0 | 0.2.0 |
Snowflake | >=1.0.0 to <2.0.0 | 0.1.0 |
- A dataset of web events from the Snowplow Javascript Tracker and familiarity with the snowplow-web dbt package
- The
snowplow_web_page_views
table generated by the snowplow-web package available in your warehouse - A
conversions
dataset available in your warehouse (a dataset containing revenue data by users). - A
channel spend
dataset available in your warehouse (a dataset containing total spend by channel)
Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages.
Please see below or refer to the doc site for details on how to configure and run the package and the python script that creates the final output table.
The package contains multiple models that are used by the Python script for the final attribution calculation:
Model | Description |
---|---|
snowplow_fractribution_channel_counts | A count of sessions per channel, campaign, source and medium |
snowplow_fractribution_channel_spend | The amount spent on advertising for each channel |
snowplow_fractribution_conversions_by_customer_id | Each conversion and associated revenue per customer_id |
snowplow_fractribution_path_summary | For each unique path, a summary of associated conversions, non conversions and revenue |
snowplow_fractribution_paths_to_conversion | Customer id and the the paths the customer has followed that have lead to conversion |
snowplow_fractribution_paths_to_non_conversion | Customer id and the the paths the customer has followed that have not lead to conversion |
snowplow_fractribution_sessions_by_customer_id | Channels per session by customer id |
- Fractribution is dependent on the snowplow_web_page_views model created by the snowplow_web dbt package. Run snowplow_web if you do not have data in the snowplow_web_page_views table for the period of time you will run fractribution for.
- Configure the
conversion_clause
macro to filter your raw Snowplow events to successful conversion events. - Configure the
conversion_value
macro to return the value of the conversion event. - Configure the default
channel_classification
macro to yield your expected channels. The ROAS calculations / attribution calculations will run against these channel definitions. - Configure the channel_spend macro to query your own spend data if you do not wish to use the default values.
- Overwrite default variables provided by the package in your dbt_project.yml file, if necessary. E.g.: make sure your
snowplow__page_views_source
andsnowplow__conversions_source
are aligned to what is available in your warehouse, and updatesnowplow__conversion_window_start_date
andsnowplow__conversion_window_end_date
if you don't want the default of the last 30 days.
Running (for detailed instructions check out our Quick Start guide)
- Ensure the setup steps have been completed above.
- Run
dbt run
, ordbt run --select snowplow_fractribution
.
There are some changes from Google's Fractribution code that have been noted below.
- Temporary UDFs have been converted to persistent / permanent UDFs
- Some temporary tables converted to permanent tables
- Users without a user_id are treated as 'anonymous' ('f') users and the domain_userid is used to identify these sessions
- Users with a user_id are treated as identified ('u') users
- Templating is now run almost entirely within dbt rather than the custom SQL / Jinja templating in the original Fractribution project
- Channel changes and contributions within a session can be considered using the
snowplow__consider_intrasession_channels
variable.
In Google Analytics (Universal Analytics) a new session is started if a campaign source changes (referrer of campaign tagged URL) which is used in Fractribution. Snowplow utilises activity based sessionisation rather than campaign based sessionisation. Setting snowplow__consider_intrasession_channels
to false
will take only the campaign information from the first page view in a given Snowplow session and not give credit to other channels in the converting session if they occur after the initial page view.
You can specify a list of channels for the variable snowplow__channels_to_exclude
to exclude them from analysis (if kept empty all channels are kept). For example, users may want to exclude the 'Direct' channel from the analysis.
The snowplow-fractribution package is Copyright 2022 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.