This dbt package:
- Uses data from modified snowplow-fractribution-redshift package to do First touch, Last touch attribution, as well as define all intermediary touchpoints so you can apply whatever attribution model you like.
- This package also uses base tables created by snowplow_web
Please refer to the SageData doc site for SageData documentation. Please refer to the Snowplow doc site for a Snowplow related documentation and full breakdown of the fractribution package.
The snowplow-attribution-redshift package currently supports only Redshift.
- A dataset of web events from the Snowplow Javascript Tracker and familiarity with the snowplow-web dbt package
- The
snowplow_web_page_views
table generated by the snowplow-web package available in your warehouse - The
snowplow_web_sessions
table generated by the snowplow-web package available in your warehouse - The
snowplow_web_users
table generated by the snowplow-web package available in your warehouse - The
snowplow_web_user_mapping
table generated by the snowplow-web package available in your warehouse - A
conversions
dataset available in your warehouse (a dataset containing revenue data by users). - A
channel spend
dataset available in your warehouse (a dataset containing total spend by channel) - A
snowplow_fractribution_conversions_by_customer_id
dataset created by snowplow-fractribution package
Check dbt Hub for the latest installation instructions, or read the dbt docs for more information on installing packages.
Please see below or refer to the doc site for details on how to configure and run the original Snowplow package.
The package contains multiple models that are created from the base snowplow-fractribution-redshift script
Model | Description |
---|---|
sagedata_conversions_w_touchpoints | Table containing all of the conversions with all the First and Last touchpoints that have led to those conversions |
sagedata_channel_spend_day | The amount spent on advertising for each channel by day |
sagedata_page_views_to_conversion | Table that shows all the pageviews that have happened prior to the conversionwithin the conversion window |
snowplow_web_page_views_w_channel | A view on top of snowplow_web_page_views, but with added Channel Classification from macro channel_classification |
snowplow_web_sessions_w_channel | A view on top of snowplow_web_sessions, but with added Channel Classification from macro channel_classification |
snowplow_web_users_w_channel | A view on top of snowplow_web_users, but with added Channel Classification from macro channel_classification |
- Fractribution is dependent on the snowplow_web_page_views model created by the snowplow_web dbt package. Run snowplow_web if you do not have data in the snowplow_web_page_views table for the period of time you will run fractribution for.
- Configure the
conversion_clause
macro to filter your raw Snowplow events to successful conversion events. - Configure the
conversion_value
macro to return the value of the conversion event. - Configure the default
channel_classification
macro to yield your expected channels. The ROAS calculations / attribution calculations will run against these channel definitions. - Configure the channel_spend macro to query your own spend data if you do not wish to use the default values.
- Overwrite default variables provided by the package in your dbt_project.yml file, if necessary. E.g.: make sure your
snowplow__page_views_source
andsnowplow__conversions_source
are aligned to what is available in your warehouse, and updatesnowplow__conversion_window_start_date
andsnowplow__conversion_window_end_date
if you don't want the default of the last 30 days.
Running (for detailed instructions check out our Quick Start guide)
- Ensure the setup steps have been completed above.
- Run
dbt run
, ordbt run --select sagedata_snowplow_attribution
.
There are some changes from Google's Fractribution code that have been noted below.
- Temporary UDFs have been converted to persistent / permanent UDFs
- Some temporary tables converted to permanent tables
- Users without a user_id are treated as 'anonymous' ('f') users and the domain_userid is used to identify these sessions
- Users with a user_id are treated as identified ('u') users
- Templating is now run almost entirely within dbt rather than the custom SQL / Jinja templating in the original Fractribution project
- Channel changes and contributions within a session can be considered using the
snowplow__consider_intrasession_channels
variable.
In Google Analytics (Universal Analytics) a new session is started if a campaign source changes (referrer of campaign tagged URL) which is used in Fractribution. Snowplow utilises activity based sessionisation rather than campaign based sessionisation. Setting snowplow__consider_intrasession_channels
to false
will take only the campaign information from the first page view in a given Snowplow session and not give credit to other channels in the converting session if they occur after the initial page view.
You can specify a list of channels for the variable snowplow__channels_to_exclude
to exclude them from analysis (if kept empty all channels are kept). For example, users may want to exclude the 'Direct' channel from the analysis.
The snowplow-fractribution package is Copyright 2022 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.