Skip to content

Commit 214fd12

Browse files
authored
Merge pull request #3851 from segmentio/reverse-etl
Reverse ETL
2 parents 9ecfaca + 488e019 commit 214fd12

File tree

7 files changed

+264
-6
lines changed

7 files changed

+264
-6
lines changed

src/_data/sidenav/main.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,14 @@ sections:
241241
title: Rate Limits
242242
- path: /connections/regional-segment
243243
title: Regional Segment
244+
- section_title: Reverse ETL
245+
section:
246+
- path: /reverse-etl
247+
title: Reverse ETL Overview
248+
- path: /reverse-etl/bigquery-setup
249+
title: BigQuery Reverse ETL Setup
250+
- path: /reverse-etl/snowflake-setup
251+
title: Snowflake Reverse ETL Setup
244252
- section_title: Profiles
245253
section:
246254
- path: /profiles
@@ -397,7 +405,7 @@ sections:
397405
- path: /privacy/account-deletion
398406
title: Account & Data Deletion
399407
- path: /privacy/hipaa-eligible-segment
400-
title: HIPAA Eligible Segment
408+
title: HIPAA Eligible Segment
401409
- path: /privacy/faq
402410
title: Privacy FAQs
403411
- section_title: Protocols

src/_includes/icons/reverse-etl.svg

Lines changed: 1 addition & 0 deletions
Loading

src/reverse-etl/bigquery-setup.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: BigQuery Reverse ETL Setup
3+
---
4+
5+
Set up BigQuery as your Reverse ETL source. You can also choose to [set up Snowflake](/docs/reverse-etl/snowflake-setup/) as your source.
6+
7+
> warning ""
8+
> You need to be an account admin to set up the Segment BigQuery connector as well as write permissions for the `__segment_reverse_etl` dataset.
9+
10+
To set up the Segment BigQuery connector:
11+
1. Navigate to **IAM & Admin > Service Accounts** in BigQuery.
12+
2. Click **+ Create Service Account** to create a new service account
13+
3. Enter your **Service account name** and a description of what the account will do
14+
4. Click **Create and Continue**.
15+
5. In the **Grant this service account access to project** section, select the *BigQuery User* role to add.
16+
6. Click **+ Add another role** and add the *BigQuery Job User* role.
17+
7. Click **Continue**.
18+
8. Click **Done**.
19+
9. Search for the service account you just created.
20+
10. When your service account pulls up, click the 3 dots under **Actions** and select **Manage keys**.
21+
11. Click **Add Key > Create new key**.
22+
12. In the pop-up window, select **JSON** for the key type and click **Create**. The file will download.
23+
13. Copy all the content within the file you just created and downloaded.
24+
14. Navigate to the Segment UI and paste all the credentials you copied from step 13 into the **Enter your credentials** section.
25+
19. Enter your **Data Location**.
26+
20. Click **Test Connection** to test to see if the connection works. If the connection fails, make sure you have the right permissions and credentials and try again.
27+
6. Click **Create Source** if the test connection is successful.
28+
29+
Once you've added BigQuery as a source, you can [add a model](/docs/reverse-etl/reverse-etl/#step-2-add-a-model).
30+
31+
## Constructing your own role or policy
32+
When you construct your own role or policy, Segment needs the following permissions:
33+
34+
Permission | Details
35+
---------- | --------
36+
`bigquery.datasets.create` | This allows Segment to create/manage a `__segment_reverse_etl` dataset for tracking state between syncs.
37+
`bigquery.datasets.get` | This allows Segment to determine if the aforementioned dataset exists
38+
`bigquery.jobs.create` | This allows Segment to execute queries on any datasets/tables your model query references and manage tables that Segment uses for tracking
39+
40+
The `bigquery.datasets.*` permissions can be scoped only to the `__segment_reverse_etl` dataset. If you don't wish to grant `bigquery.datasets.create` access, you may create this dataset yourself, but Segment still needs `bigquery.datasets.get` access.
1.6 MB
Loading

src/reverse-etl/index.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
title: Reverse ETL
3+
beta: true
4+
---
5+
6+
> info ""
7+
> Reverse ETL is in beta and Segment is actively working on this feature. Segment's [First-Access and Beta terms](https://segment.com/legal/first-access-beta-preview/) govern this feature. If you’d like to learn more, reach out to your CSM, AE, or SE.
8+
9+
Reverse ETL (Extract, Transform, Load) extracts data from a data warehouse and loads it into a 3rd party destination. Reverse ETL allows you to connect your data warehouse to the tools that Marketing, Sales, Support, Product, Analytics, and other business teams use. For example, with Reverse ETL, you can sync rows from Snowflake to Salesforce. Reverse ETL supports event and object data. This includes customer profile data, subscriptions, product tables, shopping cart tables, and more.
10+
11+
As Segment is actively developing this feature, Segment welcomes your feedback on your experience with Reverse ETL. Click the button below to submit your feedback.
12+
13+
{% include components/button-fill.html modifier="expand" text="Submit feedback" href=" https://airtable.com/shriQgvkRpBCDN955" %}
14+
15+
## Example use cases
16+
Use Reverse ETL when you want to:
17+
* Sync lead scores created in the warehouse to Salesforce to customize interactions with prospects and optimize sales opportunities.
18+
* Sync audiences and other data built in the warehouse to Braze, Iterable, Hubspot, or Salesforce Marketing Cloud for personalized marketing campaigns.
19+
* Connect Google Sheets to a view in the warehouse for other business teams to have access to up-to-date reports.
20+
* Sync enriched data to Mixpanel for a more complete view.
21+
* Send data in the warehouse back into Segment as events that can be activated in all supported destinations, including Twilio Engage and other platforms.
22+
* Pass offline or enriched data to conversion APIs like Facebook, Google Ads, TikTok, or Snapchat.
23+
24+
## Getting started
25+
There are four components to Reverse ETL: Sources, Models, Destinations, and Mappings.
26+
27+
![Reverse ETL overview image](images/RETL_Doc_Illustration.png)
28+
29+
Follow these 4 steps to set up Reverse ETL and learn what each component is about:
30+
1. [Add a Source](#step-1-add-a-source)
31+
2. [Add a Model](#step-2-add-a-model)
32+
3. [Add a Destination](#step-3-add-a-destination)
33+
4. [Create Mappings](#step-4-create-mappings)
34+
35+
### Step 1: Add a Source
36+
A Source is where your data originates from. Traditionally in Segment, a [Source](/docs/connections/sources/#what-is-a-source) is a website, server library, mobile SDK, or cloud application which can send data into Segment. In Reverse ETL, your data warehouse is the Source.
37+
38+
> info ""
39+
> Reverse ETL supports BigQuery and Snowflake as sources and Segment is actively working on adding more. If you'd like to request Segment to add a particular source, please note it on the [feedback form](https://airtable.com/shriQgvkRpBCDN955){:target="_blank"}.
40+
41+
To add your warehouse as a source:
42+
43+
> warning ""
44+
> You need to be a user that has both read and write access to the warehouse.
45+
46+
1. Navigate to **Reverse ETL** in the Segment app.
47+
2. Click **Add Source**.
48+
3. Select the source you want to add. You can choose between **BigQuery** and **Snowflake**.
49+
* If you choose to use Snowflake, run the queries listed in the [Snowflake Reverse ETL setup guide](/docs/reverse-etl/snowflake-setup/) to set up the Segment Snowflake connector. Segment recommends using the `ACCOUNTADMIN` role to execute all the commands.
50+
* If you choose to use BigQuery, use the permissions outlined in the [BigQuery Reverse ETL setup guide](/docs/reverse-etl/bigquery-setup/), to create a Service Account and generate JSON credentials that will then be copied into the Segment UI when creating a Reverse ETL Source.
51+
4. Add the account information for your source.
52+
* For Snowflake users: Learn more about the Snowflake Account ID [here](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html){:target="_blank"}.
53+
5. Click **Test Connection** to test to see if the connection works.
54+
6. Click **Create Source** if the test connection is successful.
55+
56+
After you add your data warehouse as a source, you can [add a model](#step-2-add-a-model) to your source.
57+
58+
### Step 2: Add a Model
59+
Models are SQL queries that define sets of data you want to synchronize to your Reverse ETL destinations. After you add your source, you can add a model.
60+
61+
To add your first model:
62+
1. Navigate to **Reverse ETL > Sources**. Select your source and click **Add Model**.
63+
2. Click **SQL Editor** as your modeling method. (Segment will add more modeling methods in the future.)
64+
3. Enter the SQL query that’ll define your model. Your model is used to map data to your Reverse ETL destinations.
65+
4. Choose a column to use as the unique identifier for each row in the **Unique Identifier column** field.
66+
* The Unique Identifier should be a column with unique values per row to ensure checkpointing works as expected. It can potentially be a primary key. This column is used to detect new, updated, and deleted rows.
67+
5. Click **Preview** to see a preview of the results of your SQL query. The data from the preview is extracted from the first 10 rows of your warehouse.
68+
6. Click **Next**.
69+
7. Enter your **Model Name**.
70+
8. Select the Schedule type for the times you want the model’s data to be extracted from your warehouse. You can choose from:
71+
* **Interval**: Extractions perform based on a selected time cycle.
72+
* **Day and time**: Extractions perform at specific times on selected days of the week.
73+
9. Select how often you want the schedule to sync in **Schedule configuration**.
74+
* For an **Interval** schedule type, you can choose from: 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 8 hours, 12 hours, 1 day.
75+
* 15 minutes is considered real-time for warehouse syncs
76+
* For a **Day and time** schedule type, you can choose the day(s) you’d like the schedule to sync as well as the time. You can only choose to sync the extraction at the top of the hour.
77+
10. Click **Create Model**.
78+
79+
To add multiple models to your source, repeat steps 1-10 above.
80+
81+
### Step 3: Add a Destination
82+
Once you’ve added a model, you need to add a destination. In Reverse ETL, destinations are the business tools or apps you use that Segment syncs the data from your warehouse to.
83+
84+
> info ""
85+
> Depending on the destination, you may need to know certain endpoints and have specific credentials to configure the destination.
86+
>
87+
> If you'd like to request Segment to add a particular destination, please note it on the [feedback form](https://airtable.com/shriQgvkRpBCDN955){:target="_blank"}.
88+
89+
To add your first destination:
90+
1. Navigate to **Reverse ETL > Destinations**.
91+
2. Click **Add Destination**.
92+
3. Select the destination you want to connect to.
93+
4. Select the source you want to connect the destination to.
94+
5. Enter the **Destination name** and click **Create Destination**.
95+
6. Enter the required information on the **Settings** tab of the destination.
96+
97+
### Step 4: Create Mappings
98+
After you’ve added a destination, you can create mappings from your warehouse to the destination. Mappings enable you to map the data you extract from your warehouse to the fields in your destination.
99+
100+
To create a mapping:
101+
1. Go to the **Mappings** tab of the destination and click **Add Mapping**.
102+
2. Select the model to sync from.
103+
3. Select the **Action** you want to sync and click **Next**.
104+
* Actions determine the information sent to the destination. The list of Actions will be unique to each destination.
105+
4. In the **Select record to map and send** section, select which records to send to your destination after Segment completes extracting data based on your model. You can choose from:
106+
* Added records
107+
* Updated records
108+
* Added or updated records
109+
* Deleted records
110+
5. Select a test record to preview the fields that you can map to your destination in the **Add test record** field.
111+
6. Define how to map the record columns from your model to your destination in the **Select Mappings** section.
112+
* You map the fields that come from your source, to fields that the destination expects to find. Fields on the destination side depend on the type of action selected.
113+
7. Click **Create Mapping**.
114+
8. Select the destination you’d like to enable on the **My Destinations** page under **Reverse ETL > Destinations**.
115+
9. Turn the toggle on for the **Mapping State** to enable the destination. Events that match the trigger condition in the mapping will be sent to the destination.
116+
* If you disable the mapping state to the destination, events that match the trigger condition in the mapping won’t be sent to the destination.
117+
118+
To add multiple mappings from your warehouse to your destination, repeat steps 1-9 above.
119+
120+
## Using Reverse ETL
121+
After you've followed [all four steps](/docs/reverse-etl/reverse-etl/#getting-started) and set up your source, model, destination, and mappings for Reverse ETL, your data will extract and sync to your destination(s) right away if you chose an interval schedule. If you set your data to extract at a specific day and time, the extraction will take place then.
122+
123+
### Runs status and observability
124+
You can check the status of your data extractions and see details of your syncs. You can click into failed records to view additional details on the error, sample payloads to help you debug the issue, and recommended actions.
125+
126+
To check the status of your extractions:
127+
1. Navigate to **Reverse ETL > Destinations**.
128+
2. Select the destination you want to view.
129+
3. Select the mapping you want to view.
130+
4. Click the sync you want to view to get details of the sync. You can view:
131+
* The status of the sync
132+
* How long it took for the sync to complete
133+
* The load results - how many successful records were synced as well as how many records were updated, deleted, or are new.
134+
135+
136+
### Edit your model
137+
138+
To edit your model:
139+
1. Navigate to **Reverse ETL > Sources**.
140+
2. Select the source with the model you want to edit.
141+
3. On the overview tab, click **Edit** to edit your query.
142+
4. Click the **Settings** tab to edit the model name or change the schedule settings.
143+
144+
### Edit your mapping
145+
146+
To edit your mapping:
147+
1. Navigate to **Reverse ETL > Destinations**.
148+
2. Select the destination with the mapping you want to edit.
149+
3. Select the **...** three dots and click **Edit mapping**. If you want to delete your mapping, select **Delete**.

src/reverse-etl/snowflake-setup.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Snowflake Reverse ETL Setup
3+
beta: true
4+
---
5+
6+
Set up Snowflake as your Reverse ETL source. You can also choose to [set up BigQuery](/docs/reverse-etl/bigquery-setup/) as your source.
7+
8+
At a high level, when you set up Snowflake for Reverse ETL, the configured user/role needs read permissions for any resources (databases, schemas, tables) the query needs to access. Segment keeps track of changes to your query results with a managed schema (`__SEGMENT_REVERSE_ETL`), which requires the configured user to allow write permissions for that schema.
9+
10+
## Set up guide
11+
Follow the instructions below to set up the Segment Snowflake connector. Segment recommends you use the `ACCOUNTADMIN` role to execute all the commands below.
12+
13+
1. Log in to your Snowflake account.
14+
2. Navigate to *Worksheets*.
15+
3. Enter and run the code below to create a database.
16+
Segment uses the database specified in your connection settings to create a schema called `__segment_reverse_etl` to avoid collision with your data. The schema is used for tracking changes to your model query results between syncs.
17+
An existing database can be reused, if desired. Segment recommends you to use the same database across all your models attached to this source to keep all the state tracking tables in 1 place.
18+
19+
```sql
20+
-- not required if another database is being reused
21+
CREATE DATABASE segment_reverse_etl;
22+
```
23+
4. Enter and run the code below to create a virtual warehouse.
24+
Segment Reverse ETL needs to execute queries on your Snowflake account, which requires a Virtual Warehouse to handle the compute. You can also reuse an existing warehouse.
25+
26+
```sql
27+
-- not required if reusing another warehouse
28+
CREATE WAREHOUSE segment_reverse_etl
29+
WITH WAREHOUSE_SIZE = 'XSMALL'
30+
WAREHOUSE_TYPE = 'STANDARD'
31+
AUTO_SUSPEND = 600 -- 5 minutes
32+
AUTO_RESUME = TRUE;
33+
```
34+
5. Enter and run the code below to create specific roles for Reverse ETL.
35+
All Snowflake access is specified through roles, which are then assigned to the user you’ll create later.
36+
37+
```sql
38+
-- create role
39+
CREATE ROLE segment_reverse_etl;
40+
41+
-- warehouse access
42+
GRANT USAGE ON WAREHOUSE segment_reverse_etl TO ROLE segment_reverse_etl;
43+
44+
-- database access
45+
GRANT USAGE ON DATABASE segment_reverse_etl TO ROLE segment_reverse_etl;
46+
GRANT CREATE SCHEMA ON DATABASE segment_reverse_etl TO ROLE segment_reverse_etl;
47+
```
48+
6. Enter and run the code below to create the username and password combination that will be used to execute queries. Make sure to enter your password where it says `my_strong_password`.
49+
50+
```sql
51+
-- create user
52+
CREATE USER segment_reverse_etl_user
53+
MUST_CHANGE_PASSWORD = FALSE
54+
DEFAULT_ROLE = segment_reverse_etl
55+
PASSWORD = 'my_strong_password'; -- Do not use this password
56+
57+
-- role access
58+
GRANT ROLE segment_reverse_etl TO USER segment_reverse_etl_user;
59+
```
60+
7. Follow the steps listed in the [Add a Source](/docs/reverse-etl/reverse-etl/#step-1-add-a-source) section to finish adding Snowflake as a source.

src/segment-app/iam/roles.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,17 +53,17 @@ The following roles are only available to Segment Business Tier accounts.
5353
#### Tracking Plan Admin
5454
* Edit access to all Tracking Plans in Protocols.
5555
* **Scope:** Grants access to *all* Tracking Plans.
56-
56+
5757
#### Tracking Plan Read-only
5858
* Read access to all Tracking Plans in Protocols.
5959
* **Scope:** Grants access to *all* Tracking Plans.
6060

61-
#### Warehouse Admin
62-
* Edit access to all warehouses and warehouse settings.
61+
#### Warehouse Destination Admin
62+
* Edit access to warehouse destinations and warehouse destination settings. *(For example, Redshift, Postgres, BigQuery)*
6363
* **Scope:** Grants access to *all* warehouses.
6464

65-
#### Warehouse Read-only
66-
* Read access to all warehouses and warehouse settings.
65+
#### Warehouse Destination Read-only
66+
* Read-only access warehouse destination and warehouse destination settings. *(For example, Redshift, Postgres, BigQuery)*
6767
* **Scope:** Grants access to *all* warehouses.
6868

6969

0 commit comments

Comments
 (0)