Skip to content

Commit 970462a

Browse files
authored
Merge pull request #3906 from segmentio/profiles-sync-public-beta
Profiles Sync Public Beta
2 parents e5c8c0c + 1769014 commit 970462a

File tree

5 files changed

+554
-0
lines changed

5 files changed

+554
-0
lines changed

src/_data/sidenav/main.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,25 @@ sections:
281281
title: Settings
282282
- path: /profiles/identity-resolution/ecommerce-example
283283
title: E-Commerce Example
284+
- section_title: Profiles Sync
285+
slug: profiles/profiles-sync
286+
section:
287+
- path: /profiles/profiles-sync
288+
title: Setup
289+
- path: /profiles/profiles-sync/sample-queries
290+
title: Sample Queries
291+
- path: /profiles/profiles-sync/tables
292+
title: Tables & Materialized Views
293+
- path: /profiles/profile-api
294+
title: Profile API
295+
- path: /profiles/profile-api-limits
296+
title: Profile API Limits
297+
- path: /profiles/debugger
298+
title: Profile Debugger
299+
- path: /profiles/profiles-gdpr
300+
title: Profiles and GDPR
301+
- path: /profiles/faqs
302+
title: Profiles FAQs
284303
- path: /profiles/profile-api
285304
title: Profile API
286305
- path: /profiles/profile-api-limits

src/_sass/components/_markdown.scss

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,12 @@
245245
tr {
246246
border: 1px solid color(gray-300);
247247
}
248+
th > code {
249+
color: #696f8c;
250+
font-weight: 600;
251+
font-size: 10px;
252+
background-color: inherit;
253+
}
248254
}
249255

250256
table.settings {

src/profiles/profiles-sync/index.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: Profiles Sync Setup
3+
beta: true
4+
plan: profiles
5+
---
6+
7+
> info "Profiles Sync Beta"
8+
> Profiles Sync is in beta and Segment is actively working on this feature. Segment's [First-Access and Beta terms](https://segment.com/legal/first-access-beta-preview/) govern this feature. To learn more, reach out to your CSM, AE, or SE.
9+
10+
Profiles Sync connects identity-resolved customer profiles to a data warehouse of your choice.
11+
12+
With a continual flow of synced Profiles, teams can enrich and use these data sets as the basis for new audiences and models. Profiles Sync addresses a number of use cases, with applications for machine learning, identity graph monitoring, and attribution analysis. View [Profiles Sync Sample Queries](/docs/profiles/profiles-sync/sample-queries) for an in-depth guide to Profiles Sync applications.
13+
14+
On this page, you’ll learn how to set up Profiles Sync, enable historical backfill, and adjust settings for warehouses that you’ve connected to Profiles Sync.
15+
16+
## Initial Profiles Sync setup
17+
18+
> info "Identity Resolution Setup"
19+
> To use Profiles Sync, you must first set up [Identity Resolution](/docs/profiles/identity-resolution/).
20+
21+
To set up Profiles Sync, you’ll first create a warehouse, then connect the warehouse within the Segment app.
22+
23+
Before you begin, prepare for setup with these tips:
24+
25+
- To connect your warehouse to Segment, you must have read and write permissions with the warehouse Destination you choose.
26+
- During Step 2, you’ll copy credentials between Segment and your warehouse Destination. To streamline setup, open your Segment workspace in one browser tab and open another with your warehouse account.
27+
- Make sure to copy any IP addresses Segment asks you to allowlist in your warehouse Destination.
28+
29+
### Step 1: Create a warehouse
30+
31+
You’ll first choose the Destination warehouse to which Segment will sync Profiles. Profiles Sync supports the Snowflake, Redshift, BigQuery, Azure, and Postgres warehouse Destinations. Your initial setup will depend on the warehouse you choose.
32+
33+
The following table shows the supported Profiles Sync warehouse Destinations and the corresponding required steps for each. Select a warehouse, view its Segment documentation, then carry out the warehouse’s required steps before moving to Step 2 of Profiles Sync setup:
34+
35+
| Warehouse Destination | Required steps |
36+
| ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
37+
| [Snowflake](/docs/connections/storage/catalog/snowflake/#getting-started) | 1. Create virtual warehouse. <br> 2. Create a database. <br> 3. Create role for Segment. <br> 4. Create user for Segment. <br> 5. Test the user and credentials. |
38+
| [Redshift](/docs/connections/storage/catalog/redshift/#getting-started) | 1. Choose an instance. <br> 2. Provision a new Redshift cluster. |
39+
| [BigQuery](/docs/connections/storage/catalog/bigquery/) | 1. Create a project and enable BigQuery. <br> 2. Create a service account for Segment. |
40+
| [Azure](/docs/connections/storage/catalog/azuresqldw/) | 1. Sign up for an Azure subscription. <br> 2. Provision a dedicated SQL pool. |
41+
| [Postgres](/docs/connections/storage/catalog/postgres/) | 1. Follow the steps in the [Postgres getting started](/docs/connections/storage/catalog/postgres/) section. |
42+
43+
Once you’ve finished the required steps for your chosen warehouse, you’re ready to connect your warehouse to Segment. Because you’ll next enter credentials from the warehouse you just created, **leave the warehouse tab open to streamline setup.**
44+
45+
### Step 2: Connect the warehouse and enable Profiles Sync
46+
47+
With your warehouse configured, you can now connect it to Segment.
48+
49+
During this step, you’ll copy credentials from the warehouse you just set up and enter them into the Segment app. The specific credentials you’ll enter depend on the warehouse you chose during Step 1.
50+
51+
Segment may also display IP addresses you’ll need to allowlist in your warehouse. Make sure to copy the IP addresses and enter them into your warehouse account.
52+
53+
Follow these steps to connect your warehouse:
54+
55+
1. In your Segment workspace, navigate to **Profiles > Profiles Sync**.
56+
2. Select **Add warehouse**, choose the warehouse you just set up, then select **Next**.
57+
3. Segment shows an IP address to allowlist. Copy it to your warehouse Destination.
58+
4. Segment prompts you to enter specific warehouse credentials. Enter them, then select **Test Connection**.
59+
5. If the connection test succeeds, Segment enables the **Next** button. Select it.
60+
* If the connection test fails, verify that you’ve correctly entered the warehouse credentials, then try again.
61+
6. Select **Next** on the **Sync schedule** page. Segment displays the Profiles Sync overview page.
62+
63+
At this point, Segment enables live syncs for your account.
64+
65+
#### Using historical backfill
66+
67+
Profiles Sync sends Profiles to your warehouse on an hourly basis, beginning after you complete setup. You can use backfill, however, to sync historical Profiles to your warehouse, as well.
68+
69+
By default, Segment includes identity graph updates, external ID mapping tables, and two months of the events table in the initial warehouse sync made during setup. Reach out to Segment support if your use case exceeds the scope of the initial setup backfill.
70+
71+
## Working with synced warehouses
72+
73+
<!-- add transition line here -->
74+
75+
### Monitor Profiles Sync
76+
77+
You can view warehouse sync information in the overview section of the Profiles Sync page. Segment displays the dates and times of the last and next syncs, as well as your sync frequency.
78+
79+
In the Syncs table, you’ll find reports on individual syncs. Segment lists your most recent syncs first. The following table shows the information Segment tracks for each sync:
80+
81+
| DATA TYPE | DEFINITION |
82+
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
83+
| Sync status | - `Success`, which indicates that all rows synced correctly; <br> - `Partial success`, indicating that some rows synced correctly <br> - `Failed`, indicating that no rows synced correctly |
84+
| Duration | Length of sync time, in minutes |
85+
| Start time | The date and time when the sync began |
86+
| Synced rows | The number of rows synced to the warehouse |
87+
88+
Selecting a row from the Syncs table opens a pane that contains granular sync information. In this view, you’ll see the sync’s status, duration, and start time. Segment also displays a nuanced breakdown of the total rows synced, sorting them into identity graph tables, event type tables, and event tables.
89+
90+
If the sync failed, Segment shows any available error messages in the sync report.
91+
92+
### Settings and maintenance
93+
94+
The **Settings** tab of the Profiles Sync page contains tools that can help you monitor and maintain your synced warehouse.
95+
96+
#### Disable or delete a warehouse
97+
98+
In the **Basic settings** tab, you can disable warehouse syncs or delete your connected warehouse altogether.
99+
100+
To disable syncs, toggle **Sync status** to off. Segment retains your warehouse credentials but stops further Profiles syncs. Toggle Sync status back on at any point to continue syncs.
101+
102+
To delete your warehouse, toggle **Sync status** to off, then select **Delete warehouse**. Segment doesn’t retain credentials for deleted warehouses; to reconnect a deleted warehouse, you must set it up as a new warehouse.
103+
104+
#### Connection settings
105+
106+
In the **Connection settings** tab, you can verify your synced warehouse’s credentials and view IP addresses you’ll need to allowlist so that Segment can successfully sync Profiles.
107+
108+
If you have write access, you can verify that your warehouse is successfully connected to Segment by entering your password and then selecting **Test Connection**.
109+
110+
> info "Changing your synced warehouse"
111+
> If you’d like to change the warehouse connected to Profiles Sync, [reach out to Segment support](https://segment.com/help/contact/).
112+
113+
<!-- Verify that this doesn't need to be changed -->
114+
115+
#### Sync schedule
116+
117+
Segment supports hourly syncs.
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
title: Profiles Sync Sample Queries
3+
beta: true
4+
plan: profiles
5+
---
6+
7+
On this page, you’ll find queries that you can run with Profiles Sync to address common use cases.
8+
9+
> info ""
10+
> The examples in this guide are based on a Snowflake installation. If you’re using another warehouse, you may need to adjust the syntax.
11+
12+
## About example schemas
13+
14+
The queries on this page use two example schemas:
15+
16+
- `ps_segment`, a schema where Segment lands data
17+
- `ps_ materialize`, a schema with your produced materializations
18+
19+
These schema names may not match your own.
20+
21+
## Monitor and diagnose identity graphs
22+
23+
These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by [identity resolution](/docs/profiles/identity-resolution/).
24+
25+
### Show how many profiles Segment creates and merges per hour
26+
27+
This example queries the `id_graph_udpates` table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change:
28+
29+
```sql
30+
SELECT
31+
DATE_TRUNC('hour',timestamp) as hr,
32+
CASE
33+
WHEN canonical_segment_id=segment_id
34+
THEN 'profile creation' ELSE 'profile merge'
35+
END as profile_event,
36+
triggering_event_type,
37+
COUNT(DISTINCT triggering_event_id) as event_count
38+
FROM ps_segment.id_graph_updates
39+
GROUP BY 1,2,3
40+
```
41+
42+
### Isolate profiles that have reached an identifier's maximum configured value
43+
44+
Segment’s [configurable identifier limits](/docs/profiles/identity-resolution/identity-resolution-settings/) let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile.
45+
46+
The following query lets you view Profiles that have reached a configured limit for the email identifier:
47+
48+
```sql
49+
WITH agg AS (
50+
SELECT
51+
canonical_segment_id,
52+
COUNT(LOWER(TRIM(external_id_value))) as value_count,
53+
LISTAGG(external_id_value,', ') as external_id_values
54+
FROM ps_materialize.external_id_mapping
55+
WHERE external_id_type='email'
56+
GROUP BY 1
57+
)
58+
SELECT
59+
canonical_segment_id,
60+
external_id_values,
61+
value_count
62+
FROM agg
63+
WHERE value_count > 5 -- set to your configured limit
64+
```
65+
## Reconstruct a profile's traits
66+
67+
<!-- add intro phrase here and fix this next header for clarity -->
68+
69+
### Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles
70+
71+
When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the `profile_traits` table.
72+
73+
You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait’s origin.
74+
75+
The following example inspects a particular profile, `use_XX`, and trait, `trait_1`. The query reports the profile’s last observed trait, its source ID, and any profiles Segment has since merged into the profile:
76+
77+
```sql
78+
SELECT * FROM (
79+
SELECT
80+
ids.canonical_segment_id,
81+
ident.segment_id,
82+
ident.event_source_id,
83+
ident.trait_1,
84+
row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn
85+
FROM ps_segment.identifies as ident
86+
INNER JOIN ps_materialize.id_graph as ids
87+
ON ids.segment_id = ident.segment_id
88+
AND ids.canonical_segment_id = 'use_XXX'
89+
AND ident.trait_1 IS NOT NULL
90+
) WHERE rn=1
91+
```
92+
93+
## Measure and model your customer base
94+
95+
<!-- add intro phrase here and fix this next header for clarity -->
96+
97+
### Pull a complete list of your customers, along with their merges, external identifiers, or traits
98+
99+
The following three snippets will provide a full list of your customers, along with:
100+
101+
- The profile IDs merged into that customer:
102+
103+
```sql
104+
SELECT
105+
canonical_segment_id,
106+
LISTAGG(segment_id, ', ') as associated_segment_ids
107+
FROM ps_materialize.id_graph
108+
GROUP BY 1
109+
```
110+
111+
- The external IDs associated with that customer:
112+
113+
```sql
114+
SELECT
115+
canonical_segment_id,
116+
LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids
117+
FROM ps_materialize.external_id_mapping
118+
GROUP BY 1
119+
```
120+
121+
- The customer’s traits:
122+
123+
```sql
124+
SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL
125+
```
126+
127+
### Show all pages visited by a user
128+
129+
To get complete user histories, join event tables to the identity graph and aggregate or filter with `id_graph.canonical_segment_id`:
130+
131+
```sql
132+
SELECT
133+
id_graph.canonical_segment_id,
134+
pages.*
135+
FROM ps_segment.pages
136+
LEFT JOIN ps_materialize.id_graph
137+
ON id_graph.segment_id = pages.segment_id
138+
WHERE canonical_segment_id = ‘use_XX..’
139+
```
140+
141+
### Show the complete history of a trait or audience membership associated with a customer
142+
143+
Suppose you want to track a user’s entrances and exits of the audience `aud_1`. Running the following query would return all qualifying entrance and exits:
144+
145+
```sql
146+
SELECT
147+
id_graph.canonical_segment_id,
148+
identifies.aud_1,
149+
identifies.timestamp
150+
FROM ps_segment.identifies
151+
INNER JOIN ps_materialize.id_graph
152+
ON id_graph.segment_id = identifies.segment_id
153+
AND identifies.aud_1 IS NOT NULL
154+
```
155+
156+
This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream.
157+
158+
## Frequently asked questions
159+
160+
#### Can I view Engage Audience membership and Computed Trait values in my Warehouse?
161+
162+
Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse.
163+
164+
The column name corresponds to the Audience or Trait key shown on the settings page:
165+
166+
Surface these values the same way as any other trait value:
167+
168+
- The Trait’s complete history will be in `identifies`
169+
- The Trait’s current state for each customer will be in `profile_traits`
170+
171+
#### What is the relationship between `segment_id` and `canonical_segment_id`? Are they unique?
172+
173+
Identity merges change Segment’s understanding of who performed historical events.
174+
175+
For example, if `profile_b` completed a “Product Purchased” event but Segment understands that `profile_b` should be merged into `profile_a`, Segment deduces that `profile_a` performed that initial “Product Purchased” event.
176+
177+
With that in mind, here's how to differentiate between `segment_id` and `canonical_segment_id`:
178+
179+
- `segment_id` is a unique identifier representing Segment’s understanding of who performed an action at the time the action happened.
180+
- `canonical_segment_id` is a unique identifier representing Segment’s current understanding of who performed that action.
181+
182+
The mapping between these two identifiers materializes in your `id_graph` table. If a profile has not been merged away, then `segment_id` is equivalent to `canonical_segment_id`. If a profile has been merged away, `id_graph` reflects that state.
183+
184+
As a result, you can retrieve a customer’s complete event history by joining an event table, like `product_purchased` to `id_graph`.
185+
186+
For more information, view the [Profiles Sync tables guide](/docs/profiles/profiles-sync/tables/).
187+
188+
#### Should I expect discrepancies between Profile data seen in Segment Profiles (or) UI vs. what’s exposed via Profiles in the Warehouse?
189+
190+
<!-- fix this header^^ -->
191+
192+
Profiles Sync mimics the materialization performed by Segment Profiles. A user’s merges, external IDs, and traits should be expected whether they’re queried in the warehouse, Profile API, or viewed in the UI.
193+
194+
The following edge cases might drive slight (<0.01%) variation:
195+
196+
- Data processed by Profiles hasn’t yet landed in Profiles Sync.
197+
- If you rebuild or use non-incremental materialization for `profile_traits`, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users.
198+
199+
By contrast, Segment Profiles and incrementally-built Profiles Sync materializations won’t combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile’s traits will be chosen across the board.

0 commit comments

Comments
 (0)