You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -12,144 +12,153 @@ import Link from '@docusaurus/Link';
12
12
13
13
# UID2 Databricks Clean Room Integration Guide
14
14
15
-
Overview general info plus define audience.
15
+
This guide is for advertisers and data providers who want to manage their raw UID2s in a Databricks environment.
16
16
17
-
## Databricks listing?
17
+
[**GWH__MC01 "Amazon Web Services, Google Cloud Platform, or Microsoft Azure." -- which do we use? Or, any and all?**]
18
18
19
-
xxx
19
+
[**GWH__MC02 Is it for EUID also? I think not?**]
20
+
21
+
## Databricks Listing?
22
+
23
+
[**GWH__MC03 where do Databricks users go to get more information about UID2 integration?**]
20
24
21
25
## Functionality
22
26
23
-
xxx
27
+
The following table summarizes the functionality available with the UID2 Databricks integration.
24
28
25
-
### Key Benefits
29
+
| Encrypt Raw UID2 to UID2 Token for Sharing | Decrypt UID2 Token to Raw UID2 | Generate UID2 Token from DII | Refresh UID2 Token | Map DII to Raw UID2s |
30
+
| :--- | :--- | :--- | :--- | :--- |
31
+
|✅|✅|—*|—|✅|
26
32
27
-
xxx
33
+
*You cannot use Databricks to generate a UID2 token directly from <Linkhref="../ref-info/glossary-uid#gl-dii">DII</Link>. However, you can convert DII to a raw UID2, and then encrypt the raw UID2 into a UID2 token.
28
34
29
-
##Summary of Integration Steps
35
+
### Key Benefits
30
36
31
-
------------------- MATT GUIDE, BEGIN ------------------------------
37
+
Here are some key benefits of integrating with Databricks for your UID2 processing:
32
38
39
+
- Native support for managing UID2 workflows within a Databricks data clean room.
40
+
- Secure identity interoperability between partner datasets.
41
+
- Direct lineage and observability for all UID2-related transformations and joins, for auditing and traceability.
42
+
- Streamlined integration between UID2 identifiers and The Trade Desk activation ecosystem.
43
+
- Self-service support for marketers and advertisers through Databricks.
33
44
34
-
## Summary of Integration Steps
45
+
## Integration Steps
35
46
36
47
At a high level, the following are the steps to set up your Databricks integration and process your data:
37
48
38
-
1. Create a clean room and invite UID2 as a collaborator.
39
-
1. Send your sharing identifier to your UID2 contact.
40
-
1. Add data to the clean room.
41
-
1. Run the clean room notebook to map directly identifying information (DII).
49
+
1.[Create a clean room for UID2 collaboration](#create-clean-room-for-uid2-collaboration).
50
+
1.[Send your Databricks sharing identifier to your UID2 contact](#send-sharing-identifier-to-uid2-contact).
51
+
1.[Add data to the clean room](#add-data-to-the-clean-room).
52
+
1.[Map DII](#map-dii) by running the clean room notebook.
53
+
54
+
### Create Clean Room for UID2 Collaboration
42
55
43
-
## Step 1: Create a clean room and invite UID2 as a collaborator
56
+
As a starting point, create a Databricks clean room—a secure environment for you to collaborate with UID2 to process your data.
57
+
58
+
Follow the steps in [Create clean rooms](https://docs.databricks.com/aws/en/clean-rooms/create-clean-room) in the Databricks documentation. Use the correct sharing identifier based on the [UID2 environment](../getting-started/gs-environments) you want to connect to: see [UID2 Sharing Identifiers](#uid2-sharing-identifiers).
59
+
60
+
:::important
61
+
After you've created a clean room, you cannot change its collaborators. If you have the option to set clean room collaborator aliases—for example, if you’re using the Databricks Python SDK to create the clean room—your collaborator alias must be `creator` and the UID2 collaborator alias must be `collaborator`. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you.
62
+
:::
63
+
64
+
#### UID2 Sharing Identifiers
44
65
45
-
Follow the steps in Create clean rooms in the Databricks documentation. Use the correct sharing identifier from the table below, based on the UID2 Environment you wish to connect to.
46
66
UID2 sharing identifiers can change. Be sure to check this page for the latest sharing identifiers.
47
67
48
68
| Environment | UID2 Sharing Identifier |
49
69
| :--- | :--- |
50
-
| Production | aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329 |
Once you've created a clean room, you cannot change its collaborators.
55
-
56
-
If you have the option to set clean room collaborator aliases—for example, if you’re using the Databricks Python SDK [**GWH__MC is this the UID2 Python SDK? Or a Databrics SDK?**]to create the clean room—your collaborator alias must be `creator` and the UID2 collaborator alias must be `collaborator`. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you.
57
-
:::
70
+
| Production |`aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329`|
##Step 2: Send your sharing identifier to your UID2 contact
73
+
### Send Sharing Identifier to UID2 Contact
60
74
61
75
Find the sharing identifier for the Unity Catalog metastore that is attached to the Databricks workspace where you’ll work with the clean room. Send the sharing identifier to your UID2 contact.
76
+
62
77
The sharing identifier is a string in this format: `<cloud>:<region>:<uuid>`.
63
78
64
-
For information on how to find the sharing identifier, see Get access in the Databricks-to-Databricks model in the Databricks documentation.
65
-
66
-
## Step 3: Add data to the clean room
67
-
68
-
Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in [Input Table](#uptohere)Schema.
69
-
70
-
## Step 4: Run the clean room notebook to map DII
71
-
72
-
Run the `identity_map_v3` clean room notebook to map DII to UID2s. Details about this notebook are given in the next section.
73
-
Map DII
74
-
The `identity_map_v3` clean room notebook maps DII to UID2s.
75
-
Notebook Parameters
76
-
The `identity_map_v3` notebook can be used to map DII in any table or view that has been added to the creator catalog of the clean room.
77
-
The notebook has two parameters, input_schema and input_table. Together they identify the table or view in the clean room that contains the DII to be mapped.
78
-
For example, to map DII in the clean room table named creator.default.emails, set input_schema to default and input_table to emails.
79
-
Parameter Name
80
-
Description
81
-
input_schema
82
-
The schema containing the table or view.
83
-
input_table
84
-
The name of the table or view containing the DII to be mapped.
85
-
Input Table
86
-
The input table or view must have two columns: INPUT and INPUT_TYPE. The table or view can have additional columns, but they won’t be used by the notebook.
87
-
Column Name
88
-
Data Type
89
-
Description
90
-
INPUT
91
-
string
92
-
The DII to map.
93
-
INPUT_TYPE
94
-
string
95
-
The type of DII to map. Allowed values: email, email_hash, phone, and phone_hash.
96
-
DII Format
97
-
If the DII is an email address, the notebook normalizes the data using the UID2 Email Address Normalization rules.
98
-
If the DII is a phone number, you must normalize it before mapping it with the notebook, using the UID2 Phone Number Normalization rules.
99
-
Output Table
100
-
If the clean room has an output catalog, the mapped DII will be written to a table in the output catalog. Output tables are stored for 30 days. For more information, see Overview of output tables in the Databricks documentation.
101
-
Output Table Schema
102
-
Column Name
103
-
Data Type
104
-
Description
105
-
UID
106
-
string
107
-
The value is one of the following:
108
-
DII was successfully mapped: The UID2 associated with the DII.
109
-
Otherwise: NULL.
110
-
PREV_UID
111
-
string
112
-
The value is one of the following:
113
-
DII was successfully mapped and the current raw UID2 was rotated in the last 90 days: the previous raw UID2.
114
-
Otherwise: NULL.
115
-
REFRESH_FROM
116
-
timestamp
117
-
The value is one of the following:
118
-
DII was successfully mapped: The timestamp (in epoch seconds) indicating when this UID2 should be refreshed.
119
-
Otherwise: NULL.
120
-
UNMAPPED
121
-
string
122
-
The value is one of the following:
123
-
DII was successfully mapped: NULL.
124
-
Otherwise: The reason why the identifier was not mapped: OPTOUT, INVALID IDENTIFIER, or INVALID INPUT TYPE.
125
-
For details, see Values for the UNMAPPED Column.
126
-
Values for the UNMAPPED Column
127
-
The following table shows possible values for the UNMAPPED column.
128
-
Value
129
-
Meaning
130
-
NULL
131
-
The DII was successfully mapped.
132
-
OPTOUT
133
-
The user has opted out.
134
-
INVALID IDENTIFIER
135
-
The email address or phone number is invalid.
136
-
INVALID INPUT TYPE
137
-
The value of INPUT_TYPE is invalid. Valid values for INPUT_TYPE are: email, email_hash, phone, phone_hash.
138
-
139
-
140
-
141
-
142
-
143
-
144
-
145
-
------------------- MATT GUIDE, END ------------------------------
79
+
For information on how to find the sharing identifier, see [Request the recipient's sharing identifier](https://docs.databricks.com/aws/en/delta-sharing/create-recipient#step-1-request-the-recipients-sharing-identifier) in the Databricks documentation.
146
80
81
+
[**GWH__MC04 just noting that I changed the above: just the link copy, not the link itself. You had "Get access in the Databricks-to-Databricks model" but the link in your file went to the above. LMK if I need to change anything.**]
147
82
148
-
------------------- BELOW IS A COPY OF SNOWFLAKE DOC HEADINGS ------------------------------
83
+
### Add Data to the Clean Room
149
84
85
+
Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in [Input Table](#input-table).
150
86
87
+
### Map DII
151
88
152
-
xxx
89
+
Run the `identity_map_v3` clean room [notebook](#https://docs.databricks.com/aws/en/notebooks/) to map email addresses, phone numbers, or their respective hashes to raw UID2s.
90
+
91
+
## Running the Clean Room Notebook
92
+
93
+
This section provides details to help you use your Databricks clean room to process your DII into raw UID2s, including the following:
94
+
95
+
-[Notebook Parameters](#notebook-parameters)
96
+
-[Input Table](#input-table)
97
+
-[DII Format and Normalization](#dii-format-and-normalization)
98
+
-[Output Table](#output-table)
99
+
-[Output Table Schema](#output-table-schema)
100
+
101
+
### Notebook Parameters
102
+
103
+
You can use the `identity_map_v3` notebook to map DII in any table or view that you've added to the `creator` catalog of the clean room.
104
+
105
+
The notebook has two parameters, `input_schema` and `input_table`. Together, these two parameters identify the table or view in the clean room that contains the DII to be mapped.
106
+
107
+
For example, to map DII in the clean room table named `creator.default.emails`, set `input_schema` to `default` and `input_table` to `emails`.
108
+
109
+
| Parameter Name | Description |
110
+
| :--- | :--- |
111
+
|`input_schema`| The schema containing the table or view. |
112
+
|`input_table`| The name of the table or view containing the DII to be mapped. |
113
+
114
+
### Input Table
115
+
116
+
The input table or view must have the two columns shown in the following table. The table or view can have additional columns, but the notebook doesn't use any additional columns, only these two.
117
+
118
+
| Column Name | Data Type | Description |
119
+
| :--- | :--- | :--- |
120
+
|`INPUT`| string | The DII to map. |
121
+
|`INPUT_TYPE`| string | The type of DII to map. Allowed values: `email`, `email_hash`, `phone`, and `phone_hash`. |
122
+
123
+
### DII Format and Normalization
124
+
125
+
The normalization requirements depend on the type of DII you're processing, as follows:
126
+
127
+
-**Email address**: The notebook normalizes the data using the UID2 [Email Address Normalization](../getting-started/gs-normalization-encoding#email-address-normalization) rules.
128
+
-**Phone number**: You must normalize the phone number before mapping it with the notebook, using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding#phone-number-normalization) rules.
129
+
130
+
### Output Table
131
+
132
+
If the clean room has an output catalog, the mapped DII is written to a table in the output catalog. Output tables are stored for 30 days.
133
+
134
+
For details, see [Overview of output tables](https://docs.databricks.com/aws/en/clean-rooms/output-tables#overview-of-output-tables) in the Databricks documentation.
135
+
136
+
### Output Table Schema
137
+
138
+
The following table provides information about the structure of the output data, including field names and values.
139
+
140
+
| Column Name | Data Type | Description |
141
+
| :--- | :--- | :--- |
142
+
|`UID`| string | The value is one of the following:<ul><li>**DII was successfully mapped**: The UID2 associated with the DII.</li><li>Othe**rwise: `NULL`.</li></ul> |
143
+
|`PREV_UID`| string | The value is one of the following:<ul><li>**DII was successfully mapped and the current raw UID2 was rotated in the last 90 days**: the previous raw UID2.</li><li>**Otherwise**: `NULL`.</li></ul> |
144
+
|`REFRESH_FROM`| timestamp | The value is one of the following:<ul><li>**DII was successfully mapped**: The timestamp (in epoch seconds) indicating when this UID2 should be refreshed.</li><li>**Otherwise**: `NULL`.</li></ul> |
145
+
|`UNMAPPED`| string | The value is one of the following:<ul><li>**DII was successfully mapped**: `NULL`.</li><li>Othe**rwise: The reason why the identifier was not mapped: `OPTOUT`, `INVALID IDENTIFIER`, or `INVALID INPUT TYPE`.<br/>For details, see [Values for the UNMAPPED Column](#values-for-the-unmapped-column).</li></ul> |
146
+
147
+
#### Values for the UNMAPPED Column
148
+
149
+
The following table shows possible values for the `UNMAPPED` column.
150
+
151
+
| Value | Meaning |
152
+
| :--- | :--- |
153
+
|`NULL`| The DII was successfully mapped. |
154
+
|`OPTOUT`| The user has opted out. |
155
+
|`INVALID IDENTIFIER`| The email address or phone number is invalid. |
156
+
|`INVALID INPUT TYPE`| The value of `INPUT_TYPE` is invalid. Valid values for `INPUT_TYPE` are: `email`, `email_hash`, `phone`, `phone_hash`. |
157
+
158
+
159
+
<!--
160
+
161
+
------------------- BELOW IS A COPY OF SNOWFLAKE DOC HEADINGS ------------------------------
153
162
154
163
## Testing in the Integ Environment
155
164
@@ -171,11 +180,8 @@ Define types of <Link href="../ref-info/glossary-uid#gl-dii">DII</Link>?
171
180
172
181
Does the service normalize? (for email / for phone)
173
182
174
-
175
183
A successful query returns ...?
176
184
177
-
178
-
179
185
#### Examples
180
186
181
187
Mapping request examples in this section:
@@ -198,7 +204,7 @@ The input and output data in these examples is fictitious, for illustrative purp
198
204
The following query illustrates how to map a single email address, using the [default database and schema names](#database-and-schema-names).
199
205
200
206
```sql
201
-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('[email protected]', 'email'));
207
+
202
208
```
203
209
204
210
Query results for a single email:
@@ -216,9 +222,7 @@ Query results for a single email:
216
222
The following query illustrates how to map multiple email addresses, using the [default database and schema names](#database-and-schema-names).
217
223
218
224
```sql
219
-
selecta.ID, a.EMAIL, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPEDfrom AUDIENCE a LEFT JOIN(
220
-
select ID, t.*from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL, 'email') t) m
221
-
ona.ID=m.ID;
225
+
222
226
```
223
227
224
228
Query results for multiple emails:
@@ -244,7 +248,7 @@ The following query illustrates how to map a phone number, using the [default da
244
248
You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules.
245
249
246
250
```sql
247
-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('+12345678901', 'phone'));
251
+
248
252
```
249
253
250
254
Query results for a single phone number:
@@ -264,9 +268,7 @@ The following query illustrates how to map multiple phone numbers, using the [de
264
268
You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules.
265
269
266
270
```sql
267
-
selecta.ID, a.PHONE, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPEDfrom AUDIENCE a LEFT JOIN(
268
-
select ID, t.*from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE, 'phone') t) m
269
-
ona.ID=m.ID;
271
+
270
272
```
271
273
272
274
Query results for multiple phone numbers:
@@ -290,7 +292,7 @@ The following table identifies each item in the response, including `NULL` value
290
292
The following query illustrates how to map a single email address hash, using the [default database and schema names](#database-and-schema-names).
291
293
292
294
```sql
293
-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('[email protected]', 256)), 'email_hash'));
295
+
294
296
```
295
297
296
298
Query results for a single hashed email:
@@ -308,10 +310,8 @@ Query results for a single hashed email:
308
310
The following query illustrates how to map multiple email address hashes, using the [default database and schema names](#database-and-schema-names).
309
311
310
312
```sql
311
-
selecta.ID, a.EMAIL_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPEDfrom AUDIENCE a LEFT JOIN(
312
-
select ID, t.*from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL_HASH, 'email_hash') t) m
313
-
ona.ID=m.ID;
314
-
```
313
+
314
+
````
315
315
316
316
Query results for multiple hashed emails:
317
317
@@ -333,7 +333,7 @@ The following table identifies each item in the response, including `NULL` value
333
333
The following query illustrates how to map a single phone number hash, using the [default database and schema names](#database-and-schema-names).
334
334
335
335
```sql
336
-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('+12345678901', 256)), 'phone_hash'));
336
+
337
337
```
338
338
339
339
Query results for a single hashed phone number:
@@ -351,9 +351,7 @@ Query results for a single hashed phone number:
351
351
The following query illustrates how to map multiple phone number hashes, using the [default database and schema names](#database-and-schema-names).
352
352
353
353
```sql
354
-
selecta.ID, a.PHONE_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPEDfrom AUDIENCE a LEFT JOIN(
355
-
select ID, t.*from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE_HASH, 'phone_hash') t) m
0 commit comments