Skip to content

Commit 34b7a30

Browse files
committed
doc development
1 parent f08783a commit 34b7a30

File tree

3 files changed

+514
-129
lines changed

3 files changed

+514
-129
lines changed

docs/guides/integration-databricks.md

Lines changed: 127 additions & 128 deletions
Original file line numberDiff line numberDiff line change
@@ -12,144 +12,153 @@ import Link from '@docusaurus/Link';
1212

1313
# UID2 Databricks Clean Room Integration Guide
1414

15-
Overview general info plus define audience.
15+
This guide is for advertisers and data providers who want to manage their raw UID2s in a Databricks environment.
1616

17-
## Databricks listing?
17+
[**GWH__MC01 "Amazon Web Services, Google Cloud Platform, or Microsoft Azure." -- which do we use? Or, any and all?**]
1818

19-
xxx
19+
[**GWH__MC02 Is it for EUID also? I think not?**]
20+
21+
## Databricks Listing?
22+
23+
[**GWH__MC03 where do Databricks users go to get more information about UID2 integration?**]
2024

2125
## Functionality
2226

23-
xxx
27+
The following table summarizes the functionality available with the UID2 Databricks integration.
2428

25-
### Key Benefits
29+
| Encrypt Raw UID2 to UID2 Token for Sharing | Decrypt UID2 Token to Raw UID2 | Generate UID2 Token from DII | Refresh UID2 Token | Map DII to Raw UID2s |
30+
| :--- | :--- | :--- | :--- | :--- |
31+
| ✅ | ✅ | —* | — | ✅ |
2632

27-
xxx
33+
*You cannot use Databricks to generate a UID2 token directly from <Link href="../ref-info/glossary-uid#gl-dii">DII</Link>. However, you can convert DII to a raw UID2, and then encrypt the raw UID2 into a UID2 token.
2834

29-
## Summary of Integration Steps
35+
### Key Benefits
3036

31-
------------------- MATT GUIDE, BEGIN ------------------------------
37+
Here are some key benefits of integrating with Databricks for your UID2 processing:
3238

39+
- Native support for managing UID2 workflows within a Databricks data clean room.
40+
- Secure identity interoperability between partner datasets.
41+
- Direct lineage and observability for all UID2-related transformations and joins, for auditing and traceability.
42+
- Streamlined integration between UID2 identifiers and The Trade Desk activation ecosystem.
43+
- Self-service support for marketers and advertisers through Databricks.
3344

34-
## Summary of Integration Steps
45+
## Integration Steps
3546

3647
At a high level, the following are the steps to set up your Databricks integration and process your data:
3748

38-
1. Create a clean room and invite UID2 as a collaborator.
39-
1. Send your sharing identifier to your UID2 contact.
40-
1. Add data to the clean room.
41-
1. Run the clean room notebook to map directly identifying information (DII).
49+
1. [Create a clean room for UID2 collaboration](#create-clean-room-for-uid2-collaboration).
50+
1. [Send your Databricks sharing identifier to your UID2 contact](#send-sharing-identifier-to-uid2-contact).
51+
1. [Add data to the clean room](#add-data-to-the-clean-room).
52+
1. [Map DII](#map-dii) by running the clean room notebook.
53+
54+
### Create Clean Room for UID2 Collaboration
4255

43-
## Step 1: Create a clean room and invite UID2 as a collaborator
56+
As a starting point, create a Databricks clean room&#8212;a secure environment for you to collaborate with UID2 to process your data.
57+
58+
Follow the steps in [Create clean rooms](https://docs.databricks.com/aws/en/clean-rooms/create-clean-room) in the Databricks documentation. Use the correct sharing identifier based on the [UID2 environment](../getting-started/gs-environments) you want to connect to: see [UID2 Sharing Identifiers](#uid2-sharing-identifiers).
59+
60+
:::important
61+
After you've created a clean room, you cannot change its collaborators. If you have the option to set clean room collaborator aliases&#8212;for example, if you’re using the Databricks Python SDK to create the clean room&#8212;your collaborator alias must be `creator` and the UID2 collaborator alias must be `collaborator`. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you.
62+
:::
63+
64+
#### UID2 Sharing Identifiers
4465

45-
Follow the steps in Create clean rooms in the Databricks documentation. Use the correct sharing identifier from the table below, based on the UID2 Environment you wish to connect to.
4666
UID2 sharing identifiers can change. Be sure to check this page for the latest sharing identifiers.
4767

4868
| Environment | UID2 Sharing Identifier |
4969
| :--- | :--- |
50-
| Production | aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329 |
51-
Integration | aws:us-east-2:4651b4ea-b29c-42ec-aecb-2377de70bbd4:2366823546528067:c15e03bf-a348-4189-92e5-68b9a7fb4018 |
52-
53-
:::note
54-
Once you've created a clean room, you cannot change its collaborators.
55-
56-
If you have the option to set clean room collaborator aliases&#8212;for example, if you’re using the Databricks Python SDK [**GWH__MC is this the UID2 Python SDK? Or a Databrics SDK?**]to create the clean room&#8212;your collaborator alias must be `creator` and the UID2 collaborator alias must be `collaborator`. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you.
57-
:::
70+
| Production | `aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329` |
71+
Integration | `aws:us-east-2:4651b4ea-b29c-42ec-aecb-2377de70bbd4:2366823546528067:c15e03bf-a348-4189-92e5-68b9a7fb4018` |
5872

59-
## Step 2: Send your sharing identifier to your UID2 contact
73+
### Send Sharing Identifier to UID2 Contact
6074

6175
Find the sharing identifier for the Unity Catalog metastore that is attached to the Databricks workspace where you’ll work with the clean room. Send the sharing identifier to your UID2 contact.
76+
6277
The sharing identifier is a string in this format: `<cloud>:<region>:<uuid>`.
6378

64-
For information on how to find the sharing identifier, see Get access in the Databricks-to-Databricks model in the Databricks documentation.
65-
66-
## Step 3: Add data to the clean room
67-
68-
Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in [Input Table](#uptohere)Schema.
69-
70-
## Step 4: Run the clean room notebook to map DII
71-
72-
Run the `identity_map_v3` clean room notebook to map DII to UID2s. Details about this notebook are given in the next section.
73-
Map DII
74-
The `identity_map_v3` clean room notebook maps DII to UID2s.
75-
Notebook Parameters
76-
The `identity_map_v3` notebook can be used to map DII in any table or view that has been added to the creator catalog of the clean room.
77-
The notebook has two parameters, input_schema and input_table. Together they identify the table or view in the clean room that contains the DII to be mapped.
78-
For example, to map DII in the clean room table named creator.default.emails, set input_schema to default and input_table to emails.
79-
Parameter Name
80-
Description
81-
input_schema
82-
The schema containing the table or view.
83-
input_table
84-
The name of the table or view containing the DII to be mapped.
85-
Input Table
86-
The input table or view must have two columns: INPUT and INPUT_TYPE. The table or view can have additional columns, but they won’t be used by the notebook.
87-
Column Name
88-
Data Type
89-
Description
90-
INPUT
91-
string
92-
The DII to map.
93-
INPUT_TYPE
94-
string
95-
The type of DII to map. Allowed values: email, email_hash, phone, and phone_hash.
96-
DII Format
97-
If the DII is an email address, the notebook normalizes the data using the UID2 Email Address Normalization rules.
98-
If the DII is a phone number, you must normalize it before mapping it with the notebook, using the UID2 Phone Number Normalization rules.
99-
Output Table
100-
If the clean room has an output catalog, the mapped DII will be written to a table in the output catalog. Output tables are stored for 30 days. For more information, see Overview of output tables in the Databricks documentation.
101-
Output Table Schema
102-
Column Name
103-
Data Type
104-
Description
105-
UID
106-
string
107-
The value is one of the following:
108-
DII was successfully mapped: The UID2 associated with the DII.
109-
Otherwise: NULL.
110-
PREV_UID
111-
string
112-
The value is one of the following:
113-
DII was successfully mapped and the current raw UID2 was rotated in the last 90 days: the previous raw UID2.
114-
Otherwise: NULL.
115-
REFRESH_FROM
116-
timestamp
117-
The value is one of the following:
118-
DII was successfully mapped: The timestamp (in epoch seconds) indicating when this UID2 should be refreshed.
119-
Otherwise: NULL.
120-
UNMAPPED
121-
string
122-
The value is one of the following:
123-
DII was successfully mapped: NULL.
124-
Otherwise: The reason why the identifier was not mapped: OPTOUT, INVALID IDENTIFIER, or INVALID INPUT TYPE.
125-
For details, see Values for the UNMAPPED Column.
126-
Values for the UNMAPPED Column
127-
The following table shows possible values for the UNMAPPED column.
128-
Value
129-
Meaning
130-
NULL
131-
The DII was successfully mapped.
132-
OPTOUT
133-
The user has opted out.
134-
INVALID IDENTIFIER
135-
The email address or phone number is invalid.
136-
INVALID INPUT TYPE
137-
The value of INPUT_TYPE is invalid. Valid values for INPUT_TYPE are: email, email_hash, phone, phone_hash.
138-
139-
140-
141-
142-
143-
144-
145-
------------------- MATT GUIDE, END ------------------------------
79+
For information on how to find the sharing identifier, see [Request the recipient's sharing identifier](https://docs.databricks.com/aws/en/delta-sharing/create-recipient#step-1-request-the-recipients-sharing-identifier) in the Databricks documentation.
14680

81+
[**GWH__MC04 just noting that I changed the above: just the link copy, not the link itself. You had "Get access in the Databricks-to-Databricks model" but the link in your file went to the above. LMK if I need to change anything.**]
14782

148-
------------------- BELOW IS A COPY OF SNOWFLAKE DOC HEADINGS ------------------------------
83+
### Add Data to the Clean Room
14984

85+
Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in [Input Table](#input-table ).
15086

87+
### Map DII
15188

152-
xxx
89+
Run the `identity_map_v3` clean room [notebook](#https://docs.databricks.com/aws/en/notebooks/) to map email addresses, phone numbers, or their respective hashes to raw UID2s.
90+
91+
## Running the Clean Room Notebook
92+
93+
This section provides details to help you use your Databricks clean room to process your DII into raw UID2s, including the following:
94+
95+
- [Notebook Parameters](#notebook-parameters)
96+
- [Input Table](#input-table)
97+
- [DII Format and Normalization](#dii-format-and-normalization)
98+
- [Output Table](#output-table)
99+
- [Output Table Schema](#output-table-schema)
100+
101+
### Notebook Parameters
102+
103+
You can use the `identity_map_v3` notebook to map DII in any table or view that you've added to the `creator` catalog of the clean room.
104+
105+
The notebook has two parameters, `input_schema` and `input_table`. Together, these two parameters identify the table or view in the clean room that contains the DII to be mapped.
106+
107+
For example, to map DII in the clean room table named `creator.default.emails`, set `input_schema` to `default` and `input_table` to `emails`.
108+
109+
| Parameter Name | Description |
110+
| :--- | :--- |
111+
| `input_schema` | The schema containing the table or view. |
112+
| `input_table` | The name of the table or view containing the DII to be mapped. |
113+
114+
### Input Table
115+
116+
The input table or view must have the two columns shown in the following table. The table or view can have additional columns, but the notebook doesn't use any additional columns, only these two.
117+
118+
| Column Name | Data Type | Description |
119+
| :--- | :--- | :--- |
120+
| `INPUT` | string | The DII to map. |
121+
| `INPUT_TYPE` | string | The type of DII to map. Allowed values: `email`, `email_hash`, `phone`, and `phone_hash`. |
122+
123+
### DII Format and Normalization
124+
125+
The normalization requirements depend on the type of DII you're processing, as follows:
126+
127+
- **Email address**: The notebook normalizes the data using the UID2 [Email Address Normalization](../getting-started/gs-normalization-encoding#email-address-normalization) rules.
128+
- **Phone number**: You must normalize the phone number before mapping it with the notebook, using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding#phone-number-normalization) rules.
129+
130+
### Output Table
131+
132+
If the clean room has an output catalog, the mapped DII is written to a table in the output catalog. Output tables are stored for 30 days.
133+
134+
For details, see [Overview of output tables](https://docs.databricks.com/aws/en/clean-rooms/output-tables#overview-of-output-tables) in the Databricks documentation.
135+
136+
### Output Table Schema
137+
138+
The following table provides information about the structure of the output data, including field names and values.
139+
140+
| Column Name | Data Type | Description |
141+
| :--- | :--- | :--- |
142+
| `UID` | string | The value is one of the following:<ul><li>**DII was successfully mapped**: The UID2 associated with the DII.</li><li>Othe**rwise: `NULL`.</li></ul> |
143+
| `PREV_UID` | string | The value is one of the following:<ul><li>**DII was successfully mapped and the current raw UID2 was rotated in the last 90 days**: the previous raw UID2.</li><li>**Otherwise**: `NULL`.</li></ul> |
144+
| `REFRESH_FROM` | timestamp | The value is one of the following:<ul><li>**DII was successfully mapped**: The timestamp (in epoch seconds) indicating when this UID2 should be refreshed.</li><li>**Otherwise**: `NULL`.</li></ul> |
145+
| `UNMAPPED` | string | The value is one of the following:<ul><li>**DII was successfully mapped**: `NULL`.</li><li>Othe**rwise: The reason why the identifier was not mapped: `OPTOUT`, `INVALID IDENTIFIER`, or `INVALID INPUT TYPE`.<br/>For details, see [Values for the UNMAPPED Column](#values-for-the-unmapped-column).</li></ul> |
146+
147+
#### Values for the UNMAPPED Column
148+
149+
The following table shows possible values for the `UNMAPPED` column.
150+
151+
| Value | Meaning |
152+
| :--- | :--- |
153+
| `NULL` | The DII was successfully mapped. |
154+
| `OPTOUT` | The user has opted out. |
155+
| `INVALID IDENTIFIER` | The email address or phone number is invalid. |
156+
| `INVALID INPUT TYPE` | The value of `INPUT_TYPE` is invalid. Valid values for `INPUT_TYPE` are: `email`, `email_hash`, `phone`, `phone_hash`. |
157+
158+
159+
<!--
160+
161+
------------------- BELOW IS A COPY OF SNOWFLAKE DOC HEADINGS ------------------------------
153162
154163
## Testing in the Integ Environment
155164
@@ -171,11 +180,8 @@ Define types of <Link href="../ref-info/glossary-uid#gl-dii">DII</Link>?
171180
172181
Does the service normalize? (for email / for phone)
173182
174-
175183
A successful query returns ...?
176184
177-
178-
179185
#### Examples
180186
181187
Mapping request examples in this section:
@@ -198,7 +204,7 @@ The input and output data in these examples is fictitious, for illustrative purp
198204
The following query illustrates how to map a single email address, using the [default database and schema names](#database-and-schema-names).
199205
200206
```sql
201-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('[email protected]', 'email'));
207+
202208
```
203209
204210
Query results for a single email:
@@ -216,9 +222,7 @@ Query results for a single email:
216222
The following query illustrates how to map multiple email addresses, using the [default database and schema names](#database-and-schema-names).
217223
218224
```sql
219-
select a.ID, a.EMAIL, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
220-
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL, 'email') t) m
221-
on a.ID=m.ID;
225+
222226
```
223227
224228
Query results for multiple emails:
@@ -244,7 +248,7 @@ The following query illustrates how to map a phone number, using the [default da
244248
You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules.
245249
246250
```sql
247-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('+12345678901', 'phone'));
251+
248252
```
249253
250254
Query results for a single phone number:
@@ -264,9 +268,7 @@ The following query illustrates how to map multiple phone numbers, using the [de
264268
You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules.
265269
266270
```sql
267-
select a.ID, a.PHONE, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
268-
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE, 'phone') t) m
269-
on a.ID=m.ID;
271+
270272
```
271273
272274
Query results for multiple phone numbers:
@@ -290,7 +292,7 @@ The following table identifies each item in the response, including `NULL` value
290292
The following query illustrates how to map a single email address hash, using the [default database and schema names](#database-and-schema-names).
291293
292294
```sql
293-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('[email protected]', 256)), 'email_hash'));
295+
294296
```
295297
296298
Query results for a single hashed email:
@@ -308,10 +310,8 @@ Query results for a single hashed email:
308310
The following query illustrates how to map multiple email address hashes, using the [default database and schema names](#database-and-schema-names).
309311
310312
```sql
311-
select a.ID, a.EMAIL_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
312-
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL_HASH, 'email_hash') t) m
313-
on a.ID=m.ID;
314-
```
313+
314+
````
315315
316316
Query results for multiple hashed emails:
317317
@@ -333,7 +333,7 @@ The following table identifies each item in the response, including `NULL` value
333333
The following query illustrates how to map a single phone number hash, using the [default database and schema names](#database-and-schema-names).
334334
335335
```sql
336-
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('+12345678901', 256)), 'phone_hash'));
336+
337337
```
338338
339339
Query results for a single hashed phone number:
@@ -351,9 +351,7 @@ Query results for a single hashed phone number:
351351
The following query illustrates how to map multiple phone number hashes, using the [default database and schema names](#database-and-schema-names).
352352
353353
```sql
354-
select a.ID, a.PHONE_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
355-
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE_HASH, 'phone_hash') t) m
356-
on a.ID=m.ID;
354+
357355
```
358356
359357
Query results for multiple hashed phone numbers:
@@ -383,3 +381,4 @@ xxx
383381
Query results:
384382
385383
xxx
384+
-->

0 commit comments

Comments
 (0)