Skip to content

Commit f08783a

Browse files
committed
databricks guide
1 parent 0ff3f82 commit f08783a

File tree

2 files changed

+387
-1
lines changed

2 files changed

+387
-1
lines changed
Lines changed: 385 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,385 @@
1+
---
2+
title: Databricks Integration
3+
sidebar_label: Databricks
4+
pagination_label: Databricks Integration
5+
description: Information about integrating with UID2 through Databricks.
6+
hide_table_of_contents: false
7+
sidebar_position: 04
8+
displayed_sidebar: docs
9+
---
10+
11+
import Link from '@docusaurus/Link';
12+
13+
# UID2 Databricks Clean Room Integration Guide
14+
15+
Overview general info plus define audience.
16+
17+
## Databricks listing?
18+
19+
xxx
20+
21+
## Functionality
22+
23+
xxx
24+
25+
### Key Benefits
26+
27+
xxx
28+
29+
## Summary of Integration Steps
30+
31+
------------------- MATT GUIDE, BEGIN ------------------------------
32+
33+
34+
## Summary of Integration Steps
35+
36+
At a high level, the following are the steps to set up your Databricks integration and process your data:
37+
38+
1. Create a clean room and invite UID2 as a collaborator.
39+
1. Send your sharing identifier to your UID2 contact.
40+
1. Add data to the clean room.
41+
1. Run the clean room notebook to map directly identifying information (DII).
42+
43+
## Step 1: Create a clean room and invite UID2 as a collaborator
44+
45+
Follow the steps in Create clean rooms in the Databricks documentation. Use the correct sharing identifier from the table below, based on the UID2 Environment you wish to connect to.
46+
UID2 sharing identifiers can change. Be sure to check this page for the latest sharing identifiers.
47+
48+
| Environment | UID2 Sharing Identifier |
49+
| :--- | :--- |
50+
| Production | aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329 |
51+
Integration | aws:us-east-2:4651b4ea-b29c-42ec-aecb-2377de70bbd4:2366823546528067:c15e03bf-a348-4189-92e5-68b9a7fb4018 |
52+
53+
:::note
54+
Once you've created a clean room, you cannot change its collaborators.
55+
56+
If you have the option to set clean room collaborator aliases—for example, if you’re using the Databricks Python SDK [**GWH__MC is this the UID2 Python SDK? Or a Databrics SDK?**]to create the clean room—your collaborator alias must be `creator` and the UID2 collaborator alias must be `collaborator`. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you.
57+
:::
58+
59+
## Step 2: Send your sharing identifier to your UID2 contact
60+
61+
Find the sharing identifier for the Unity Catalog metastore that is attached to the Databricks workspace where you’ll work with the clean room. Send the sharing identifier to your UID2 contact.
62+
The sharing identifier is a string in this format: `<cloud>:<region>:<uuid>`.
63+
64+
For information on how to find the sharing identifier, see Get access in the Databricks-to-Databricks model in the Databricks documentation.
65+
66+
## Step 3: Add data to the clean room
67+
68+
Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in [Input Table](#uptohere)Schema.
69+
70+
## Step 4: Run the clean room notebook to map DII
71+
72+
Run the `identity_map_v3` clean room notebook to map DII to UID2s. Details about this notebook are given in the next section.
73+
Map DII
74+
The `identity_map_v3` clean room notebook maps DII to UID2s.
75+
Notebook Parameters
76+
The `identity_map_v3` notebook can be used to map DII in any table or view that has been added to the creator catalog of the clean room.
77+
The notebook has two parameters, input_schema and input_table. Together they identify the table or view in the clean room that contains the DII to be mapped.
78+
For example, to map DII in the clean room table named creator.default.emails, set input_schema to default and input_table to emails.
79+
Parameter Name
80+
Description
81+
input_schema
82+
The schema containing the table or view.
83+
input_table
84+
The name of the table or view containing the DII to be mapped.
85+
Input Table
86+
The input table or view must have two columns: INPUT and INPUT_TYPE. The table or view can have additional columns, but they won’t be used by the notebook.
87+
Column Name
88+
Data Type
89+
Description
90+
INPUT
91+
string
92+
The DII to map.
93+
INPUT_TYPE
94+
string
95+
The type of DII to map. Allowed values: email, email_hash, phone, and phone_hash.
96+
DII Format
97+
If the DII is an email address, the notebook normalizes the data using the UID2 Email Address Normalization rules.
98+
If the DII is a phone number, you must normalize it before mapping it with the notebook, using the UID2 Phone Number Normalization rules.
99+
Output Table
100+
If the clean room has an output catalog, the mapped DII will be written to a table in the output catalog. Output tables are stored for 30 days. For more information, see Overview of output tables in the Databricks documentation.
101+
Output Table Schema
102+
Column Name
103+
Data Type
104+
Description
105+
UID
106+
string
107+
The value is one of the following:
108+
DII was successfully mapped: The UID2 associated with the DII.
109+
Otherwise: NULL.
110+
PREV_UID
111+
string
112+
The value is one of the following:
113+
DII was successfully mapped and the current raw UID2 was rotated in the last 90 days: the previous raw UID2.
114+
Otherwise: NULL.
115+
REFRESH_FROM
116+
timestamp
117+
The value is one of the following:
118+
DII was successfully mapped: The timestamp (in epoch seconds) indicating when this UID2 should be refreshed.
119+
Otherwise: NULL.
120+
UNMAPPED
121+
string
122+
The value is one of the following:
123+
DII was successfully mapped: NULL.
124+
Otherwise: The reason why the identifier was not mapped: OPTOUT, INVALID IDENTIFIER, or INVALID INPUT TYPE.
125+
For details, see Values for the UNMAPPED Column.
126+
Values for the UNMAPPED Column
127+
The following table shows possible values for the UNMAPPED column.
128+
Value
129+
Meaning
130+
NULL
131+
The DII was successfully mapped.
132+
OPTOUT
133+
The user has opted out.
134+
INVALID IDENTIFIER
135+
The email address or phone number is invalid.
136+
INVALID INPUT TYPE
137+
The value of INPUT_TYPE is invalid. Valid values for INPUT_TYPE are: email, email_hash, phone, phone_hash.
138+
139+
140+
141+
142+
143+
144+
145+
------------------- MATT GUIDE, END ------------------------------
146+
147+
148+
------------------- BELOW IS A COPY OF SNOWFLAKE DOC HEADINGS ------------------------------
149+
150+
151+
152+
xxx
153+
154+
## Testing in the Integ Environment
155+
156+
xxx
157+
158+
## Shared Objects/Functions?
159+
160+
xxx
161+
162+
### Database and Schema Names
163+
164+
Query examples?
165+
166+
xxx |
167+
168+
### Map DII
169+
170+
Define types of <Link href="../ref-info/glossary-uid#gl-dii">DII</Link>?
171+
172+
Does the service normalize? (for email / for phone)
173+
174+
175+
A successful query returns ...?
176+
177+
178+
179+
#### Examples
180+
181+
Mapping request examples in this section:
182+
183+
- [Single Unhashed Email](#mapping-request-example---single-unhashed-email)
184+
- [Multiple Unhashed Emails](#mapping-request-example---multiple-unhashed-emails)
185+
- [Single Unhashed Phone Number](#mapping-request-example---single-unhashed-phone-number)
186+
- [Multiple Unhashed Phone Numbers](#mapping-request-example---multiple-unhashed-phone-numbers)
187+
- [Single Hashed Email](#mapping-request-example---single-hashed-email)
188+
- [Multiple Hashed Emails](#mapping-request-example---multiple-hashed-emails)
189+
- [Single Hashed Phone Number](#mapping-request-example---single-hashed-phone-number)
190+
- [Multiple Hashed Phone Numbers](#mapping-request-example---multiple-hashed-phone-numbers)
191+
192+
:::note
193+
The input and output data in these examples is fictitious, for illustrative purposes only. The values provided are not real values.
194+
:::
195+
196+
#### Mapping Request Example - Single Unhashed Email
197+
198+
The following query illustrates how to map a single email address, using the [default database and schema names](#database-and-schema-names).
199+
200+
```sql
201+
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('[email protected]', 'email'));
202+
```
203+
204+
Query results for a single email:
205+
206+
```
207+
+----------------------------------------------+--------------------------------------------------+--------------+----------+
208+
| UID | PREV_UID | REFRESH_FROM | UNMAPPED |
209+
+----------------------------------------------+--------------------------------------------------+--------------+----------+
210+
| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
211+
+----------------------------------------------+--------------------------------------------------+--------------+----------+
212+
```
213+
214+
#### Mapping Request Example - Multiple Unhashed Emails
215+
216+
The following query illustrates how to map multiple email addresses, using the [default database and schema names](#database-and-schema-names).
217+
218+
```sql
219+
select a.ID, a.EMAIL, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
220+
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL, 'email') t) m
221+
on a.ID=m.ID;
222+
```
223+
224+
Query results for multiple emails:
225+
226+
The following table identifies each item in the response, including `NULL` values for `NULL` or improperly formatted emails.
227+
228+
```
229+
+----+----------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
230+
| ID | EMAIL | UID | PREV_UID | REFRESH_FROM | UNMAPPED |
231+
+----+----------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
232+
| 1 | [email protected] | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
233+
| 2 | [email protected] | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL |
234+
| 3 | [email protected] | NULL | NULL | NULL | OPTOUT |
235+
| 4 | invalid-email | NULL | NULL | NULL | INVALID IDENTIFIER |
236+
| 5 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER |
237+
+----+----------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
238+
```
239+
240+
#### Mapping Request Example - Single Unhashed Phone Number
241+
242+
The following query illustrates how to map a phone number, using the [default database and schema names](#database-and-schema-names).
243+
244+
You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules.
245+
246+
```sql
247+
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('+12345678901', 'phone'));
248+
```
249+
250+
Query results for a single phone number:
251+
252+
```
253+
+----------------------------------------------+----------+--------------+----------+
254+
| UID | PREV_UID | REFRESH_FROM | UNMAPPED |
255+
+----------------------------------------------+----------+--------------+----------+
256+
| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | NULL | 1735689600 | NULL |
257+
+----------------------------------------------+----------+--------------+----------+
258+
```
259+
260+
#### Mapping Request Example - Multiple Unhashed Phone Numbers
261+
262+
The following query illustrates how to map multiple phone numbers, using the [default database and schema names](#database-and-schema-names).
263+
264+
You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules.
265+
266+
```sql
267+
select a.ID, a.PHONE, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
268+
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE, 'phone') t) m
269+
on a.ID=m.ID;
270+
```
271+
272+
Query results for multiple phone numbers:
273+
274+
The following table identifies each item in the response, including `NULL` values for `NULL` or invalid phone numbers.
275+
276+
```
277+
+----+--------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
278+
| ID | PHONE | UID | PREV_UID | REFRESH_FROM | UNMAPPED |
279+
+----+--------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
280+
| 1 | +12345678901 | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
281+
| 2 | +61491570006 | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL |
282+
| 3 | +56789123001 | NULL | NULL | NULL | OPTOUT |
283+
| 4 | 1234 | NULL | NULL | NULL | INVALID IDENTIFIER |
284+
| 5 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER |
285+
+----+--------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
286+
```
287+
288+
#### Mapping Request Example - Single Hashed Email
289+
290+
The following query illustrates how to map a single email address hash, using the [default database and schema names](#database-and-schema-names).
291+
292+
```sql
293+
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('[email protected]', 256)), 'email_hash'));
294+
```
295+
296+
Query results for a single hashed email:
297+
298+
```
299+
+----------------------------------------------+----------------------------------------------+--------------+----------+
300+
| UID | PREV_UID | REFRESH_FROM | UNMAPPED |
301+
+----------------------------------------------+----------------------------------------------+--------------+----------+
302+
| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
303+
+----------------------------------------------+----------------------------------------------+--------------+----------+
304+
```
305+
306+
#### Mapping Request Example - Multiple Hashed Emails
307+
308+
The following query illustrates how to map multiple email address hashes, using the [default database and schema names](#database-and-schema-names).
309+
310+
```sql
311+
select a.ID, a.EMAIL_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
312+
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL_HASH, 'email_hash') t) m
313+
on a.ID=m.ID;
314+
```
315+
316+
Query results for multiple hashed emails:
317+
318+
The following table identifies each item in the response, including `NULL` values for `NULL` hashes.
319+
320+
```
321+
+----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
322+
| ID | EMAIL_HASH | UID | PREV_UID | REFRESH_FROM | UNMAPPED |
323+
+----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
324+
| 1 | LdhtUlMQ58ZZy5YUqGPRQw5xUMS5dXG5ocJHYJHbAKI= | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
325+
| 2 | /XJSTajB68SCUyuc3ePyxSLNhxrMKvJcjndq8TuwW5g= | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL |
326+
| 2 | UebesrNN0bQkm/QR7Jx7eav+UDXN5Gbq3zs1fLBMRy0= | NULL | NULL | 1735689600 | OPTOUT |
327+
| 4 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER |
328+
+----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
329+
```
330+
331+
#### Mapping Request Example - Single Hashed Phone Number
332+
333+
The following query illustrates how to map a single phone number hash, using the [default database and schema names](#database-and-schema-names).
334+
335+
```sql
336+
select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('+12345678901', 256)), 'phone_hash'));
337+
```
338+
339+
Query results for a single hashed phone number:
340+
341+
```
342+
+----------------------------------------------+----------------------------------------------+--------------+----------+
343+
| UID | PREV_UID | REFRESH_FROM | UNMAPPED |
344+
+----------------------------------------------+----------------------------------------------+--------------+----------+
345+
| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
346+
+----------------------------------------------+----------------------------------------------+--------------+----------+
347+
```
348+
349+
#### Mapping Request Example - Multiple Hashed Phone Numbers
350+
351+
The following query illustrates how to map multiple phone number hashes, using the [default database and schema names](#database-and-schema-names).
352+
353+
```sql
354+
select a.ID, a.PHONE_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN(
355+
select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE_HASH, 'phone_hash') t) m
356+
on a.ID=m.ID;
357+
```
358+
359+
Query results for multiple hashed phone numbers:
360+
361+
The following table identifies each item in the response, including `NULL` values for `NULL` hashes.
362+
363+
```
364+
+----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
365+
| ID | PHONE_HASH | UID | PREV_UID | REFRESH_FROM | UNMAPPED |
366+
+----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
367+
| 1 | LdhtUlMQ58ZZy5YUqGPRQw5xUMS5dXG5ocJHYJHbAKI= | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL |
368+
| 2 | /XJSTajB68SCUyuc3ePyxSLNhxrMKvJcjndq8TuwW5g= | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL |
369+
| 2 | UebesrNN0bQkm/QR7Jx7eav+UDXN5Gbq3zs1fLBMRy0= | NULL | NULL | 1735689600 | OPTOUT |
370+
| 4 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER |
371+
+----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+
372+
```
373+
374+
### Monitor Raw UID2 Refresh and Regenerate Raw UID2s
375+
376+
xxx
377+
378+
#### Targeted Input Table
379+
380+
xxx
381+
382+
383+
Query results:
384+
385+
xxx

sidebars.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,8 @@ const fullSidebar = [
234234
],
235235
},
236236

237-
'guides/integration-aws-entity-resolution',
237+
'guides/integration-databricks',
238+
'guides/integration-aws-entity-resolution',
238239
'guides/integration-advertiser-dataprovider-endpoints',
239240
],
240241
},

0 commit comments

Comments
 (0)