|
| 1 | +--- |
| 2 | +title: Databricks Integration |
| 3 | +sidebar_label: Databricks |
| 4 | +pagination_label: Databricks Integration |
| 5 | +description: Information about integrating with UID2 through Databricks. |
| 6 | +hide_table_of_contents: false |
| 7 | +sidebar_position: 04 |
| 8 | +displayed_sidebar: docs |
| 9 | +--- |
| 10 | + |
| 11 | +import Link from '@docusaurus/Link'; |
| 12 | + |
| 13 | +# UID2 Databricks Clean Room Integration Guide |
| 14 | + |
| 15 | +Overview general info plus define audience. |
| 16 | + |
| 17 | +## Databricks listing? |
| 18 | + |
| 19 | +xxx |
| 20 | + |
| 21 | +## Functionality |
| 22 | + |
| 23 | +xxx |
| 24 | + |
| 25 | +### Key Benefits |
| 26 | + |
| 27 | +xxx |
| 28 | + |
| 29 | +## Summary of Integration Steps |
| 30 | + |
| 31 | +------------------- MATT GUIDE, BEGIN ------------------------------ |
| 32 | + |
| 33 | + |
| 34 | +## Summary of Integration Steps |
| 35 | + |
| 36 | +At a high level, the following are the steps to set up your Databricks integration and process your data: |
| 37 | + |
| 38 | +1. Create a clean room and invite UID2 as a collaborator. |
| 39 | +1. Send your sharing identifier to your UID2 contact. |
| 40 | +1. Add data to the clean room. |
| 41 | +1. Run the clean room notebook to map directly identifying information (DII). |
| 42 | + |
| 43 | +## Step 1: Create a clean room and invite UID2 as a collaborator |
| 44 | + |
| 45 | +Follow the steps in Create clean rooms in the Databricks documentation. Use the correct sharing identifier from the table below, based on the UID2 Environment you wish to connect to. |
| 46 | +UID2 sharing identifiers can change. Be sure to check this page for the latest sharing identifiers. |
| 47 | + |
| 48 | +| Environment | UID2 Sharing Identifier | |
| 49 | +| :--- | :--- | |
| 50 | +| Production | aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329 | |
| 51 | +Integration | aws:us-east-2:4651b4ea-b29c-42ec-aecb-2377de70bbd4:2366823546528067:c15e03bf-a348-4189-92e5-68b9a7fb4018 | |
| 52 | + |
| 53 | +:::note |
| 54 | +Once you've created a clean room, you cannot change its collaborators. |
| 55 | + |
| 56 | +If you have the option to set clean room collaborator aliases—for example, if you’re using the Databricks Python SDK [**GWH__MC is this the UID2 Python SDK? Or a Databrics SDK?**]to create the clean room—your collaborator alias must be `creator` and the UID2 collaborator alias must be `collaborator`. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you. |
| 57 | +::: |
| 58 | + |
| 59 | +## Step 2: Send your sharing identifier to your UID2 contact |
| 60 | + |
| 61 | +Find the sharing identifier for the Unity Catalog metastore that is attached to the Databricks workspace where you’ll work with the clean room. Send the sharing identifier to your UID2 contact. |
| 62 | +The sharing identifier is a string in this format: `<cloud>:<region>:<uuid>`. |
| 63 | + |
| 64 | +For information on how to find the sharing identifier, see Get access in the Databricks-to-Databricks model in the Databricks documentation. |
| 65 | + |
| 66 | +## Step 3: Add data to the clean room |
| 67 | + |
| 68 | +Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in [Input Table](#uptohere)Schema. |
| 69 | + |
| 70 | +## Step 4: Run the clean room notebook to map DII |
| 71 | + |
| 72 | +Run the `identity_map_v3` clean room notebook to map DII to UID2s. Details about this notebook are given in the next section. |
| 73 | +Map DII |
| 74 | +The `identity_map_v3` clean room notebook maps DII to UID2s. |
| 75 | +Notebook Parameters |
| 76 | +The `identity_map_v3` notebook can be used to map DII in any table or view that has been added to the creator catalog of the clean room. |
| 77 | +The notebook has two parameters, input_schema and input_table. Together they identify the table or view in the clean room that contains the DII to be mapped. |
| 78 | +For example, to map DII in the clean room table named creator.default.emails, set input_schema to default and input_table to emails. |
| 79 | +Parameter Name |
| 80 | +Description |
| 81 | +input_schema |
| 82 | +The schema containing the table or view. |
| 83 | +input_table |
| 84 | +The name of the table or view containing the DII to be mapped. |
| 85 | +Input Table |
| 86 | +The input table or view must have two columns: INPUT and INPUT_TYPE. The table or view can have additional columns, but they won’t be used by the notebook. |
| 87 | +Column Name |
| 88 | +Data Type |
| 89 | +Description |
| 90 | +INPUT |
| 91 | +string |
| 92 | +The DII to map. |
| 93 | +INPUT_TYPE |
| 94 | +string |
| 95 | +The type of DII to map. Allowed values: email, email_hash, phone, and phone_hash. |
| 96 | +DII Format |
| 97 | +If the DII is an email address, the notebook normalizes the data using the UID2 Email Address Normalization rules. |
| 98 | +If the DII is a phone number, you must normalize it before mapping it with the notebook, using the UID2 Phone Number Normalization rules. |
| 99 | +Output Table |
| 100 | +If the clean room has an output catalog, the mapped DII will be written to a table in the output catalog. Output tables are stored for 30 days. For more information, see Overview of output tables in the Databricks documentation. |
| 101 | +Output Table Schema |
| 102 | +Column Name |
| 103 | +Data Type |
| 104 | +Description |
| 105 | +UID |
| 106 | +string |
| 107 | +The value is one of the following: |
| 108 | +DII was successfully mapped: The UID2 associated with the DII. |
| 109 | +Otherwise: NULL. |
| 110 | +PREV_UID |
| 111 | +string |
| 112 | +The value is one of the following: |
| 113 | +DII was successfully mapped and the current raw UID2 was rotated in the last 90 days: the previous raw UID2. |
| 114 | +Otherwise: NULL. |
| 115 | +REFRESH_FROM |
| 116 | +timestamp |
| 117 | +The value is one of the following: |
| 118 | +DII was successfully mapped: The timestamp (in epoch seconds) indicating when this UID2 should be refreshed. |
| 119 | +Otherwise: NULL. |
| 120 | +UNMAPPED |
| 121 | +string |
| 122 | +The value is one of the following: |
| 123 | +DII was successfully mapped: NULL. |
| 124 | +Otherwise: The reason why the identifier was not mapped: OPTOUT, INVALID IDENTIFIER, or INVALID INPUT TYPE. |
| 125 | +For details, see Values for the UNMAPPED Column. |
| 126 | +Values for the UNMAPPED Column |
| 127 | +The following table shows possible values for the UNMAPPED column. |
| 128 | +Value |
| 129 | +Meaning |
| 130 | +NULL |
| 131 | +The DII was successfully mapped. |
| 132 | +OPTOUT |
| 133 | +The user has opted out. |
| 134 | +INVALID IDENTIFIER |
| 135 | +The email address or phone number is invalid. |
| 136 | +INVALID INPUT TYPE |
| 137 | +The value of INPUT_TYPE is invalid. Valid values for INPUT_TYPE are: email, email_hash, phone, phone_hash. |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | + |
| 145 | +------------------- MATT GUIDE, END ------------------------------ |
| 146 | + |
| 147 | + |
| 148 | +------------------- BELOW IS A COPY OF SNOWFLAKE DOC HEADINGS ------------------------------ |
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | +xxx |
| 153 | + |
| 154 | +## Testing in the Integ Environment |
| 155 | + |
| 156 | +xxx |
| 157 | + |
| 158 | +## Shared Objects/Functions? |
| 159 | + |
| 160 | +xxx |
| 161 | + |
| 162 | +### Database and Schema Names |
| 163 | + |
| 164 | +Query examples? |
| 165 | + |
| 166 | +xxx | |
| 167 | + |
| 168 | +### Map DII |
| 169 | + |
| 170 | +Define types of <Link href="../ref-info/glossary-uid#gl-dii">DII</Link>? |
| 171 | + |
| 172 | +Does the service normalize? (for email / for phone) |
| 173 | + |
| 174 | + |
| 175 | +A successful query returns ...? |
| 176 | + |
| 177 | + |
| 178 | + |
| 179 | +#### Examples |
| 180 | + |
| 181 | +Mapping request examples in this section: |
| 182 | + |
| 183 | +- [Single Unhashed Email](#mapping-request-example---single-unhashed-email) |
| 184 | +- [Multiple Unhashed Emails](#mapping-request-example---multiple-unhashed-emails) |
| 185 | +- [Single Unhashed Phone Number](#mapping-request-example---single-unhashed-phone-number) |
| 186 | +- [Multiple Unhashed Phone Numbers](#mapping-request-example---multiple-unhashed-phone-numbers) |
| 187 | +- [Single Hashed Email](#mapping-request-example---single-hashed-email) |
| 188 | +- [Multiple Hashed Emails](#mapping-request-example---multiple-hashed-emails) |
| 189 | +- [Single Hashed Phone Number](#mapping-request-example---single-hashed-phone-number) |
| 190 | +- [Multiple Hashed Phone Numbers](#mapping-request-example---multiple-hashed-phone-numbers) |
| 191 | + |
| 192 | +:::note |
| 193 | +The input and output data in these examples is fictitious, for illustrative purposes only. The values provided are not real values. |
| 194 | +::: |
| 195 | + |
| 196 | +#### Mapping Request Example - Single Unhashed Email |
| 197 | + |
| 198 | +The following query illustrates how to map a single email address, using the [default database and schema names](#database-and-schema-names). |
| 199 | + |
| 200 | +```sql |
| 201 | +select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table( UID2_PROD_UID_SH. UID.FN_T_IDENTITY_MAP_V3( '[email protected]', 'email')); |
| 202 | +``` |
| 203 | + |
| 204 | +Query results for a single email: |
| 205 | + |
| 206 | +``` |
| 207 | ++----------------------------------------------+--------------------------------------------------+--------------+----------+ |
| 208 | +| UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 209 | ++----------------------------------------------+--------------------------------------------------+--------------+----------+ |
| 210 | +| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 211 | ++----------------------------------------------+--------------------------------------------------+--------------+----------+ |
| 212 | +``` |
| 213 | + |
| 214 | +#### Mapping Request Example - Multiple Unhashed Emails |
| 215 | + |
| 216 | +The following query illustrates how to map multiple email addresses, using the [default database and schema names](#database-and-schema-names). |
| 217 | + |
| 218 | +```sql |
| 219 | +select a.ID, a.EMAIL, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN( |
| 220 | + select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL, 'email') t) m |
| 221 | + on a.ID=m.ID; |
| 222 | +``` |
| 223 | + |
| 224 | +Query results for multiple emails: |
| 225 | + |
| 226 | +The following table identifies each item in the response, including `NULL` values for `NULL` or improperly formatted emails. |
| 227 | + |
| 228 | +``` |
| 229 | ++----+----------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 230 | +| ID | EMAIL | UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 231 | ++----+----------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 232 | +| 1 | [email protected] | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 233 | +| 2 | [email protected] | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL | |
| 234 | +| 3 | [email protected] | NULL | NULL | NULL | OPTOUT | |
| 235 | +| 4 | invalid-email | NULL | NULL | NULL | INVALID IDENTIFIER | |
| 236 | +| 5 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER | |
| 237 | ++----+----------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 238 | +``` |
| 239 | + |
| 240 | +#### Mapping Request Example - Single Unhashed Phone Number |
| 241 | + |
| 242 | +The following query illustrates how to map a phone number, using the [default database and schema names](#database-and-schema-names). |
| 243 | + |
| 244 | +You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules. |
| 245 | + |
| 246 | +```sql |
| 247 | +select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3('+12345678901', 'phone')); |
| 248 | +``` |
| 249 | + |
| 250 | +Query results for a single phone number: |
| 251 | + |
| 252 | +``` |
| 253 | ++----------------------------------------------+----------+--------------+----------+ |
| 254 | +| UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 255 | ++----------------------------------------------+----------+--------------+----------+ |
| 256 | +| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | NULL | 1735689600 | NULL | |
| 257 | ++----------------------------------------------+----------+--------------+----------+ |
| 258 | +``` |
| 259 | + |
| 260 | +#### Mapping Request Example - Multiple Unhashed Phone Numbers |
| 261 | + |
| 262 | +The following query illustrates how to map multiple phone numbers, using the [default database and schema names](#database-and-schema-names). |
| 263 | + |
| 264 | +You must normalize phone numbers using the UID2 [Phone Number Normalization](../getting-started/gs-normalization-encoding.md#phone-number-normalization) rules. |
| 265 | + |
| 266 | +```sql |
| 267 | +select a.ID, a.PHONE, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN( |
| 268 | + select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE, 'phone') t) m |
| 269 | + on a.ID=m.ID; |
| 270 | +``` |
| 271 | + |
| 272 | +Query results for multiple phone numbers: |
| 273 | + |
| 274 | +The following table identifies each item in the response, including `NULL` values for `NULL` or invalid phone numbers. |
| 275 | + |
| 276 | +``` |
| 277 | ++----+--------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 278 | +| ID | PHONE | UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 279 | ++----+--------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 280 | +| 1 | +12345678901 | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 281 | +| 2 | +61491570006 | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL | |
| 282 | +| 3 | +56789123001 | NULL | NULL | NULL | OPTOUT | |
| 283 | +| 4 | 1234 | NULL | NULL | NULL | INVALID IDENTIFIER | |
| 284 | +| 5 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER | |
| 285 | ++----+--------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 286 | +``` |
| 287 | + |
| 288 | +#### Mapping Request Example - Single Hashed Email |
| 289 | + |
| 290 | +The following query illustrates how to map a single email address hash, using the [default database and schema names](#database-and-schema-names). |
| 291 | + |
| 292 | +```sql |
| 293 | +select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table( UID2_PROD_UID_SH. UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY( '[email protected]', 256)), 'email_hash')); |
| 294 | +``` |
| 295 | + |
| 296 | +Query results for a single hashed email: |
| 297 | + |
| 298 | +``` |
| 299 | ++----------------------------------------------+----------------------------------------------+--------------+----------+ |
| 300 | +| UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 301 | ++----------------------------------------------+----------------------------------------------+--------------+----------+ |
| 302 | +| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 303 | ++----------------------------------------------+----------------------------------------------+--------------+----------+ |
| 304 | +``` |
| 305 | + |
| 306 | +#### Mapping Request Example - Multiple Hashed Emails |
| 307 | + |
| 308 | +The following query illustrates how to map multiple email address hashes, using the [default database and schema names](#database-and-schema-names). |
| 309 | + |
| 310 | +```sql |
| 311 | +select a.ID, a.EMAIL_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN( |
| 312 | + select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(EMAIL_HASH, 'email_hash') t) m |
| 313 | + on a.ID=m.ID; |
| 314 | +``` |
| 315 | + |
| 316 | +Query results for multiple hashed emails: |
| 317 | + |
| 318 | +The following table identifies each item in the response, including `NULL` values for `NULL` hashes. |
| 319 | + |
| 320 | +``` |
| 321 | ++----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 322 | +| ID | EMAIL_HASH | UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 323 | ++----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 324 | +| 1 | LdhtUlMQ58ZZy5YUqGPRQw5xUMS5dXG5ocJHYJHbAKI= | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 325 | +| 2 | /XJSTajB68SCUyuc3ePyxSLNhxrMKvJcjndq8TuwW5g= | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL | |
| 326 | +| 2 | UebesrNN0bQkm/QR7Jx7eav+UDXN5Gbq3zs1fLBMRy0= | NULL | NULL | 1735689600 | OPTOUT | |
| 327 | +| 4 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER | |
| 328 | ++----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 329 | +``` |
| 330 | + |
| 331 | +#### Mapping Request Example - Single Hashed Phone Number |
| 332 | + |
| 333 | +The following query illustrates how to map a single phone number hash, using the [default database and schema names](#database-and-schema-names). |
| 334 | + |
| 335 | +```sql |
| 336 | +select UID, PREV_UID, REFRESH_FROM, UNMAPPED from table(UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(BASE64_ENCODE(SHA2_BINARY('+12345678901', 256)), 'phone_hash')); |
| 337 | +``` |
| 338 | + |
| 339 | +Query results for a single hashed phone number: |
| 340 | + |
| 341 | +``` |
| 342 | ++----------------------------------------------+----------------------------------------------+--------------+----------+ |
| 343 | +| UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 344 | ++----------------------------------------------+----------------------------------------------+--------------+----------+ |
| 345 | +| 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 346 | ++----------------------------------------------+----------------------------------------------+--------------+----------+ |
| 347 | +``` |
| 348 | + |
| 349 | +#### Mapping Request Example - Multiple Hashed Phone Numbers |
| 350 | + |
| 351 | +The following query illustrates how to map multiple phone number hashes, using the [default database and schema names](#database-and-schema-names). |
| 352 | + |
| 353 | +```sql |
| 354 | +select a.ID, a.PHONE_HASH, m.UID, m.PREV_UID, m.REFRESH_FROM, m.UNMAPPED from AUDIENCE a LEFT JOIN( |
| 355 | + select ID, t.* from AUDIENCE, lateral UID2_PROD_UID_SH.UID.FN_T_IDENTITY_MAP_V3(PHONE_HASH, 'phone_hash') t) m |
| 356 | + on a.ID=m.ID; |
| 357 | +``` |
| 358 | + |
| 359 | +Query results for multiple hashed phone numbers: |
| 360 | + |
| 361 | +The following table identifies each item in the response, including `NULL` values for `NULL` hashes. |
| 362 | + |
| 363 | +``` |
| 364 | ++----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 365 | +| ID | PHONE_HASH | UID | PREV_UID | REFRESH_FROM | UNMAPPED | |
| 366 | ++----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 367 | +| 1 | LdhtUlMQ58ZZy5YUqGPRQw5xUMS5dXG5ocJHYJHbAKI= | 2ODl112/VS3x2vL+kG1439nPb7XNngLvOWiZGaMhdcU= | vP9zK2mL7fR4tY8qN3wE6xB0dH5jA1sC+nI/oGuMeVa= | 1735689600 | NULL | |
| 368 | +| 2 | /XJSTajB68SCUyuc3ePyxSLNhxrMKvJcjndq8TuwW5g= | IbW4n6LIvtDj/8fCESlU0QG9K/fH63UdcTkJpAG8fIQ= | NULL | 1735689600 | NULL | |
| 369 | +| 2 | UebesrNN0bQkm/QR7Jx7eav+UDXN5Gbq3zs1fLBMRy0= | NULL | NULL | 1735689600 | OPTOUT | |
| 370 | +| 4 | NULL | NULL | NULL | NULL | INVALID IDENTIFIER | |
| 371 | ++----+----------------------------------------------+----------------------------------------------+----------------------------------------------+--------------+--------------------+ |
| 372 | +``` |
| 373 | + |
| 374 | +### Monitor Raw UID2 Refresh and Regenerate Raw UID2s |
| 375 | + |
| 376 | +xxx |
| 377 | + |
| 378 | +#### Targeted Input Table |
| 379 | + |
| 380 | +xxx |
| 381 | + |
| 382 | + |
| 383 | +Query results: |
| 384 | + |
| 385 | +xxx |
0 commit comments