You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/purview/tutorial-custom-types.md
+54-56Lines changed: 54 additions & 56 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: adinastoll
5
5
ms.author: adnegrau
6
6
ms.service: purview
7
7
ms.topic: tutorial
8
-
ms.date: 02/02/2023
8
+
ms.date: 03/08/2023
9
9
---
10
10
11
11
@@ -44,7 +44,7 @@ An *asset* is a metadata element that describes a digital or physical resource.
44
44
45
45
* Data sources such as databases, files, and data feed.
46
46
* Analytical models and processes.
47
-
*Bussiness policies and terms.
47
+
*Business policies and terms.
48
48
* Infrastructure like the server.
49
49
50
50
Microsoft Purview provides users a flexible *type system* to expand the definition of the asset to include new kinds of resources as they become relevant. Microsoft Purview relies on the [Type System](https://atlas.apache.org/2.0.0/TypeSystem.html) from Apache Atlas. All metadata objects (assets) managed by Microsoft Purview are modeled using type definitions. Understanding the Type System is fundamental to create new custom types in Microsoft Purview.
@@ -138,41 +138,41 @@ Based on the JSON type definition, let's look at some properties:
138
138
139
139
***ServiceType** field is useful when browsing assets *by source type* in Microsoft Purview. The *service type* will be an entry point to find all assets that belong to the same *service type* - as defined on their type definition. In the below screenshot of Purview UI, the user limits the result to be the entities specified with *Azure SQL Database* in **serviceType**:
140
140
141
-
:::image type="content" source="./media/tutorial-custom-types/browse-assets.png" alt-text="Screenshot of the portal showing the path from Data Catalog to Browse to By source type and the asset highlighted.":::
141
+
:::image type="content" source="./media/tutorial-custom-types/browse-assets.png" alt-text="Screenshot of the portal showing the path from Data Catalog to Browse to By source type and the asset highlighted.":::
142
142
143
-
> [!NOTE]
144
-
> **Azure SQL Database** is defined with the same *serviceType* as **Azure SQL Table**.
143
+
> [!NOTE]
144
+
> **Azure SQL Database** is defined with the same *serviceType* as **Azure SQL Table**.
145
145
146
146
***SuperTypes** describes the *"parent"* types you want to "*inherit*" from.
147
147
148
148
***schemaElementsAttributes** from **options** influences what appears in the **Schema** tab of your asset in Microsoft Purview.
149
149
150
-
Below you can see an example of how the **Schema** tab looks like for an asset of type Azure SQL Table:
150
+
Below you can see an example of how the **Schema** tab looks like for an asset of type Azure SQL Table:
151
151
152
-
:::image type="content" source="./media/tutorial-custom-types/schema-tab.png" alt-text="Screenshot of the schema tab for an Azure SQL Table asset.":::
152
+
:::image type="content" source="./media/tutorial-custom-types/schema-tab.png" alt-text="Screenshot of the schema tab for an Azure SQL Table asset.":::
153
153
154
-
**relationshipAttributeDefs** are calculated through the relationship type definitions. In our JSON, we can see that **schemaElementsAttributes** points to the relationship attribute called **columns** - which is one of elements from **relationshipAttributeDefs** array, as shown below:
154
+
***relationshipAttributeDefs** are calculated through the relationship type definitions. In our JSON, we can see that **schemaElementsAttributes** points to the relationship attribute called **columns** - which is one of elements from **relationshipAttributeDefs** array, as shown below:
Each relationship has its own definition. The name of the definition is found in **relationshipTypeName** attribute. In this case, it's *azure_sql_table_columns*.
170
+
Each relationship has its own definition. The name of the definition is found in **relationshipTypeName** attribute. In this case, it's *azure_sql_table_columns*.
171
171
172
-
* The **cardinality** of this relationship attribute is set to *SET, which suggests that it holds a list of related assets.
173
-
* The related asset is of type *azure_sql_column*, as visible in the *typeName* attribute.
172
+
* The **cardinality** of this relationship attribute is set to *SET, which suggests that it holds a list of related assets.
173
+
* The related asset is of type *azure_sql_column*, as visible in the *typeName* attribute.
174
174
175
-
In other words, the *columns* relationship attribute relates the Azure SQL Table to a list of Azure SQL Columns that show up in the Schema tab.
175
+
In other words, the *columns* relationship attribute relates the Azure SQL Table to a list of Azure SQL Columns that show up in the Schema tab.
176
176
177
177
## Example of a *relationship Type definition*
178
178
@@ -224,30 +224,27 @@ Below you can see a simplified JSON result:
224
224
***cardinality** is either SINGLE, SET or LIST.
225
225
226
226
***isContainer** is a boolean and applies to containment relationship category. When set to true in one end, it indicates that this end is the container of the other end. Therefore:
227
-
*Ony *Composition* or *Aggregation* category relationship can and should have in one end *isContainer* set to true.
228
-
**Association* category relationship should not have *isContainer* property set to true in any end.
227
+
*Only the *Composition* or *Aggregation* category relationships can and should have in one end *isContainer* set to true.
228
+
**Association* category relationship shouldn't have *isContainer* property set to true in any end.
229
229
230
230
***endDef2** is the second end of the definition and describes, similarly to endDef1, the properties of the second part of the relationship.
231
231
232
232
## Schema tab
233
233
234
234
### What is **Schema** in Microsoft Purview?
235
-
Schema is an important concept which reflects how data is stored and organized in the data store. It reflects the structure of the data as well as the data restrictions of the elements that construct the structure.
235
+
Schema is an important concept that reflects how data is stored and organized in the data store. It reflects the structure of the data and the data restrictions of the elements that construct the structure.
236
236
237
-
Elements on the same schema can be classified differently (due to their content). Also, different transformation (lineage) can happen to only a subset of elements. Due to these aspects, Purview allows to model schema and schema elements **as entities**, hence schema is usually a relationship attribute to the data asset entity. Examples of schema elements are: **columns** of a table, **json properties** of json schema, **xml elements** of xml schema etc.
237
+
Elements on the same schema can be classified differently (due to their content). Also, different transformation (lineage) can happen to only a subset of elements. Due to these aspects, Purview can model schema and schema elements **as entities**, hence schema is usually a relationship attribute to the data asset entity. Examples of schema elements are: **columns** of a table, **json properties** of json schema, **xml elements** of xml schema etc.
238
238
239
239
There are two types of schemas:
240
-
* Intrinsic Schema:
241
240
242
-
Some systems are intrinsic to schema. For example, when you create a SQL Table, the system will require you to define the columns that construct the table; in this sense, schema of a table is reflected by its columns.
241
+
***Intrinsic Schema** - Some systems are intrinsic to schema. For example, when you create a SQL Table, the system requires you to define the columns that construct the table; in this sense, schema of a table is reflected by its columns.
243
242
244
-
For data store with predefined schema, Purview uses the corresponding relationship between the data asset and the schema elements to reflect the schema. This relationship attribute is specified by the keyword **schemaElementsAttribute** in **options** property of the entity type definition.
243
+
For data store with predefined schema, Purview uses the corresponding relationship between the data asset and the schema elements to reflect the schema. This relationship attribute is specified by the keyword **schemaElementsAttribute** in **options** property of the entity type definition.
245
244
246
-
* Non Intrinsic Schema:
245
+
***Non Intrinsic Schema** - Some systems don't enforce such schema restrictions, but users can use it to store structural data by applying some schema protocols to the data. For example, Azure Blobs store binary data and do not care about the data in the binary stream. Therefore, it's unaware of any schema, but the user can serialize their data with schema protocols like json before storing it in the blob. In this sense, schema is maintained by some extra protocols and corresponding validation enforced by the user.
247
246
248
-
Some systems don't enforce such schema restrictions, but users can use it to store structural data by applying some schema protocols to the data. For example, Azure Blobs store binary data and does not care about the data in the binary stream. Therefore, it is unaware of any schema, but the user can serialize their data with schema protocols like json before storing it in the blob. In this sense, schema is maintained by some extra protocols and corresponding validation enforced by the user.
249
-
250
-
For data store without inherent schema, schema model is independent of this data store. For such cases, Purview defines an interface for schema and a relationship between DataSet and schema, called **dataset_attached_schemas** - this extends any entity type that inherits form DataSet to have an **attachedSchema** relationship attribute to link to their schema representation.
247
+
For data store without inherent schema, schema model is independent of this data store. For such cases, Purview defines an interface for schema and a relationship between DataSet and schema, called **dataset_attached_schemas** - this extends any entity type that inherits form DataSet to have an **attachedSchema** relationship attribute to link to their schema representation.
251
248
252
249
### Example of **Schema tab**
253
250
The Azure SQL Table example from above has an intrinsic schema. The information that shows up in the Schema tab of the Azure SQL Table comes from the Azure SQL Column themselves.
@@ -307,7 +304,7 @@ Here's a simplified JSON result:
307
304
308
305
Azure SQL Table used *schemaElementAttribute* to point to a relationship consisting of a list of Azure SQL Columns. The type definition of a column has *schemaAttributes* defined.
309
306
310
-
In this way, the Schema tab in the table will display the attribute(s) listed in the *schemaAttributes* of the related assets.
307
+
In this way, the Schema tab in the table displays the attribute(s) listed in the *schemaAttributes* of the related assets.
311
308
312
309
## Create custom type definitions
313
310
@@ -326,9 +323,9 @@ Now that we gained an understanding of type definitions in general, let us creat
326
323
327
324
### Scenario
328
325
329
-
*In this tutorial, we would like to model a 1:n relationship between two types, called *custom_type_parent* and *custom_type_child*.
326
+
In this tutorial, we would like to model a 1:n relationship between two types, called *custom_type_parent* and *custom_type_child*.
330
327
331
-
*A *custom_type_child* should reference one parent, whereas a *custom_type_parent* can reference a list of children.
328
+
A *custom_type_child* should reference one parent, whereas a *custom_type_parent* can reference a list of children.
332
329
333
330
They should be linked together through a 1:n relationship.
334
331
@@ -343,7 +340,7 @@ They should be linked together through a 1:n relationship.
343
340
POST https://{{ENDPOINT}}.purview.azure.com/catalog/api/atlas/v2/types/typedefs
344
341
```
345
342
346
-
with the body:
343
+
With the body:
347
344
348
345
```json
349
346
{
@@ -374,7 +371,7 @@ They should be linked together through a 1:n relationship.
374
371
POST https://{{ENDPOINT}}.purview.azure.com/catalog/api/atlas/v2/types/typedefs
375
372
```
376
373
377
-
with the body:
374
+
With the body:
378
375
379
376
```json
380
377
{
@@ -405,7 +402,7 @@ They should be linked together through a 1:n relationship.
405
402
POST https://{{ENDPOINT}}.purview.azure.com/catalog/api/atlas/v2/types/typedefs
406
403
```
407
404
408
-
with the body:
405
+
With the body:
409
406
410
407
```json
411
408
{
@@ -440,7 +437,7 @@ They should be linked together through a 1:n relationship.
440
437
POST https://{{ENDPOINT}}.purview.azure.com/catalog/api/atlas/v2/entity
441
438
```
442
439
443
-
with the body:
440
+
With the body:
444
441
445
442
```json
446
443
@@ -518,48 +515,49 @@ They should be linked together through a 1:n relationship.
518
515
1. Select *By source type*.
519
516
1. Select *Sample-Custom-Types*.
520
517
521
-
:::image type="content" source="./media/tutorial-custom-types/custom-types-objects.png" alt-text="Screenshot showing the path from the Data Catalog to Browse assets with the filter narrowed to Sample-Custom-Types.":::
518
+
:::image type="content" source="./media/tutorial-custom-types/custom-types-objects.png" alt-text="Screenshot showing the path from the Data Catalog to Browse assets with the filter narrowed to Sample-Custom-Types.":::
522
519
523
520
1. Select the *First_parent_object*:
524
521
525
-
:::image type="content" source="./media/tutorial-custom-types/first-parent-object.png" alt-text="Screenshot of the First_parent_object page.":::
522
+
:::image type="content" source="./media/tutorial-custom-types/first-parent-object.png" alt-text="Screenshot of the First_parent_object page.":::
526
523
527
524
1. Select the *Properties* tab:
528
525
529
-
:::image type="content" source="./media/tutorial-custom-types/children.png" alt-text="Screenshot of the properties tab with the related assets highlighted, showing one child asset.":::
526
+
:::image type="content" source="./media/tutorial-custom-types/children.png" alt-text="Screenshot of the properties tab with the related assets highlighted, showing one child asset.":::
530
527
531
528
1. You can see the *First_child_object* being linked there.
532
529
533
530
1. Select the *First_child_object*:
534
531
535
-
:::image type="content" source="./media/tutorial-custom-types/first-child-object.png" alt-text="Screenshot of the First_child_object page, showing the overview tab.":::
532
+
:::image type="content" source="./media/tutorial-custom-types/first-child-object.png" alt-text="Screenshot of the First_child_object page, showing the overview tab.":::
536
533
537
534
1. Select the *Properties* tab:
538
535
539
-
:::image type="content" source="./media/tutorial-custom-types/parent.png" alt-text="Screenshot of the properties page, showing the related assets with a single parent asset.":::
536
+
:::image type="content" source="./media/tutorial-custom-types/parent.png" alt-text="Screenshot of the properties page, showing the related assets with a single parent asset.":::
540
537
541
538
1. You can see the Parent object being linked there.
542
539
543
540
1. Similarly, you can select the *Related* tab and will see the relationship between the two objects:
544
541
545
-
:::image type="content" source="./media/tutorial-custom-types/relationship.png" alt-text="Screenshot of the Related tab, showing the relationship between the child and parent.":::
542
+
:::image type="content" source="./media/tutorial-custom-types/relationship.png" alt-text="Screenshot of the Related tab, showing the relationship between the child and parent.":::
546
543
547
544
1. You can create multiple children by initializing a new child asset and inititialzing a relationship
548
545
549
-
>[!NOTE]
550
-
>The *qualifiedName* is unique per asset, therefore, the second child should be called differently, such as: *custom//custom_type_child:Second_child_object*
546
+
>[!NOTE]
547
+
>The *qualifiedName* is unique per asset, therefore, the second child should be called differently, such as: *custom//custom_type_child:Second_child_object*
551
548
552
-
:::image type="content" source="./media/tutorial-custom-types/two_children.png" alt-text="Screenshot of the First_parent_object, showing the two child assets highlighted.":::
549
+
:::image type="content" source="./media/tutorial-custom-types/two_children.png" alt-text="Screenshot of the First_parent_object, showing the two child assets highlighted.":::
553
550
554
-
>[!TIP]
555
-
> If you delete the *First_parent_object* you will notice that the children will also be removed, due to the *COMPOSITION* relationship that we chose in the definition.
551
+
>[!TIP]
552
+
> If you delete the *First_parent_object* you will notice that the children will also be removed, due to the *COMPOSITION* relationship that we chose in the definition.
556
553
557
554
## Limitations
558
555
559
-
There are several known limitations when working with custom types which will be enhanced in future, such as:
556
+
There are several known limitations when working with custom types that will be enhanced in future, such as:
560
557
* Relationship tab looks different compared to built-in types
0 commit comments