You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/purview/tutorial-custom-types.md
+19-17Lines changed: 19 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,9 @@ author: adinastoll
5
5
ms.author: adnegrau
6
6
ms.service: purview
7
7
ms.topic: how-to
8
-
ms.date: 03/08/2023
8
+
ms.date: 03/14/2023
9
9
---
10
10
11
-
12
11
# Type definitions and how to create custom types
13
12
14
13
This tutorial will explain what type definitions are, how to create custom types, and how to initialize assets of custom types in Microsoft Purview.
@@ -31,15 +30,16 @@ For this tutorial you'll need:
31
30
* Apache Atlas endpoint of your Microsoft Purview account. To get your Apache Atlas endpoint, follow the *Apache Atlas endpoint* section from [here](tutorial-atlas-2-2-apis.md#atlas-endpoint).
32
31
33
32
> [!NOTE]
34
-
> Before moving to the hands-on part of the tutorial, the first four sections will explain what System Type is and how it is used in Microsoft Purview.
33
+
> Before moving to the hands-on part of the tutorial, the first four sections will explain what a System Type is and how it is used in Microsoft Purview.
35
34
> All the REST API calls described further will use the **bearer token** and the **endpoint** which are described in the prerequisites.
36
35
>
37
36
> To skip directly to the steps, use these links:
38
37
>
39
38
>*[Create custom type definitions](#create-definitions)
40
39
>*[Initialize assets of custom types](#initialize-assets-of-custom-types)
41
40
42
-
## What is *asset* and *type* in Microsoft Purview
41
+
## What are *asset* and *type* in Microsoft Purview?
42
+
43
43
An *asset* is a metadata element that describes a digital or physical resource. The digital or physical resources that are expected to be cataloged as assets include:
44
44
45
45
* Data sources such as databases, files, and data feed.
@@ -51,8 +51,8 @@ Microsoft Purview provides users a flexible *type system* to expand the definiti
51
51
52
52
Essentially, a *Type* can be seen as a *Class* from Object Oriented Programming (OOP):
53
53
54
-
* It defines the properties that represent that type
55
-
* Each type is uniquely identified by its *name*
54
+
* It defines the properties that represent that type.
55
+
* Each type is uniquely identified by its *name*.
56
56
* A *type* can inherit from a *supertType*. This is an equivalent concept as inheritance from OOP. A type that extends a superType will inherit the attributes of the superType.
57
57
58
58
You can see all type definitions in your Microsoft Purview account by sending a `GET` request to the [All Type Definitions](/rest/api/purview/catalogdataplane/types/get-all-type-definitions) endpoint:
@@ -61,7 +61,7 @@ You can see all type definitions in your Microsoft Purview account by sending a
61
61
GET https://{{ENDPOINT}}/catalog/api/atlas/v2/types/typedefs
62
62
```
63
63
64
-
Apache Atlas has few pre-defined system types that are commonly used as supertypes.
64
+
Apache Atlas has few predefined system types that are commonly used as supertypes.
65
65
66
66
For example:
67
67
@@ -73,7 +73,7 @@ For example:
73
73
74
74
***Lineage**: Lineage information helps one understand the origin of data and the transformations it may have gone through before arriving in a file or table. Lineage is calculated through *DataSet* and *Process*: DataSets (input of process) impact some other DataSets (output of process) through Process.
75
75
76
-
:::image type="content" source="./media/tutorial-custom-types/base-model-diagram.png" alt-text="Diagram showing the relationships between system types." border="false":::
76
+
:::image type="content" source="./media/tutorial-custom-types/base-model-diagram.png" alt-text="Diagram showing the relationships between system types." border="false" lightbox="./media/tutorial-custom-types/base-model-diagram.png":::
77
77
78
78
## Example of a *Type* definition
79
79
@@ -86,7 +86,7 @@ GET https://{{ENDPOINT}}/catalog/api/atlas/v2/types/typedef/name/{name}
86
86
```
87
87
88
88
>[!TIP]
89
-
> The **{name}** property tells which defintion you are interested in. In this case, you should use **azure_sql_table**.
89
+
> The **{name}** property tells which definition you are interested in. In this case, you should use **azure_sql_table**.
90
90
91
91
Below you can see a simplified JSON result:
92
92
@@ -149,7 +149,7 @@ Based on the JSON type definition, let's look at some properties:
149
149
150
150
Below you can see an example of how the **Schema** tab looks like for an asset of type Azure SQL Table:
151
151
152
-
:::image type="content" source="./media/tutorial-custom-types/schema-tab.png" alt-text="Screenshot of the schema tab for an Azure SQL Table asset.":::
152
+
:::image type="content" source="./media/tutorial-custom-types/schema-tab.png" alt-text="Screenshot of the schema tab for an Azure SQL Table asset." lightbox="./media/tutorial-custom-types/schema-tab.png":::
153
153
154
154
***relationshipAttributeDefs** are calculated through the relationship type definitions. In our JSON, we can see that **schemaElementsAttributes** points to the relationship attribute called **columns** - which is one of elements from **relationshipAttributeDefs** array, as shown below:
155
155
@@ -231,8 +231,9 @@ Below you can see a simplified JSON result:
231
231
232
232
## Schema tab
233
233
234
-
### What is **Schema** in Microsoft Purview?
235
-
Schema is an important concept that reflects how data is stored and organized in the data store. It reflects the structure of the data and the data restrictions of the elements that construct the structure.
234
+
### What is *Schema* in Microsoft Purview?
235
+
236
+
Schema is an important concept that reflects how data is stored and organized in the data store. It reflects the structure of the data and the data restrictions of the elements that construct the structure.
236
237
237
238
Elements on the same schema can be classified differently (due to their content). Also, different transformation (lineage) can happen to only a subset of elements. Due to these aspects, Purview can model schema and schema elements **as entities**, hence schema is usually a relationship attribute to the data asset entity. Examples of schema elements are: **columns** of a table, **json properties** of json schema, **xml elements** of xml schema etc.
238
239
@@ -242,20 +243,21 @@ There are two types of schemas:
242
243
243
244
For data store with predefined schema, Purview uses the corresponding relationship between the data asset and the schema elements to reflect the schema. This relationship attribute is specified by the keyword **schemaElementsAttribute** in **options** property of the entity type definition.
244
245
245
-
***Non Intrinsic Schema** - Some systems don't enforce such schema restrictions, but users can use it to store structural data by applying some schema protocols to the data. For example, Azure Blobs store binary data and do not care about the data in the binary stream. Therefore, it's unaware of any schema, but the user can serialize their data with schema protocols like json before storing it in the blob. In this sense, schema is maintained by some extra protocols and corresponding validation enforced by the user.
246
+
***Non Intrinsic Schema** - Some systems don't enforce such schema restrictions, but users can use it to store structural data by applying some schema protocols to the data. For example, Azure Blobs store binary data and don't care about the data in the binary stream. Therefore, it's unaware of any schema, but the user can serialize their data with schema protocols like json before storing it in the blob. In this sense, schema is maintained by some extra protocols and corresponding validation enforced by the user.
247
+
248
+
For data store without inherent schema, schema model is independent of this data store. For such cases, Purview defines an interface for schema and a relationship between DataSet and schema, called **dataset_attached_schemas** - this extends any entity type that inherits from DataSet to have an **attachedSchema** relationship attribute to link to their schema representation.
246
249
247
-
For data store without inherent schema, schema model is independent of this data store. For such cases, Purview defines an interface for schema and a relationship between DataSet and schema, called **dataset_attached_schemas** - this extends any entity type that inherits form DataSet to have an **attachedSchema** relationship attribute to link to their schema representation.
250
+
### Example of *Schema tab*
248
251
249
-
### Example of **Schema tab**
250
252
The Azure SQL Table example from above has an intrinsic schema. The information that shows up in the Schema tab of the Azure SQL Table comes from the Azure SQL Column themselves.
251
253
252
254
Selecting one column item, we would see the following:
253
255
254
-
:::image type="content" source="./media/tutorial-custom-types/azure-sql-column.png" alt-text="Screenshot of the addressID column page with the properties tab open and the data type highlighted.":::
256
+
:::image type="content" source="./media/tutorial-custom-types/azure-sql-column.png" alt-text="Screenshot of the addressID column page with the properties tab open and the data type highlighted." lightbox="./media/tutorial-custom-types/azure-sql-column.png":::
255
257
256
258
The question is, how did Microsoft Purview select the *data_tye* property from the column and showed it in the Schema tab of the table?
257
259
258
-
:::image type="content" source="./media/tutorial-custom-types/schema-tab-data-type.png" alt-text="Screenshot of the Azure SQL Table page with the schema page open.":::
260
+
:::image type="content" source="./media/tutorial-custom-types/schema-tab-data-type.png" alt-text="Screenshot of the Azure SQL Table page with the schema page open." lightbox="./media/tutorial-custom-types/schema-tab-data-type.png":::
259
261
260
262
You can get the type definition of an Azure SQL Column by making a `GET` request to the [endpoint](/rest/api/purview/catalogdataplane/types/get-type-definition-by-name):
0 commit comments